llmproxy is an OpenAI-compatible proxy and load balancer for local or
self-hosted LLM backends. It combines the proxy API, dashboard, diagnostics,
and an MCP server endpoint in a layered Nuxt and Nitro workspace.
This repository is structured as a host app plus reusable lower-level apps under
apps/README.md.
The short version:
llmproxyis the host product appai-client/ai-proxy/ai-serversplit backend connectivity, orchestration, and HTTP surfacemcp-client/mcp-serversplit outbound and inbound MCP concerns- infrastructure apps like
config,ajv,sse, andtool-registrystay reusable
Cross-app imports are expected to go through app public surfaces or shared, not
through another app's server/services/*, server/utils/*, or frontend
internals such as app/utils/*. For reusable browser helpers, apps should
publish a top-level *-client.ts surface.
The shared app guide also defines the distinction between *-capability.ts and
*-runtime.ts, so new layers use the same public-surface pattern consistently.
- OpenAI-compatible forwarding for
POST /v1/chat/completions - Aggregated
GET /v1/models - Load balancing across multiple backends with
maxConcurrency - Queueing when local backends are fully utilized
- Dashboard under
/dashboardwith live status, request inspection, playground, configuration, and diagnostics - Built-in connectors for
openai,llama.cpp, andollama - Built-in MCP server endpoint under
POST /mcp - Separate
mcp-clientlayer for outbound MCP integrations
GET /v1/modelsPOST /v1/chat/completions
Other OpenAI-style routes such as POST /v1/completions,
POST /v1/responses, POST /v1/embeddings, audio routes, or image routes
are currently not implemented and return 501.
Requirements:
- Node.js
^20.19.0 || >=22.12.0 - npm
Install dependencies:
npm installStart development mode:
npm run devBuild for production:
npm run buildStart the production build locally:
npm startAfter startup:
- Proxy API:
http://localhost:3000/v1/... - Dashboard:
http://localhost:3000/dashboard - Requests:
http://localhost:3000/dashboard/logs - Playground:
http://localhost:3000/dashboard/playground - Diagnostics:
http://localhost:3000/dashboard/diagnostics - Config:
http://localhost:3000/dashboard/config
To use a different local host or port, set the standard Nuxt or Nitro
environment variables such as HOST, PORT, NUXT_HOST, NUXT_PORT,
NITRO_HOST, or NITRO_PORT before starting the app.
Every completed or rejected request is also emitted as one NDJSON line on
stdout.
Build the image:
docker build -t llmproxy .Run the container:
docker run --rm -p 4100:4100 -v llmproxy-data:/data llmproxyInside the container, DATA_DIR defaults to /data, so the persisted
AI backend config lives at DATA_DIR/config/ai-client/config.json, which
resolves to /data/config/ai-client/config.json there. Outbound MCP client
registrations live separately at DATA_DIR/config/mcp-client/config.json.
Missing config files are created automatically on first startup.
Run the regular test suite:
npm testRun type checking:
npm run typecheckVerify the production build:
npm run buildRun stress and soak tests:
npm run test:memory
npm run test:chaos-memory
npm run test:streams-500By default, DATA_DIR resolves to .data locally. The ai-client app stores
backend and runtime settings under DATA_DIR/config/ai-client/config.json,
which resolves to .data/config/ai-client/config.json. Set DATA_DIR to move
that base directory. If no config file exists yet, it is created automatically
on first start with default values.
Outbound MCP client registrations are stored separately under
DATA_DIR/config/mcp-client/config.json. Those entries are owned by the
mcp-client app and can be changed through the llmproxy admin API.
Each app also provides an external config.schema.json file in its app root.
The config app aggregates those schemas at GET /api/config/schema and uses
them to validate persisted config writes.
Important fields:
recentRequestLimit: number of request entries retained in memory, default1000baseUrl: upstream backend URLconnector:openai,llama.cpp, orollamamaxConcurrency: concurrent requests allowed per backendmodels: optional model allowlist such as["*"]or["llama-*"]healthPath: optional backend health endpointapiKeyorapiKeyEnv: optional upstream authenticationtimeoutMs: optional per-backend request timeoutmonitoringTimeoutMs: optional health-check timeoutmonitoringIntervalMs: optional health-check interval
Backend changes become active immediately. Server configuration values
requestTimeoutMs, queueTimeoutMs, healthCheckIntervalMs, and
recentRequestLimit also apply immediately.
The MCP endpoint is configured by the mcp-server app itself through Nuxt
runtime config. Set private.mcpEnabled in the mcp-server layer runtime
config, for example via NUXT_PRIVATE_MCP_ENABLED=false, if you want to
disable /mcp.
- Requests are buffered so routing can use the requested
model. - If the client does not request streaming,
llmproxystill streams upstream internally and returns a normal JSON response at the end. connector: "openai"strips non-standard sampler fields such astop_k,min_p, andrepeat_penalty.connector: "llama.cpp"preserves those fields on the OpenAI-compatible route surface.connector: "ollama"translates requests to native Ollama routes such as/api/chatand/api/tags.
A short technical overview of routing, connectors, retention, and configuration behavior is available in apps/llmproxy/docs/architecture.md.