Skip to content

Bug: MCP tools permanently lost mid-session after single transient listTools() failure — no retry, no reconnect #17099

@HawkingRadiation42

Description

@HawkingRadiation42

Description

MCP tools from a stdio-transport server intermittently vanish mid-session (within 5-10 minutes of normal use). The server process remains running — it did not crash. Once the tools disappear, they never come back until OpenCode is restarted.
The model sees:

Model tried to call unavailable tool 'seadev_run'.
Available tools: bash, read, glob, grep, edit, write, task, webfetch, ...

seadev_* tools are completely absent from the available list — not erroring, just gone. Tools from a second MCP server (f5-confluence_*) remain available in the same session.

Root Cause (verified from source)

In packages/opencode/src/mcp/index.ts, the MCP.tools() function (line 609) is called on every LLM step via resolveTools() in prompt.ts:836. It calls client.listTools() for each connected MCP client.
Lines 621-635:

const toolsResults = await Promise.all(
  connectedClients.map(async ([clientName, client]) => {
    const toolsResult = await client.listTools().catch((e) => {
      log.error("failed to get tools", { clientName, error: e.message })
      const failedStatus = {
        status: "failed" as const,
        error: e instanceof Error ? e.message : String(e),
      }
      s.status[clientName] = failedStatus
      delete s.clients[clientName]          // ← permanently removes client
      return undefined
    })
    return { clientName, client, toolsResult }
  }),
)

Three problems

# Problem Impact
1 delete s.clients[clientName] permanently removes the MCP client from the singleton state Client is gone for the lifetime of the process — tools vanish permanently
2 No retry logic A single transient failure (timeout, pipe hiccup, GC pause) permanently evicts a healthy server
3 No reconnection and no onclose/onerror handlers on MCP clients Nothing ever recreates a deleted client
Compare with create() at startup (line 509) where listTools() is wrapped with withTimeout(). The runtime tools() call has no such timeout wrapper — it relies on the MCP SDK's internal timeout, and on failure, permanently deletes rather than retrying.

Evidence

Check Result
MCP server process alive? Yesps aux confirms the Python process is still running with the same PID hours after tools vanished
Crash in OpenCode logs? No — zero error/disconnect entries for the affected MCP server in ~/.local/share/opencode/log/
Other MCP server affected? Nof5-confluence_* tools remained available in the same session
Restart fixes it? Yes — restarting OpenCode re-creates the client via create() in the Instance.state() initializer

Code path

prompt.ts:604    → resolveTools() called every LLM loop iteration
prompt.ts:836    → MCP.tools() called
mcp/index.ts:621 → client.listTools() called per-client with .catch()
mcp/index.ts:630 → catch handler: delete s.clients[clientName]  ← permanent eviction
                   (no retry, no reconnect, no onclose handler)

State is a singleton (Instance.state) — the deletion persists for the entire process lifetime.

Suggested Fix

Option A — Minimal: Retry listTools() 2-3 times with short backoff before evicting:

const toolsResult = await retry(() => client.listTools(), { attempts: 3, delay: 1000 })
  .catch((e) => {
    // only evict after all retries exhausted
  })

Option B — Robust: On failure, attempt to create() a new client for the same MCP config. Mark as "reconnecting" rather than "failed".
Option C — Defensive: Register client.onclose handlers after creating MCP clients (near line 476) to trigger automatic reconnection:

client.onclose = () => {
  log.warn("MCP client closed, reconnecting", { key })
  // trigger reconnection
}
At minimum, delete s.clients[clientName] on line 630 should be removed or guarded by a retry counter — a single transient listTools() failure should not permanently kill an otherwise healthy MCP connection.

Related Issues

Plugins

opencode-beads

OpenCode version

1.2.24

Steps to reproduce

  1. Configure two MCP servers in opencode.json — one fast/local and one that wraps subprocess calls (e.g. a custom FastMCP server that runs SSH commands)
  2. Start OpenCode and verify both MCP servers connect and all tools are available
  3. Use the session normally for 5-10 minutes, invoking tools from both servers
  4. At some point, tools from one server silently vanish — the model reports Model tried to call unavailable tool. The other server's tools remain available.
  5. Verify the MCP server process is still running (ps aux | grep <server-name>)
  6. Restart OpenCode — tools reappear immediately

Screenshot and/or share link

No response

Operating System

macos Tahoe 26.3

Terminal

iTerm2

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcoreAnything pertaining to core functionality of the application (opencode server stuff)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions