Skip to content

CLOSE_WAIT socket leak causes persistent CPU spin on macOS (kqueue) even when fully idle #665

@trssantos

Description

@trssantos

Bug Description

After a ClaudeSDKClient session ends (via client.disconnect() with timeout + force-kill fallback), the Python process continues to burn ~24% CPU indefinitely, even when completely idle with no active sessions.

The root cause is a leaked CLOSE_WAIT TCP socket to the Anthropic API that remains registered in the kqueue event loop. Since CLOSE_WAIT sockets are permanently "readable" (EOF pending), kqueue returns them as ready on every poll cycle, causing the asyncio event loop to busy-spin.

How This Differs from #378

Issue #378 describes close() hanging during the call due to _deliver_cancellation spinning. Our issue is about what happens after — even when disconnect completes or times out and the subprocess is force-killed:

  1. A TCP socket to the Anthropic API remains in CLOSE_WAIT state
  2. The socket FD stays registered in kqueue
  3. The asyncio event loop spins polling this permanently-readable FD
  4. CPU stays at ~24% with the process doing absolutely nothing

Reproduction

We run a long-lived FastAPI daemon that uses ClaudeSDKClient for periodic tasks. Between tasks, the daemon should be near 0% CPU.

# Simplified pattern
client = ClaudeSDKClient(ClaudeAgentOptions(max_turns=5))
# ... use client ...

# Disconnect with timeout (workaround from #378)
try:
    await asyncio.wait_for(client.disconnect(), timeout=5.0)
except asyncio.TimeoutError:
    # Force-kill the subprocess
    os.kill(subprocess_pid, signal.SIGKILL)

After this sequence, lsof shows the leaked socket:

Python  71226 user   13u   IPv6 ...   TCP [local]:59274->[2600:9000:2134:...]:https (CLOSE_WAIT)

And sample confirms kqueue spin:

789/889 samples in:
  select_kqueue_control_impl → kevent  (should be blocking, but returns immediately)

Evidence

  • Process state: Daemon fully idle — no active sessions, no scheduled tasks running
  • CPU: 23.7% sustained, for over 1 hour
  • lsof output: CLOSE_WAIT TCP socket (FD 13) to Anthropic API endpoint, never closed
  • sample output: 88.7% of samples in kevent call, but CPU not idle — kqueue returning immediately due to permanently-readable CLOSE_WAIT socket
  • No orphaned pipes: Subprocess pipes were properly closed (we implemented a workaround for that). The socket is from the SDK's internal HTTP transport, not the subprocess stdio.

Root Cause Analysis

The SDK (or its HTTP transport layer) opens HTTPS connections to the Anthropic API. When the remote server closes the connection (sends FIN):

  1. The local TCP stack ACKs the FIN → socket enters CLOSE_WAIT
  2. The SDK never calls close() on the socket
  3. The socket's FD remains registered in kqueue (via asyncio's event loop)
  4. kqueue reports it as readable every poll cycle (EOF is pending)
  5. asyncio event loop never blocks → CPU spin

The asyncio.wait_for() workaround from #378 doesn't help here because:

Suggested Fix

The SDK's transport layer should:

  1. Track all opened sockets (HTTP connections to the API, not just subprocess pipes)
  2. Close them in transport.close() — ensure close() on the TCP socket is called
  3. Deregister from the event loop — remove the FD from kqueue/epoll before closing

Alternatively, a defensive cleanup in Query.close():

async def close(self) -> None:
    self._closed = True
    if self._tg:
        self._tg.cancel_scope.cancel()
        with suppress(anyio.get_cancelled_exc_class()):
            try:
                with anyio.fail_after(5.0):
                    await self._tg.__aexit__(None, None, None)
            except TimeoutError:
                pass
    await self.transport.close()
    # Defensive: close any remaining sockets to prevent kqueue spin
    self._close_leaked_fds()

Environment

  • claude-agent-sdk: 0.1.45
  • Python: 3.13.5
  • Platform: macOS 15.6.1 (Darwin 24.6.0), ARM64 (Apple Silicon)
  • Event loop: asyncio with kqueue selector
  • Use case: Long-running FastAPI daemon with periodic ClaudeSDKClient sessions

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions