Files
claude-code/src/utils/udsClient.ts
Dosion 4266149820 fix: keep UDS peer failures structured (#375)
* fix: keep UDS peer failures structured

CodeRabbit and Claude cross-review identified that timeout and raw peer connection failures should share one observable error contract. UDS peer failures now use UdsPeerConnectionError consistently, and connectToPeer hands the socket lifecycle back to the caller after a successful connection instead of retaining an internal timeout or error listener.

The tests cover the real socket paths with capability files, timeout behavior, connection failure structure, post-connect listener handoff, AgentSummary rescheduling observations, and platform-specific mailbox directory errno handling.

Constraint: Preserve the 5000ms production timeout default while allowing tests to exercise timeout paths quickly.

Rejected: Suppress CodeRabbit warnings in tests | would hide the real timeout/error contract gap.

Rejected: Keep connectToPeer post-connect error listener | it would silently swallow caller-owned socket errors.

Confidence: high

Scope-risk: narrow

Directive: Keep UDS send/connect timeout and socket-error paths on the same structured peer error contract.

Tested: bun test src/utils/__tests__/udsMessaging.test.ts src/services/AgentSummary/__tests__/agentSummary.test.ts src/utils/__tests__/teammateMailbox.test.ts

Tested: bunx tsc --noEmit --pretty false

Tested: bun run lint

Tested: bun run test:all

Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage

Tested: bun run build

Tested: bun run build:vite

Tested: omx ask claude simplify review artifact .omx/artifacts/claude-review-only-cross-check-for-pr-374-on-branch-codex-codecov-r-2026-04-27T08-17-47-309Z.md

Tested: omx ask claude security review artifact .omx/artifacts/claude-security-review-cross-check-for-pr-374-current-working-tree--2026-04-27T08-26-54-079Z.md

Not-tested: GitHub-hosted CodeRabbit refresh until pushed.

* docs: clarify UDS peer socket ownership

CodeRabbit's #375 pass found that connectToPeer now correctly hands socket errors to the caller, but the JSDoc needed to spell out that contract. The lifecycle test also uses a less brittle post-connect timeout so slow CI does not turn the ownership check into a connection-speed race.

Constraint: The raw socket API intentionally detaches its internal listener after successful connect so caller-owned errors are not swallowed.

Rejected: Keep the test timeout at 50ms | it tests scheduler speed instead of socket lifecycle ownership.

Confidence: high

Scope-risk: narrow

Directive: connectToPeer callers must attach their own error listener immediately after awaiting the socket.

Tested: bun test src/utils/__tests__/udsMessaging.test.ts

Tested: bunx tsc --noEmit --pretty false

Tested: bun run lint

Tested: git diff --check

Tested: bun run test:all

Not-tested: GitHub-hosted CodeRabbit refresh until pushed.

* fix: close peer socket listener handoff window

CodeRabbit and Claude review found that documenting caller-owned raw socket errors still left a Promise handoff window and a stale timeout-listener risk. The peer connection API now requires a caller error handler and installs it before resolving, while cleanup removes internal error and timeout listeners on every path.

Constraint: Keep the fix precise to PR #375 review feedback and avoid warning suppression or fallback behavior.
Rejected: Leave the behavior documented only | still permits an unhandled socket error window between resolve and caller listener attachment.
Rejected: Keep a no-op internal error listener | would silently swallow caller-owned socket errors.
Confidence: high
Scope-risk: narrow
Directive: Do not add raw connectToPeer callers without providing a real onSocketError handler and capability handshake.
Tested: bun test src/utils/__tests__/udsMessaging.test.ts src/services/AgentSummary/__tests__/agentSummary.test.ts
Tested: bunx tsc --noEmit --pretty false
Tested: bun run lint
Tested: bun run test:all
Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage
Tested: bun run build
Tested: bun run build:vite
Tested: bun audit
Not-tested: Manual external ACP peer runtime beyond repository tests.

* fix: use a deadline timer for peer connects

The raw socket handoff no longer needs Socket#setTimeout; an ordinary connection deadline keeps the timeout behavior while avoiding an internal socket timeout listener that has no reliable UDS integration path to exercise.

Constraint: Keep Codecov coverage honest without adding ignore pragmas, mocks, or fallback suppression.

Rejected: c8 ignore on the timeout listener | hides the uncovered branch instead of simplifying the lifecycle.

Rejected: keep Socket#setTimeout listener | leaves a socket listener lifecycle to manage for a connect-only deadline.

Confidence: high

Scope-risk: narrow

Directive: Keep connectToPeer errors caller-owned via onSocketError and reject pre-connect failures with UdsPeerConnectionError.

Tested: bun test src/utils/__tests__/udsMessaging.test.ts src/services/AgentSummary/__tests__/agentSummary.test.ts

Tested: bunx tsc --noEmit --pretty false

Tested: bun run lint

Tested: bun test src/utils/__tests__/udsMessaging.test.ts --coverage --coverage-reporter lcov --coverage-dir coverage-uds

Tested: bun run test:all

Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage

Tested: bun run build

Tested: bun run build:vite

Tested: bun audit

Not-tested: Manual external ACP peer runtime beyond repository tests.

---------

Co-authored-by: unraid <local@unraid.local>
2026-04-27 20:16:09 +08:00

322 lines
9.1 KiB
TypeScript

/**
* UDS Client — connect to peer Claude Code sessions via Unix Domain Sockets.
*
* Peers are discovered by reading the PID-file registry in ~/.claude/sessions/
* (written by concurrentSessions.ts) and checking each entry's
* `messagingSocketPath` field. A peer is "alive" if its PID is running and
* its socket accepts a ping/pong round-trip.
*/
import { createConnection, type Socket } from 'net'
import { readdir, readFile } from 'fs/promises'
import { join } from 'path'
import { getClaudeConfigHomeDir } from './envUtils.js'
import { logForDebugging } from './debug.js'
import { errorMessage, isFsInaccessible } from './errors.js'
import { isProcessRunning } from './genericProcessUtils.js'
import { jsonParse, jsonStringify } from './slowOperations.js'
import type { SessionKind } from './concurrentSessions.js'
import { MAX_UDS_FRAME_BYTES, type UdsMessage } from './udsMessaging.js'
import { attachUdsResponseReader, getChunkBytes } from './udsResponseReader.js'
// ---------------------------------------------------------------------------
// Types
// ---------------------------------------------------------------------------
export type PeerSession = {
pid: number
sessionId?: string
cwd?: string
startedAt?: number
kind?: SessionKind
name?: string
messagingSocketPath?: string
entrypoint?: string
bridgeSessionId?: string | null
alive: boolean
}
export class UdsPeerConnectionError extends Error {
readonly socketPath: string
constructor(socketPath: string, cause: unknown) {
super(
`Failed to connect to peer at ${socketPath}: ${errorMessage(cause)}`,
{ cause },
)
this.name = 'UdsPeerConnectionError'
this.socketPath = socketPath
}
}
// ---------------------------------------------------------------------------
// Session directory
// ---------------------------------------------------------------------------
function getSessionsDir(): string {
return join(getClaudeConfigHomeDir(), 'sessions')
}
// ---------------------------------------------------------------------------
// Discovery
// ---------------------------------------------------------------------------
/**
* List all live sessions from the PID registry, optionally probing their
* UDS sockets for liveness. Sessions whose PID is no longer running are
* excluded (and their stale files cleaned up).
*/
export async function listAllLiveSessions(): Promise<PeerSession[]> {
const dir = getSessionsDir()
let files: string[]
try {
files = await readdir(dir)
} catch (e) {
if (!isFsInaccessible(e)) {
logForDebugging(`[udsClient] readdir failed: ${errorMessage(e)}`)
}
return []
}
const results: PeerSession[] = []
for (const file of files) {
if (!/^\d+\.json$/.test(file)) continue
const pid = parseInt(file.slice(0, -5), 10)
if (!isProcessRunning(pid)) {
// Stale — skip (concurrentSessions handles cleanup)
continue
}
try {
const raw = await readFile(join(dir, file), 'utf8')
const data = jsonParse(raw) as Record<string, unknown>
results.push({
pid,
sessionId: data.sessionId as string | undefined,
cwd: data.cwd as string | undefined,
startedAt: data.startedAt as number | undefined,
kind: data.kind as SessionKind | undefined,
name: data.name as string | undefined,
messagingSocketPath: data.messagingSocketPath as string | undefined,
entrypoint: data.entrypoint as string | undefined,
bridgeSessionId: data.bridgeSessionId as string | null | undefined,
alive: true,
})
} catch {
// Corrupted file — skip
}
}
return results
}
/**
* List peer sessions that have a UDS messaging socket (i.e. can receive
* messages). Excludes the current process.
*/
export async function listPeers(): Promise<PeerSession[]> {
const all = await listAllLiveSessions()
return all.filter(s => s.pid !== process.pid && s.messagingSocketPath != null)
}
async function findAuthTokenForSocketPath(
socketPath: string,
): Promise<string | undefined> {
const { readUdsCapabilityToken } = await import('./udsMessaging.js')
return readUdsCapabilityToken(socketPath)
}
// ---------------------------------------------------------------------------
// Connection helpers
// ---------------------------------------------------------------------------
/**
* Probe a UDS socket to check if a server is listening (ping/pong).
* Returns true if the peer responds within the timeout.
*/
export async function isPeerAlive(
socketPath: string,
timeoutMs = 3000,
authToken?: string,
): Promise<boolean> {
const token = authToken ?? (await findAuthTokenForSocketPath(socketPath))
if (!token) return false
return new Promise<boolean>(resolve => {
const conn = createConnection(socketPath, () => {
const ping: UdsMessage = {
type: 'ping',
ts: new Date().toISOString(),
meta: { authToken: token },
}
conn.write(jsonStringify(ping) + '\n')
})
let resolved = false
const timer = setTimeout(() => {
if (!resolved) {
resolved = true
conn.destroy()
resolve(false)
}
}, timeoutMs)
let buffer = ''
conn.on('data', chunk => {
if (
Buffer.byteLength(buffer, 'utf8') + getChunkBytes(chunk) >
MAX_UDS_FRAME_BYTES
) {
if (!resolved) {
resolved = true
clearTimeout(timer)
conn.destroy()
resolve(false)
}
return
}
buffer += chunk.toString()
if (buffer.includes('"pong"')) {
if (!resolved) {
resolved = true
clearTimeout(timer)
conn.end()
resolve(true)
}
}
})
conn.on('error', () => {
if (!resolved) {
resolved = true
clearTimeout(timer)
resolve(false)
}
})
})
}
/**
* Send a text message to a peer's UDS socket. This is the high-level helper
* used by SendMessageTool for `uds:<path>` addresses.
*/
export async function sendToUdsSocket(
targetSocketPath: string,
message: string | Record<string, unknown>,
timeoutMs = 5000,
): Promise<void> {
const { parseUdsTarget } = await import('./udsMessaging.js')
const target = parseUdsTarget(targetSocketPath)
const authToken = await findAuthTokenForSocketPath(target.socketPath)
if (!authToken) {
throw new Error(`No auth token found for peer at ${target.socketPath}`)
}
const data = typeof message === 'string' ? message : jsonStringify(message)
const udsMsg: UdsMessage = {
type: 'text',
data,
ts: new Date().toISOString(),
}
// Lazily import to avoid circular dep at module-load time
const { getUdsMessagingSocketPath } = await import('./udsMessaging.js')
udsMsg.from = getUdsMessagingSocketPath()
return new Promise<void>((resolve, reject) => {
let settled = false
let conn: ReturnType<typeof createConnection>
const finish = (error?: Error): void => {
if (settled) return
settled = true
if (error) {
conn.destroy(error)
reject(error)
} else {
conn.end()
resolve()
}
}
conn = createConnection(target.socketPath, () => {
udsMsg.meta = { ...udsMsg.meta, authToken }
conn.write(jsonStringify(udsMsg) + '\n', err => {
if (err) finish(err)
})
})
attachUdsResponseReader(conn, {
maxFrameBytes: MAX_UDS_FRAME_BYTES,
onSettled: finish,
formatSocketError: err =>
new UdsPeerConnectionError(target.socketPath, err),
})
conn.setTimeout(timeoutMs, () => {
finish(
new UdsPeerConnectionError(
target.socketPath,
new Error('Connection timed out'),
),
)
})
})
}
/**
* Connect to a peer and return the raw socket for bidirectional communication.
* The caller owns the post-connect lifecycle through onSocketError, which is
* attached before the Promise resolves so peer socket errors cannot be
* swallowed or surface through a listener handoff window.
* Pre-connect failures reject with UdsPeerConnectionError.
* This only opens the transport; callers still own any capability handshake.
*/
export function connectToPeer(
socketPath: string,
onSocketError: (error: Error) => void,
timeoutMs = 5000,
): Promise<Socket> {
return new Promise<Socket>((resolve, reject) => {
const conn = createConnection(socketPath)
let settled = false
const timeout = setTimeout(
fail,
timeoutMs,
new Error('Connection timed out'),
)
function cleanupListeners(): void {
clearTimeout(timeout)
conn.off('error', fail)
}
function fail(cause: unknown): void {
if (settled) {
return
}
settled = true
cleanupListeners()
conn.destroy()
reject(new UdsPeerConnectionError(socketPath, cause))
}
conn.once('connect', () => {
if (settled) {
return
}
settled = true
cleanupListeners()
conn.on('error', onSocketError)
resolve(conn)
})
conn.on('error', fail)
})
}
/**
* Disconnect a previously connected peer socket.
*/
export function disconnectPeer(socket: Socket): void {
if (!socket.destroyed) {
socket.end()
}
}