mirror of
https://github.com/claude-code-best/claude-code.git
synced 2026-06-17 13:55:50 +00:00
* test: keep Codecov coverage on real agent communication paths PR #369 was merged before the final Codecov coverage fix landed, so this follow-up carries only the incremental real-path tests needed on top of main. The tests exercise AgentSummary lifecycle branches, mailbox fail-closed behavior, UDS client connection failure through a real capability file, and UDS response-reader framing without mock.module, warning suppression, feature fallback, or production-code churn. Constraint: PR #369 is already merged; this branch must contain only the incremental Codecov repair on top of latest main Rejected: Reopen or keep pushing the merged PR branch | merged PR refs do not update and would leave Codecov stale Rejected: Mock bun:bundle or hide warnings | would reintroduce cross-test pollution and pseudo coverage Rejected: Keep unrelated SendMessageTool production diff | it created avoidable patch-coverage debt without improving the runtime path Confidence: high Scope-risk: narrow Directive: Keep these coverage tests on real paths; do not replace them with output suppression or feature-flag mocks Tested: bunx tsc --noEmit --pretty false Tested: bun run lint Tested: bun test src\utils\__tests__\teammateMailbox.test.ts Tested: bun test src\services\AgentSummary\__tests__\agentSummary.test.ts src\services\AgentSummary\__tests__\summaryContext.test.ts src\utils\__tests__\teammateMailbox.test.ts src\utils\__tests__\udsMessaging.test.ts src\utils\__tests__\udsResponseReader.test.ts packages\builtin-tools\src\tools\SendMessageTool\__tests__\udsRecipientSanitization.test.ts Tested: bun run test:all Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage Tested: bun run build Tested: bun run build:vite Tested: bun audit Tested: git diff --check Tested: Claude simplify review GO (.omx/artifacts/claude-simplify-codecov-20260427-1521.md) Tested: Claude security review GO (.omx/artifacts/claude-security-codecov-20260427-1522.md) Not-tested: GitHub-hosted Codecov upload after this amended commit until PR checks rerun * test: keep review assertions tied to real failure paths CodeRabbit flagged three non-blocking but valid review gaps: platform-specific mailbox errno checks, brittle UDS connection-failure message assertions, and missing AgentSummary reschedule proof after fork errors. This keeps the fixes narrow by tightening the affected assertions and adding a structured UDS connection error for tests to assert behavior instead of prose. Constraint: PR #374 is a review follow-up and must not hide warnings, skip tests, or merge the PR. Rejected: Matching the UDS failure message literal | preserves the brittle coupling CodeRabbit flagged. Rejected: Asserting only that mailbox writes throw | would allow unrelated pre-path failures to pass. Confidence: high Scope-risk: narrow Directive: Keep UDS connection-failure tests on structured error data, not display wording. Tested: bun test src/services/AgentSummary/__tests__/agentSummary.test.ts src/utils/__tests__/teammateMailbox.test.ts src/utils/__tests__/udsMessaging.test.ts Tested: bunx tsc --noEmit --pretty false Tested: bun run lint Tested: bun run test:all Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage Tested: bun run build Tested: bun run build:vite Not-tested: GitHub-hosted CodeRabbit refresh until pushed. * test: remove brittle review follow-up assumptions CodeRabbit's second pass found two valid brittleness issues and one suggested callback-reference assertion that would not match production behavior. This keeps the production behavior unchanged: timers still schedule the summarizer closure, tests now assert timer-handle identity, and UDS connection errors use native Error.cause instead of shadowing it. Constraint: Do not manufacture behavior just to satisfy a review hint; assertions must match the real AgentSummary scheduling contract. Rejected: Assert a fresh scheduled callback function | scheduleNext intentionally passes the same runSummary closure each time. Rejected: Store a custom cause field on UdsPeerConnectionError | native Error.cause is available under ESNext/Bun. Confidence: high Scope-risk: narrow Directive: Timer tests should assert returned handle identity for ownership, not incidental numeric values. Tested: bun test src/services/AgentSummary/__tests__/agentSummary.test.ts src/utils/__tests__/udsMessaging.test.ts Tested: bunx tsc --noEmit --pretty false Tested: bun run lint Tested: bun run test:all Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage Tested: bun run build Tested: bun run build:vite Not-tested: GitHub-hosted CodeRabbit refresh until pushed. * test: enforce structured UDS timeout failures CodeRabbit's follow-up surfaced a real consistency gap: UDS send socket errors used UdsPeerConnectionError while response timeouts still rejected a generic Error. Timeouts now use the same structured peer failure contract, and the test exercises that path through a short explicit timeout instead of waiting for the production default. The AgentSummary unchanged-fingerprint test now also asserts that the second unchanged tick does not log errors, preserving the existing behavior checks without changing production scheduling semantics. Constraint: Keep the production timeout default at 5000ms while allowing tests to exercise the timeout path quickly. Rejected: Leave timeout failures as generic Error | callers would need separate handling for the same peer connection failure class. Confidence: high Scope-risk: narrow Directive: Keep UDS send timeout and socket-error branches on the same structured error contract. Tested: bun test src/services/AgentSummary/__tests__/agentSummary.test.ts src/utils/__tests__/udsMessaging.test.ts Tested: bunx tsc --noEmit --pretty false Tested: bun run lint Tested: bun run test:all Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage Tested: bun run build Tested: bun run build:vite Not-tested: GitHub-hosted CodeRabbit refresh until pushed. --------- Co-authored-by: unraid <local@unraid.local>
229 lines
6.7 KiB
TypeScript
229 lines
6.7 KiB
TypeScript
import { beforeEach, describe, expect, test } from 'bun:test'
|
|
import { asAgentId } from '../../../types/ids.js'
|
|
import type { Message } from '../../../types/message.js'
|
|
import type {
|
|
CacheSafeParams,
|
|
ForkedAgentResult,
|
|
} from '../../../utils/forkedAgent.js'
|
|
import {
|
|
type AgentSummaryDependencies,
|
|
startAgentSummarization,
|
|
} from '../agentSummary.js'
|
|
|
|
const transcriptMessages = [
|
|
{ type: 'user', message: { content: 'start' }, uuid: 'u1' },
|
|
{
|
|
type: 'assistant',
|
|
message: { content: [{ type: 'text', text: 'working' }] },
|
|
uuid: 'a1',
|
|
},
|
|
{ type: 'user', message: { content: 'continue' }, uuid: 'u2' },
|
|
] as unknown as Message[]
|
|
|
|
type ForkCall = {
|
|
cacheSafeParams: CacheSafeParams
|
|
}
|
|
|
|
describe('startAgentSummarization', () => {
|
|
let scheduled: (() => void | Promise<void>) | undefined
|
|
let handle: { stop: () => void } | undefined
|
|
let forkCalls: ForkCall[]
|
|
let updateCalls: Array<{ taskId: string; summary: string }>
|
|
let transcriptMessagesForTest: Message[]
|
|
let debugLogs: string[]
|
|
let loggedErrors: Error[]
|
|
let clearedHandles: unknown[]
|
|
let scheduledCount: number
|
|
let lastTimerHandle: unknown
|
|
|
|
function startTestSummarization(
|
|
dependencies: AgentSummaryDependencies = {},
|
|
): { stop: () => void } {
|
|
return startAgentSummarization(
|
|
'task-1',
|
|
asAgentId('a0000000000000000'),
|
|
{
|
|
forkContextMessages: [
|
|
{ type: 'user', message: { content: 'stale' }, uuid: 'old' },
|
|
],
|
|
model: 'claude-test',
|
|
} as unknown as CacheSafeParams,
|
|
() => undefined,
|
|
{
|
|
clearTimeout: ((timeoutId: unknown) => {
|
|
clearedHandles.push(timeoutId)
|
|
}) as typeof clearTimeout,
|
|
getAgentTranscript: async () => ({
|
|
messages: transcriptMessagesForTest,
|
|
contentReplacements: [],
|
|
}),
|
|
isPoorModeActive: () => false,
|
|
logError: error => {
|
|
loggedErrors.push(
|
|
error instanceof Error ? error : new Error(String(error)),
|
|
)
|
|
},
|
|
logForDebugging: message => {
|
|
debugLogs.push(message)
|
|
},
|
|
runForkedAgent: async (args: ForkCall) => {
|
|
forkCalls.push(args)
|
|
return {
|
|
messages: [
|
|
{
|
|
type: 'assistant',
|
|
message: {
|
|
content: [{ type: 'text', text: 'Reading udsClient.ts' }],
|
|
},
|
|
},
|
|
],
|
|
} as unknown as ForkedAgentResult
|
|
},
|
|
setTimeout: ((callback: TimerHandler) => {
|
|
if (typeof callback !== 'function') {
|
|
throw new Error('Expected timer callback')
|
|
}
|
|
scheduledCount += 1
|
|
scheduled = callback as () => void | Promise<void>
|
|
lastTimerHandle = { id: scheduledCount }
|
|
return lastTimerHandle as ReturnType<typeof setTimeout>
|
|
}) as unknown as typeof setTimeout,
|
|
updateAgentSummary: (taskId: string, summary: string) => {
|
|
updateCalls.push({ taskId, summary })
|
|
},
|
|
...dependencies,
|
|
},
|
|
)
|
|
}
|
|
|
|
beforeEach(() => {
|
|
forkCalls = []
|
|
updateCalls = []
|
|
scheduled = undefined
|
|
handle = undefined
|
|
transcriptMessagesForTest = transcriptMessages
|
|
debugLogs = []
|
|
loggedErrors = []
|
|
clearedHandles = []
|
|
scheduledCount = 0
|
|
lastTimerHandle = undefined
|
|
})
|
|
|
|
test('summarizes bounded transcript once and skips unchanged fingerprints', async () => {
|
|
handle = startTestSummarization()
|
|
|
|
expect(typeof scheduled).toBe('function')
|
|
await scheduled!()
|
|
|
|
expect(forkCalls).toHaveLength(1)
|
|
expect(updateCalls).toEqual([
|
|
{ taskId: 'task-1', summary: 'Reading udsClient.ts' },
|
|
])
|
|
|
|
const forkContext = forkCalls[0].cacheSafeParams.forkContextMessages ?? []
|
|
expect(forkContext.map(message => String(message.uuid))).toEqual([
|
|
'u1',
|
|
'a1',
|
|
'u2',
|
|
])
|
|
expect(forkContext.some(message => String(message.uuid) === 'old')).toBe(
|
|
false,
|
|
)
|
|
|
|
await scheduled!()
|
|
|
|
expect(forkCalls).toHaveLength(1)
|
|
expect(updateCalls).toHaveLength(1)
|
|
expect(loggedErrors).toEqual([])
|
|
})
|
|
|
|
test('skips summarization when filtering leaves too little bounded context', async () => {
|
|
transcriptMessagesForTest = [
|
|
{ type: 'user', message: { content: 'start' }, uuid: 'u1' },
|
|
{
|
|
type: 'assistant',
|
|
uuid: 'a1',
|
|
message: {
|
|
content: [{ type: 'tool_use', id: 'missing', name: 'Read' }],
|
|
},
|
|
},
|
|
{ type: 'user', message: { content: 'continue' }, uuid: 'u2' },
|
|
] as unknown as Message[]
|
|
|
|
handle = startTestSummarization()
|
|
|
|
expect(typeof scheduled).toBe('function')
|
|
await scheduled!()
|
|
|
|
expect(forkCalls).toEqual([])
|
|
expect(updateCalls).toEqual([])
|
|
expect(debugLogs).toContain(
|
|
'[AgentSummary] Skipping summary for task-1: no bounded context available',
|
|
)
|
|
})
|
|
|
|
test('skips summarization before building context when transcript is too short', async () => {
|
|
transcriptMessagesForTest = transcriptMessages.slice(0, 2)
|
|
handle = startTestSummarization()
|
|
|
|
expect(typeof scheduled).toBe('function')
|
|
await scheduled!()
|
|
|
|
expect(forkCalls).toEqual([])
|
|
expect(updateCalls).toEqual([])
|
|
expect(debugLogs).toContain(
|
|
'[AgentSummary] Skipping summary for task-1: not enough messages (2)',
|
|
)
|
|
})
|
|
|
|
test('skips and reschedules while poor mode is active', async () => {
|
|
handle = startTestSummarization({
|
|
isPoorModeActive: () => true,
|
|
})
|
|
|
|
expect(typeof scheduled).toBe('function')
|
|
const initialScheduledCount = scheduledCount
|
|
const initialTimerHandle = lastTimerHandle
|
|
await scheduled!()
|
|
|
|
expect(forkCalls).toEqual([])
|
|
expect(updateCalls).toEqual([])
|
|
expect(debugLogs).toContain(
|
|
'[AgentSummary] Skipping summary — poor mode active',
|
|
)
|
|
expect(scheduledCount).toBe(initialScheduledCount + 1)
|
|
expect(lastTimerHandle).not.toBe(initialTimerHandle)
|
|
})
|
|
|
|
test('logs summary errors and schedules the next timer', async () => {
|
|
const error = new Error('fork failed')
|
|
handle = startTestSummarization({
|
|
runForkedAgent: async () => {
|
|
throw error
|
|
},
|
|
})
|
|
|
|
expect(typeof scheduled).toBe('function')
|
|
const initialScheduledCount = scheduledCount
|
|
const initialTimerHandle = lastTimerHandle
|
|
await scheduled!()
|
|
|
|
expect(loggedErrors).toEqual([error])
|
|
expect(updateCalls).toEqual([])
|
|
expect(scheduledCount).toBe(initialScheduledCount + 1)
|
|
expect(lastTimerHandle).not.toBe(initialTimerHandle)
|
|
})
|
|
|
|
test('stop clears the pending summary timer', () => {
|
|
handle = startTestSummarization()
|
|
const pendingHandle = lastTimerHandle
|
|
|
|
handle.stop()
|
|
|
|
expect(debugLogs).toContain(
|
|
'[AgentSummary] Stopping summarization for task-1',
|
|
)
|
|
expect(clearedHandles).toEqual([pendingHandle])
|
|
})
|
|
})
|