mirror of
https://github.com/claude-code-best/claude-code.git
synced 2026-06-17 13:55:50 +00:00
feat: harden autonomy lifecycle, OOM bounds, and provider-boundary finalization
This PR consolidates a coordinated batch of fixes around autonomy run/flow lifecycle, scheduled task deduplication, provider-boundary state finalization, and matching memory-bound treatments for adjacent long-running subsystems (REPL fullscreen scrollback, skill-search/skill-learning runtime activation). All changes were developed and reviewed together because they touched the same lifecycle invariants and were uncovered by the same long-running session reproductions.
## Lifecycle correctness
- Queued autonomy prompts are not injected unless the persisted run was successfully claimed; queued run claiming is now terminal-safe so a once-consumed/cancelled/failed run can not slip back into `queued`.
- Autonomy run/flow finalization happens on completion, provider error, generator close, and cancellation — not just the happy path. New `src/__tests__/queryAutonomyProviderBoundary.test.ts` covers these provider-boundary transitions.
- `requestManagedAutonomyFlowCancel` and `resumeManagedAutonomyFlowPrompt` carry `rootDir` and `currentDir` explicitly across detached async boundaries (proactive-tick, cron, daemon restart) instead of inferring from process state.
- Active runs/flows are protected from janitor pruning so a running step can not be garbage-collected mid-flight (`src/utils/autonomyAuthority.ts`).
- Heartbeat parser now ignores fenced code blocks; the two-phase commit window for autonomy state transitions is documented in `docs/internals/autonomy-jira.md`.
## Ownership and dedup
- `src/utils/autonomyRuns.ts`: ownership stamping (run id + rootDir carried end-to-end), source-based dedup against active runs.
- `src/hooks/useScheduledTasks.ts`: scheduled ticks deduplicate against runs already active on the same source label.
- `src/utils/processUserInput/processSlashCommand.tsx`: forked slash commands now thread the autonomy `runId` so completion finalizers can find the originating run for deferred completion.
- New `src/utils/autonomyQueueLifecycle.ts` and tests collect the queue-side lifecycle invariants in one place.
## Memory bounds (related, same review pass)
- `src/screens/REPL.tsx`: caps fullscreen scrollback after the compact boundary and updates trailing progress rows in place. Long-running fullscreen sessions could otherwise retain thousands of post-compaction messages and duplicate progress rows, keeping Ink trees alive long after their useful context had moved on.
- `src/services/skillSearch/*` and `src/services/skillLearning/*`: runtime activation is strictly opt-in via existing env toggles; session caches are capped so long-running processes can not grow them forever. Build presence is preserved so operators can still discover and opt into the slash commands.
## CI / test contract
- `tests/integration/dependency-overrides.test.ts`: smoke test no longer drives Mermaid's browser renderer; it validates the package-resolution contract directly so CI does not regress on unrelated browser timing.
- New `tests/integration/autonomy-lifecycle-user-flow.test.ts`: end-to-end CLI subprocess flow exercising `status --deep`, `flows`, `flow <id>`, `flow resume`, `flow cancel` against persisted state.
- `src/entrypoints/cli.tsx`: `claude autonomy …` routes through an entrypoint fast path that reuses the slash-command formatter without booting the full interactive CLI. Stdout is flushed before forced exit so coverage subprocesses do not terminate with empty stdout.
- `packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/RemoteTriggerTool.test.ts`: stabilized to prevent audit flake under coverage.
## Tests added
- `src/__tests__/queryAutonomyProviderBoundary.test.ts`
- `src/hooks/__tests__/useScheduledTasks.test.ts`
- `src/utils/__tests__/autonomyAuthority.test.ts`
- `src/utils/__tests__/autonomyFlows.test.ts` (extended)
- `src/utils/__tests__/autonomyPersistence.test.ts` (extended)
- `src/utils/__tests__/autonomyQueueLifecycle.test.ts`
- `src/utils/__tests__/autonomyRuns.test.ts` (extended)
- `src/utils/processUserInput/__tests__/processSlashCommand.test.ts`
- `tests/integration/autonomy-lifecycle-user-flow.test.ts`
## Docs
- `docs/agent/sur-loop-scheduled-oom.md`: System Understanding Report covering the scheduled/loop OOM problem, the call graphs investigated, and the lifecycle invariants this PR establishes.
- `docs/agent/sur-skill-overflow-bugs.md`: SUR for the related skill-overflow context.
- `docs/internals/autonomy-jira.md`: documents the two-phase commit window and ownership stamping invariants.
- `docs/memory-leak-audit.md`: audit notes covering the REPL/scrollback and skill-search bounds.
## Invariants this PR establishes
1. Queued autonomy prompts are not injected unless the persisted run was successfully claimed.
2. Terminal run/flow states are terminal — completion, failure, and cancellation all finalize state regardless of which provider/error path triggered them.
3. Autonomy run/flow `rootDir` is carried explicitly across detached async boundaries instead of inferred from a shared singleton.
4. State-only CLI subcommands (`autonomy status|runs|flows|flow …`) bypass full interactive bootstrap so they do not hold unrelated handles open.
5. REPL fullscreen scrollback and skill-search/skill-learning session caches are explicitly bounded.
## Validation
```bash
bun run typecheck
CI=true GITHUB_ACTIONS=true bun test # 3996 pass / 0 fail across 305 files
bun test src/__tests__/queryAutonomyProviderBoundary.test.ts \
src/hooks/__tests__/useScheduledTasks.test.ts \
src/utils/__tests__/autonomy{Runs,Flows,Authority,QueueLifecycle,Persistence}.test.ts \
src/utils/processUserInput/__tests__/processSlashCommand.test.ts \
tests/integration/autonomy-lifecycle-user-flow.test.ts
```
## Origin
This PR is the consolidated, upstream-targeted version of two fork-side review PRs (fix/loop-scheduled-autonomy-oom and fix/autonomy-lifecycle). The fork-side review history is preserved at https://github.com/amDosion/claude-code-bast/pull/7 . The fork's own internal `chore: keep fork current with upstream` sync commits and the `docs: update contributors` automation are intentionally not included in this PR.
The autonomy CLI handler `rootDir` threading that the fork added (78f64d8a, 98d04ddb) is intentionally omitted here because upstream `a2cfaf91` (fix: 修复 RemoteTriggerTool 和 autonomy 测试的全量运行失败) already performed the equivalent change with an additional `currentDir` option. Keeping the upstream version avoids regressing that improvement.
This commit is contained in:
375
src/utils/processUserInput/__tests__/processSlashCommand.test.ts
Normal file
375
src/utils/processUserInput/__tests__/processSlashCommand.test.ts
Normal file
@@ -0,0 +1,375 @@
|
||||
import { afterEach, beforeEach, describe, expect, mock, test } from 'bun:test'
|
||||
import type { QueuedCommand } from '../../../types/textInputTypes'
|
||||
import {
|
||||
resetStateForTests,
|
||||
setCwdState,
|
||||
setOriginalCwd,
|
||||
setProjectRoot,
|
||||
} from '../../../bootstrap/state'
|
||||
import {
|
||||
createAutonomyQueuedPrompt,
|
||||
getAutonomyRunById,
|
||||
listAutonomyRuns,
|
||||
markAutonomyRunRunning,
|
||||
} from '../../autonomyRuns'
|
||||
import { resetAutonomyAuthorityForTests } from '../../autonomyAuthority'
|
||||
import { createScheduledTaskQueuedCommand } from '../../../hooks/useScheduledTasks'
|
||||
import {
|
||||
cleanupTempDir,
|
||||
createTempDir,
|
||||
} from '../../../../tests/mocks/file-system'
|
||||
|
||||
let runAgentBlocker: Promise<void> | null = null
|
||||
let releaseRunAgentBlocker: (() => void) | null = null
|
||||
let runAgentStartCount = 0
|
||||
let originalNodeEnv: string | undefined
|
||||
let originalAnthropicApiKey: string | undefined
|
||||
const commandQueue: QueuedCommand[] = []
|
||||
|
||||
function enqueue(command: QueuedCommand): void {
|
||||
commandQueue.push({ ...command, priority: command.priority ?? 'next' })
|
||||
}
|
||||
|
||||
function enqueuePendingNotification(command: QueuedCommand): void {
|
||||
commandQueue.push({ ...command, priority: command.priority ?? 'later' })
|
||||
}
|
||||
|
||||
function getCommandQueue(): QueuedCommand[] {
|
||||
return [...commandQueue]
|
||||
}
|
||||
|
||||
function hasCommandsInQueue(): boolean {
|
||||
return commandQueue.length > 0
|
||||
}
|
||||
|
||||
function resetCommandQueue(): void {
|
||||
commandQueue.length = 0
|
||||
}
|
||||
|
||||
function createMessageQueueManagerMock() {
|
||||
return {
|
||||
enqueue,
|
||||
enqueuePendingNotification,
|
||||
getCommandQueue,
|
||||
hasCommandsInQueue,
|
||||
resetCommandQueue,
|
||||
}
|
||||
}
|
||||
|
||||
function holdRunAgent(): void {
|
||||
runAgentBlocker = new Promise(resolve => {
|
||||
releaseRunAgentBlocker = resolve
|
||||
})
|
||||
}
|
||||
|
||||
function releaseRunAgent(): void {
|
||||
releaseRunAgentBlocker?.()
|
||||
runAgentBlocker = null
|
||||
releaseRunAgentBlocker = null
|
||||
}
|
||||
|
||||
mock.module('bun:bundle', () => ({
|
||||
feature: (name: string) => name === 'KAIROS',
|
||||
}))
|
||||
|
||||
mock.module(
|
||||
'@claude-code-best/builtin-tools/tools/AgentTool/runAgent.js',
|
||||
() => ({
|
||||
runAgent: async function* () {
|
||||
runAgentStartCount += 1
|
||||
if (runAgentBlocker) {
|
||||
await runAgentBlocker
|
||||
}
|
||||
yield {
|
||||
type: 'assistant',
|
||||
uuid: 'assistant-1',
|
||||
timestamp: new Date().toISOString(),
|
||||
message: {
|
||||
id: 'msg_1',
|
||||
type: 'message',
|
||||
role: 'assistant',
|
||||
model: 'test-model',
|
||||
content: [{ type: 'text', text: 'forked command done' }],
|
||||
stop_reason: 'end_turn',
|
||||
stop_sequence: null,
|
||||
usage: {
|
||||
input_tokens: 0,
|
||||
output_tokens: 0,
|
||||
},
|
||||
},
|
||||
}
|
||||
},
|
||||
}),
|
||||
)
|
||||
|
||||
mock.module('@claude-code-best/builtin-tools/tools/AgentTool/UI.js', () => ({
|
||||
AgentPromptDisplay: () => null,
|
||||
AgentResponseDisplay: () => null,
|
||||
extractLastToolInfo: () => null,
|
||||
renderGroupedAgentToolUse: () => null,
|
||||
renderToolResultMessage: () => null,
|
||||
renderToolUseErrorMessage: () => null,
|
||||
renderToolUseMessage: () => null,
|
||||
renderToolUseProgressMessage: () => null,
|
||||
renderToolUseRejectedMessage: () => null,
|
||||
renderToolUseTag: () => null,
|
||||
userFacingName: () => 'Agent',
|
||||
userFacingNameBackgroundColor: () => 'gray',
|
||||
}))
|
||||
|
||||
mock.module('../../messageQueueManager', createMessageQueueManagerMock)
|
||||
mock.module('../../messageQueueManager.js', createMessageQueueManagerMock)
|
||||
|
||||
const { processSlashCommand } = await import('../processSlashCommand')
|
||||
|
||||
let tempDir = ''
|
||||
|
||||
function createScheduledTaskQueuedCommandForTest(task: {
|
||||
id: string
|
||||
prompt: string
|
||||
}) {
|
||||
return createScheduledTaskQueuedCommand(task, {
|
||||
rootDir: tempDir,
|
||||
currentDir: tempDir,
|
||||
})
|
||||
}
|
||||
|
||||
async function waitForRunStatus(
|
||||
runId: string,
|
||||
status: 'queued' | 'running' | 'completed' | 'failed' | 'cancelled',
|
||||
): Promise<void> {
|
||||
for (let i = 0; i < 200; i++) {
|
||||
const run = await getAutonomyRunById(runId, tempDir)
|
||||
if (run?.status === status) {
|
||||
return
|
||||
}
|
||||
await new Promise(resolve => setTimeout(resolve, 10))
|
||||
}
|
||||
const run = await getAutonomyRunById(runId, tempDir)
|
||||
throw new Error(`Expected ${runId} to be ${status}, got ${run?.status}`)
|
||||
}
|
||||
|
||||
async function waitForRunAgentStarts(expected: number): Promise<void> {
|
||||
for (let i = 0; i < 200; i++) {
|
||||
if (runAgentStartCount >= expected) {
|
||||
return
|
||||
}
|
||||
await new Promise(resolve => setTimeout(resolve, 10))
|
||||
}
|
||||
throw new Error(
|
||||
`Expected runAgent to start ${expected} time(s), got ${runAgentStartCount}`,
|
||||
)
|
||||
}
|
||||
|
||||
async function waitForCommandQueueLength(expected: number): Promise<void> {
|
||||
for (let i = 0; i < 200; i++) {
|
||||
if (getCommandQueue().length === expected) {
|
||||
return
|
||||
}
|
||||
await new Promise(resolve => setTimeout(resolve, 10))
|
||||
}
|
||||
throw new Error(
|
||||
`Expected command queue length ${expected}, got ${getCommandQueue().length}`,
|
||||
)
|
||||
}
|
||||
|
||||
beforeEach(async () => {
|
||||
tempDir = await createTempDir('process-slash-command-')
|
||||
originalNodeEnv = process.env.NODE_ENV
|
||||
originalAnthropicApiKey = process.env.ANTHROPIC_API_KEY
|
||||
process.env.NODE_ENV = 'test'
|
||||
process.env.ANTHROPIC_API_KEY = 'test-key'
|
||||
runAgentBlocker = null
|
||||
releaseRunAgentBlocker = null
|
||||
runAgentStartCount = 0
|
||||
resetStateForTests()
|
||||
resetAutonomyAuthorityForTests()
|
||||
resetCommandQueue()
|
||||
setOriginalCwd(tempDir)
|
||||
setProjectRoot(tempDir)
|
||||
setCwdState(tempDir)
|
||||
})
|
||||
|
||||
afterEach(async () => {
|
||||
releaseRunAgent()
|
||||
if (originalNodeEnv === undefined) {
|
||||
delete process.env.NODE_ENV
|
||||
} else {
|
||||
process.env.NODE_ENV = originalNodeEnv
|
||||
}
|
||||
if (originalAnthropicApiKey === undefined) {
|
||||
delete process.env.ANTHROPIC_API_KEY
|
||||
} else {
|
||||
process.env.ANTHROPIC_API_KEY = originalAnthropicApiKey
|
||||
}
|
||||
resetStateForTests()
|
||||
resetAutonomyAuthorityForTests()
|
||||
resetCommandQueue()
|
||||
if (tempDir) {
|
||||
await cleanupTempDir(tempDir)
|
||||
}
|
||||
mock.restore()
|
||||
})
|
||||
|
||||
describe('processSlashCommand', () => {
|
||||
const forkedCommand = {
|
||||
type: 'prompt',
|
||||
name: 'forked',
|
||||
description: 'test forked command',
|
||||
progressMessage: 'forking',
|
||||
contentLength: 0,
|
||||
source: 'builtin',
|
||||
context: 'fork',
|
||||
getPromptForCommand: async () => [
|
||||
{ type: 'text', text: 'review from fork' },
|
||||
],
|
||||
} as const
|
||||
|
||||
function createContext() {
|
||||
return {
|
||||
getAppState: () => ({
|
||||
kairosEnabled: true,
|
||||
mcp: { clients: [] },
|
||||
toolPermissionContext: {
|
||||
mode: 'default',
|
||||
alwaysAllowRules: {},
|
||||
},
|
||||
}),
|
||||
options: {
|
||||
commands: [forkedCommand],
|
||||
allowBackgroundForkedSlashCommands: true,
|
||||
tools: [],
|
||||
refreshTools: () => [],
|
||||
agentDefinitions: {
|
||||
activeAgents: [{ agentType: 'general-purpose' }],
|
||||
},
|
||||
},
|
||||
setResponseLength: mock((_updater: (length: number) => number) => {}),
|
||||
} as any
|
||||
}
|
||||
|
||||
test('defers autonomy completion until a KAIROS background forked command completes', async () => {
|
||||
const queued = await createAutonomyQueuedPrompt({
|
||||
basePrompt: '/forked review',
|
||||
trigger: 'scheduled-task',
|
||||
rootDir: tempDir,
|
||||
currentDir: tempDir,
|
||||
sourceId: 'cron-1',
|
||||
})
|
||||
expect(queued).not.toBeNull()
|
||||
const runId = queued!.autonomy!.runId
|
||||
await markAutonomyRunRunning(runId, tempDir, 100)
|
||||
|
||||
const result = await processSlashCommand(
|
||||
'/forked review',
|
||||
[],
|
||||
[],
|
||||
[],
|
||||
createContext(),
|
||||
mock(() => {}),
|
||||
undefined,
|
||||
false,
|
||||
async () => ({ behavior: 'allow', updatedInput: {} }) as any,
|
||||
queued!.autonomy,
|
||||
)
|
||||
|
||||
expect(result).toMatchObject({
|
||||
messages: [],
|
||||
shouldQuery: false,
|
||||
deferAutonomyCompletion: true,
|
||||
})
|
||||
|
||||
await waitForRunStatus(runId, 'completed')
|
||||
await waitForCommandQueueLength(1)
|
||||
expect(getCommandQueue()).toEqual([
|
||||
expect.objectContaining({
|
||||
mode: 'prompt',
|
||||
isMeta: true,
|
||||
skipSlashCommands: true,
|
||||
value: expect.stringContaining(
|
||||
'<scheduled-task-result command="/forked">',
|
||||
),
|
||||
}),
|
||||
])
|
||||
})
|
||||
|
||||
test('keeps repeated /loop scheduled fires bounded while a background fork is running', async () => {
|
||||
const task = {
|
||||
id: 'cron-loop',
|
||||
prompt: '/forked review',
|
||||
}
|
||||
const first = await createScheduledTaskQueuedCommandForTest(task)
|
||||
expect(first?.autonomy?.runId).toBeDefined()
|
||||
const runId = first!.autonomy!.runId
|
||||
await markAutonomyRunRunning(runId, tempDir, 100)
|
||||
|
||||
holdRunAgent()
|
||||
const result = await processSlashCommand(
|
||||
'/forked review',
|
||||
[],
|
||||
[],
|
||||
[],
|
||||
createContext(),
|
||||
mock(() => {}),
|
||||
undefined,
|
||||
false,
|
||||
async () => ({ behavior: 'allow', updatedInput: {} }) as any,
|
||||
first!.autonomy,
|
||||
)
|
||||
|
||||
expect(result.deferAutonomyCompletion).toBe(true)
|
||||
await waitForRunAgentStarts(1)
|
||||
|
||||
const repeatedFires = await Promise.all(
|
||||
Array.from({ length: 200 }, () =>
|
||||
createScheduledTaskQueuedCommandForTest(task),
|
||||
),
|
||||
)
|
||||
expect(repeatedFires.every(command => command === null)).toBe(true)
|
||||
expect(
|
||||
(await listAutonomyRuns(tempDir)).filter(
|
||||
run => run.sourceId === 'cron-loop',
|
||||
),
|
||||
).toHaveLength(1)
|
||||
expect(getCommandQueue()).toHaveLength(0)
|
||||
|
||||
releaseRunAgent()
|
||||
await waitForRunStatus(runId, 'completed')
|
||||
await waitForCommandQueueLength(1)
|
||||
expect(getCommandQueue()).toHaveLength(1)
|
||||
|
||||
const next = await createScheduledTaskQueuedCommandForTest(task)
|
||||
expect(next?.autonomy?.runId).toBeDefined()
|
||||
expect(
|
||||
(await listAutonomyRuns(tempDir)).filter(
|
||||
run => run.sourceId === 'cron-loop',
|
||||
),
|
||||
).toHaveLength(2)
|
||||
})
|
||||
|
||||
test('rejects the background fork test override outside test runtime', async () => {
|
||||
process.env.NODE_ENV = 'production'
|
||||
|
||||
const result = await processSlashCommand(
|
||||
'/forked review',
|
||||
[],
|
||||
[],
|
||||
[],
|
||||
createContext(),
|
||||
mock(() => {}),
|
||||
undefined,
|
||||
false,
|
||||
async () => ({ behavior: 'allow', updatedInput: {} }) as any,
|
||||
)
|
||||
|
||||
expect(result.shouldQuery).toBe(false)
|
||||
expect(
|
||||
result.messages.some(message =>
|
||||
JSON.stringify(message).includes(
|
||||
'allowBackgroundForkedSlashCommands is test-only',
|
||||
),
|
||||
),
|
||||
).toBe(true)
|
||||
expect(runAgentStartCount).toBe(0)
|
||||
})
|
||||
})
|
||||
File diff suppressed because it is too large
Load Diff
@@ -28,6 +28,7 @@ import type {
|
||||
import type { PermissionMode } from '../../types/permissions.js'
|
||||
import {
|
||||
isValidImagePaste,
|
||||
type QueuedCommand,
|
||||
type PromptInputMode,
|
||||
} from '../../types/textInputTypes.js'
|
||||
import {
|
||||
@@ -80,6 +81,9 @@ export type ProcessUserInputBaseResult = {
|
||||
// Used by /discover to chain into the selected feature's command
|
||||
nextInput?: string
|
||||
submitNextInput?: boolean
|
||||
// When true, the command started detached work that will finalize its
|
||||
// autonomy run after the background work completes.
|
||||
deferAutonomyCompletion?: boolean
|
||||
}
|
||||
|
||||
export async function processUserInput({
|
||||
@@ -100,6 +104,7 @@ export async function processUserInput({
|
||||
bridgeOrigin,
|
||||
isMeta,
|
||||
skipAttachments,
|
||||
autonomy,
|
||||
}: {
|
||||
input: string | Array<ContentBlockParam>
|
||||
/**
|
||||
@@ -137,6 +142,7 @@ export async function processUserInput({
|
||||
*/
|
||||
isMeta?: boolean
|
||||
skipAttachments?: boolean
|
||||
autonomy?: QueuedCommand['autonomy']
|
||||
}): Promise<ProcessUserInputBaseResult> {
|
||||
const inputString = typeof input === 'string' ? input : null
|
||||
// Immediately show the user input prompt while we are still processing the input.
|
||||
@@ -168,6 +174,7 @@ export async function processUserInput({
|
||||
isMeta,
|
||||
skipAttachments,
|
||||
preExpansionInput,
|
||||
autonomy,
|
||||
)
|
||||
queryCheckpoint('query_process_user_input_base_end')
|
||||
|
||||
@@ -296,6 +303,7 @@ async function processUserInputBase(
|
||||
isMeta?: boolean,
|
||||
skipAttachments?: boolean,
|
||||
preExpansionInput?: string,
|
||||
autonomy?: QueuedCommand['autonomy'],
|
||||
): Promise<ProcessUserInputBaseResult> {
|
||||
let inputString: string | null = null
|
||||
let precedingInputBlocks: ContentBlockParam[] = []
|
||||
@@ -491,6 +499,7 @@ async function processUserInputBase(
|
||||
uuid,
|
||||
isAlreadyProcessing,
|
||||
canUseTool,
|
||||
autonomy,
|
||||
)
|
||||
return addImageMetadataMessage(slashResult, imageMetadataTexts)
|
||||
}
|
||||
@@ -549,6 +558,7 @@ async function processUserInputBase(
|
||||
uuid,
|
||||
isAlreadyProcessing,
|
||||
canUseTool,
|
||||
autonomy,
|
||||
)
|
||||
return addImageMetadataMessage(slashResult, imageMetadataTexts)
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user