feat: harden autonomy lifecycle, OOM bounds, and provider-boundary finalization

This PR consolidates a coordinated batch of fixes around autonomy run/flow lifecycle, scheduled task deduplication, provider-boundary state finalization, and matching memory-bound treatments for adjacent long-running subsystems (REPL fullscreen scrollback, skill-search/skill-learning runtime activation). All changes were developed and reviewed together because they touched the same lifecycle invariants and were uncovered by the same long-running session reproductions. ## Lifecycle correctness - Queued autonomy prompts are not injected unless the persisted run was successfully claimed; queued run claiming is now terminal-safe so a once-consumed/cancelled/failed run can not slip back into `queued`. - Autonomy run/flow finalization happens on completion, provider error, generator close, and cancellation — not just the happy path. New `src/__tests__/queryAutonomyProviderBoundary.test.ts` covers these provider-boundary transitions. - `requestManagedAutonomyFlowCancel` and `resumeManagedAutonomyFlowPrompt` carry `rootDir` and `currentDir` explicitly across detached async boundaries (proactive-tick, cron, daemon restart) instead of inferring from process state. - Active runs/flows are protected from janitor pruning so a running step can not be garbage-collected mid-flight (`src/utils/autonomyAuthority.ts`). - Heartbeat parser now ignores fenced code blocks; the two-phase commit window for autonomy state transitions is documented in `docs/internals/autonomy-jira.md`. ## Ownership and dedup - `src/utils/autonomyRuns.ts`: ownership stamping (run id + rootDir carried end-to-end), source-based dedup against active runs. - `src/hooks/useScheduledTasks.ts`: scheduled ticks deduplicate against runs already active on the same source label. - `src/utils/processUserInput/processSlashCommand.tsx`: forked slash commands now thread the autonomy `runId` so completion finalizers can find the originating run for deferred completion. - New `src/utils/autonomyQueueLifecycle.ts` and tests collect the queue-side lifecycle invariants in one place. ## Memory bounds (related, same review pass) - `src/screens/REPL.tsx`: caps fullscreen scrollback after the compact boundary and updates trailing progress rows in place. Long-running fullscreen sessions could otherwise retain thousands of post-compaction messages and duplicate progress rows, keeping Ink trees alive long after their useful context had moved on. - `src/services/skillSearch/*` and `src/services/skillLearning/*`: runtime activation is strictly opt-in via existing env toggles; session caches are capped so long-running processes can not grow them forever. Build presence is preserved so operators can still discover and opt into the slash commands. ## CI / test contract - `tests/integration/dependency-overrides.test.ts`: smoke test no longer drives Mermaid's browser renderer; it validates the package-resolution contract directly so CI does not regress on unrelated browser timing. - New `tests/integration/autonomy-lifecycle-user-flow.test.ts`: end-to-end CLI subprocess flow exercising `status --deep`, `flows`, `flow <id>`, `flow resume`, `flow cancel` against persisted state. - `src/entrypoints/cli.tsx`: `claude autonomy …` routes through an entrypoint fast path that reuses the slash-command formatter without booting the full interactive CLI. Stdout is flushed before forced exit so coverage subprocesses do not terminate with empty stdout. - `packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/RemoteTriggerTool.test.ts`: stabilized to prevent audit flake under coverage. ## Tests added - `src/__tests__/queryAutonomyProviderBoundary.test.ts` - `src/hooks/__tests__/useScheduledTasks.test.ts` - `src/utils/__tests__/autonomyAuthority.test.ts` - `src/utils/__tests__/autonomyFlows.test.ts` (extended) - `src/utils/__tests__/autonomyPersistence.test.ts` (extended) - `src/utils/__tests__/autonomyQueueLifecycle.test.ts` - `src/utils/__tests__/autonomyRuns.test.ts` (extended) - `src/utils/processUserInput/__tests__/processSlashCommand.test.ts` - `tests/integration/autonomy-lifecycle-user-flow.test.ts` ## Docs - `docs/agent/sur-loop-scheduled-oom.md`: System Understanding Report covering the scheduled/loop OOM problem, the call graphs investigated, and the lifecycle invariants this PR establishes. - `docs/agent/sur-skill-overflow-bugs.md`: SUR for the related skill-overflow context. - `docs/internals/autonomy-jira.md`: documents the two-phase commit window and ownership stamping invariants. - `docs/memory-leak-audit.md`: audit notes covering the REPL/scrollback and skill-search bounds. ## Invariants this PR establishes 1. Queued autonomy prompts are not injected unless the persisted run was successfully claimed. 2. Terminal run/flow states are terminal — completion, failure, and cancellation all finalize state regardless of which provider/error path triggered them. 3. Autonomy run/flow `rootDir` is carried explicitly across detached async boundaries instead of inferred from a shared singleton. 4. State-only CLI subcommands (`autonomy status|runs|flows|flow …`) bypass full interactive bootstrap so they do not hold unrelated handles open. 5. REPL fullscreen scrollback and skill-search/skill-learning session caches are explicitly bounded. ## Validation ```bash bun run typecheck CI=true GITHUB_ACTIONS=true bun test # 3996 pass / 0 fail across 305 files bun test src/__tests__/queryAutonomyProviderBoundary.test.ts \ src/hooks/__tests__/useScheduledTasks.test.ts \ src/utils/__tests__/autonomy{Runs,Flows,Authority,QueueLifecycle,Persistence}.test.ts \ src/utils/processUserInput/__tests__/processSlashCommand.test.ts \ tests/integration/autonomy-lifecycle-user-flow.test.ts ``` ## Origin This PR is the consolidated, upstream-targeted version of two fork-side review PRs (fix/loop-scheduled-autonomy-oom and fix/autonomy-lifecycle). The fork-side review history is preserved at https://github.com/amDosion/claude-code-bast/pull/7 . The fork's own internal `chore: keep fork current with upstream` sync commits and the `docs: update contributors` automation are intentionally not included in this PR. The autonomy CLI handler `rootDir` threading that the fork added (78f64d8a, 98d04ddb) is intentionally omitted here because upstream `a2cfaf91` (fix: 修复 RemoteTriggerTool 和 autonomy 测试的全量运行失败) already performed the equivalent change with an additional `currentDir` option. Keeping the upstream version avoids regressing that improvement.
2026-06-18 14:25:51 +00:00 · 2026-04-29 14:04:27 +08:00
parent 4f1649e249
commit f2e9af4927
51 changed files with 4885 additions and 971 deletions
--- a/src/utils/tests/autonomyRuns.test.ts
+++ b/src/utils/tests/autonomyRuns.test.ts
@@ -1,6 +1,7 @@
 import { afterEach, beforeEach, describe, expect, test } from 'bun:test'
+import { existsSync, readFileSync } from 'fs'
 import { mkdir, writeFile } from 'fs/promises'
-import { join } from 'path'
+import { join, resolve as resolvePath } from 'path'
 import {
  resetStateForTests,
  setCwdState,
@@ -8,17 +9,23 @@ import {
  setProjectRoot,
 } from '../../bootstrap/state'
 import {
+  createAutonomyRun,
  formatAutonomyRunsList,
  formatAutonomyRunsStatus,
  listAutonomyRuns,
  createAutonomyQueuedPrompt,
+  createAutonomyQueuedPromptIfNoActiveSource,
  createProactiveAutonomyCommands,
  finalizeAutonomyRunCompleted,
+  getAutonomyRunById,
+  hasActiveAutonomyRunForSource,
  markAutonomyRunCompleted,
+  markAutonomyRunCancelled,
  markAutonomyRunFailed,
  markAutonomyRunRunning,
  recoverManagedAutonomyFlowPrompt,
  resolveAutonomyRunsPath,
+  STALE_ACTIVE_RUN_ERROR_PREFIX,
  startManagedAutonomyFlowFromHeartbeatTask,
 } from '../autonomyRuns'
 import {
@@ -95,7 +102,9 @@ describe('autonomyRuns', () => {
      ownerKey: 'main-thread',
      sourceId: 'cron-1',
      sourceLabel: 'nightly-report',
+      ownerProcessId: process.pid,
    })
+    expect(runs[0]?.ownerSessionId).toBeString()
    expect(flows).toHaveLength(0)
    expect(resolveAutonomyRunsPath(tempDir)).toContain('.claude')
  })
@@ -118,7 +127,7 @@ describe('autonomyRuns', () => {
    expect(command!.value).toContain('nested authority')
  })

-  test('markAutonomyRunRunning/completed/failed update persisted lifecycle state for plain runs', async () => {
+  test('markAutonomyRunRunning/completed update persisted lifecycle state for plain runs', async () => {
    const command = await createAutonomyQueuedPrompt({
      basePrompt: '<tick>12:00:00</tick>',
      trigger: 'proactive-tick',
@@ -134,7 +143,9 @@ describe('autonomyRuns', () => {
      runId,
      status: 'running',
      startedAt: 100,
+      ownerProcessId: process.pid,
    })
+    expect(runs[0]?.ownerSessionId).toBeString()

    await markAutonomyRunCompleted(runId, tempDir, 200)
    runs = await listAutonomyRuns(tempDir)
@@ -143,9 +154,22 @@ describe('autonomyRuns', () => {
      status: 'completed',
      endedAt: 200,
    })
+  })

+  test('markAutonomyRunFailed updates a non-terminal run', async () => {
+    const command = await createAutonomyQueuedPrompt({
+      basePrompt: '<tick>12:00:00</tick>',
+      trigger: 'proactive-tick',
+      rootDir: tempDir,
+      currentDir: tempDir,
+    })
+    expect(command).not.toBeNull()
+    const runId = command!.autonomy!.runId
+
+    await markAutonomyRunRunning(runId, tempDir, 100)
    await markAutonomyRunFailed(runId, 'boom', tempDir, 300)
-    runs = await listAutonomyRuns(tempDir)
+    const runs = await listAutonomyRuns(tempDir)
+
    expect(runs[0]).toMatchObject({
      runId,
      status: 'failed',
@@ -154,6 +178,348 @@ describe('autonomyRuns', () => {
    })
  })

+  test('terminal runs are not revived by stale lifecycle updates', async () => {
+    const command = await createAutonomyQueuedPrompt({
+      basePrompt: 'scheduled prompt',
+      trigger: 'scheduled-task',
+      rootDir: tempDir,
+      currentDir: tempDir,
+    })
+    expect(command).not.toBeNull()
+    const runId = command!.autonomy!.runId
+
+    await markAutonomyRunCancelled(runId, tempDir, 100)
+    const revived = await markAutonomyRunRunning(runId, tempDir, 200)
+    const completed = await markAutonomyRunCompleted(runId, tempDir, 300)
+    const failed = await markAutonomyRunFailed(
+      runId,
+      'late failure',
+      tempDir,
+      400,
+    )
+    const persisted = await getAutonomyRunById(runId, tempDir)
+
+    expect(revived).toBeNull()
+    expect(completed).toBeNull()
+    expect(failed).toBeNull()
+    expect(persisted).toMatchObject({
+      status: 'cancelled',
+      endedAt: 100,
+    })
+    expect(persisted!.error).toBeUndefined()
+  })
+
+  test('hasActiveAutonomyRunForSource only treats queued and running scheduled runs as active', async () => {
+    const command = await createAutonomyQueuedPrompt({
+      basePrompt: 'scheduled prompt',
+      trigger: 'scheduled-task',
+      rootDir: tempDir,
+      currentDir: tempDir,
+      sourceId: 'cron-1',
+      sourceLabel: 'nightly',
+    })
+    expect(command).not.toBeNull()
+    const runId = command!.autonomy!.runId
+
+    await expect(
+      hasActiveAutonomyRunForSource({
+        trigger: 'scheduled-task',
+        sourceId: 'cron-1',
+        rootDir: tempDir,
+      }),
+    ).resolves.toBe(true)
+
+    await markAutonomyRunRunning(runId, tempDir, 100)
+    await expect(
+      hasActiveAutonomyRunForSource({
+        trigger: 'scheduled-task',
+        sourceId: 'cron-1',
+        rootDir: tempDir,
+      }),
+    ).resolves.toBe(true)
+
+    await expect(
+      hasActiveAutonomyRunForSource({
+        trigger: 'scheduled-task',
+        sourceId: 'cron-2',
+        rootDir: tempDir,
+      }),
+    ).resolves.toBe(false)
+
+    await markAutonomyRunCompleted(runId, tempDir, 200)
+    await expect(
+      hasActiveAutonomyRunForSource({
+        trigger: 'scheduled-task',
+        sourceId: 'cron-1',
+        rootDir: tempDir,
+      }),
+    ).resolves.toBe(false)
+
+    const failedCommand = await createAutonomyQueuedPrompt({
+      basePrompt: 'scheduled prompt',
+      trigger: 'scheduled-task',
+      rootDir: tempDir,
+      currentDir: tempDir,
+      sourceId: 'cron-1',
+    })
+    expect(failedCommand).not.toBeNull()
+    await markAutonomyRunFailed(
+      failedCommand!.autonomy!.runId,
+      'boom',
+      tempDir,
+      300,
+    )
+    await expect(
+      hasActiveAutonomyRunForSource({
+        trigger: 'scheduled-task',
+        sourceId: 'cron-1',
+        rootDir: tempDir,
+      }),
+    ).resolves.toBe(false)
+  })
+
+  test('createAutonomyQueuedPromptIfNoActiveSource atomically skips duplicate active scheduled sources', async () => {
+    const [first, second] = await Promise.all([
+      createAutonomyQueuedPromptIfNoActiveSource({
+        basePrompt: 'scheduled prompt',
+        trigger: 'scheduled-task',
+        rootDir: tempDir,
+        currentDir: tempDir,
+        sourceId: 'cron-1',
+      }),
+      createAutonomyQueuedPromptIfNoActiveSource({
+        basePrompt: 'scheduled prompt',
+        trigger: 'scheduled-task',
+        rootDir: tempDir,
+        currentDir: tempDir,
+        sourceId: 'cron-1',
+      }),
+    ])
+
+    const created = [first, second].filter(command => command !== null)
+    const runs = await listAutonomyRuns(tempDir)
+
+    expect(created).toHaveLength(1)
+    expect(runs).toHaveLength(1)
+    expect(runs[0]).toMatchObject({
+      trigger: 'scheduled-task',
+      status: 'queued',
+      sourceId: 'cron-1',
+    })
+  })
+
+  test('createAutonomyQueuedPromptIfNoActiveSource scopes dedup by ownerKey', async () => {
+    const first = await createAutonomyQueuedPromptIfNoActiveSource({
+      basePrompt: 'scheduled prompt',
+      trigger: 'scheduled-task',
+      rootDir: tempDir,
+      currentDir: tempDir,
+      sourceId: 'cron-1',
+      ownerKey: 'owner-a',
+    })
+    const second = await createAutonomyQueuedPromptIfNoActiveSource({
+      basePrompt: 'scheduled prompt',
+      trigger: 'scheduled-task',
+      rootDir: tempDir,
+      currentDir: tempDir,
+      sourceId: 'cron-1',
+      ownerKey: 'owner-b',
+    })
+
+    const runs = await listAutonomyRuns(tempDir)
+
+    expect(first).not.toBeNull()
+    expect(second).not.toBeNull()
+    expect(runs).toHaveLength(2)
+    expect(new Set(runs.map(run => run.ownerKey))).toEqual(
+      new Set(['owner-a', 'owner-b']),
+    )
+  })
+
+  test('createAutonomyQueuedPromptIfNoActiveSource does not advance heartbeat last-run state on dedup skip (two-phase commit invariant)', async () => {
+    await writeTempFile(
+      tempDir,
+      HEARTBEAT_REL,
+      [
+        'tasks:',
+        '  - name: inbox',
+        '    interval: 30m',
+        '    prompt: "Check inbox"',
+      ].join('\n'),
+    )
+
+    // Seed an active queued run for cron-1 so the next dedup attempt skips.
+    await mkdir(join(tempDir, AUTONOMY_DIR), { recursive: true })
+    await writeFile(
+      resolveAutonomyRunsPath(tempDir),
+      `${JSON.stringify(
+        {
+          runs: [
+            {
+              runId: 'preexisting-active',
+              runtime: 'automatic',
+              trigger: 'scheduled-task',
+              status: 'queued',
+              rootDir: tempDir,
+              currentDir: tempDir,
+              sourceId: 'cron-1',
+              promptPreview: 'still queued',
+              createdAt: 100,
+              ownerProcessId: process.pid,
+              ownerSessionId: 'self',
+            },
+          ],
+        },
+        null,
+        2,
+      )}\n`,
+      'utf-8',
+    )
+
+    const skipped = await createAutonomyQueuedPromptIfNoActiveSource({
+      basePrompt: 'scheduled prompt',
+      trigger: 'scheduled-task',
+      rootDir: tempDir,
+      currentDir: tempDir,
+      sourceId: 'cron-1',
+    })
+    expect(skipped).toBeNull()
+
+    // If the dedup skip wrongly advanced heartbeat state, the next
+    // proactive-tick prompt would NOT include the inbox task. Verify it
+    // still does.
+    const followUp = await createAutonomyQueuedPrompt({
+      basePrompt: '<tick>12:00:00</tick>',
+      trigger: 'proactive-tick',
+      rootDir: tempDir,
+      currentDir: tempDir,
+    })
+    expect(followUp).not.toBeNull()
+    expect(followUp!.value).toContain('Due HEARTBEAT.md tasks:')
+    expect(followUp!.value).toContain('- inbox (30m): Check inbox')
+  })
+
+  test('createAutonomyQueuedPromptIfNoActiveSource recovers stale active runs from dead owner processes', async () => {
+    await mkdir(join(tempDir, AUTONOMY_DIR), { recursive: true })
+    await writeFile(
+      resolveAutonomyRunsPath(tempDir),
+      `${JSON.stringify(
+        {
+          runs: [
+            {
+              runId: 'stale-run',
+              runtime: 'automatic',
+              trigger: 'scheduled-task',
+              status: 'running',
+              rootDir: tempDir,
+              currentDir: tempDir,
+              sourceId: 'cron-1',
+              sourceLabel: 'nightly',
+              promptPreview: 'stale scheduled prompt',
+              createdAt: 100,
+              startedAt: 100,
+              ownerProcessId: 2_147_483_647,
+              ownerSessionId: 'dead-session',
+            },
+          ],
+        },
+        null,
+        2,
+      )}\n`,
+      'utf-8',
+    )
+
+    await expect(
+      hasActiveAutonomyRunForSource({
+        trigger: 'scheduled-task',
+        sourceId: 'cron-1',
+        rootDir: tempDir,
+      }),
+    ).resolves.toBe(false)
+
+    const command = await createAutonomyQueuedPromptIfNoActiveSource({
+      basePrompt: 'scheduled prompt',
+      trigger: 'scheduled-task',
+      rootDir: tempDir,
+      currentDir: tempDir,
+      sourceId: 'cron-1',
+    })
+    const runs = await listAutonomyRuns(tempDir)
+
+    expect(command).not.toBeNull()
+    expect(runs).toHaveLength(2)
+    expect(runs[0]).toMatchObject({
+      trigger: 'scheduled-task',
+      status: 'queued',
+      sourceId: 'cron-1',
+      ownerProcessId: process.pid,
+    })
+    expect(runs[1]).toMatchObject({
+      runId: 'stale-run',
+      status: 'failed',
+      endedAt: runs[0]?.createdAt,
+      error: expect.stringContaining('owner process 2147483647'),
+    })
+  })
+
+  test('stale managed-flow run recovery also marks the flow step failed', async () => {
+    const command = await startManagedAutonomyFlowFromHeartbeatTask({
+      task: {
+        name: 'weekly-report',
+        interval: '7d',
+        prompt: 'Ship the weekly report',
+        steps: [
+          {
+            name: 'gather',
+            prompt: 'Gather weekly inputs',
+          },
+        ],
+      },
+      rootDir: tempDir,
+      currentDir: tempDir,
+    })
+    expect(command).not.toBeNull()
+    const runId = command!.autonomy!.runId
+    await markAutonomyRunRunning(runId, tempDir, 100)
+
+    const runsPath = resolveAutonomyRunsPath(tempDir)
+    const file = JSON.parse(readFileSync(runsPath, 'utf-8')) as {
+      runs: Array<Record<string, unknown>>
+    }
+    file.runs = file.runs.map(run =>
+      run.runId === runId
+        ? { ...run, ownerProcessId: 2_147_483_647 }
+        : run,
+    )
+    await writeFile(runsPath, `${JSON.stringify(file, null, 2)}\n`, 'utf-8')
+
+    const replacement = await createAutonomyQueuedPromptIfNoActiveSource({
+      basePrompt: 'replacement prompt',
+      trigger: 'managed-flow-step',
+      rootDir: tempDir,
+      currentDir: tempDir,
+      sourceId: command!.autonomy!.sourceId!,
+      ownerKey: 'main-thread',
+    })
+    const [flow] = await listAutonomyFlows(tempDir)
+    const runs = await listAutonomyRuns(tempDir)
+
+    expect(replacement).not.toBeNull()
+    expect(runs.find(run => run.runId === runId)).toMatchObject({
+      status: 'failed',
+      error: expect.stringContaining(STALE_ACTIVE_RUN_ERROR_PREFIX),
+    })
+    expect(flow).toMatchObject({
+      status: 'failed',
+      blockedRunId: runId,
+    })
+    expect(flow?.stateJson?.steps[0]).toMatchObject({
+      status: 'failed',
+      runId,
+      error: expect.stringContaining(STALE_ACTIVE_RUN_ERROR_PREFIX),
+    })
+  })
+
  test('formatters produce readable status and run listings', async () => {
    const first = await createAutonomyQueuedPrompt({
      basePrompt: 'scheduled prompt',
@@ -223,6 +589,53 @@ describe('autonomyRuns', () => {
    )
  })

+  test('persistence pruning keeps active runs ahead of recent completed history', async () => {
+    const runs = [
+      {
+        runId: 'old-active',
+        runtime: 'automatic',
+        trigger: 'scheduled-task',
+        status: 'queued',
+        rootDir: tempDir,
+        currentDir: tempDir,
+        ownerKey: 'main-thread',
+        promptPreview: 'old active',
+        createdAt: 1,
+      },
+      ...Array.from({ length: 200 }, (_, index) => ({
+        runId: `history-${index}`,
+        runtime: 'automatic',
+        trigger: 'scheduled-task',
+        status: 'completed',
+        rootDir: tempDir,
+        currentDir: tempDir,
+        ownerKey: 'main-thread',
+        promptPreview: `history ${index}`,
+        createdAt: 1_000 + index,
+        endedAt: 2_000 + index,
+      })),
+    ]
+    await mkdir(join(tempDir, AUTONOMY_DIR), { recursive: true })
+    await writeFile(
+      resolveAutonomyRunsPath(tempDir),
+      `${JSON.stringify({ runs }, null, 2)}\n`,
+      'utf-8',
+    )
+
+    await createAutonomyRun({
+      trigger: 'scheduled-task',
+      prompt: 'fresh active',
+      rootDir: tempDir,
+      currentDir: tempDir,
+      nowMs: 9_999,
+    })
+
+    const persisted = await listAutonomyRuns(tempDir)
+    expect(persisted).toHaveLength(200)
+    expect(persisted.some(run => run.runId === 'old-active')).toBe(true)
+    expect(persisted.some(run => run.runId === 'history-0')).toBe(false)
+  })
+
  test('listAutonomyRuns keeps older persisted records by normalizing missing runtime and owner metadata', async () => {
    const runsPath = resolveAutonomyRunsPath(tempDir)
    await mkdir(join(tempDir, '.claude', 'autonomy'), { recursive: true })
@@ -418,4 +831,27 @@ describe('autonomyRuns', () => {
    expect(recovered!.autonomy?.runId).toBe(command!.autonomy?.runId)
    expect(recovered!.autonomy?.flowId).toBe(flow!.flowId)
  })
+
+  test('STALE_ACTIVE_RUN_ERROR_PREFIX stays in sync with HEARTBEAT.md stale-recovery-health task', () => {
+    // The HEARTBEAT.md stale-recovery-health task prompt embeds this prefix
+    // as a literal string. Changing the constant without updating the
+    // heartbeat prompt would silently break the monitor — this test fails
+    // first to force the simultaneous update.
+    const heartbeatPath = resolvePath(
+      import.meta.dir,
+      '..',
+      '..',
+      '..',
+      '.claude',
+      'autonomy',
+      'HEARTBEAT.md',
+    )
+    if (!existsSync(heartbeatPath)) {
+      // .claude/ may be absent in some checkout layouts (e.g., shallow clone
+      // for npm pack). Skip rather than fail in that case.
+      return
+    }
+    const content = readFileSync(heartbeatPath, 'utf8')
+    expect(content).toContain(STALE_ACTIVE_RUN_ERROR_PREFIX)
+  })
 })