Root causes:
1. ThemeProvider was imported but never used in App.tsx and showSetupDialog
2. setThemeConfigCallbacks was never called to inject persistence callbacks
3. Preview/save/cancel theme lifecycle had no provider to coordinate
Changes:
- Export setThemeConfigCallbacks from @anthropic/ink
- Wrap App.tsx children with ThemeProvider (initialState from config, onThemeSave persists)
- Wrap showSetupDialog with ThemeProvider for onboarding/trust dialogs
- Call setThemeConfigCallbacks in init.ts to register load/save callbacks
- Update SnapshotUpdateDialog test to account for new ThemeProvider wrapper
Fixes #theme-switching
* feat: harden autonomy lifecycle, OOM bounds, and provider-boundary finalization
This PR consolidates a coordinated batch of fixes around autonomy run/flow lifecycle, scheduled task deduplication, provider-boundary state finalization, and matching memory-bound treatments for adjacent long-running subsystems (REPL fullscreen scrollback, skill-search/skill-learning runtime activation). All changes were developed and reviewed together because they touched the same lifecycle invariants and were uncovered by the same long-running session reproductions.
## Lifecycle correctness
- Queued autonomy prompts are not injected unless the persisted run was successfully claimed; queued run claiming is now terminal-safe so a once-consumed/cancelled/failed run can not slip back into `queued`.
- Autonomy run/flow finalization happens on completion, provider error, generator close, and cancellation — not just the happy path. New `src/__tests__/queryAutonomyProviderBoundary.test.ts` covers these provider-boundary transitions.
- `requestManagedAutonomyFlowCancel` and `resumeManagedAutonomyFlowPrompt` carry `rootDir` and `currentDir` explicitly across detached async boundaries (proactive-tick, cron, daemon restart) instead of inferring from process state.
- Active runs/flows are protected from janitor pruning so a running step can not be garbage-collected mid-flight (`src/utils/autonomyAuthority.ts`).
- Heartbeat parser now ignores fenced code blocks; the two-phase commit window for autonomy state transitions is documented in `docs/internals/autonomy-jira.md`.
## Ownership and dedup
- `src/utils/autonomyRuns.ts`: ownership stamping (run id + rootDir carried end-to-end), source-based dedup against active runs.
- `src/hooks/useScheduledTasks.ts`: scheduled ticks deduplicate against runs already active on the same source label.
- `src/utils/processUserInput/processSlashCommand.tsx`: forked slash commands now thread the autonomy `runId` so completion finalizers can find the originating run for deferred completion.
- New `src/utils/autonomyQueueLifecycle.ts` and tests collect the queue-side lifecycle invariants in one place.
## Memory bounds (related, same review pass)
- `src/screens/REPL.tsx`: caps fullscreen scrollback after the compact boundary and updates trailing progress rows in place. Long-running fullscreen sessions could otherwise retain thousands of post-compaction messages and duplicate progress rows, keeping Ink trees alive long after their useful context had moved on.
- `src/services/skillSearch/*` and `src/services/skillLearning/*`: runtime activation is strictly opt-in via existing env toggles; session caches are capped so long-running processes can not grow them forever. Build presence is preserved so operators can still discover and opt into the slash commands.
## CI / test contract
- `tests/integration/dependency-overrides.test.ts`: smoke test no longer drives Mermaid's browser renderer; it validates the package-resolution contract directly so CI does not regress on unrelated browser timing.
- New `tests/integration/autonomy-lifecycle-user-flow.test.ts`: end-to-end CLI subprocess flow exercising `status --deep`, `flows`, `flow <id>`, `flow resume`, `flow cancel` against persisted state.
- `src/entrypoints/cli.tsx`: `claude autonomy …` routes through an entrypoint fast path that reuses the slash-command formatter without booting the full interactive CLI. Stdout is flushed before forced exit so coverage subprocesses do not terminate with empty stdout.
- `packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/RemoteTriggerTool.test.ts`: stabilized to prevent audit flake under coverage.
## Tests added
- `src/__tests__/queryAutonomyProviderBoundary.test.ts`
- `src/hooks/__tests__/useScheduledTasks.test.ts`
- `src/utils/__tests__/autonomyAuthority.test.ts`
- `src/utils/__tests__/autonomyFlows.test.ts` (extended)
- `src/utils/__tests__/autonomyPersistence.test.ts` (extended)
- `src/utils/__tests__/autonomyQueueLifecycle.test.ts`
- `src/utils/__tests__/autonomyRuns.test.ts` (extended)
- `src/utils/processUserInput/__tests__/processSlashCommand.test.ts`
- `tests/integration/autonomy-lifecycle-user-flow.test.ts`
## Docs
- `docs/agent/sur-loop-scheduled-oom.md`: System Understanding Report covering the scheduled/loop OOM problem, the call graphs investigated, and the lifecycle invariants this PR establishes.
- `docs/agent/sur-skill-overflow-bugs.md`: SUR for the related skill-overflow context.
- `docs/internals/autonomy-jira.md`: documents the two-phase commit window and ownership stamping invariants.
- `docs/memory-leak-audit.md`: audit notes covering the REPL/scrollback and skill-search bounds.
## Invariants this PR establishes
1. Queued autonomy prompts are not injected unless the persisted run was successfully claimed.
2. Terminal run/flow states are terminal — completion, failure, and cancellation all finalize state regardless of which provider/error path triggered them.
3. Autonomy run/flow `rootDir` is carried explicitly across detached async boundaries instead of inferred from a shared singleton.
4. State-only CLI subcommands (`autonomy status|runs|flows|flow …`) bypass full interactive bootstrap so they do not hold unrelated handles open.
5. REPL fullscreen scrollback and skill-search/skill-learning session caches are explicitly bounded.
## Validation
```bash
bun run typecheck
CI=true GITHUB_ACTIONS=true bun test # 3996 pass / 0 fail across 305 files
bun test src/__tests__/queryAutonomyProviderBoundary.test.ts \
src/hooks/__tests__/useScheduledTasks.test.ts \
src/utils/__tests__/autonomy{Runs,Flows,Authority,QueueLifecycle,Persistence}.test.ts \
src/utils/processUserInput/__tests__/processSlashCommand.test.ts \
tests/integration/autonomy-lifecycle-user-flow.test.ts
```
## Origin
This PR is the consolidated, upstream-targeted version of two fork-side review PRs (fix/loop-scheduled-autonomy-oom and fix/autonomy-lifecycle). The fork-side review history is preserved at https://github.com/amDosion/claude-code-bast/pull/7 . The fork's own internal `chore: keep fork current with upstream` sync commits and the `docs: update contributors` automation are intentionally not included in this PR.
The autonomy CLI handler `rootDir` threading that the fork added (78f64d8a, 98d04ddb) is intentionally omitted here because upstream `a2cfaf91` (fix: 修复 RemoteTriggerTool 和 autonomy 测试的全量运行失败) already performed the equivalent change with an additional `currentDir` option. Keeping the upstream version avoids regressing that improvement.
* fixup: address CodeRabbit review on PR #386
Twelve actionable items (7 Major + 5 Minor) from the CodeRabbit review on
claude-code-best/claude-code#386:
- docs/internals/autonomy-jira.md: typo "due input close" → "due to input close".
- src/utils/autonomyRuns.ts:
- selectPersistedAutonomyRuns no longer evicts active (queued/running) runs
when the combined list exceeds AUTONOMY_RUNS_MAX. Active runs are kept in
full and the inactive history is capped to the remaining budget so
persisted ownership for live work survives.
- isValidOwnerProcessId now allows pid <= 4_194_304 so a live run owned by
the maximum Linux PID is not treated as stale.
- src/utils/autonomyAuthority.ts: maskCodeFencedLines tracks the active fence
length and only closes the fence when a same-character run of equal-or-
greater length appears with no trailing content, so a nested ```yaml inside
an outer ```` block no longer leaks fake `tasks:` entries into the parser.
- src/cli/print.ts: late-shutdown branches in the cron and scheduled-task
paths now call cancelQueuedAutonomyCommands({ commands: [command] }) instead
of markAutonomyRunCancelled(...). Updating run state alone left the
queue-side record orphaned for resume/recovery.
- src/utils/processUserInput/processSlashCommand.tsx: scheduled-task-result
notification is enqueued before finalizeAutonomyRunCompleted (which queues
follow-up autonomy commands) so both at priority: 'later' land in order and
the next autonomy step can not run before the worker's output is observed.
- src/screens/REPL.tsx + src/utils/handlePromptSubmit.ts:
- onQuery now returns Promise<boolean>: false from the concurrent-guard
skip path, true otherwise. Other call sites use `void onQuery(...)` and
are unaffected. handlePromptSubmit's onQuery prop type matches.
- The autonomy-prompt callsite captures the executed flag, finalizes
claim.claimedCommands as { type: 'completed' } only when onQuery actually
ran, and runs the completed-finalize in its own try/catch so a failure
there does not propagate into the outer catch and trigger a second
finalize as { type: 'failed' } for the same commands.
- Removed the unsafe `command.value as string` cast; createUserMessage
already accepts `string | ContentBlockParam[]`.
- createUserMessage mock in src/__tests__/handlePromptSubmit.test.ts now
matches the new Promise<boolean> shape.
- packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/
RemoteTriggerTool.test.ts:
- Inline auth mock replaced with the shared tests/mocks/auth (added).
- The full mock of src/constants/oauth.js is replaced by a narrow
side-effect-only mock that overrides the env-reading helpers
(getOauthConfig, fileSuffixForOauthConfig, MCP_CLIENT_METADATA_URL) and
delegates pure data exports to the real module.
- tests/integration/dependency-overrides.test.ts:
- mermaid does not export `./package.json` in its exports map, so
require.resolve('mermaid/package.json') throws
ERR_PACKAGE_PATH_NOT_EXPORTED in runtimes that honor exports semantics.
The test now resolves the package entry and walks up to the package
root via a small findPackageJson helper.
- readFileSync from node:fs is replaced with `await Bun.file(...).text()`
to match the project's Bun-API requirement.
Validation:
- bun run typecheck (clean).
- bun test → 3996 pass / 0 fail across 305 test files.
Targets PRs:
- amDosion/claude-code-bast#8 (fork-internal review)
- claude-code-best/claude-code#386 (upstream review, same head branch)
* fixup: address CodeRabbit second-round review on PR #386
Four inline + one outside-diff actionable comment from the second CodeRabbit
review on claude-code-best/claude-code#386:
- tests/mocks/auth.ts: align mock return contracts with src/utils/auth.ts.
checkAndRefreshOAuthTokenIfNeeded resolves to a Promise<boolean> and
getClaudeAIOAuthTokens returns the full token shape (refreshToken, expiresAt,
scopes, subscriptionType, rateLimitTier) so tests that branch on these
values can not silently drift away from production.
- src/utils/handlePromptSubmit.ts (461-468): clear the freshly-published
abortController before the early return when every claimed autonomy command
was skipped as non-consumable, so this turn's stale controller does not leak
into the next turn.
- src/utils/handlePromptSubmit.ts (621-649): separate execution failure from
finalizer failure. The turn body now writes to a `turnError` slot; a single
pass after the inner try decides whether to finalize claimed commands as
`completed` or `failed`, with each finalize call wrapped in its own
try/catch so a failure inside finalize does not flip a successful turn into
`failed` and double-finalize the same commands. The outer catch only
rethrows the original turn error.
- src/utils/processUserInput/processSlashCommand.tsx (228-276): wrap the
post-success `finalizeDeferredAutonomyRunCompleted()` call in its own
try/catch so a finalize failure no longer falls into the worker-failure
catch path and emits a contradictory `<scheduled-task-result status="failed">`
for a slash command that actually succeeded.
Outside scope (not changed) — the CodeRabbit suggestion to add a `.ts`
extension to the shared `tests/mocks/auth` import contradicts the project's
existing convention: every other test imports the shared mocks without the
extension (e.g. `tests/mocks/log`, `tests/mocks/debug`,
`tests/mocks/file-system`), and the project's tsconfig does not enable
`allowImportingTsExtensions`, so adding the extension fails typecheck. The
import is kept extension-less to match the rest of the suite.
Validation:
- bun run typecheck (clean).
- bun test → 3996 pass / 0 fail across 305 test files.
* docs: 给 sur-skill-overflow-bugs 的代码块加 bash 标签
应用 PR #386 review 的剩余 nit。pid_max 边界、REPL cast、autonomy-jira typo
三处与远端 fixup (452a7e6) 内容相同,rebase 时已去重,本次提交仅包含 code
fence 语言标签这一项。
* fixup: 处理 PR #386 review 中尚未覆盖的 4 项
- src/cli/print.ts: cron onFire 改用 createAutonomyQueuedPromptIfNoActiveSource
并以 prompt 文本作为 sourceId,避免同一定时提示在前一次 run 仍活跃时被重复
入队叠加;顺手移除 4 个已没人引用的 dead import
(commitAutonomyQueuedPrompt / prepareAutonomyTurnPrompt /
markAutonomyRunCancelled / createAutonomyQueuedPrompt)
- src/services/compact/postCompactCleanup.ts: 在 void import().then() 处加
注释,明确 sweepFileContentCache 是有意的 fire-and-forget,函数对外保持
同步签名是设计而非疏忽
- src/utils/autonomyFlows.ts: 给 selectPersistedAutonomyFlows 的两阶段排序
加文档注释(先按 active+updatedAt 选 top-N,再统一按 updatedAt 重排)
- tests/integration/autonomy-lifecycle-user-flow.test.ts: stderr 断言失败时
把实际 stderr 内容写进 message,方便 CI 失败时定位
* refactor: 简化/复用/防御 — 清理 PR #386 审计发现
简化 (S1, S2):
- src/cli/print.ts: 抽出 dispatchHeadlessCronCommand 本地 helper,把
cron 三个入口(onFire / onFireTask agent / onFireTask 非-agent)共享的
「dedup-claim → input-close-recheck → onSuccess」管线集中到一处,
避免三个分支在「claim 与 dispatch 之间发生 inputClosed」的处理上漂移。
enqueueAndRun 再抽出来,使两个非-agent 分支共用一个 onSuccess 回调。
约 -55 行重复模板。
- src/utils/autonomyPersistence.ts: 新增 retainActiveFirst<T> 泛型
helper —— active 记录无条件保留(不参与 cap),inactive 按 timestamp
desc 填满剩余预算;统一 selectPersistedAutonomyRuns / Flows 的两阶段
排序语义。
- src/utils/autonomyRuns.ts、autonomyFlows.ts: 改用 retainActiveFirst,
删掉重复的内联两阶段排序逻辑。
复用 (R1, review #8):
- tests/mocks/file-system.ts: 新增 readTempFile / tempPathExists 两个
Bun.file 包装,补齐 Node fs.readFileSync / existsSync 在测试里的
Bun-only 等价物。
- src/utils/__tests__/autonomyRuns.test.ts: 把全部 Node fs/path 导入
(existsSync, readFileSync, mkdir, writeFile, path.join/resolve)替换为
tests/mocks/file-system 的共享 helper + node:path(带 node: 前缀)。
不再有 6 处 mkdir + writeFile 模板,统一用 writeTempFile(自带 mkdir-p)。
解决 review #8 (Major) 的 Bun-only 运行时契约违反。
防御 (D1, OOM 早期信号):
- src/services/compact/postCompactCleanup.ts: 在 void import().then() 末尾
补 .catch(logError)。当前 attributionHooks 是 stub,但当真实现被恢复
且 sweepFileContentCache 抛错时,这个 .catch 阻止它变成 unhandled
rejection(函数返回值是 void,调用者无从观察异步失败)。
- src/utils/autonomyRuns.ts: 给 active runs 加 100 条软上限 + 一次性
warn。selectPersistedAutonomyRuns 仍然永不淘汰 active 记录,但跨过
阈值时 logError 一次,作为 finalize-leak 早期信号——避免 active 无限
增长悄悄使 AUTONOMY_RUNS_MAX 失效。
---------
Co-authored-by: unraid <local@unraid.local>
Co-authored-by: Claude <noreply@anthropic.com>
Four inline + one outside-diff actionable comment from the second CodeRabbit
review on claude-code-best/claude-code#386:
- tests/mocks/auth.ts: align mock return contracts with src/utils/auth.ts.
checkAndRefreshOAuthTokenIfNeeded resolves to a Promise<boolean> and
getClaudeAIOAuthTokens returns the full token shape (refreshToken, expiresAt,
scopes, subscriptionType, rateLimitTier) so tests that branch on these
values can not silently drift away from production.
- src/utils/handlePromptSubmit.ts (461-468): clear the freshly-published
abortController before the early return when every claimed autonomy command
was skipped as non-consumable, so this turn's stale controller does not leak
into the next turn.
- src/utils/handlePromptSubmit.ts (621-649): separate execution failure from
finalizer failure. The turn body now writes to a `turnError` slot; a single
pass after the inner try decides whether to finalize claimed commands as
`completed` or `failed`, with each finalize call wrapped in its own
try/catch so a failure inside finalize does not flip a successful turn into
`failed` and double-finalize the same commands. The outer catch only
rethrows the original turn error.
- src/utils/processUserInput/processSlashCommand.tsx (228-276): wrap the
post-success `finalizeDeferredAutonomyRunCompleted()` call in its own
try/catch so a finalize failure no longer falls into the worker-failure
catch path and emits a contradictory `<scheduled-task-result status="failed">`
for a slash command that actually succeeded.
Outside scope (not changed) — the CodeRabbit suggestion to add a `.ts`
extension to the shared `tests/mocks/auth` import contradicts the project's
existing convention: every other test imports the shared mocks without the
extension (e.g. `tests/mocks/log`, `tests/mocks/debug`,
`tests/mocks/file-system`), and the project's tsconfig does not enable
`allowImportingTsExtensions`, so adding the extension fails typecheck. The
import is kept extension-less to match the rest of the suite.
Validation:
- bun run typecheck (clean).
- bun test → 3996 pass / 0 fail across 305 test files.
Twelve actionable items (7 Major + 5 Minor) from the CodeRabbit review on
claude-code-best/claude-code#386:
- docs/internals/autonomy-jira.md: typo "due input close" → "due to input close".
- src/utils/autonomyRuns.ts:
- selectPersistedAutonomyRuns no longer evicts active (queued/running) runs
when the combined list exceeds AUTONOMY_RUNS_MAX. Active runs are kept in
full and the inactive history is capped to the remaining budget so
persisted ownership for live work survives.
- isValidOwnerProcessId now allows pid <= 4_194_304 so a live run owned by
the maximum Linux PID is not treated as stale.
- src/utils/autonomyAuthority.ts: maskCodeFencedLines tracks the active fence
length and only closes the fence when a same-character run of equal-or-
greater length appears with no trailing content, so a nested ```yaml inside
an outer ```` block no longer leaks fake `tasks:` entries into the parser.
- src/cli/print.ts: late-shutdown branches in the cron and scheduled-task
paths now call cancelQueuedAutonomyCommands({ commands: [command] }) instead
of markAutonomyRunCancelled(...). Updating run state alone left the
queue-side record orphaned for resume/recovery.
- src/utils/processUserInput/processSlashCommand.tsx: scheduled-task-result
notification is enqueued before finalizeAutonomyRunCompleted (which queues
follow-up autonomy commands) so both at priority: 'later' land in order and
the next autonomy step can not run before the worker's output is observed.
- src/screens/REPL.tsx + src/utils/handlePromptSubmit.ts:
- onQuery now returns Promise<boolean>: false from the concurrent-guard
skip path, true otherwise. Other call sites use `void onQuery(...)` and
are unaffected. handlePromptSubmit's onQuery prop type matches.
- The autonomy-prompt callsite captures the executed flag, finalizes
claim.claimedCommands as { type: 'completed' } only when onQuery actually
ran, and runs the completed-finalize in its own try/catch so a failure
there does not propagate into the outer catch and trigger a second
finalize as { type: 'failed' } for the same commands.
- Removed the unsafe `command.value as string` cast; createUserMessage
already accepts `string | ContentBlockParam[]`.
- createUserMessage mock in src/__tests__/handlePromptSubmit.test.ts now
matches the new Promise<boolean> shape.
- packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/
RemoteTriggerTool.test.ts:
- Inline auth mock replaced with the shared tests/mocks/auth (added).
- The full mock of src/constants/oauth.js is replaced by a narrow
side-effect-only mock that overrides the env-reading helpers
(getOauthConfig, fileSuffixForOauthConfig, MCP_CLIENT_METADATA_URL) and
delegates pure data exports to the real module.
- tests/integration/dependency-overrides.test.ts:
- mermaid does not export `./package.json` in its exports map, so
require.resolve('mermaid/package.json') throws
ERR_PACKAGE_PATH_NOT_EXPORTED in runtimes that honor exports semantics.
The test now resolves the package entry and walks up to the package
root via a small findPackageJson helper.
- readFileSync from node:fs is replaced with `await Bun.file(...).text()`
to match the project's Bun-API requirement.
Validation:
- bun run typecheck (clean).
- bun test → 3996 pass / 0 fail across 305 test files.
Targets PRs:
- amDosion/claude-code-bast#8 (fork-internal review)
- claude-code-best/claude-code#386 (upstream review, same head branch)
This PR consolidates a coordinated batch of fixes around autonomy run/flow lifecycle, scheduled task deduplication, provider-boundary state finalization, and matching memory-bound treatments for adjacent long-running subsystems (REPL fullscreen scrollback, skill-search/skill-learning runtime activation). All changes were developed and reviewed together because they touched the same lifecycle invariants and were uncovered by the same long-running session reproductions.
## Lifecycle correctness
- Queued autonomy prompts are not injected unless the persisted run was successfully claimed; queued run claiming is now terminal-safe so a once-consumed/cancelled/failed run can not slip back into `queued`.
- Autonomy run/flow finalization happens on completion, provider error, generator close, and cancellation — not just the happy path. New `src/__tests__/queryAutonomyProviderBoundary.test.ts` covers these provider-boundary transitions.
- `requestManagedAutonomyFlowCancel` and `resumeManagedAutonomyFlowPrompt` carry `rootDir` and `currentDir` explicitly across detached async boundaries (proactive-tick, cron, daemon restart) instead of inferring from process state.
- Active runs/flows are protected from janitor pruning so a running step can not be garbage-collected mid-flight (`src/utils/autonomyAuthority.ts`).
- Heartbeat parser now ignores fenced code blocks; the two-phase commit window for autonomy state transitions is documented in `docs/internals/autonomy-jira.md`.
## Ownership and dedup
- `src/utils/autonomyRuns.ts`: ownership stamping (run id + rootDir carried end-to-end), source-based dedup against active runs.
- `src/hooks/useScheduledTasks.ts`: scheduled ticks deduplicate against runs already active on the same source label.
- `src/utils/processUserInput/processSlashCommand.tsx`: forked slash commands now thread the autonomy `runId` so completion finalizers can find the originating run for deferred completion.
- New `src/utils/autonomyQueueLifecycle.ts` and tests collect the queue-side lifecycle invariants in one place.
## Memory bounds (related, same review pass)
- `src/screens/REPL.tsx`: caps fullscreen scrollback after the compact boundary and updates trailing progress rows in place. Long-running fullscreen sessions could otherwise retain thousands of post-compaction messages and duplicate progress rows, keeping Ink trees alive long after their useful context had moved on.
- `src/services/skillSearch/*` and `src/services/skillLearning/*`: runtime activation is strictly opt-in via existing env toggles; session caches are capped so long-running processes can not grow them forever. Build presence is preserved so operators can still discover and opt into the slash commands.
## CI / test contract
- `tests/integration/dependency-overrides.test.ts`: smoke test no longer drives Mermaid's browser renderer; it validates the package-resolution contract directly so CI does not regress on unrelated browser timing.
- New `tests/integration/autonomy-lifecycle-user-flow.test.ts`: end-to-end CLI subprocess flow exercising `status --deep`, `flows`, `flow <id>`, `flow resume`, `flow cancel` against persisted state.
- `src/entrypoints/cli.tsx`: `claude autonomy …` routes through an entrypoint fast path that reuses the slash-command formatter without booting the full interactive CLI. Stdout is flushed before forced exit so coverage subprocesses do not terminate with empty stdout.
- `packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/RemoteTriggerTool.test.ts`: stabilized to prevent audit flake under coverage.
## Tests added
- `src/__tests__/queryAutonomyProviderBoundary.test.ts`
- `src/hooks/__tests__/useScheduledTasks.test.ts`
- `src/utils/__tests__/autonomyAuthority.test.ts`
- `src/utils/__tests__/autonomyFlows.test.ts` (extended)
- `src/utils/__tests__/autonomyPersistence.test.ts` (extended)
- `src/utils/__tests__/autonomyQueueLifecycle.test.ts`
- `src/utils/__tests__/autonomyRuns.test.ts` (extended)
- `src/utils/processUserInput/__tests__/processSlashCommand.test.ts`
- `tests/integration/autonomy-lifecycle-user-flow.test.ts`
## Docs
- `docs/agent/sur-loop-scheduled-oom.md`: System Understanding Report covering the scheduled/loop OOM problem, the call graphs investigated, and the lifecycle invariants this PR establishes.
- `docs/agent/sur-skill-overflow-bugs.md`: SUR for the related skill-overflow context.
- `docs/internals/autonomy-jira.md`: documents the two-phase commit window and ownership stamping invariants.
- `docs/memory-leak-audit.md`: audit notes covering the REPL/scrollback and skill-search bounds.
## Invariants this PR establishes
1. Queued autonomy prompts are not injected unless the persisted run was successfully claimed.
2. Terminal run/flow states are terminal — completion, failure, and cancellation all finalize state regardless of which provider/error path triggered them.
3. Autonomy run/flow `rootDir` is carried explicitly across detached async boundaries instead of inferred from a shared singleton.
4. State-only CLI subcommands (`autonomy status|runs|flows|flow …`) bypass full interactive bootstrap so they do not hold unrelated handles open.
5. REPL fullscreen scrollback and skill-search/skill-learning session caches are explicitly bounded.
## Validation
```bash
bun run typecheck
CI=true GITHUB_ACTIONS=true bun test # 3996 pass / 0 fail across 305 files
bun test src/__tests__/queryAutonomyProviderBoundary.test.ts \
src/hooks/__tests__/useScheduledTasks.test.ts \
src/utils/__tests__/autonomy{Runs,Flows,Authority,QueueLifecycle,Persistence}.test.ts \
src/utils/processUserInput/__tests__/processSlashCommand.test.ts \
tests/integration/autonomy-lifecycle-user-flow.test.ts
```
## Origin
This PR is the consolidated, upstream-targeted version of two fork-side review PRs (fix/loop-scheduled-autonomy-oom and fix/autonomy-lifecycle). The fork-side review history is preserved at https://github.com/amDosion/claude-code-bast/pull/7 . The fork's own internal `chore: keep fork current with upstream` sync commits and the `docs: update contributors` automation are intentionally not included in this PR.
The autonomy CLI handler `rootDir` threading that the fork added (78f64d8a, 98d04ddb) is intentionally omitted here because upstream `a2cfaf91` (fix: 修复 RemoteTriggerTool 和 autonomy 测试的全量运行失败) already performed the equivalent change with an additional `currentDir` option. Keeping the upstream version avoids regressing that improvement.
* fix: keep UDS peer failures structured
CodeRabbit and Claude cross-review identified that timeout and raw peer connection failures should share one observable error contract. UDS peer failures now use UdsPeerConnectionError consistently, and connectToPeer hands the socket lifecycle back to the caller after a successful connection instead of retaining an internal timeout or error listener.
The tests cover the real socket paths with capability files, timeout behavior, connection failure structure, post-connect listener handoff, AgentSummary rescheduling observations, and platform-specific mailbox directory errno handling.
Constraint: Preserve the 5000ms production timeout default while allowing tests to exercise timeout paths quickly.
Rejected: Suppress CodeRabbit warnings in tests | would hide the real timeout/error contract gap.
Rejected: Keep connectToPeer post-connect error listener | it would silently swallow caller-owned socket errors.
Confidence: high
Scope-risk: narrow
Directive: Keep UDS send/connect timeout and socket-error paths on the same structured peer error contract.
Tested: bun test src/utils/__tests__/udsMessaging.test.ts src/services/AgentSummary/__tests__/agentSummary.test.ts src/utils/__tests__/teammateMailbox.test.ts
Tested: bunx tsc --noEmit --pretty false
Tested: bun run lint
Tested: bun run test:all
Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage
Tested: bun run build
Tested: bun run build:vite
Tested: omx ask claude simplify review artifact .omx/artifacts/claude-review-only-cross-check-for-pr-374-on-branch-codex-codecov-r-2026-04-27T08-17-47-309Z.md
Tested: omx ask claude security review artifact .omx/artifacts/claude-security-review-cross-check-for-pr-374-current-working-tree--2026-04-27T08-26-54-079Z.md
Not-tested: GitHub-hosted CodeRabbit refresh until pushed.
* docs: clarify UDS peer socket ownership
CodeRabbit's #375 pass found that connectToPeer now correctly hands socket errors to the caller, but the JSDoc needed to spell out that contract. The lifecycle test also uses a less brittle post-connect timeout so slow CI does not turn the ownership check into a connection-speed race.
Constraint: The raw socket API intentionally detaches its internal listener after successful connect so caller-owned errors are not swallowed.
Rejected: Keep the test timeout at 50ms | it tests scheduler speed instead of socket lifecycle ownership.
Confidence: high
Scope-risk: narrow
Directive: connectToPeer callers must attach their own error listener immediately after awaiting the socket.
Tested: bun test src/utils/__tests__/udsMessaging.test.ts
Tested: bunx tsc --noEmit --pretty false
Tested: bun run lint
Tested: git diff --check
Tested: bun run test:all
Not-tested: GitHub-hosted CodeRabbit refresh until pushed.
* fix: close peer socket listener handoff window
CodeRabbit and Claude review found that documenting caller-owned raw socket errors still left a Promise handoff window and a stale timeout-listener risk. The peer connection API now requires a caller error handler and installs it before resolving, while cleanup removes internal error and timeout listeners on every path.
Constraint: Keep the fix precise to PR #375 review feedback and avoid warning suppression or fallback behavior.
Rejected: Leave the behavior documented only | still permits an unhandled socket error window between resolve and caller listener attachment.
Rejected: Keep a no-op internal error listener | would silently swallow caller-owned socket errors.
Confidence: high
Scope-risk: narrow
Directive: Do not add raw connectToPeer callers without providing a real onSocketError handler and capability handshake.
Tested: bun test src/utils/__tests__/udsMessaging.test.ts src/services/AgentSummary/__tests__/agentSummary.test.ts
Tested: bunx tsc --noEmit --pretty false
Tested: bun run lint
Tested: bun run test:all
Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage
Tested: bun run build
Tested: bun run build:vite
Tested: bun audit
Not-tested: Manual external ACP peer runtime beyond repository tests.
* fix: use a deadline timer for peer connects
The raw socket handoff no longer needs Socket#setTimeout; an ordinary connection deadline keeps the timeout behavior while avoiding an internal socket timeout listener that has no reliable UDS integration path to exercise.
Constraint: Keep Codecov coverage honest without adding ignore pragmas, mocks, or fallback suppression.
Rejected: c8 ignore on the timeout listener | hides the uncovered branch instead of simplifying the lifecycle.
Rejected: keep Socket#setTimeout listener | leaves a socket listener lifecycle to manage for a connect-only deadline.
Confidence: high
Scope-risk: narrow
Directive: Keep connectToPeer errors caller-owned via onSocketError and reject pre-connect failures with UdsPeerConnectionError.
Tested: bun test src/utils/__tests__/udsMessaging.test.ts src/services/AgentSummary/__tests__/agentSummary.test.ts
Tested: bunx tsc --noEmit --pretty false
Tested: bun run lint
Tested: bun test src/utils/__tests__/udsMessaging.test.ts --coverage --coverage-reporter lcov --coverage-dir coverage-uds
Tested: bun run test:all
Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage
Tested: bun run build
Tested: bun run build:vite
Tested: bun audit
Not-tested: Manual external ACP peer runtime beyond repository tests.
---------
Co-authored-by: unraid <local@unraid.local>
* test: keep Codecov coverage on real agent communication paths
PR #369 was merged before the final Codecov coverage fix landed, so this follow-up carries only the incremental real-path tests needed on top of main. The tests exercise AgentSummary lifecycle branches, mailbox fail-closed behavior, UDS client connection failure through a real capability file, and UDS response-reader framing without mock.module, warning suppression, feature fallback, or production-code churn.
Constraint: PR #369 is already merged; this branch must contain only the incremental Codecov repair on top of latest main
Rejected: Reopen or keep pushing the merged PR branch | merged PR refs do not update and would leave Codecov stale
Rejected: Mock bun:bundle or hide warnings | would reintroduce cross-test pollution and pseudo coverage
Rejected: Keep unrelated SendMessageTool production diff | it created avoidable patch-coverage debt without improving the runtime path
Confidence: high
Scope-risk: narrow
Directive: Keep these coverage tests on real paths; do not replace them with output suppression or feature-flag mocks
Tested: bunx tsc --noEmit --pretty false
Tested: bun run lint
Tested: bun test src\utils\__tests__\teammateMailbox.test.ts
Tested: bun test src\services\AgentSummary\__tests__\agentSummary.test.ts src\services\AgentSummary\__tests__\summaryContext.test.ts src\utils\__tests__\teammateMailbox.test.ts src\utils\__tests__\udsMessaging.test.ts src\utils\__tests__\udsResponseReader.test.ts packages\builtin-tools\src\tools\SendMessageTool\__tests__\udsRecipientSanitization.test.ts
Tested: bun run test:all
Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage
Tested: bun run build
Tested: bun run build:vite
Tested: bun audit
Tested: git diff --check
Tested: Claude simplify review GO (.omx/artifacts/claude-simplify-codecov-20260427-1521.md)
Tested: Claude security review GO (.omx/artifacts/claude-security-codecov-20260427-1522.md)
Not-tested: GitHub-hosted Codecov upload after this amended commit until PR checks rerun
* test: keep review assertions tied to real failure paths
CodeRabbit flagged three non-blocking but valid review gaps: platform-specific mailbox errno checks, brittle UDS connection-failure message assertions, and missing AgentSummary reschedule proof after fork errors. This keeps the fixes narrow by tightening the affected assertions and adding a structured UDS connection error for tests to assert behavior instead of prose.
Constraint: PR #374 is a review follow-up and must not hide warnings, skip tests, or merge the PR.
Rejected: Matching the UDS failure message literal | preserves the brittle coupling CodeRabbit flagged.
Rejected: Asserting only that mailbox writes throw | would allow unrelated pre-path failures to pass.
Confidence: high
Scope-risk: narrow
Directive: Keep UDS connection-failure tests on structured error data, not display wording.
Tested: bun test src/services/AgentSummary/__tests__/agentSummary.test.ts src/utils/__tests__/teammateMailbox.test.ts src/utils/__tests__/udsMessaging.test.ts
Tested: bunx tsc --noEmit --pretty false
Tested: bun run lint
Tested: bun run test:all
Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage
Tested: bun run build
Tested: bun run build:vite
Not-tested: GitHub-hosted CodeRabbit refresh until pushed.
* test: remove brittle review follow-up assumptions
CodeRabbit's second pass found two valid brittleness issues and one suggested callback-reference assertion that would not match production behavior. This keeps the production behavior unchanged: timers still schedule the summarizer closure, tests now assert timer-handle identity, and UDS connection errors use native Error.cause instead of shadowing it.
Constraint: Do not manufacture behavior just to satisfy a review hint; assertions must match the real AgentSummary scheduling contract.
Rejected: Assert a fresh scheduled callback function | scheduleNext intentionally passes the same runSummary closure each time.
Rejected: Store a custom cause field on UdsPeerConnectionError | native Error.cause is available under ESNext/Bun.
Confidence: high
Scope-risk: narrow
Directive: Timer tests should assert returned handle identity for ownership, not incidental numeric values.
Tested: bun test src/services/AgentSummary/__tests__/agentSummary.test.ts src/utils/__tests__/udsMessaging.test.ts
Tested: bunx tsc --noEmit --pretty false
Tested: bun run lint
Tested: bun run test:all
Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage
Tested: bun run build
Tested: bun run build:vite
Not-tested: GitHub-hosted CodeRabbit refresh until pushed.
* test: enforce structured UDS timeout failures
CodeRabbit's follow-up surfaced a real consistency gap: UDS send socket errors used UdsPeerConnectionError while response timeouts still rejected a generic Error. Timeouts now use the same structured peer failure contract, and the test exercises that path through a short explicit timeout instead of waiting for the production default.
The AgentSummary unchanged-fingerprint test now also asserts that the second unchanged tick does not log errors, preserving the existing behavior checks without changing production scheduling semantics.
Constraint: Keep the production timeout default at 5000ms while allowing tests to exercise the timeout path quickly.
Rejected: Leave timeout failures as generic Error | callers would need separate handling for the same peer connection failure class.
Confidence: high
Scope-risk: narrow
Directive: Keep UDS send timeout and socket-error branches on the same structured error contract.
Tested: bun test src/services/AgentSummary/__tests__/agentSummary.test.ts src/utils/__tests__/udsMessaging.test.ts
Tested: bunx tsc --noEmit --pretty false
Tested: bun run lint
Tested: bun run test:all
Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage
Tested: bun run build
Tested: bun run build:vite
Not-tested: GitHub-hosted CodeRabbit refresh until pushed.
---------
Co-authored-by: unraid <local@unraid.local>