OpenAI's prompt_tokens includes cached tokens, but Anthropic's
input_tokens semantic excludes them. The adapter was mapping
prompt_tokens → input_tokens verbatim, causing downstream code
(cache hit rate, cost, autocompact) to double-count.
Real-world impact: DeepSeek returns prompt_tokens=34097 with
cached_tokens=34048, displayed as 50% hit rate instead of 99.86%.
Co-Authored-By: glm-5.1 <zai-org@claude-code-best.win>
DeepSeek v4 in thinking mode sometimes returns reasoning_content: ""
when the model answers directly without internal reasoning. Two places
were filtering the empty string out, which dropped the thinking block
from the assistant turn entirely. The next request then omitted
reasoning_content for that prior turn, and DeepSeek rejected with
400 "reasoning_content ... must be passed back to the API".
Fix:
- openaiStreamAdapter: open a thinking block whenever reasoning_content
is present (including ""); skip the empty thinking_delta event since
the empty value is already conveyed by the block's initial state.
- openaiConvertMessages: preserve empty thinking blocks as
reasoning_content: "" when serializing assistant messages back to
the OpenAI/DeepSeek format.
Tests:
- New: empty reasoning_content opens a thinking block (adapter).
- Updated: empty thinking blocks now round-trip as reasoning_content: ""
instead of being dropped.
- New: assistant messages with no thinking block still omit
reasoning_content (regression guard for non-thinking models).
Root causes:
1. ThemeProvider was imported but never used in App.tsx and showSetupDialog
2. setThemeConfigCallbacks was never called to inject persistence callbacks
3. Preview/save/cancel theme lifecycle had no provider to coordinate
Changes:
- Export setThemeConfigCallbacks from @anthropic/ink
- Wrap App.tsx children with ThemeProvider (initialState from config, onThemeSave persists)
- Wrap showSetupDialog with ThemeProvider for onboarding/trust dialogs
- Call setThemeConfigCallbacks in init.ts to register load/save callbacks
- Update SnapshotUpdateDialog test to account for new ThemeProvider wrapper
Fixes #theme-switching
Twelve actionable items (7 Major + 5 Minor) from the CodeRabbit review on
claude-code-best/claude-code#386:
- docs/internals/autonomy-jira.md: typo "due input close" → "due to input close".
- src/utils/autonomyRuns.ts:
- selectPersistedAutonomyRuns no longer evicts active (queued/running) runs
when the combined list exceeds AUTONOMY_RUNS_MAX. Active runs are kept in
full and the inactive history is capped to the remaining budget so
persisted ownership for live work survives.
- isValidOwnerProcessId now allows pid <= 4_194_304 so a live run owned by
the maximum Linux PID is not treated as stale.
- src/utils/autonomyAuthority.ts: maskCodeFencedLines tracks the active fence
length and only closes the fence when a same-character run of equal-or-
greater length appears with no trailing content, so a nested ```yaml inside
an outer ```` block no longer leaks fake `tasks:` entries into the parser.
- src/cli/print.ts: late-shutdown branches in the cron and scheduled-task
paths now call cancelQueuedAutonomyCommands({ commands: [command] }) instead
of markAutonomyRunCancelled(...). Updating run state alone left the
queue-side record orphaned for resume/recovery.
- src/utils/processUserInput/processSlashCommand.tsx: scheduled-task-result
notification is enqueued before finalizeAutonomyRunCompleted (which queues
follow-up autonomy commands) so both at priority: 'later' land in order and
the next autonomy step can not run before the worker's output is observed.
- src/screens/REPL.tsx + src/utils/handlePromptSubmit.ts:
- onQuery now returns Promise<boolean>: false from the concurrent-guard
skip path, true otherwise. Other call sites use `void onQuery(...)` and
are unaffected. handlePromptSubmit's onQuery prop type matches.
- The autonomy-prompt callsite captures the executed flag, finalizes
claim.claimedCommands as { type: 'completed' } only when onQuery actually
ran, and runs the completed-finalize in its own try/catch so a failure
there does not propagate into the outer catch and trigger a second
finalize as { type: 'failed' } for the same commands.
- Removed the unsafe `command.value as string` cast; createUserMessage
already accepts `string | ContentBlockParam[]`.
- createUserMessage mock in src/__tests__/handlePromptSubmit.test.ts now
matches the new Promise<boolean> shape.
- packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/
RemoteTriggerTool.test.ts:
- Inline auth mock replaced with the shared tests/mocks/auth (added).
- The full mock of src/constants/oauth.js is replaced by a narrow
side-effect-only mock that overrides the env-reading helpers
(getOauthConfig, fileSuffixForOauthConfig, MCP_CLIENT_METADATA_URL) and
delegates pure data exports to the real module.
- tests/integration/dependency-overrides.test.ts:
- mermaid does not export `./package.json` in its exports map, so
require.resolve('mermaid/package.json') throws
ERR_PACKAGE_PATH_NOT_EXPORTED in runtimes that honor exports semantics.
The test now resolves the package entry and walks up to the package
root via a small findPackageJson helper.
- readFileSync from node:fs is replaced with `await Bun.file(...).text()`
to match the project's Bun-API requirement.
Validation:
- bun run typecheck (clean).
- bun test → 3996 pass / 0 fail across 305 test files.
Targets PRs:
- amDosion/claude-code-bast#8 (fork-internal review)
- claude-code-best/claude-code#386 (upstream review, same head branch)
This PR consolidates a coordinated batch of fixes around autonomy run/flow lifecycle, scheduled task deduplication, provider-boundary state finalization, and matching memory-bound treatments for adjacent long-running subsystems (REPL fullscreen scrollback, skill-search/skill-learning runtime activation). All changes were developed and reviewed together because they touched the same lifecycle invariants and were uncovered by the same long-running session reproductions.
## Lifecycle correctness
- Queued autonomy prompts are not injected unless the persisted run was successfully claimed; queued run claiming is now terminal-safe so a once-consumed/cancelled/failed run can not slip back into `queued`.
- Autonomy run/flow finalization happens on completion, provider error, generator close, and cancellation — not just the happy path. New `src/__tests__/queryAutonomyProviderBoundary.test.ts` covers these provider-boundary transitions.
- `requestManagedAutonomyFlowCancel` and `resumeManagedAutonomyFlowPrompt` carry `rootDir` and `currentDir` explicitly across detached async boundaries (proactive-tick, cron, daemon restart) instead of inferring from process state.
- Active runs/flows are protected from janitor pruning so a running step can not be garbage-collected mid-flight (`src/utils/autonomyAuthority.ts`).
- Heartbeat parser now ignores fenced code blocks; the two-phase commit window for autonomy state transitions is documented in `docs/internals/autonomy-jira.md`.
## Ownership and dedup
- `src/utils/autonomyRuns.ts`: ownership stamping (run id + rootDir carried end-to-end), source-based dedup against active runs.
- `src/hooks/useScheduledTasks.ts`: scheduled ticks deduplicate against runs already active on the same source label.
- `src/utils/processUserInput/processSlashCommand.tsx`: forked slash commands now thread the autonomy `runId` so completion finalizers can find the originating run for deferred completion.
- New `src/utils/autonomyQueueLifecycle.ts` and tests collect the queue-side lifecycle invariants in one place.
## Memory bounds (related, same review pass)
- `src/screens/REPL.tsx`: caps fullscreen scrollback after the compact boundary and updates trailing progress rows in place. Long-running fullscreen sessions could otherwise retain thousands of post-compaction messages and duplicate progress rows, keeping Ink trees alive long after their useful context had moved on.
- `src/services/skillSearch/*` and `src/services/skillLearning/*`: runtime activation is strictly opt-in via existing env toggles; session caches are capped so long-running processes can not grow them forever. Build presence is preserved so operators can still discover and opt into the slash commands.
## CI / test contract
- `tests/integration/dependency-overrides.test.ts`: smoke test no longer drives Mermaid's browser renderer; it validates the package-resolution contract directly so CI does not regress on unrelated browser timing.
- New `tests/integration/autonomy-lifecycle-user-flow.test.ts`: end-to-end CLI subprocess flow exercising `status --deep`, `flows`, `flow <id>`, `flow resume`, `flow cancel` against persisted state.
- `src/entrypoints/cli.tsx`: `claude autonomy …` routes through an entrypoint fast path that reuses the slash-command formatter without booting the full interactive CLI. Stdout is flushed before forced exit so coverage subprocesses do not terminate with empty stdout.
- `packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/RemoteTriggerTool.test.ts`: stabilized to prevent audit flake under coverage.
## Tests added
- `src/__tests__/queryAutonomyProviderBoundary.test.ts`
- `src/hooks/__tests__/useScheduledTasks.test.ts`
- `src/utils/__tests__/autonomyAuthority.test.ts`
- `src/utils/__tests__/autonomyFlows.test.ts` (extended)
- `src/utils/__tests__/autonomyPersistence.test.ts` (extended)
- `src/utils/__tests__/autonomyQueueLifecycle.test.ts`
- `src/utils/__tests__/autonomyRuns.test.ts` (extended)
- `src/utils/processUserInput/__tests__/processSlashCommand.test.ts`
- `tests/integration/autonomy-lifecycle-user-flow.test.ts`
## Docs
- `docs/agent/sur-loop-scheduled-oom.md`: System Understanding Report covering the scheduled/loop OOM problem, the call graphs investigated, and the lifecycle invariants this PR establishes.
- `docs/agent/sur-skill-overflow-bugs.md`: SUR for the related skill-overflow context.
- `docs/internals/autonomy-jira.md`: documents the two-phase commit window and ownership stamping invariants.
- `docs/memory-leak-audit.md`: audit notes covering the REPL/scrollback and skill-search bounds.
## Invariants this PR establishes
1. Queued autonomy prompts are not injected unless the persisted run was successfully claimed.
2. Terminal run/flow states are terminal — completion, failure, and cancellation all finalize state regardless of which provider/error path triggered them.
3. Autonomy run/flow `rootDir` is carried explicitly across detached async boundaries instead of inferred from a shared singleton.
4. State-only CLI subcommands (`autonomy status|runs|flows|flow …`) bypass full interactive bootstrap so they do not hold unrelated handles open.
5. REPL fullscreen scrollback and skill-search/skill-learning session caches are explicitly bounded.
## Validation
```bash
bun run typecheck
CI=true GITHUB_ACTIONS=true bun test # 3996 pass / 0 fail across 305 files
bun test src/__tests__/queryAutonomyProviderBoundary.test.ts \
src/hooks/__tests__/useScheduledTasks.test.ts \
src/utils/__tests__/autonomy{Runs,Flows,Authority,QueueLifecycle,Persistence}.test.ts \
src/utils/processUserInput/__tests__/processSlashCommand.test.ts \
tests/integration/autonomy-lifecycle-user-flow.test.ts
```
## Origin
This PR is the consolidated, upstream-targeted version of two fork-side review PRs (fix/loop-scheduled-autonomy-oom and fix/autonomy-lifecycle). The fork-side review history is preserved at https://github.com/amDosion/claude-code-bast/pull/7 . The fork's own internal `chore: keep fork current with upstream` sync commits and the `docs: update contributors` automation are intentionally not included in this PR.
The autonomy CLI handler `rootDir` threading that the fork added (78f64d8a, 98d04ddb) is intentionally omitted here because upstream `a2cfaf91` (fix: 修复 RemoteTriggerTool 和 autonomy 测试的全量运行失败) already performed the equivalent change with an additional `currentDir` option. Keeping the upstream version avoids regressing that improvement.
* fix: bound agent communication memory growth
UDS messaging now uses private local capabilities instead of exposing auth tokens through SDK metadata, environment variables, session registry, peer listing, or tool output. The receive path bounds NDJSON frames, response buffers, active clients, and pending inbox bytes, and strips auth metadata before messages enter the prompt queue.
Teammate mailboxes now validate file and message sizes, fail closed on corrupt mutation inputs, compact by count and retained bytes, and use stable message identity for in-process acknowledgements. Agent summaries now fork only a bounded recent context using lazy size estimation and content fingerprints instead of retaining or serializing unbounded histories.
Constraint: PR #361 was already merged; this branch is based on upstream/main@c2ac9a74.
Rejected: Default-disabling COORDINATOR_MODE/TEAMMEM only | explicit feature enablement still hit unbounded paths.
Rejected: Persisting UDS auth in SDK/env/session registry | bridge/remote metadata can leak local capability secrets.
Rejected: Inline uds #token addresses | observable/tool/classifier paths can reflect raw addresses outside the UDS request frame.
Rejected: Positional mailbox marking after compaction | compaction can shift indices across the lock boundary.
Confidence: high
Scope-risk: moderate
Directive: Do not expose UDS capability tokens through SDK messages, environment variables, session registry, peer-list output, or SendMessage result/classifier surfaces.
Directive: Do not reintroduce positional mailbox acknowledgements unless compaction is removed or read+mark is atomic under one lock.
Tested: bun test src/utils/__tests__/ndjsonFramer.test.ts src/utils/__tests__/udsMessaging.test.ts packages/builtin-tools/src/tools/SendMessageTool/__tests__/udsRecipientSanitization.test.ts
Tested: bunx tsc --noEmit --pretty false
Tested: bun run lint
Tested: bunx biome lint modified src/package files
Tested: bun run test:all (3704 pass, 0 fail, 6734 expects)
Tested: bun audit (No vulnerabilities found)
Tested: bun run build
Tested: bun run build:vite
Tested: git diff --check
Not-tested: End-to-end external UDS client driving a full production headless model turn.
* fix: harden bounded agent communication review fixes
CodeRabbit and Codecov surfaced real gaps in UDS framing, peer discovery, mailbox retention, and summary context coverage. This tightens those paths without suppressing review or coverage signals.
Constraint: PR #369 must address CodeRabbit and Codecov findings without warning suppression or fake fallbacks
Rejected: Suppress Codecov or CodeRabbit warnings | leaves real receive-path and test-isolation gaps
Rejected: Add unreachable feature-gated tests | bun:bundle keeps those branches compile-time gated in local tests
Confidence: high
Scope-risk: moderate
Directive: Keep UDS auth-token rejection outside feature flags; do not reintroduce inline token fallbacks
Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage; bun run test:all; bun run lint; bun run build; bun run build:vite; bun audit; git diff --cached --check
Not-tested: Remote Codecov/CodeRabbit refreshed reports until pushed
* fix: prevent agent communication bounds from hiding CI regressions
Tighten the UDS auth, framing, and response-reader boundaries while keeping the AgentSummary lifecycle covered so Codecov and CI fail on real regressions instead of missing coverage. The poorMode settings mock mirrors unrelated real settings defaults to avoid Bun mock retention changing later permission tests.
Constraint: PR #369 must fix Codecov/CI precisely without warning suppression, fallback masking, or mock pollution
Rejected: Delete AgentSummary lifecycle coverage | would hide Codecov loss and stale-summary behavior
Rejected: Store inline UDS rejection in a hidden input sentinel | cloned observable inputs can drop it and bypass rejection
Rejected: Ignore malformed UDS frames until timeout | leaves client slots and SendMessage calls open to exhaustion
Confidence: high
Scope-risk: moderate
Directive: Keep empty #token= markers rejected; do not require a non-empty token value in hasInlineUdsToken
Tested: bun test packages/builtin-tools/src/tools/SendMessageTool/__tests__/udsRecipientSanitization.test.ts src/utils/__tests__/udsMessaging.test.ts src/utils/__tests__/udsResponseReader.test.ts src/utils/__tests__/ndjsonFramer.test.ts
Tested: bunx tsc --noEmit --pretty false
Tested: bun run lint
Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage
Tested: bun run test:all
Tested: bun audit
Tested: bun run build
Tested: bun run build:vite
Not-tested: GitHub-hosted Codecov upload until pushed PR checks rerun
---------
Co-authored-by: unraid <local@unraid.local>