Files
claude-code/docs/internals/autonomy-jira.md
unraid 452a7e6a15 fixup: address CodeRabbit review on PR #386
Twelve actionable items (7 Major + 5 Minor) from the CodeRabbit review on
claude-code-best/claude-code#386:

- docs/internals/autonomy-jira.md: typo "due input close" → "due to input close".
- src/utils/autonomyRuns.ts:
  - selectPersistedAutonomyRuns no longer evicts active (queued/running) runs
    when the combined list exceeds AUTONOMY_RUNS_MAX. Active runs are kept in
    full and the inactive history is capped to the remaining budget so
    persisted ownership for live work survives.
  - isValidOwnerProcessId now allows pid <= 4_194_304 so a live run owned by
    the maximum Linux PID is not treated as stale.
- src/utils/autonomyAuthority.ts: maskCodeFencedLines tracks the active fence
  length and only closes the fence when a same-character run of equal-or-
  greater length appears with no trailing content, so a nested ```yaml inside
  an outer ```` block no longer leaks fake `tasks:` entries into the parser.
- src/cli/print.ts: late-shutdown branches in the cron and scheduled-task
  paths now call cancelQueuedAutonomyCommands({ commands: [command] }) instead
  of markAutonomyRunCancelled(...). Updating run state alone left the
  queue-side record orphaned for resume/recovery.
- src/utils/processUserInput/processSlashCommand.tsx: scheduled-task-result
  notification is enqueued before finalizeAutonomyRunCompleted (which queues
  follow-up autonomy commands) so both at priority: 'later' land in order and
  the next autonomy step can not run before the worker's output is observed.
- src/screens/REPL.tsx + src/utils/handlePromptSubmit.ts:
  - onQuery now returns Promise<boolean>: false from the concurrent-guard
    skip path, true otherwise. Other call sites use `void onQuery(...)` and
    are unaffected. handlePromptSubmit's onQuery prop type matches.
  - The autonomy-prompt callsite captures the executed flag, finalizes
    claim.claimedCommands as { type: 'completed' } only when onQuery actually
    ran, and runs the completed-finalize in its own try/catch so a failure
    there does not propagate into the outer catch and trigger a second
    finalize as { type: 'failed' } for the same commands.
  - Removed the unsafe `command.value as string` cast; createUserMessage
    already accepts `string | ContentBlockParam[]`.
  - createUserMessage mock in src/__tests__/handlePromptSubmit.test.ts now
    matches the new Promise<boolean> shape.
- packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/
  RemoteTriggerTool.test.ts:
  - Inline auth mock replaced with the shared tests/mocks/auth (added).
  - The full mock of src/constants/oauth.js is replaced by a narrow
    side-effect-only mock that overrides the env-reading helpers
    (getOauthConfig, fileSuffixForOauthConfig, MCP_CLIENT_METADATA_URL) and
    delegates pure data exports to the real module.
- tests/integration/dependency-overrides.test.ts:
  - mermaid does not export `./package.json` in its exports map, so
    require.resolve('mermaid/package.json') throws
    ERR_PACKAGE_PATH_NOT_EXPORTED in runtimes that honor exports semantics.
    The test now resolves the package entry and walks up to the package
    root via a small findPackageJson helper.
  - readFileSync from node:fs is replaced with `await Bun.file(...).text()`
    to match the project's Bun-API requirement.

Validation:
- bun run typecheck (clean).
- bun test → 3996 pass / 0 fail across 305 test files.

Targets PRs:
- amDosion/claude-code-bast#8 (fork-internal review)
- claude-code-best/claude-code#386 (upstream review, same head branch)
2026-04-29 15:17:50 +08:00

315 lines
12 KiB
Markdown

# Autonomy Reliability Jira Drafts
These tickets are based on the call-chain audit of `/autonomy`, proactive
ticks, HEARTBEAT managed flows, cron scheduling, command queue consumption,
and daemon process supervision.
## AUT-001: Preserve autonomy lifecycle when queued commands are consumed mid-turn
Type: Bug
Priority: P0
Status: Draft
Patch status: Implemented in `fix/autonomy-lifecycle`.
Problem:
`query.ts` can drain queued prompt/task-notification commands as attachments
during an active turn. Autonomy prompts consumed this way were removed from the
in-memory queue without marking the persisted run as running/completed/failed,
so managed flows could stay stuck in `queued` and never advance.
Evidence:
- `src/query.ts` drains queued commands via `getCommandsByMaxPriority()`.
- `src/query.ts` removes consumed commands from the queue.
- Lifecycle updates existed only in the normal queued-submit path
`src/utils/handlePromptSubmit.ts` and headless `src/cli/print.ts`.
Acceptance criteria:
- Mid-turn consumed autonomy commands mark runs `running`.
- Normal query completion finalizes consumed runs and queues next managed-flow
steps.
- Query errors or abort terminal reasons mark consumed runs failed.
- Stale/cancelled autonomy commands are removed from the in-memory queue
without being sent to the model.
- Regression tests cover stale command filtering and managed-flow advancement.
## AUT-002: Make autonomy run lifecycle transitions terminal-safe
Type: Bug
Priority: P0
Status: Draft
Patch status: Implemented in `fix/autonomy-lifecycle`.
Problem:
Run lifecycle helpers rewrote status unconditionally. A stale in-memory command
could mark a cancelled/completed/failed run back to `running`, causing a
cancelled flow to execute or a terminal flow to be rewritten.
Evidence:
- `markAutonomyRunRunning`, `markAutonomyRunCompleted`,
`markAutonomyRunFailed`, and `markAutonomyRunCancelled` updated records
without checking current status.
- External CLI cancel cannot remove queued commands living inside another
process, so stale commands are a realistic input.
Acceptance criteria:
- `queued -> running/completed/failed/cancelled` remains allowed.
- `running -> completed/failed/cancelled` remains allowed.
- Any terminal status rejects later lifecycle updates.
- Rejected transitions do not update managed-flow step state.
- Regression tests cover stale lifecycle calls after cancellation.
## AUT-003: Prevent proactive and scheduled-task async fire failures from becoming invisible
Type: Bug
Priority: P1
Status: Draft
Patch status: Implemented in `fix/autonomy-lifecycle`.
Problem:
Proactive tick and cron fire callbacks launch detached async work. Failures in
prompt preparation or queue insertion could surface as unhandled rejections or
be lost from diagnostics. In one-shot cron paths, the scheduler has already
decided the task fired.
Evidence:
- `src/proactive/useProactive.ts` used a detached async IIFE without catch.
- `src/cli/print.ts` proactive and cron paths also detached async work.
- `src/hooks/useScheduledTasks.ts` cron callbacks detached async work.
Acceptance criteria:
- Detached proactive/cron fire work has explicit error logging.
- REPL proactive tick generation is non-reentrant.
- Tick generation stops queueing after hook unmount.
## AUT-004: Bound long-running daemon restart timers during shutdown
Type: Bug
Priority: P1
Status: Draft
Patch status: Implemented in `fix/autonomy-lifecycle`.
Problem:
The daemon supervisor scheduled worker restarts with `setTimeout()` but did
not store, clear, or `unref()` the timer. Shutdown during backoff could keep
the supervisor alive until the timer fired, forcing the stop path toward
SIGKILL.
Evidence:
- `src/daemon/main.ts` scheduled restart timers directly in the worker exit
handler.
- Shutdown only signaled child processes and did not clear restart timers.
Acceptance criteria:
- Worker restart timers are tracked per worker.
- Shutdown clears any pending restart timers.
- Restart and force-kill grace timers do not keep the supervisor alive alone.
## AUT-005: Release autonomy persistence lock bookkeeping after each chain
Type: Bug
Priority: P1
Status: Draft
Patch status: Implemented in `fix/autonomy-lifecycle`.
Problem:
`withAutonomyPersistenceLock` stored a chained promise in its map but compared
the map value against the raw current promise during cleanup. That condition
never matched, so root-level lock bookkeeping could accumulate in long-lived
processes that touch many workspaces.
Evidence:
- `src/utils/autonomyPersistence.ts` stored `previous.then(() => current)`.
- Cleanup compared `persistenceLocks.get(key) === current`.
Acceptance criteria:
- The stored chained promise is the value used for cleanup comparison.
- Existing serialization behavior for same-root calls remains unchanged.
- Tests directly assert same-root lock bookkeeping returns to zero after both
success and failure.
## AUT-006: Add active-record protection before persistence truncation
Type: Reliability
Priority: P2
Status: Draft
Patch status: Implemented in `fix/autonomy-lifecycle`.
Problem:
Autonomy runs and flows are capped by latest-created/updated order only.
Under high churn, active `queued` or `running` records can be truncated before
completion, which removes recovery evidence and can break managed-flow
advancement.
Evidence:
- `src/utils/autonomyRuns.ts` keeps the latest 200 runs by `createdAt`.
- `src/utils/autonomyFlows.ts` keeps the latest 100 flows by `updatedAt`.
Acceptance criteria:
- Active records are retained before completed historical records are trimmed.
- Tests cover trimming with more than the configured cap and active records
near the tail.
## AUT-007: Treat provider API-error responses as failed autonomy turns
Type: Bug
Priority: P0
Status: Draft
Patch status: Implemented in `fix/autonomy-lifecycle`.
Problem:
Third-party provider adapters can convert provider failures into synthetic
assistant API-error messages instead of throwing. `query.ts` treated
`isApiErrorMessage` terminal responses as `completed`, so an autonomy command
that had already been consumed as a queued attachment could be marked
completed and advance its managed flow even though the provider call failed.
Evidence:
- `src/services/api/openai/index.ts`, `src/services/api/gemini/index.ts`, and
`src/services/api/grok/index.ts` yield `createAssistantAPIErrorMessage()` on
adapter errors.
- `src/query.ts` skipped stop hooks for API-error assistant messages but
returned `reason: 'completed'`.
- Top-level autonomy finalization used terminal completion to decide whether
to mark consumed runs completed or failed.
Acceptance criteria:
- Provider API-error assistant messages terminate the query with
`reason: 'model_error'`.
- Any consumed autonomy run is marked failed rather than completed.
- Managed flows do not advance to the next step after provider API errors.
- A regression test simulates provider error after a queued autonomy attachment
has been consumed.
## AUT-008: Finalize consumed autonomy runs on async-generator close
Type: Bug
Priority: P0
Status: Draft
Patch status: Implemented in `fix/autonomy-lifecycle`.
Problem:
`query()` is an async generator. When its consumer calls `.return()` or breaks
out of iteration, JavaScript executes `finally` blocks and skips code after the
`try/finally`. The previous autonomy finalization ran after the `finally`, so
queued autonomy commands that had already been claimed as `running` could stay
persisted as `running` forever if the REPL/SDK consumer closed the generator.
Evidence:
- Claimed run IDs were collected during queued attachment injection.
- Completion/failure finalization happened only after `yield* queryLoop(...)`
returned normally or threw.
- Claude cross-validation flagged this as a durable run/flow leak.
Acceptance criteria:
- Consumed autonomy runs are finalized from a `finally` path.
- Normal completion marks consumed runs completed and enqueues next managed
flow steps.
- Provider/model errors mark consumed runs failed.
- Generator close and user abort terminals mark consumed runs cancelled.
- A regression test closes the generator after a queued autonomy attachment and
verifies the run/flow are cancelled, not left running.
## AUT-009: Claim queued autonomy runs before attachment injection
Type: Bug
Priority: P0
Status: Draft
Patch status: Implemented in `fix/autonomy-lifecycle`.
Problem:
The query loop filtered stale queued autonomy commands before attachment
generation, but it did not claim runs as `running` until after attachments were
already yielded. A concurrent cancellation between those steps could still send
a cancelled prompt into the model context.
Evidence:
- `partitionConsumableQueuedAutonomyCommands()` only checked persisted status.
- `markAutonomyRunRunning()` previously ran after `getAttachmentMessages()`.
- Reviewer cross-validation identified the check-then-act race.
Acceptance criteria:
- Query claims queued autonomy runs before passing commands to attachment
generation.
- Only successfully claimed commands are injected as queued-command
attachments.
- Failed claims are treated as stale and removed from the in-memory queue.
- Claiming reads persisted run state once per turn rather than once per
command.
## AUT-010: Cancel proactive and cron runs dropped before enqueue
Type: Bug
Priority: P1
Status: Draft
Patch status: Implemented in `fix/autonomy-lifecycle`.
Problem:
`/proactive` and scheduled-task producers persist autonomy runs before
returning queue commands. If the component is disposed or headless input closes
after persistence but before enqueue, the queued run is left on disk with no
in-memory command to consume it.
Evidence:
- `createProactiveAutonomyCommands()` commits runs before returning commands.
- `commitAutonomyQueuedPrompt()` persists scheduled-task runs before callers
enqueue them.
- Callers checked `disposed` / `inputClosed` after command creation and could
return without terminalizing the run.
Acceptance criteria:
- Proactive hook cancellation checks run both before commit and after command
creation.
- Headless proactive and cron paths cancel any already-created command that is
dropped due to input close.
- REPL scheduled-task cleanup cancels already-created commands when unmounted.
- A regression test verifies a proactive command created but dropped before
enqueue is marked cancelled.
## AUT-011: Replace query transition `any` stubs with typed contracts
Type: Test/Type Safety
Priority: P2
Status: Draft
Patch status: Implemented in `fix/autonomy-lifecycle`.
Problem:
`src/query/transitions.ts` defined both `Terminal` and `Continue` as `any`.
That allowed new terminal reasons such as `model_error` and continuation
reasons such as `collapse_drain_retry` to drift without compiler checks.
Evidence:
- Claude cross-validation flagged the `Terminal = any` contract as a remaining
issue.
- Tightening the type immediately caught that
`collapse_drain_retry.committed` is a `number`, not a `boolean`.
Acceptance criteria:
- `Terminal` is a concrete union of query terminal reasons.
- `Continue` is a concrete union of continuation reasons and payloads.
- `bun run typecheck` validates all query return sites against that contract.
## AUT-012: Avoid provider test settings-module mock pollution
Type: Test Reliability
Priority: P2
Status: Draft
Patch status: Implemented in `fix/autonomy-lifecycle`.
Problem:
The provider tests previously mocked `settings.js`. A minimal mock broke other
tests that imported additional settings exports in the same Bun process; the
expanded mock avoided the failure but over-coupled the provider test to
unrelated settings internals.
Evidence:
- Full test runs observed cross-file settings mock pollution.
- `src/utils/model/providers.ts` only needs the real `getInitialSettings()`
behavior.
Acceptance criteria:
- Provider tests do not mock `settings.js`.
- `modelType` precedence is exercised through an injected settings snapshot,
leaving global bootstrap state untouched.
- Provider tests pass when run alongside permissions tests and the provider
matrix.