Files
claude-code/src/services/skillSearch/prefetch.ts
unraid f2e9af4927 feat: harden autonomy lifecycle, OOM bounds, and provider-boundary finalization
This PR consolidates a coordinated batch of fixes around autonomy run/flow lifecycle, scheduled task deduplication, provider-boundary state finalization, and matching memory-bound treatments for adjacent long-running subsystems (REPL fullscreen scrollback, skill-search/skill-learning runtime activation). All changes were developed and reviewed together because they touched the same lifecycle invariants and were uncovered by the same long-running session reproductions.

## Lifecycle correctness

- Queued autonomy prompts are not injected unless the persisted run was successfully claimed; queued run claiming is now terminal-safe so a once-consumed/cancelled/failed run can not slip back into `queued`.
- Autonomy run/flow finalization happens on completion, provider error, generator close, and cancellation — not just the happy path. New `src/__tests__/queryAutonomyProviderBoundary.test.ts` covers these provider-boundary transitions.
- `requestManagedAutonomyFlowCancel` and `resumeManagedAutonomyFlowPrompt` carry `rootDir` and `currentDir` explicitly across detached async boundaries (proactive-tick, cron, daemon restart) instead of inferring from process state.
- Active runs/flows are protected from janitor pruning so a running step can not be garbage-collected mid-flight (`src/utils/autonomyAuthority.ts`).
- Heartbeat parser now ignores fenced code blocks; the two-phase commit window for autonomy state transitions is documented in `docs/internals/autonomy-jira.md`.

## Ownership and dedup

- `src/utils/autonomyRuns.ts`: ownership stamping (run id + rootDir carried end-to-end), source-based dedup against active runs.
- `src/hooks/useScheduledTasks.ts`: scheduled ticks deduplicate against runs already active on the same source label.
- `src/utils/processUserInput/processSlashCommand.tsx`: forked slash commands now thread the autonomy `runId` so completion finalizers can find the originating run for deferred completion.
- New `src/utils/autonomyQueueLifecycle.ts` and tests collect the queue-side lifecycle invariants in one place.

## Memory bounds (related, same review pass)

- `src/screens/REPL.tsx`: caps fullscreen scrollback after the compact boundary and updates trailing progress rows in place. Long-running fullscreen sessions could otherwise retain thousands of post-compaction messages and duplicate progress rows, keeping Ink trees alive long after their useful context had moved on.
- `src/services/skillSearch/*` and `src/services/skillLearning/*`: runtime activation is strictly opt-in via existing env toggles; session caches are capped so long-running processes can not grow them forever. Build presence is preserved so operators can still discover and opt into the slash commands.

## CI / test contract

- `tests/integration/dependency-overrides.test.ts`: smoke test no longer drives Mermaid's browser renderer; it validates the package-resolution contract directly so CI does not regress on unrelated browser timing.
- New `tests/integration/autonomy-lifecycle-user-flow.test.ts`: end-to-end CLI subprocess flow exercising `status --deep`, `flows`, `flow <id>`, `flow resume`, `flow cancel` against persisted state.
- `src/entrypoints/cli.tsx`: `claude autonomy …` routes through an entrypoint fast path that reuses the slash-command formatter without booting the full interactive CLI. Stdout is flushed before forced exit so coverage subprocesses do not terminate with empty stdout.
- `packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/RemoteTriggerTool.test.ts`: stabilized to prevent audit flake under coverage.

## Tests added

- `src/__tests__/queryAutonomyProviderBoundary.test.ts`
- `src/hooks/__tests__/useScheduledTasks.test.ts`
- `src/utils/__tests__/autonomyAuthority.test.ts`
- `src/utils/__tests__/autonomyFlows.test.ts` (extended)
- `src/utils/__tests__/autonomyPersistence.test.ts` (extended)
- `src/utils/__tests__/autonomyQueueLifecycle.test.ts`
- `src/utils/__tests__/autonomyRuns.test.ts` (extended)
- `src/utils/processUserInput/__tests__/processSlashCommand.test.ts`
- `tests/integration/autonomy-lifecycle-user-flow.test.ts`

## Docs

- `docs/agent/sur-loop-scheduled-oom.md`: System Understanding Report covering the scheduled/loop OOM problem, the call graphs investigated, and the lifecycle invariants this PR establishes.
- `docs/agent/sur-skill-overflow-bugs.md`: SUR for the related skill-overflow context.
- `docs/internals/autonomy-jira.md`: documents the two-phase commit window and ownership stamping invariants.
- `docs/memory-leak-audit.md`: audit notes covering the REPL/scrollback and skill-search bounds.

## Invariants this PR establishes

1. Queued autonomy prompts are not injected unless the persisted run was successfully claimed.
2. Terminal run/flow states are terminal — completion, failure, and cancellation all finalize state regardless of which provider/error path triggered them.
3. Autonomy run/flow `rootDir` is carried explicitly across detached async boundaries instead of inferred from a shared singleton.
4. State-only CLI subcommands (`autonomy status|runs|flows|flow …`) bypass full interactive bootstrap so they do not hold unrelated handles open.
5. REPL fullscreen scrollback and skill-search/skill-learning session caches are explicitly bounded.

## Validation

```bash
bun run typecheck
CI=true GITHUB_ACTIONS=true bun test            # 3996 pass / 0 fail across 305 files
bun test src/__tests__/queryAutonomyProviderBoundary.test.ts \
         src/hooks/__tests__/useScheduledTasks.test.ts \
         src/utils/__tests__/autonomy{Runs,Flows,Authority,QueueLifecycle,Persistence}.test.ts \
         src/utils/processUserInput/__tests__/processSlashCommand.test.ts \
         tests/integration/autonomy-lifecycle-user-flow.test.ts
```

## Origin

This PR is the consolidated, upstream-targeted version of two fork-side review PRs (fix/loop-scheduled-autonomy-oom and fix/autonomy-lifecycle). The fork-side review history is preserved at https://github.com/amDosion/claude-code-bast/pull/7 . The fork's own internal `chore: keep fork current with upstream` sync commits and the `docs: update contributors` automation are intentionally not included in this PR.

The autonomy CLI handler `rootDir` threading that the fork added (78f64d8a, 98d04ddb) is intentionally omitted here because upstream `a2cfaf91` (fix: 修复 RemoteTriggerTool 和 autonomy 测试的全量运行失败) already performed the equivalent change with an additional `currentDir` option. Keeping the upstream version avoids regressing that improvement.
2026-04-29 14:04:27 +08:00

355 lines
10 KiB
TypeScript

import type { Attachment } from '../../utils/attachments.js'
import type { Message } from '../../types/message.js'
import type { ToolUseContext } from '../../Tool.js'
import type { DiscoverySignal } from './signals.js'
import { isSkillSearchEnabled } from './featureCheck.js'
import {
getSkillIndex,
searchSkills,
type SearchResult,
} from './localSearch.js'
import { normalizeQueryIntent } from './intentNormalize.js'
import { logForDebugging } from '../../utils/debug.js'
import { readFile } from 'node:fs/promises'
import { join } from 'node:path'
import { parseFrontmatter } from '../../utils/frontmatterParser.js'
/**
* Per-session memoization to avoid re-emitting the same skill discovery /
* gap signal twice. Each Set is bounded to keep long-running sessions from
* monotonically accumulating skill names and signal keys forever (which
* was the original session-scoped-but-unbounded design).
*
* FIFO eviction by insertion order — once the cap is hit, the oldest
* entries roll off and may be re-recorded if rediscovered, which is the
* correct degraded behaviour: at worst we re-emit a duplicate signal,
* never silently drop a real one.
*/
const SESSION_TRACKING_MAX = 1000
const SESSION_TRACKING_TRIM_TO = 750
const discoveredThisSession = new Set<string>()
const recordedGapSignals = new Set<string>()
function addBoundedSessionEntry(set: Set<string>, value: string): void {
set.add(value)
if (set.size > SESSION_TRACKING_MAX) {
const toDrop = set.size - SESSION_TRACKING_TRIM_TO
const iter = set.values()
for (let i = 0; i < toDrop; i++) {
const next = iter.next()
if (next.done) break
set.delete(next.value)
}
}
}
const AUTO_LOAD_MIN_SCORE = Number(
process.env.SKILL_SEARCH_AUTOLOAD_MIN_SCORE ?? '0.30',
)
const AUTO_LOAD_LIMIT = Number(process.env.SKILL_SEARCH_AUTOLOAD_LIMIT ?? '2')
const AUTO_LOAD_MAX_CHARS = Number(
process.env.SKILL_SEARCH_AUTOLOAD_MAX_CHARS ?? '12000',
)
export function extractQueryFromMessages(
input: string | null,
messages: Message[],
): string {
const parts: string[] = []
if (input) parts.push(input)
// Walk backward. In inter-turn prefetch the most recent 'user' message is
// typically a tool_result (no text block), so we must keep walking until we
// find a real user utterance with string content or a text block.
for (let i = messages.length - 1; i >= 0; i--) {
const msg = messages[i] as Record<string, unknown>
if (msg.type !== 'user') continue
const content = msg.content
if (typeof content === 'string') {
parts.push(content.slice(0, 500))
break
}
if (Array.isArray(content)) {
let foundText = false
for (const block of content) {
const entry = block as Record<string, unknown>
// Skip tool_result and other non-text blocks — they carry no discovery
// signal and would return undefined here regardless.
if (entry.type && entry.type !== 'text') continue
const text = entry.text
if (typeof text === 'string' && text.trim()) {
parts.push(text.slice(0, 500))
foundText = true
break
}
}
if (foundText) break
}
}
return parts.join(' ')
}
function buildDiscoveryAttachment(
skills: SkillDiscoveryResult[],
signal: DiscoverySignal,
gap?: SkillDiscoveryGap,
): Attachment {
return {
type: 'skill_discovery',
skills,
signal,
source: 'native',
gap,
} as Attachment
}
type SkillDiscoveryResult = {
name: string
description: string
shortId?: string
score?: number
autoLoaded?: boolean
content?: string
path?: string
}
type SkillDiscoveryGap = {
key: string
status: 'pending' | 'draft' | 'active'
draftName?: string
draftPath?: string
activeName?: string
activePath?: string
}
async function enrichResultsForAutoLoad(
results: SearchResult[],
context: ToolUseContext,
): Promise<SkillDiscoveryResult[]> {
let loadedCount = 0
const enriched: SkillDiscoveryResult[] = []
for (const result of results) {
const base: SkillDiscoveryResult = {
name: result.name,
description: result.description,
score: result.score,
}
if (loadedCount >= AUTO_LOAD_LIMIT || result.score < AUTO_LOAD_MIN_SCORE) {
enriched.push(base)
continue
}
const loaded = await loadSkillContent(result)
if (!loaded) {
enriched.push(base)
continue
}
loadedCount++
await markAutoLoadedSkill(result.name, loaded.path, loaded.content, context)
enriched.push({
...base,
autoLoaded: true,
content: loaded.content,
path: loaded.path,
})
}
return enriched
}
async function loadSkillContent(
result: SearchResult,
): Promise<{ path: string; content: string } | null> {
if (!result.skillRoot) return null
const candidates = [
join(result.skillRoot, 'SKILL.md'),
join(result.skillRoot, 'skill.md'),
]
for (const path of candidates) {
try {
const raw = await readFile(path, 'utf8')
return {
path,
content: parseFrontmatter(raw).content.slice(0, AUTO_LOAD_MAX_CHARS),
}
} catch {
// Try next candidate.
}
}
return null
}
async function markAutoLoadedSkill(
name: string,
path: string,
content: string,
context: ToolUseContext,
): Promise<void> {
try {
const { addInvokedSkill } = await import('../../bootstrap/state.js')
addInvokedSkill(name, path, content, context.agentId ?? null)
} catch {
// Best effort only.
}
}
async function maybeRecordSkillGap(
queryText: string,
results: SearchResult[],
context: ToolUseContext,
trigger: DiscoverySignal['trigger'],
): Promise<SkillDiscoveryGap | undefined> {
if (trigger !== 'user_input') return undefined
if (!queryText.trim()) return undefined
const gapSignalKey = `${trigger}:${queryText.trim().toLowerCase()}`
if (recordedGapSignals.has(gapSignalKey)) return undefined
addBoundedSessionEntry(recordedGapSignals, gapSignalKey)
try {
const [{ isSkillLearningEnabled }, { recordSkillGap }] = await Promise.all([
import('../skillLearning/featureCheck.js'),
import('../skillLearning/skillGapStore.js'),
])
if (!isSkillLearningEnabled()) return undefined
const gap = await recordSkillGap({
prompt: queryText,
cwd:
((context as Record<string, unknown>).cwd as string) ?? process.cwd(),
sessionId:
((context as Record<string, unknown>).sessionId as string) ??
'unknown-session',
recommendations: results,
})
const status = gap.status
if (status !== 'pending' && status !== 'draft' && status !== 'active') {
return undefined
}
return {
key: gap.key,
status,
draftName: gap.draft?.name,
draftPath: gap.draft?.skillPath,
activeName: gap.active?.name,
activePath: gap.active?.skillPath,
}
} catch (error) {
logForDebugging(`[skill-search] skill gap learning error: ${error}`)
return undefined
}
}
export async function startSkillDiscoveryPrefetch(
input: string | null,
messages: Message[],
toolUseContext: ToolUseContext,
): Promise<Attachment[]> {
if (!isSkillSearchEnabled()) return []
const startedAt = Date.now()
const queryText = extractQueryFromMessages(input, messages)
if (!queryText.trim()) return []
try {
const cwd =
((toolUseContext as Record<string, unknown>).cwd as string) ??
process.cwd()
const index = await getSkillIndex(cwd)
const results = searchSkills(queryText, index)
const newResults = results.filter(r => !discoveredThisSession.has(r.name))
if (newResults.length === 0) return []
for (const r of newResults) addBoundedSessionEntry(discoveredThisSession, r.name)
const signal: DiscoverySignal = {
trigger: 'assistant_turn',
queryText: queryText.slice(0, 200),
startedAt,
durationMs: Date.now() - startedAt,
indexSize: index.length,
method: 'tfidf',
}
logForDebugging(
`[skill-search] prefetch found ${newResults.length} skills in ${signal.durationMs}ms`,
)
return [
buildDiscoveryAttachment(
await enrichResultsForAutoLoad(newResults, toolUseContext),
signal,
),
]
} catch (error) {
logForDebugging(`[skill-search] prefetch error: ${error}`)
return []
}
}
export async function collectSkillDiscoveryPrefetch(
pending: Promise<Attachment[]>,
): Promise<Attachment[]> {
try {
return await pending
} catch {
return []
}
}
export async function getTurnZeroSkillDiscovery(
input: string,
messages: Message[],
context: ToolUseContext,
): Promise<Attachment | null> {
if (!isSkillSearchEnabled()) return null
if (!input.trim()) return null
const startedAt = Date.now()
try {
const cwd =
((context as Record<string, unknown>).cwd as string) ?? process.cwd()
const index = await getSkillIndex(cwd)
// Intent normalization (feature-flagged, ASCII-only fast path, graceful
// fallback to original). Turn-zero is the one blocking entry — acceptable
// to add a Haiku call here since a bad match here pollutes the LLM's
// context for the entire session.
const searchQuery = await normalizeQueryIntent(input)
const results = searchSkills(searchQuery, index)
const enriched = await enrichResultsForAutoLoad(results, context)
const gap = enriched.some(result => result.autoLoaded)
? undefined
: await maybeRecordSkillGap(input, results, context, 'user_input')
if (results.length === 0 && !gap) return null
for (const r of results) addBoundedSessionEntry(discoveredThisSession, r.name)
const signal: DiscoverySignal = {
trigger: 'user_input',
queryText: input.slice(0, 200),
startedAt,
durationMs: Date.now() - startedAt,
indexSize: index.length,
method: 'tfidf',
}
logForDebugging(
`[skill-search] turn-zero found ${results.length} skills in ${signal.durationMs}ms`,
)
return buildDiscoveryAttachment(enriched, signal, gap)
} catch (error) {
logForDebugging(`[skill-search] turn-zero error: ${error}`)
return null
}
}