mirror of
https://github.com/claude-code-best/claude-code.git
synced 2026-06-17 05:45:51 +00:00
This PR consolidates a coordinated batch of fixes around autonomy run/flow lifecycle, scheduled task deduplication, provider-boundary state finalization, and matching memory-bound treatments for adjacent long-running subsystems (REPL fullscreen scrollback, skill-search/skill-learning runtime activation). All changes were developed and reviewed together because they touched the same lifecycle invariants and were uncovered by the same long-running session reproductions.
## Lifecycle correctness
- Queued autonomy prompts are not injected unless the persisted run was successfully claimed; queued run claiming is now terminal-safe so a once-consumed/cancelled/failed run can not slip back into `queued`.
- Autonomy run/flow finalization happens on completion, provider error, generator close, and cancellation — not just the happy path. New `src/__tests__/queryAutonomyProviderBoundary.test.ts` covers these provider-boundary transitions.
- `requestManagedAutonomyFlowCancel` and `resumeManagedAutonomyFlowPrompt` carry `rootDir` and `currentDir` explicitly across detached async boundaries (proactive-tick, cron, daemon restart) instead of inferring from process state.
- Active runs/flows are protected from janitor pruning so a running step can not be garbage-collected mid-flight (`src/utils/autonomyAuthority.ts`).
- Heartbeat parser now ignores fenced code blocks; the two-phase commit window for autonomy state transitions is documented in `docs/internals/autonomy-jira.md`.
## Ownership and dedup
- `src/utils/autonomyRuns.ts`: ownership stamping (run id + rootDir carried end-to-end), source-based dedup against active runs.
- `src/hooks/useScheduledTasks.ts`: scheduled ticks deduplicate against runs already active on the same source label.
- `src/utils/processUserInput/processSlashCommand.tsx`: forked slash commands now thread the autonomy `runId` so completion finalizers can find the originating run for deferred completion.
- New `src/utils/autonomyQueueLifecycle.ts` and tests collect the queue-side lifecycle invariants in one place.
## Memory bounds (related, same review pass)
- `src/screens/REPL.tsx`: caps fullscreen scrollback after the compact boundary and updates trailing progress rows in place. Long-running fullscreen sessions could otherwise retain thousands of post-compaction messages and duplicate progress rows, keeping Ink trees alive long after their useful context had moved on.
- `src/services/skillSearch/*` and `src/services/skillLearning/*`: runtime activation is strictly opt-in via existing env toggles; session caches are capped so long-running processes can not grow them forever. Build presence is preserved so operators can still discover and opt into the slash commands.
## CI / test contract
- `tests/integration/dependency-overrides.test.ts`: smoke test no longer drives Mermaid's browser renderer; it validates the package-resolution contract directly so CI does not regress on unrelated browser timing.
- New `tests/integration/autonomy-lifecycle-user-flow.test.ts`: end-to-end CLI subprocess flow exercising `status --deep`, `flows`, `flow <id>`, `flow resume`, `flow cancel` against persisted state.
- `src/entrypoints/cli.tsx`: `claude autonomy …` routes through an entrypoint fast path that reuses the slash-command formatter without booting the full interactive CLI. Stdout is flushed before forced exit so coverage subprocesses do not terminate with empty stdout.
- `packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/RemoteTriggerTool.test.ts`: stabilized to prevent audit flake under coverage.
## Tests added
- `src/__tests__/queryAutonomyProviderBoundary.test.ts`
- `src/hooks/__tests__/useScheduledTasks.test.ts`
- `src/utils/__tests__/autonomyAuthority.test.ts`
- `src/utils/__tests__/autonomyFlows.test.ts` (extended)
- `src/utils/__tests__/autonomyPersistence.test.ts` (extended)
- `src/utils/__tests__/autonomyQueueLifecycle.test.ts`
- `src/utils/__tests__/autonomyRuns.test.ts` (extended)
- `src/utils/processUserInput/__tests__/processSlashCommand.test.ts`
- `tests/integration/autonomy-lifecycle-user-flow.test.ts`
## Docs
- `docs/agent/sur-loop-scheduled-oom.md`: System Understanding Report covering the scheduled/loop OOM problem, the call graphs investigated, and the lifecycle invariants this PR establishes.
- `docs/agent/sur-skill-overflow-bugs.md`: SUR for the related skill-overflow context.
- `docs/internals/autonomy-jira.md`: documents the two-phase commit window and ownership stamping invariants.
- `docs/memory-leak-audit.md`: audit notes covering the REPL/scrollback and skill-search bounds.
## Invariants this PR establishes
1. Queued autonomy prompts are not injected unless the persisted run was successfully claimed.
2. Terminal run/flow states are terminal — completion, failure, and cancellation all finalize state regardless of which provider/error path triggered them.
3. Autonomy run/flow `rootDir` is carried explicitly across detached async boundaries instead of inferred from a shared singleton.
4. State-only CLI subcommands (`autonomy status|runs|flows|flow …`) bypass full interactive bootstrap so they do not hold unrelated handles open.
5. REPL fullscreen scrollback and skill-search/skill-learning session caches are explicitly bounded.
## Validation
```bash
bun run typecheck
CI=true GITHUB_ACTIONS=true bun test # 3996 pass / 0 fail across 305 files
bun test src/__tests__/queryAutonomyProviderBoundary.test.ts \
src/hooks/__tests__/useScheduledTasks.test.ts \
src/utils/__tests__/autonomy{Runs,Flows,Authority,QueueLifecycle,Persistence}.test.ts \
src/utils/processUserInput/__tests__/processSlashCommand.test.ts \
tests/integration/autonomy-lifecycle-user-flow.test.ts
```
## Origin
This PR is the consolidated, upstream-targeted version of two fork-side review PRs (fix/loop-scheduled-autonomy-oom and fix/autonomy-lifecycle). The fork-side review history is preserved at https://github.com/amDosion/claude-code-bast/pull/7 . The fork's own internal `chore: keep fork current with upstream` sync commits and the `docs: update contributors` automation are intentionally not included in this PR.
The autonomy CLI handler `rootDir` threading that the fork added (78f64d8a, 98d04ddb) is intentionally omitted here because upstream `a2cfaf91` (fix: 修复 RemoteTriggerTool 和 autonomy 测试的全量运行失败) already performed the equivalent change with an additional `currentDir` option. Keeping the upstream version avoids regressing that improvement.
355 lines
10 KiB
TypeScript
355 lines
10 KiB
TypeScript
import type { Attachment } from '../../utils/attachments.js'
|
|
import type { Message } from '../../types/message.js'
|
|
import type { ToolUseContext } from '../../Tool.js'
|
|
import type { DiscoverySignal } from './signals.js'
|
|
import { isSkillSearchEnabled } from './featureCheck.js'
|
|
import {
|
|
getSkillIndex,
|
|
searchSkills,
|
|
type SearchResult,
|
|
} from './localSearch.js'
|
|
import { normalizeQueryIntent } from './intentNormalize.js'
|
|
import { logForDebugging } from '../../utils/debug.js'
|
|
import { readFile } from 'node:fs/promises'
|
|
import { join } from 'node:path'
|
|
import { parseFrontmatter } from '../../utils/frontmatterParser.js'
|
|
|
|
/**
|
|
* Per-session memoization to avoid re-emitting the same skill discovery /
|
|
* gap signal twice. Each Set is bounded to keep long-running sessions from
|
|
* monotonically accumulating skill names and signal keys forever (which
|
|
* was the original session-scoped-but-unbounded design).
|
|
*
|
|
* FIFO eviction by insertion order — once the cap is hit, the oldest
|
|
* entries roll off and may be re-recorded if rediscovered, which is the
|
|
* correct degraded behaviour: at worst we re-emit a duplicate signal,
|
|
* never silently drop a real one.
|
|
*/
|
|
const SESSION_TRACKING_MAX = 1000
|
|
const SESSION_TRACKING_TRIM_TO = 750
|
|
const discoveredThisSession = new Set<string>()
|
|
const recordedGapSignals = new Set<string>()
|
|
|
|
function addBoundedSessionEntry(set: Set<string>, value: string): void {
|
|
set.add(value)
|
|
if (set.size > SESSION_TRACKING_MAX) {
|
|
const toDrop = set.size - SESSION_TRACKING_TRIM_TO
|
|
const iter = set.values()
|
|
for (let i = 0; i < toDrop; i++) {
|
|
const next = iter.next()
|
|
if (next.done) break
|
|
set.delete(next.value)
|
|
}
|
|
}
|
|
}
|
|
|
|
const AUTO_LOAD_MIN_SCORE = Number(
|
|
process.env.SKILL_SEARCH_AUTOLOAD_MIN_SCORE ?? '0.30',
|
|
)
|
|
const AUTO_LOAD_LIMIT = Number(process.env.SKILL_SEARCH_AUTOLOAD_LIMIT ?? '2')
|
|
const AUTO_LOAD_MAX_CHARS = Number(
|
|
process.env.SKILL_SEARCH_AUTOLOAD_MAX_CHARS ?? '12000',
|
|
)
|
|
|
|
export function extractQueryFromMessages(
|
|
input: string | null,
|
|
messages: Message[],
|
|
): string {
|
|
const parts: string[] = []
|
|
|
|
if (input) parts.push(input)
|
|
|
|
// Walk backward. In inter-turn prefetch the most recent 'user' message is
|
|
// typically a tool_result (no text block), so we must keep walking until we
|
|
// find a real user utterance with string content or a text block.
|
|
for (let i = messages.length - 1; i >= 0; i--) {
|
|
const msg = messages[i] as Record<string, unknown>
|
|
if (msg.type !== 'user') continue
|
|
const content = msg.content
|
|
if (typeof content === 'string') {
|
|
parts.push(content.slice(0, 500))
|
|
break
|
|
}
|
|
if (Array.isArray(content)) {
|
|
let foundText = false
|
|
for (const block of content) {
|
|
const entry = block as Record<string, unknown>
|
|
// Skip tool_result and other non-text blocks — they carry no discovery
|
|
// signal and would return undefined here regardless.
|
|
if (entry.type && entry.type !== 'text') continue
|
|
const text = entry.text
|
|
if (typeof text === 'string' && text.trim()) {
|
|
parts.push(text.slice(0, 500))
|
|
foundText = true
|
|
break
|
|
}
|
|
}
|
|
if (foundText) break
|
|
}
|
|
}
|
|
|
|
return parts.join(' ')
|
|
}
|
|
|
|
function buildDiscoveryAttachment(
|
|
skills: SkillDiscoveryResult[],
|
|
signal: DiscoverySignal,
|
|
gap?: SkillDiscoveryGap,
|
|
): Attachment {
|
|
return {
|
|
type: 'skill_discovery',
|
|
skills,
|
|
signal,
|
|
source: 'native',
|
|
gap,
|
|
} as Attachment
|
|
}
|
|
|
|
type SkillDiscoveryResult = {
|
|
name: string
|
|
description: string
|
|
shortId?: string
|
|
score?: number
|
|
autoLoaded?: boolean
|
|
content?: string
|
|
path?: string
|
|
}
|
|
|
|
type SkillDiscoveryGap = {
|
|
key: string
|
|
status: 'pending' | 'draft' | 'active'
|
|
draftName?: string
|
|
draftPath?: string
|
|
activeName?: string
|
|
activePath?: string
|
|
}
|
|
|
|
async function enrichResultsForAutoLoad(
|
|
results: SearchResult[],
|
|
context: ToolUseContext,
|
|
): Promise<SkillDiscoveryResult[]> {
|
|
let loadedCount = 0
|
|
const enriched: SkillDiscoveryResult[] = []
|
|
|
|
for (const result of results) {
|
|
const base: SkillDiscoveryResult = {
|
|
name: result.name,
|
|
description: result.description,
|
|
score: result.score,
|
|
}
|
|
|
|
if (loadedCount >= AUTO_LOAD_LIMIT || result.score < AUTO_LOAD_MIN_SCORE) {
|
|
enriched.push(base)
|
|
continue
|
|
}
|
|
|
|
const loaded = await loadSkillContent(result)
|
|
if (!loaded) {
|
|
enriched.push(base)
|
|
continue
|
|
}
|
|
|
|
loadedCount++
|
|
await markAutoLoadedSkill(result.name, loaded.path, loaded.content, context)
|
|
enriched.push({
|
|
...base,
|
|
autoLoaded: true,
|
|
content: loaded.content,
|
|
path: loaded.path,
|
|
})
|
|
}
|
|
|
|
return enriched
|
|
}
|
|
|
|
async function loadSkillContent(
|
|
result: SearchResult,
|
|
): Promise<{ path: string; content: string } | null> {
|
|
if (!result.skillRoot) return null
|
|
|
|
const candidates = [
|
|
join(result.skillRoot, 'SKILL.md'),
|
|
join(result.skillRoot, 'skill.md'),
|
|
]
|
|
|
|
for (const path of candidates) {
|
|
try {
|
|
const raw = await readFile(path, 'utf8')
|
|
return {
|
|
path,
|
|
content: parseFrontmatter(raw).content.slice(0, AUTO_LOAD_MAX_CHARS),
|
|
}
|
|
} catch {
|
|
// Try next candidate.
|
|
}
|
|
}
|
|
return null
|
|
}
|
|
|
|
async function markAutoLoadedSkill(
|
|
name: string,
|
|
path: string,
|
|
content: string,
|
|
context: ToolUseContext,
|
|
): Promise<void> {
|
|
try {
|
|
const { addInvokedSkill } = await import('../../bootstrap/state.js')
|
|
addInvokedSkill(name, path, content, context.agentId ?? null)
|
|
} catch {
|
|
// Best effort only.
|
|
}
|
|
}
|
|
|
|
async function maybeRecordSkillGap(
|
|
queryText: string,
|
|
results: SearchResult[],
|
|
context: ToolUseContext,
|
|
trigger: DiscoverySignal['trigger'],
|
|
): Promise<SkillDiscoveryGap | undefined> {
|
|
if (trigger !== 'user_input') return undefined
|
|
if (!queryText.trim()) return undefined
|
|
|
|
const gapSignalKey = `${trigger}:${queryText.trim().toLowerCase()}`
|
|
if (recordedGapSignals.has(gapSignalKey)) return undefined
|
|
addBoundedSessionEntry(recordedGapSignals, gapSignalKey)
|
|
|
|
try {
|
|
const [{ isSkillLearningEnabled }, { recordSkillGap }] = await Promise.all([
|
|
import('../skillLearning/featureCheck.js'),
|
|
import('../skillLearning/skillGapStore.js'),
|
|
])
|
|
if (!isSkillLearningEnabled()) return undefined
|
|
const gap = await recordSkillGap({
|
|
prompt: queryText,
|
|
cwd:
|
|
((context as Record<string, unknown>).cwd as string) ?? process.cwd(),
|
|
sessionId:
|
|
((context as Record<string, unknown>).sessionId as string) ??
|
|
'unknown-session',
|
|
recommendations: results,
|
|
})
|
|
const status = gap.status
|
|
if (status !== 'pending' && status !== 'draft' && status !== 'active') {
|
|
return undefined
|
|
}
|
|
return {
|
|
key: gap.key,
|
|
status,
|
|
draftName: gap.draft?.name,
|
|
draftPath: gap.draft?.skillPath,
|
|
activeName: gap.active?.name,
|
|
activePath: gap.active?.skillPath,
|
|
}
|
|
} catch (error) {
|
|
logForDebugging(`[skill-search] skill gap learning error: ${error}`)
|
|
return undefined
|
|
}
|
|
}
|
|
|
|
export async function startSkillDiscoveryPrefetch(
|
|
input: string | null,
|
|
messages: Message[],
|
|
toolUseContext: ToolUseContext,
|
|
): Promise<Attachment[]> {
|
|
if (!isSkillSearchEnabled()) return []
|
|
|
|
const startedAt = Date.now()
|
|
const queryText = extractQueryFromMessages(input, messages)
|
|
if (!queryText.trim()) return []
|
|
|
|
try {
|
|
const cwd =
|
|
((toolUseContext as Record<string, unknown>).cwd as string) ??
|
|
process.cwd()
|
|
const index = await getSkillIndex(cwd)
|
|
const results = searchSkills(queryText, index)
|
|
|
|
const newResults = results.filter(r => !discoveredThisSession.has(r.name))
|
|
if (newResults.length === 0) return []
|
|
|
|
for (const r of newResults) addBoundedSessionEntry(discoveredThisSession, r.name)
|
|
|
|
const signal: DiscoverySignal = {
|
|
trigger: 'assistant_turn',
|
|
queryText: queryText.slice(0, 200),
|
|
startedAt,
|
|
durationMs: Date.now() - startedAt,
|
|
indexSize: index.length,
|
|
method: 'tfidf',
|
|
}
|
|
|
|
logForDebugging(
|
|
`[skill-search] prefetch found ${newResults.length} skills in ${signal.durationMs}ms`,
|
|
)
|
|
|
|
return [
|
|
buildDiscoveryAttachment(
|
|
await enrichResultsForAutoLoad(newResults, toolUseContext),
|
|
signal,
|
|
),
|
|
]
|
|
} catch (error) {
|
|
logForDebugging(`[skill-search] prefetch error: ${error}`)
|
|
return []
|
|
}
|
|
}
|
|
|
|
export async function collectSkillDiscoveryPrefetch(
|
|
pending: Promise<Attachment[]>,
|
|
): Promise<Attachment[]> {
|
|
try {
|
|
return await pending
|
|
} catch {
|
|
return []
|
|
}
|
|
}
|
|
|
|
export async function getTurnZeroSkillDiscovery(
|
|
input: string,
|
|
messages: Message[],
|
|
context: ToolUseContext,
|
|
): Promise<Attachment | null> {
|
|
if (!isSkillSearchEnabled()) return null
|
|
if (!input.trim()) return null
|
|
|
|
const startedAt = Date.now()
|
|
|
|
try {
|
|
const cwd =
|
|
((context as Record<string, unknown>).cwd as string) ?? process.cwd()
|
|
const index = await getSkillIndex(cwd)
|
|
// Intent normalization (feature-flagged, ASCII-only fast path, graceful
|
|
// fallback to original). Turn-zero is the one blocking entry — acceptable
|
|
// to add a Haiku call here since a bad match here pollutes the LLM's
|
|
// context for the entire session.
|
|
const searchQuery = await normalizeQueryIntent(input)
|
|
const results = searchSkills(searchQuery, index)
|
|
const enriched = await enrichResultsForAutoLoad(results, context)
|
|
const gap = enriched.some(result => result.autoLoaded)
|
|
? undefined
|
|
: await maybeRecordSkillGap(input, results, context, 'user_input')
|
|
|
|
if (results.length === 0 && !gap) return null
|
|
|
|
for (const r of results) addBoundedSessionEntry(discoveredThisSession, r.name)
|
|
|
|
const signal: DiscoverySignal = {
|
|
trigger: 'user_input',
|
|
queryText: input.slice(0, 200),
|
|
startedAt,
|
|
durationMs: Date.now() - startedAt,
|
|
indexSize: index.length,
|
|
method: 'tfidf',
|
|
}
|
|
|
|
logForDebugging(
|
|
`[skill-search] turn-zero found ${results.length} skills in ${signal.durationMs}ms`,
|
|
)
|
|
|
|
return buildDiscoveryAttachment(enriched, signal, gap)
|
|
} catch (error) {
|
|
logForDebugging(`[skill-search] turn-zero error: ${error}`)
|
|
return null
|
|
}
|
|
}
|