refactor: 简化/复用/防御 — 清理 PR #386 审计发现

简化 (S1, S2):
- src/cli/print.ts: 抽出 dispatchHeadlessCronCommand 本地 helper,把
  cron 三个入口(onFire / onFireTask agent / onFireTask 非-agent)共享的
  「dedup-claim → input-close-recheck → onSuccess」管线集中到一处,
  避免三个分支在「claim 与 dispatch 之间发生 inputClosed」的处理上漂移。
  enqueueAndRun 再抽出来,使两个非-agent 分支共用一个 onSuccess 回调。
  约 -55 行重复模板。
- src/utils/autonomyPersistence.ts: 新增 retainActiveFirst<T> 泛型
  helper —— active 记录无条件保留(不参与 cap),inactive 按 timestamp
  desc 填满剩余预算;统一 selectPersistedAutonomyRuns / Flows 的两阶段
  排序语义。
- src/utils/autonomyRuns.ts、autonomyFlows.ts: 改用 retainActiveFirst,
  删掉重复的内联两阶段排序逻辑。

复用 (R1, review #8):
- tests/mocks/file-system.ts: 新增 readTempFile / tempPathExists 两个
  Bun.file 包装,补齐 Node fs.readFileSync / existsSync 在测试里的
  Bun-only 等价物。
- src/utils/__tests__/autonomyRuns.test.ts: 把全部 Node fs/path 导入
  (existsSync, readFileSync, mkdir, writeFile, path.join/resolve)替换为
  tests/mocks/file-system 的共享 helper + node:path(带 node: 前缀)。
  不再有 6 处 mkdir + writeFile 模板,统一用 writeTempFile(自带 mkdir-p)。
  解决 review #8 (Major) 的 Bun-only 运行时契约违反。

防御 (D1, OOM 早期信号):
- src/services/compact/postCompactCleanup.ts: 在 void import().then() 末尾
  补 .catch(logError)。当前 attributionHooks 是 stub,但当真实现被恢复
  且 sweepFileContentCache 抛错时,这个 .catch 阻止它变成 unhandled
  rejection(函数返回值是 void,调用者无从观察异步失败)。
- src/utils/autonomyRuns.ts: 给 active runs 加 100 条软上限 + 一次性
  warn。selectPersistedAutonomyRuns 仍然永不淘汰 active 记录,但跨过
  阈值时 logError 一次,作为 finalize-leak 早期信号——避免 active 无限
  增长悄悄使 AUTONOMY_RUNS_MAX 失效。
This commit is contained in:
Claude
2026-04-29 13:23:41 +00:00
parent 6b7cfda9b1
commit 7a6e65caf7
7 changed files with 190 additions and 143 deletions

View File

@@ -27,12 +27,22 @@ import {
type AutonomyFlowSyncMode,
type ManagedAutonomyFlowStepDefinition,
} from './autonomyFlows.js'
import { withAutonomyPersistenceLock } from './autonomyPersistence.js'
import {
retainActiveFirst,
withAutonomyPersistenceLock,
} from './autonomyPersistence.js'
import { getFsImplementation } from './fsOperations.js'
import { isProcessRunning } from './genericProcessUtils.js'
import { logError } from './log.js'
const AUTONOMY_RUNS_MAX = 200
// Diagnostic threshold for active (queued/running) runs. Active records are
// deliberately exempt from AUTONOMY_RUNS_MAX so a leak in finalization cannot
// silently evict in-flight work; that exemption only makes sense if a leak is
// loud when it appears. Crossing this threshold warns once per process so
// operators see the divergence in logs before runs.json grows pathologically.
const AUTONOMY_ACTIVE_RUNS_WARN_THRESHOLD = 100
let warnedActiveRunsThresholdCrossed = false
const AUTONOMY_RUNS_RELATIVE_PATH = join(AUTONOMY_DIR, 'runs.json')
// Sentinel string surfaced to operators via runs.json error fields and
// referenced literally by the HEARTBEAT.md `stale-recovery-health` task.
@@ -130,17 +140,24 @@ function isAutonomyRunActive(run: AutonomyRunRecord): boolean {
function selectPersistedAutonomyRuns(
runs: AutonomyRunRecord[],
): AutonomyRunRecord[] {
const cloned = runs.slice().map(cloneRunRecord)
const active = cloned
.filter(isAutonomyRunActive)
.sort((left, right) => right.createdAt - left.createdAt)
const history = cloned
.filter(run => !isAutonomyRunActive(run))
.sort((left, right) => right.createdAt - left.createdAt)
.slice(0, Math.max(0, AUTONOMY_RUNS_MAX - active.length))
return [...active, ...history].sort(
(left, right) => right.createdAt - left.createdAt,
const cloned = runs.map(cloneRunRecord)
const activeCount = cloned.filter(isAutonomyRunActive).length
if (
!warnedActiveRunsThresholdCrossed &&
activeCount >= AUTONOMY_ACTIVE_RUNS_WARN_THRESHOLD
) {
warnedActiveRunsThresholdCrossed = true
logError(
new Error(
`autonomy: ${activeCount} active runs exceed warn threshold ${AUTONOMY_ACTIVE_RUNS_WARN_THRESHOLD}; check for finalize leaks`,
),
)
}
return retainActiveFirst(
cloned,
isAutonomyRunActive,
run => run.createdAt,
AUTONOMY_RUNS_MAX,
)
}