Files
claude-code/docs/test-plans/openclaw-autonomy-baseline.md
unraid 637c9081f6 feat: integrate 5 feature branches + daemon/job 命令层级化 + 跨平台后台引擎 + TypeScript 错误修复
Squashed merge of:
1. fix/mcp-tsc-errors — 修复上游 MCP 重构后的 tsc 错误和测试失败
2. feat/pipe-mute-disconnect — Pipe IPC 逻辑断开、/lang 命令、mute 状态机
3. feat/stub-recovery-all — 实现全部 stub 恢复 (task 001-012)
4. feat/kairos-activation — KAIROS 激活解除阻塞 + 工具实现
5. codex/openclaw-autonomy-pr — 自治权限系统、运行记录、managed flows

Additional:
6. daemon/job 命令层级化重构 (subcommand 架构)
7. 跨平台后台引擎抽象 (detached/tmux engines)
8. 修复 src/ 中 43 个预存在的 TypeScript 类型错误
9. 修复 langfuse isolated test mock 完整性
10. 修复 CodeRabbit 审查的 Critical/Major/Minor 问题
11. remote-control-server logger 抽象 (测试 stderr 静默化)
12. /simplify 审查修复 (代码复用、质量、效率)
2026-04-14 19:53:36 +08:00

3.2 KiB

OpenClaw Autonomy Baseline Test Spec

Purpose

This test spec locks the current behavior of the existing trigger and context layers before any formal autonomy-subsystem implementation begins.

At this stage, production code is read-only. Only test files, fixtures, and planning documents may change.

Goal

Establish a stable baseline around the parts of Claude-code-bast that later autonomy work is most likely to touch:

  • proactive state handling
  • cron task storage semantics
  • cron scheduler helper semantics
  • user-context cache and CLAUDE.md injection behavior

Out of Scope for This Baseline Round

  • New authority behavior (AGENTS.md / HEARTBEAT.md)
  • New detached-run ledger behavior
  • New flow behavior
  • UI redesign

Files Under Baseline Protection

  • src/proactive/index.ts
  • src/utils/cronTasks.ts
  • src/utils/cronScheduler.ts
  • src/context.ts

Test Files Added In This Round

  • src/proactive/__tests__/state.baseline.test.ts
  • src/commands/__tests__/proactive.baseline.test.ts
  • src/utils/__tests__/cronTasks.baseline.test.ts
  • src/utils/__tests__/cronScheduler.baseline.test.ts
  • src/__tests__/context.baseline.test.ts

Baseline Assertions

Proactive state

  1. Activating proactive mode sets active state and activation source.
  2. Pausing proactive mode suppresses shouldTick() and clears nextTickAt.
  3. Blocking context suppresses shouldTick() and clears nextTickAt.
  4. Subscribers are notified on state transitions.
  5. The /proactive command enables proactive mode and emits the expected hidden reminder.
  6. The /proactive command disables proactive mode on the second invocation.

Cron task storage

  1. Session-only cron tasks remain in memory only.
  2. Durable cron tasks are persisted to .claude/scheduled_tasks.json.
  3. Daemon-style dir-scoped reads exclude session-only cron tasks.
  4. removeCronTasks() without dir can remove session-only tasks.
  5. removeCronTasks() with dir does not mutate session-only task storage.

Cron scheduler helpers

  1. isRecurringTaskAged() preserves current aging semantics.
  2. buildMissedTaskNotification() preserves the current AskUserQuestion safety wording.
  3. buildMissedTaskNotification() preserves code-fence hardening for prompt bodies that contain backticks.

User context caching

  1. getUserContext() includes currentDate.
  2. getUserContext() includes mocked claudeMd content when memory loading is enabled.
  3. CLAUDE_CODE_DISABLE_CLAUDE_MDS suppresses claudeMd.
  4. setSystemPromptInjection() clears the memoized user-context cache.
  5. getSystemContext() reflects the injection after cache invalidation.

Remaining Baseline Gaps

The following areas are intentionally deferred because they require higher-cost harnessing and should still avoid production-code changes:

  1. useScheduledTasks.ts hook-level runtime behavior
  2. src/cli/print.ts full headless scheduler loop behavior
  3. useProactive.ts hook timer behavior
  4. end-to-end queue interaction between proactive ticks and SleepTool

Acceptance

This baseline round is complete when:

  1. The four new test files pass.
  2. No production source files are modified.
  3. The tests are stable enough to serve as a pre-implementation guardrail.