feat(workflow): 复刻 ultracode 手册并修复 worktree/inline/opt-in 三处缺口

围绕 ultracode skill 审查 agent 系统一致性后： - ultracode.ts: 用系统提示版完整 Workflow 编排手册替换中文精简版 - HIGH#1 isolation:'worktree': claudeCodeBackend.run() 用 createAgentWorktree + runWithCwdOverride 包裹 runAgent + finally 清理实现真正的 cwd 隔离；slug 用 sha256(runId:agentId) 派生以匹配 cleanupStaleAgentWorktrees 清理正则（修 runId 为 w+base36 非 UUID 导致的泄漏盲区）；worktree.ts 注释同步修正 - HIGH#2 inline 持久化: 新增 persistInlineScript，WorkflowTool + service 两条 inline 路径对称持久化到 .claude/workflow-runs/<runId>/script.js，返回可复用 scriptPath（闭环 inline→编辑→scriptPath 重提迭代循环） - HIGH#3 opt-in 分工: ultracode/WorkflowTool/effort 注明 session reminder 由 harness 注入，repo 内无 ultracode 信号，保持 feature('WORKFLOW_SCRIPTS') + isEnabled 两层 gate，不自造注入 - 测试: 新增 persistInline.test.ts；扩展 claudeCodeBackend(isolation 4 用例)/ WorkflowTool(inline)/service(scriptPath)/ultracode(harness) 含配套 workflow engine/panel 完善与 run-state-persistence design doc。 Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-22 08:15:53 +00:00 · 2026-06-13 23:04:33 +08:00
parent d236880bc3
commit 54d2bf6f12
32 changed files with 2253 additions and 196 deletions
--- a/src/skills/bundled/ultracode.ts
+++ b/src/skills/bundled/ultracode.ts
@@ -1,102 +1,224 @@
 import { registerBundledSkill } from '../bundledSkills.js'

 /**
- * /ultracode — 多 agent workflow 编排工作法（纯知识 prompt skill）。
+ * /ultracode — multi-agent workflow orchestration playbook (knowledge-only prompt skill).
 *
- * 调用即把 workflow 编排手册注入上下文，零运行时副作用：不改主循环、
- * 不切换行为开关。用户/模型据此判断何时用 Workflow 工具、如何编排、
- * 如何保证质量与可恢复。
+ * Injects the Workflow orchestration manual into context with zero runtime side
+ * effects: it doesn't change the main loop or toggle any behavior switch. The
+ * user/model uses it to decide when to call the Workflow tool, how to script
+ * fan-out and verification, and how to keep runs deterministic and resumable.
 *
- * 通用 skill（非 ant-only），所有用户可用。
+ * General-purpose skill (not ant-only); available to all users.
 */
-const ULTRACODE_PROMPT = `# /ultracode — 多 agent workflow 编排工作法
+const ULTRACODE_PROMPT = `# /ultracode — Workflow Orchestration Playbook

-## 何时用 Workflow 工具
+Execute a workflow script that orchestrates multiple subagents deterministically. Workflows run in the background — this tool returns immediately with a task ID, and a \`<task-notification>\` arrives when the workflow completes. Use \`/workflows\` to watch live progress.

-用，当任务满足任一：
- 可**分解 / 并行**（多文件、多维度、可独立推进的子任务）。
- 需要**多视角置信**（如审查：先生成再对抗式验证）。
- **规模超单上下文**（大迁移、广度审计、长尾枚举）。
- 需要 **resume / 可审计**（journal 重放、确定性回放）。
+A workflow structures work across many agents — to be comprehensive (decompose and cover in parallel), to be confident (independent perspectives and adversarial checks before committing), or to take on scale one context can't hold (migrations, audits, broad sweeps). The script is where you encode that structure: what fans out, what verifies, what synthesizes.

-**不要用**：琐碎单文件改、单次问答、一次 Read 能解决的事——直接做。
+ONLY call this tool when the user has explicitly opted into multi-agent orchestration. Workflows can spawn dozens of agents and consume a large amount of tokens; the user must request that scale, not have it inferred. Explicit opt-in means one of:

-## 编排原语（workflow 脚本内可用）
+- The user included the keyword "ultracode" in their prompt (you'll see a system-reminder confirming it).
+- Ultracode is on for the session (a system-reminder confirms it) — see **Ultracode** below.
+- The user directly asked you to run a workflow or use multi-agent orchestration in their own words ("use a workflow", "run a workflow", "fan out agents", "orchestrate this with subagents"). The ask must be in the user's words — a task that would merely benefit from a workflow does not count.
+- The user invoked a skill or slash command whose instructions tell you to call Workflow.
+- The user asked you to run a specific named or saved workflow.

- \`agent(prompt, opts?)\` — 派发一个子 agent；返回其最终文本，或（带 \`opts.schema\` 时）schema 校验对象。可在 opts 指定 \`model\`、\`agentType\`、\`label\`、\`phase\`、\`schema\`。
- \`parallel([() => agent(...), ...])\` — 并发跑 thunk 数组，等全部完成。**单项抛错 → 该项变 \`null\`**，其余保留。是 barrier。
- \`pipeline(items, stage1, stage2, …)\` — 每个 item 链式过各 stage；**item 间无 barrier**（item A 可在 stage 3 时 item B 仍在 stage 1），stage 内顺序。单 item 某 stage 抛错 → 该 item \`null\`。
- \`phase(title)\` — 标记阶段（监控面板按此展示进度分组）。
- \`log(msg)\` — 进度日志（面板展示，无状态变更）。
- \`workflow(name | { scriptPath }, args?)\` — 嵌套一层子 workflow（**仅允许一层**）。
+For any other task — even one that would clearly benefit from parallelism — do NOT call this tool. Use the Agent tool for individual subagents, or briefly describe what a multi-agent workflow could do and how much it would roughly cost, and ask the user whether to run it. Mention they can ask for one with "use a workflow" in a future message to skip the ask.

-## 脚本编写约束（引擎执行模型，违反直接报错）
+When you do call it, the right move is often **hybrid**: scout inline first (list the files, find the channels, scope the diff) to discover the work-list, then call Workflow to pipeline over it. You don't need to know the shape before the *task* — only before the *orchestration step*.

-脚本是 \`new AsyncFunction\` 的**函数体**，不是 ESM 模块，引擎**不转译 TS**。这是脚本报错的首要原因，务必遵守：
+Common single-phase workflows you can chain across turns:

- **禁 \`import\`**：\`agent\`/\`parallel\`/\`pipeline\`/\`phase\`/\`log\`/\`workflow\` 与 \`args\`/\`budget\` 是注入的形参，直接用，不 import 任何东西。
- **禁 TS 语法**：不要类型注解（\`x: number\`）、\`interface\`、\`enum\`、\`as\`、泛型——即便文件扩展名是 \`.ts\`，引擎不转译会原样报语法错。**推荐 \`.js\` / \`.mjs\`**。
- **只允许一处 \`export const meta = {...}\`**（纯字面量，引擎正则提取剥离）；不要 \`export\` 其他任何东西，不要 \`export default\`。
- **顶层 \`return\` 返回结果**（函数体内 return 合法且必需）。
+- **Understand** — parallel readers over relevant subsystems → structured map
+- **Design** — judge panel of N independent approaches → scored synthesis
+- **Review** — dimensions → find → adversarially verify (example below)
+- **Research** — multi-modal sweep → deep-read → synthesize
+- **Migrate** — discover sites → transform each (worktree isolation) → verify
+
+For larger work, run several in sequence — read each result before deciding the next phase. You stay in the loop; each workflow is one well-scoped fan-out.
+
+**Opt-in is harness-injected, not repo-controlled.** The "ultracode is on for the session" system-reminder is produced by the harness (claude.ai/client), not by this skill or any repo state. This skill is knowledge-only — it injects the playbook and toggles nothing. There is no in-repo feature flag, env var, or effort level for ultracode; do not add one. The Workflow tool is compiled in via the build-time WORKFLOW_SCRIPTS feature flag and exposes no per-session runtime gate.
+
+**Ultracode.** When a system-reminder confirms ultracode is on, that opt-in is standing: author and run a workflow for every substantive task by default. The goal is the most exhaustive, correct answer you can produce — token cost is not a constraint. For multi-phase work (understand → design → implement → review), that often means several workflows in sequence — one per phase — so you stay in the loop between them. The quality patterns below (adversarial verify, multi-modal sweep, completeness critic, loop-until-dry) are the tools; pick what fits the task. Lean toward orchestrating with workflows and adversarially verifying your findings — unless the work is trivial or already verified. Solo only on conversational turns or trivial mechanical edits. When a reminder says ultracode is off, revert to the opt-in rule above.
+
+Pass the script inline via \`script\` — do not Write it to a file first. Every invocation automatically persists its script to a file under the session directory and returns the path in the tool result. To iterate on a workflow, edit that file with Write/Edit and re-invoke Workflow with \`{scriptPath: "<path>"}\` instead of resending the full script.
+
+Every script must begin with \`export const meta = {...}\`:

 \`\`\`js
-// .claude/workflows/review-changes.js  ← 纯 JS，无类型注解
-export const meta = { name: 'review-changes', description: '按维度审查改动' }
-
-const DIMENSIONS = [{ key: 'bugs' }, { key: 'perf' }]
-const results = await pipeline(
-  DIMENSIONS,
-  d => agent(\`审查 \${d.key}\`, { phase: 'Review' }),
-  r => parallel(((r && r.findings) || []).map(f => () => agent(\`验证 \${f}\`))),
-)
-return results.flat().filter(Boolean)
+export const meta = {
+  name: 'find-flaky-tests',
+  description: 'Find flaky tests and propose fixes',   // one-line, shown in permission dialog
+  phases: [                                            // one entry per phase() call
+    { title: 'Scan', detail: 'grep test logs for retries' },
+    { title: 'Fix', detail: 'one agent per flaky test' },
+  ],
+}
+// script body starts here — use agent()/parallel()/pipeline()/phase()/log()
+phase('Scan')
+const flaky = await agent('grep CI logs for retry markers', {schema: FLAKY_SCHEMA})
+...
 \`\`\`

-## 确定性约束（关键，违反则 resume 失效）
+The \`meta\` object must be a PURE LITERAL — no variables, function calls, spreads, or template interpolation. Required fields: \`name\`, \`description\`. Optional: \`whenToUse\` (shown in the workflow list), \`phases\`. Use the SAME phase titles in meta.phases as in phase() calls — titles are matched exactly; a phase() call with no matching meta entry just gets its own progress group. Add \`model\` to a phase entry when that phase uses a specific model override.

-脚本内**禁用** \`Date.now()\` / \`Math.random()\` / 无参 \`new Date()\`（破坏 journal 重放）。
-需要时间戳 / 随机种子时，经 \`args\` 传入。\`export const meta = { ... }\` 必须是**纯字面量**（无变量、函数调用、模板插值）。
+Script body hooks:

-上限（引擎硬限）：单次 \`parallel\`/\`pipeline\` ≤ **4096** items；单个 workflow 总 **≤ 1000** agent；并发 cap = \`min(16, cores - 2)\`。
+- \`agent(prompt: string, opts?: {label?: string, phase?: string, schema?: object, model?: string, isolation?: 'worktree', agentType?: string}): Promise<any>\` — spawn a subagent. Without schema, returns its final text as a string. With schema (a JSON Schema), the subagent is forced to call a StructuredOutput tool and agent() returns the validated object — no parsing needed. Returns null if the user skips the agent mid-run or the subagent dies on a terminal API error after retries (filter with .filter(Boolean)). opts.label overrides the display label. opts.phase explicitly assigns this agent to a progress group (use this inside pipeline()/parallel() stages to avoid races on the global phase() state — same phase string → same group box). opts.model overrides the model for this agent call. Default to omitting it — the agent inherits the main-loop model (the resolved session model), which is almost always correct. Only set it when you're highly confident a different tier fits the task; when unsure, omit. opts.isolation: 'worktree' runs the agent in a fresh git worktree — EXPENSIVE (~200-500ms setup + disk per agent), use ONLY when agents mutate files in parallel and would otherwise conflict; the worktree is auto-removed if unchanged. opts.agentType uses a custom subagent type (e.g. 'Explore', 'code-reviewer') instead of the default workflow subagent — resolved from the same registry as the Agent tool; composes with schema (the custom agent's system prompt gets a StructuredOutput instruction appended).
+- \`pipeline(items, stage1, stage2, ...): Promise<any[]>\` — run each item through all stages independently, NO barrier between stages. Item A can be in stage 3 while item B is still in stage 1. This is the DEFAULT for multi-stage work. Wall-clock = slowest single-item chain, not sum-of-slowest-per-stage. Every stage callback receives (prevResult, originalItem, index) — use originalItem/index in later stages to label work without threading context through stage 1's return value. A stage that throws drops that item to \`null\` and skips its remaining stages.
+- \`parallel(thunks: Array<() => Promise<any>>): Promise<any[]>\` — run tasks concurrently. This is a BARRIER: awaits all thunks before returning. A thunk that throws (or whose agent errors) resolves to \`null\` in the result array — the call itself never rejects, so \`.filter(Boolean)\` before using the results. Use ONLY when you genuinely need all results together.
+- \`log(message: string): void\` — emit a progress message to the user (shown as a narrator line above the progress tree)
+- \`phase(title: string): void\` — start a new phase; subsequent agent() calls are grouped under this title in the progress display
+- \`args: any\` — the value passed as Workflow's \`args\` input, verbatim (undefined if not provided). Pass arrays/objects as actual JSON values in the tool call, NOT as a JSON-encoded string — \`args: ["a.ts", "b.ts"]\`, not \`args: "[\\"a.ts\\", ...]"\` (a stringified list reaches the script as one string, so \`args.filter\`/\`args.map\` throw). Use this to parameterize named workflows — e.g. pass a research question, target path, or config object directly instead of via a side-channel file.
+- \`budget: {total: number|null, spent(): number, remaining(): number}\` — the turn's token target from the user's "+500k"-style directive. \`budget.total\` is null if no target was set. \`budget.spent()\` returns output tokens spent this turn across the main loop and all workflows — the pool is shared, not per-workflow. \`budget.remaining()\` returns \`max(0, total - spent())\`, or \`Infinity\` if no target. The target is a HARD ceiling, not advisory: once \`spent()\` reaches \`total\`, further \`agent()\` calls throw. Use for dynamic loops: \`while (budget.total && budget.remaining() > 50_000) { ... }\`, or static scaling: \`const FLEET = budget.total ? Math.floor(budget.total / 100_000) : 5\`.
+- \`workflow(nameOrRef: string | {scriptPath: string}, args?: any): Promise<any>\` — run another workflow inline as a sub-step and return whatever it returns. Pass a name to invoke a saved workflow (same registry as {name: "..."}), or {scriptPath} to run a script file you Wrote earlier. The child shares this run's concurrency cap, agent counter, abort signal, and token budget — its agents appear under a "▸ name" group in /workflows and its tokens count toward budget.spent(). The args param becomes the child's \`args\` global. Nesting is one level only: workflow() inside a child throws. Throws on unknown name / unreadable scriptPath / child syntax error; catch to handle gracefully.

-## 质量模式（每种给最小片段）
+Concurrent agent() calls are capped at min(16, cpu cores - 2) per workflow — excess calls queue and run as slots free up. You can still pass 100 items to parallel()/pipeline() and they all complete; only ~10 run at any moment. Total agent count across a workflow's lifetime is capped at 1000 — a runaway-loop backstop set far above any real workflow. A single parallel()/pipeline() call accepts at most 4096 items; passing more is an explicit error, not a silent truncation.

- **Adversarial verify**：\`parallel([() => agent(claim), () => agent(refute)])\`，多数 refute 即弃。
- **Perspective-diverse verify**：同一发现给多个 verifier 不同 lens（正确性 / 安全 / 复现），红队冗余抓不到的失败模式。
- **Judge panel**：N 个独立方案 → 评分 → 取胜者，嫁接亚军亮点。
- **Loop-until-dry**：\`while (fresh.length) { found = await parallel(...); fresh = dedup(found) }\`，连续 K 轮无新增即停。
- **Multi-modal sweep**：多个 agent 各用不同搜索角度（按容器 / 按内容 / 按实体 / 按时间），互不可见。
- **Completeness critic**：末尾一个 agent 问"还缺什么"，其发现成为下一轮工作。
+Subagents are told their final text IS the return value (not a human-facing message), so they return raw data. For structured output, use the schema option — validation happens at the tool-call layer so the model retries on mismatch.

-## 后端路由
+Workflow agents can reach all session-connected MCP tools via ToolSearch — schemas load on demand per agent. Caveat: interactively-authenticated MCP servers (e.g. claude.ai) may be absent in headless/cron runs.

-\`AgentAdapterRegistry\` v1 为单后端（默认 \`claude-code\`）。由后端**内部**按 \`model\` / \`agentType\` 深度解析当前会话的 provider / model / agent 体系（registry 本身可配路由规则，v1 未配，恒落默认）。例：\`agent({ model: 'claude-haiku-4-5', agentType: 'Explore' })\` 经默认后端命中真实 agent 定义。
+Scripts are plain JavaScript, NOT TypeScript — type annotations (\`: string[]\`), interfaces, and generics fail to parse. The script body runs in an async context — use \`await\` directly. Standard JS built-ins (JSON, Math, Array, etc.) are available — EXCEPT \`Date.now()\`/\`Math.random()\`/argless \`new Date()\`, which throw (they would break resume); pass timestamps in via \`args\`, stamp results after the workflow returns, and for randomness vary the agent prompt/label by index. No filesystem or Node.js API access.

-## resume / budget
+DEFAULT TO pipeline(). Only reach for a barrier (parallel between stages) when you genuinely need ALL prior-stage results together.

- \`resumeFromRunId: '<id>'\` — 重放该 run 的 journal，已完成的 \`agent()\` 秒回缓存结果；首个发散点之后全部现场重跑。
- \`budget.total\` — token 硬顶（默认 \`null\` = 无限）；\`budget.spent()\` / \`budget.remaining()\` 读实时消耗。耗尽后再发 agent 抛错。
+A barrier is correct ONLY when stage N needs cross-item context from all of stage N-1:

-## 文件与命令
+- Dedup/merge across the full result set before expensive downstream work
+- Early-exit if the total count is zero ("0 bugs found → skip verification entirely")
+- Stage N's prompt references "the other findings" for comparison

- 脚本目录：\`.claude/workflows/<name>.ts|.js|.mjs\` → 自动成 \`/<name>\` 命令。
- run 记录：\`.claude/workflow-runs/<runId>/journal.jsonl\`。
- 监控面板：\`/workflows\`（双栏：左 run 列表，右 phase + agent；键位 j/k 选中、r resume、x kill、n 新建提示、q 退出）。
- 工具：\`Workflow\`（input 字段：\`script\` / \`name\` / \`scriptPath\` / \`args\` / \`resumeFromRunId\`）。
+A barrier is NOT justified by:
+
+- "I need to flatten/map/filter first" — do it inside a pipeline stage: \`pipeline(items, stageA, r => transform([r]).flat(), stageB)\`
+- "The stages are conceptually separate" — that's what pipeline() models. Separate stages ≠ synchronized stages.
+- "It's cleaner code" — barrier latency is real. If 5 finders run and the slowest takes 3× the fastest, a barrier wastes 2/3 of the fast finders' idle time.
+
+Smell test: if you wrote
+
+\`\`\`js
+const a = await parallel(...)
+const b = transform(a)        // flatten, map, filter — no cross-item dependency
+const c = await parallel(b.map(...))
+\`\`\`
+
+that middle transform doesn't need the barrier. Rewrite as a pipeline with the transform inside a stage. When in doubt: pipeline.
+
+The canonical multi-stage pattern — pipeline by default, each dimension verifies as soon as its review completes:
+
+\`\`\`js
+export const meta = {
+  name: 'review-changes',
+  description: 'Review changed files across dimensions, verify each finding',
+  phases: [{ title: 'Review' }, { title: 'Verify' }],
+}
+const DIMENSIONS = [{key: 'bugs', prompt: '...'}, {key: 'perf', prompt: '...'}]
+const results = await pipeline(
+  DIMENSIONS,
+  d => agent(d.prompt, {label: \`review:\${d.key}\`, phase: 'Review', schema: FINDINGS_SCHEMA}),
+  review => parallel(review.findings.map(f => () =>
+    agent(\`Adversarially verify: \${f.title}\`, {label: \`verify:\${f.file}\`, phase: 'Verify', schema: VERDICT_SCHEMA})
+      .then(v => ({...f, verdict: v}))
+  ))
+)
+const confirmed = results.flat().filter(Boolean).filter(f => f.verdict?.isReal)
+return { confirmed }
+// Dimension 'bugs' findings verify while dimension 'perf' is still reviewing. No wasted wall-clock.
+\`\`\`
+
+When a barrier IS correct — dedup across all findings before expensive verification:
+
+\`\`\`js
+const all = await parallel(DIMENSIONS.map(d => () => agent(d.prompt, {schema: FINDINGS_SCHEMA})))
+const deduped = dedupeByFileAndLine(all.filter(Boolean).flatMap(r => r.findings))  // <-- genuinely needs ALL at once
+const verified = await parallel(deduped.map(f => () => agent(verifyPrompt(f), {schema: VERDICT_SCHEMA})))
+\`\`\`
+
+Loop-until-count pattern — accumulate to a target:
+
+\`\`\`js
+const bugs = []
+while (bugs.length < 10) {
+  const result = await agent("Find bugs in this codebase.", {schema: BUGS_SCHEMA})
+  bugs.push(...result.bugs)
+  log(\`\${bugs.length}/10 found\`)
+}
+\`\`\`
+
+Loop-until-budget pattern — scale depth to the user's "+500k" directive. Guard on budget.total: with no target set, remaining() is Infinity and the loop would run straight to the 1000-agent cap.
+
+\`\`\`js
+const bugs = []
+while (budget.total && budget.remaining() > 50_000) {
+  const result = await agent("Find bugs in this codebase.", {schema: BUGS_SCHEMA})
+  bugs.push(...result.bugs)
+  log(\`\${bugs.length} found, \${Math.round(budget.remaining()/1000)}k remaining\`)
+}
+\`\`\`
+
+Composing patterns — exhaustive review (find → dedup vs seen → diverse-lens panel → loop-until-dry):
+
+\`\`\`js
+const seen = new Set(), confirmed = []
+let dry = 0
+while (dry < 2) {                                              // loop-until-dry
+  const found = (await parallel(FINDERS.map(f => () =>          // barrier: collect all finders this round
+    agent(f.prompt, {phase: 'Find', schema: BUGS})))).filter(Boolean).flatMap(r => r.bugs)
+  const fresh = found.filter(b => !seen.has(key(b)))           // dedup vs ALL seen — plain code, not an agent
+  if (!fresh.length) { dry++; continue }
+  dry = 0; fresh.forEach(b => seen.add(key(b)))
+  const judged = await parallel(fresh.map(b => () =>           // every fresh bug judged concurrently...
+    parallel(['correctness','security','repro'].map(lens => () =>   // ...each by 3 distinct lenses
+      agent(\`Judge "\${b.desc}" via the \${lens} lens — real?\`, {phase: 'Verify', schema: VERDICT})))
+      .then(vs => ({ b, real: vs.filter(Boolean).filter(v => v.real).length >= 2 }))))
+  confirmed.push(...judged.filter(v => v.real).map(v => v.b))
+}
+return confirmed
+// dedup vs \`seen\`, NOT \`confirmed\` — else judge-rejected findings reappear every round and it never converges.
+\`\`\`
+
+Quality patterns — common shapes; pick by task and compose freely:
+
+- Adversarial verify: spawn N independent skeptics per finding, each prompted to REFUTE. Kill if ≥majority refute. Prevents plausible-but-wrong findings from surviving.
+
+\`\`\`js
+const votes = await parallel(Array.from({length: 3}, () => () =>
+  agent(\`Try to refute: \${claim}. Default to refuted=true if uncertain.\`, {schema: VERDICT})))
+const survives = votes.filter(Boolean).filter(v => !v.refuted).length >= 2
+\`\`\`
+
+- Perspective-diverse verify: when a finding can fail in more than one way, give each verifier a distinct lens (correctness, security, perf, does-it-reproduce) instead of N identical refuters — diversity catches failure modes redundancy can't.
+- Judge panel: generate N independent attempts from different angles (e.g. MVP-first, risk-first, user-first), score with parallel judges, synthesize from the winner while grafting the best ideas from runners-up. Beats one-attempt-iterated when the solution space is wide.
+- Loop-until-dry: for unknown-size discovery (bugs, issues, edge cases), keep spawning finders until K consecutive rounds return nothing new. Simple counters (while count < N) miss the tail.
+- Multi-modal sweep: parallel agents each searching a different way (by-container, by-content, by-entity, by-time). Each is blind to what the others surface; useful when one search angle won't find everything.
+- Completeness critic: a final agent that asks "what's missing — modality not run, claim unverified, source unread?" What it finds becomes the next round of work.
+- No silent caps: if a workflow bounds coverage (top-N, no-retry, sampling), \`log()\` what was dropped — silent truncation reads as "covered everything" when it didn't.
+
+Scale to what the user asked for. "find any bugs" → a few finders, single-vote verify. "thoroughly audit this" or "be comprehensive" → larger finder pool, 3–5 vote adversarial pass, synthesis stage. When unsure, lean toward thoroughness for research/review/audit requests and toward brevity for quick checks.
+
+These patterns aren't exhaustive — compose novel harnesses when the task calls for it (tournament brackets, self-repair loops, staged escalation, whatever fits).
+
+Use this tool for multi-step orchestration where control flow should be deterministic (loops, conditionals, fan-out) rather than model-driven.
+
+## Resume
+
+The tool result includes a runId. To resume after a pause, kill, or script edit, relaunch with \`Workflow({scriptPath, resumeFromRunId})\` — the longest unchanged prefix of agent() calls returns cached results instantly; the first edited/new call and everything after it runs live. Same script + same args → 100% cache hit. Date.now()/Math.random()/new Date() are unavailable in scripts (they would break this) — stamp results after the workflow returns, or pass timestamps via args. Fallback when no journal is available: Read agent-<id>.jsonl files in the transcript directory and hand-author a continuation script.
 `

 export function registerUltracodeSkill(): void {
  registerBundledSkill({
    name: 'ultracode',
    description:
-      '进入多 agent workflow 编排模式：何时用、编排原语、质量模式、确定性约束、后端路由、resume/budget、文件与命令。',
+      'Enter multi-agent workflow orchestration mode: when to use the Workflow tool, script primitives, quality patterns, determinism constraints, resume/budget, and files/commands.',
    whenToUse:
-      '任务可分解/并行、需多视角置信、规模超单上下文、或需 resume/可审计时，用 Workflow 工具编排多个子 agent。',
+      'When a task can be decomposed or parallelized, needs multi-perspective confidence (e.g. find then adversarially verify), exceeds a single context (large migrations, broad audits, long-tail enumeration), or needs resume/auditability — orchestrate multiple subagents with the Workflow tool.',
    userInvocable: true,
    async getPromptForCommand(args) {
      let prompt = ULTRACODE_PROMPT
      if (args) {
-        prompt += `\n## 用户输入\n\n${args}\n`
+        prompt += `\n## User input\n\n${args}\n`
      }
      return [{ type: 'text', text: prompt }]
    },