feat: dynamic-workflow 来了 (#1271)

* feat(workflow): add workflow engine, /workflows panel, /ultracode skill 将 feat/sdk-backend 分支中 workflow 相关的 20 个 commit 压缩为单 commit： - 工作流引擎核心：phase / agent / parallel / pipeline 编排原语（packages/workflow-engine/） - /workflows 面板：三区焦点布局（顶部 run tabs + 左侧 phase 侧栏 + 右侧 agent 列表） - /ultracode skill：多 agent workflow 编排入口 - 进度存储 / journal / notification 系统 - WorkflowService 生命周期管理 + SentryErrorBoundary - 脚本沙箱：禁用 dynamic import()、JSON args 防御性归一化 - journal 与 named-workflow 路径统一在 projectRoot - 错误处理：parallel/pipeline hooks 错误日志、failure routing、semaphore abort - workflow 工具升级为 core 工具 + PascalCase 命名 Co-Authored-By: glm-5.1 <zai-org@claude-code-best.win> * feat(workflow): 复刻 ultracode 手册并修复 worktree/inline/opt-in 三处缺口围绕 ultracode skill 审查 agent 系统一致性后： - ultracode.ts: 用系统提示版完整 Workflow 编排手册替换中文精简版 - HIGH#1 isolation:'worktree': claudeCodeBackend.run() 用 createAgentWorktree + runWithCwdOverride 包裹 runAgent + finally 清理实现真正的 cwd 隔离；slug 用 sha256(runId:agentId) 派生以匹配 cleanupStaleAgentWorktrees 清理正则（修 runId 为 w+base36 非 UUID 导致的泄漏盲区）；worktree.ts 注释同步修正 - HIGH#2 inline 持久化: 新增 persistInlineScript，WorkflowTool + service 两条 inline 路径对称持久化到 .claude/workflow-runs/<runId>/script.js，返回可复用 scriptPath（闭环 inline→编辑→scriptPath 重提迭代循环） - HIGH#3 opt-in 分工: ultracode/WorkflowTool/effort 注明 session reminder 由 harness 注入，repo 内无 ultracode 信号，保持 feature('WORKFLOW_SCRIPTS') + isEnabled 两层 gate，不自造注入 - 测试: 新增 persistInline.test.ts；扩展 claudeCodeBackend(isolation 4 用例)/ WorkflowTool(inline)/service(scriptPath)/ultracode(harness) 含配套 workflow engine/panel 完善与 run-state-persistence design doc。 Co-Authored-By: Claude <noreply@anthropic.com> * feat(workflow): run 终态落盘 state.json 支持跨重启恢复终态 RunProgress（含 returnValue/error）此前只在内存 ProgressStore，进程重启即丢失。本次让其落盘到 .claude/workflow-runs/<runId>/state.json，使 (a) 重启后可按 runId 取 return、(b) /workflows 面板跨重启展示历史 run。跨进程 resume 明确不在范围。 - persistence.ts: getRunsDir/writeRunState/readRunState/listPersistedRuns + attachRunStatePersistence；原子覆盖写（tmp+rename），读容错（缺文件/ 损坏/schemaVersion 不符 → null），写 best-effort（IO 失败只 log warn） - progress/store.ts: 加 hydrate(run) 直接注入磁盘 run（已存在 runId 跳过，内存优先） - service.ts: getWorkflowService() 接线 attachRunStatePersistence(bus, store) 订阅 run_done（completed/failed/killed 三态共用，shutdown-kill 也走同路径，无需额外钩子）；WorkflowService 加 getRunAsync(id) 内存 miss→读盘 fallback（不注入内存）+ loadPersistedRuns() 扫盘 hydrate （persistedLoaded flag 守护幂等） - panel/WorkflowsPanel.tsx: mount 时调一次 loadPersistedRuns（重 mount 不重复） - ports.ts: runsDir 改用 getRunsDir() 消除拼接重复 - 测试: persistence.test.ts(11)/runStatePersistence.test.ts(5)/ progressStore(2)/service(5)/WorkflowsPanel(1) 共 24 个新测试； precheck 5629 pass / 0 fail 设计偏离: 计划原写 monkey-patch getRunsDir 指向 tmpdir，Bun ESM namespace 不可变不可行；改用可选 runsDirProvider 参数（默认 getRunsDir）DI 注入，加到 attachRunStatePersistence 与 makeService（cwdOverride 之后第 4 参），与现有 cwdOverride 模式一致。makeService 的 cwdOverride 保持不变，不破坏 inline 持久化特性。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(workflow): 默认并发降为 3 并支持 per-run maxConcurrency 注入 - DEFAULT_MAX_CONCURRENCY=3 替代旧的 min(16, cores-2)；MAX_CONCURRENCY_CAP=16 保留为用户输入的绝对上限 - 新增 clampMaxConcurrency() 处理 undefined/<1/>CAP 边界 - WorkflowInput schema 新增 maxConcurrency: number.int().min(1).max(16).optional() - 引擎层 context/runWorkflow 全链路透传：semaphore 容量来自 per-run 入参 - WorkflowTool prompt 增加指引：fan-out 场景先用 AskUserQuestion 与用户确认并发再启动 - 同步 ultracode skill + audit workflow spec 的并发文字（删 cpu-cores 公式） - 同步 docs/features/workflow-scripts.md 旧公式 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(workflow): 面板 UI 字符串英文化 WorkflowsPanel 中 4 处面向用户的中文（onDone 错误消息、键位提示行）改为英文；其他面板组件（AgentList/TabsBar）原本已是英文。代码注释保留中文，与 workflow 模块惯例一致。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(workflow): 中断系统（x 杀单 agent / K 杀整个 workflow，Dialog 二次确认） - claudeCodeBackend 桥接 ctx.signal → runAgent.override.abortController（修 'x' 无效根因：abort 到不了内部 fetch） - AbortError 识别为 throw WorkflowAbortedError（不再吞成 dead，workflow 能感知被 kill） - ports.taskRegistrar 加 registerAgentAbort/unregisterAgentAbort/killAgent；service.killAgent(runId, agentId) 精确中断 - 面板键位：'x' 杀当前 agent（agents 列聚焦时） / 'K' 杀整个 workflow；Dialog 二次确认 + confirm 模式吞导航键防误触 - 新增测试 8 项（backend signal bridge / hooks inject / ports killAgent / service killAgent） Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * docs(workflow): ultracode skill 加 model tier 选择指引（haiku/sonnet/opus/best 场景匹配）补足 agent() 已有 model 参数缺的判断依据：列出 4 个 tier 的成本/延迟量级和典型场景，明确"无法 articulate 为什么换 tier 就 omit"的 rule of thumb。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(workflow): maxConcurrency≠3 必须先 AskUserQuestion（默认 3 推荐值）把 fan-out 时才问改成任何 maxConcurrency≠3 都必须问。唯一例外：用户在当前会话已明确说过并发数（"use 6" / "maxConcurrency 9"）。 prompt (WorkflowTool.ts) + skill (ultracode.ts) + audit spec 三处同步。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(workflow): agent 失败自动重试一次（dead 或非 abort throw） - hooks.agent 包装 invokeBackend：第一次 dead 或非 abort throw → 重试一次 - WorkflowAbortedError（kill）不重试——是用户意图 - registry.resolve 配置错（AdapterNotFoundError 等）在 try 外直接上抛，不走重试—— 配置问题重试无意义且掩盖 bug - 重试仍失败：dead 保持 dead；throw 降级 dead（不击穿 workflow，与 parallel/pipeline null-on-error 契约一致） - budget 不重复扣：dead 不 addOutputTokens，重试 ok 才扣一次 - 新增 7 项 hooks 层重试测试 + 1 项 service 层降级测试 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(workflow): 面板 label 截断保留 #数字后缀（同 dim 多 finding 可区分） audit workflow 用 verify:\${dim}#\${findingIdx} 命名 verify agent。旧逻辑 slice(0, 18) 从右切把 #idx 全吃了——同 dimension 多 finding 肉眼无法区分。新逻辑：含 #数字后缀时保留后缀，前缀截断 + … 省略号。例：verify:correctness#0 → verify:correctn…#0 verify:architecture#15 → verify:archite…#15 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(workflow): kill 整个 workflow 后立即回主 chat run_done→store→notifications.ts 的通知路径已有，但 confirmYes 后面板继续挂着挡住主 chat，用户看不到"已停止"反馈。kill 后调 onDone() 立即退出面板，让主 chat 的 `Workflow "<name>" was stopped` 通知直接可见。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(workflow): agent dead 带 reason/detail + prompt 加压 StructuredOutput 12 agent audit workflow 8 个 dead，journal 只记 {kind:"dead"} 无信息，事后无法区分 "agent 没产 StructuredOutput" vs "runAgent 抛错"。证据指向主因：sonnet 长 tool chain 后忘记调 StructuredOutput， extractStructuredOutput 返回 null 即降级 dead。 - types.ts: AgentRunResult.dead 加可选 reason/detail 字段（no-structured-output / runagent-threw / worktree-failed / unknown）兼容旧 journal（均 optional）。 - claudeCodeBackend.ts: 三处 dead 填 reason + detail； no-structured-output 把 finalized 文本前 200 字符做 detail，让日志/面板能立刻看到 agent 最后说了什么。 - claudeCodeBackend.ts: schema 模式 prompt 首尾各放一次 StructuredOutput 强制要求，针对 sonnet 长 tool chain 后忘记收尾。 - hooks.ts: retry 日志带 reason；retry 仍 throw 时降级 dead 也填 reason=runagent-threw + detail。 - types.test.ts: 加 reason JSON 往返 + 旧 journal 兼容测试。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(workflow): schema 模式弃用 StructuredOutput 工具契约，改鲁棒 JSON 文本解析上一轮 70a2f76 把"agent 长 tool chain 后忘调 StructuredOutput"当作死因，加 prompt 头尾双强制。但实测跑 5 个 review agent 4 个 dead，detail 全是 "StructuredOutput tool is not available as a deferred tool"——根因是该工具从未注入 workflow sub-agent 的工具集（assembleToolPool 默认池不含，只有 stop_hook 路径 execAgentHook.ts 显式 createStructuredOutputTool()）。 prompt 反复要求调一个不可达的工具，agent 困扰、长篇辩解、最终没产 JSON。 - claudeCodeBackend.ts: - extractStructuredOutput 重写：括号栈扫描替代 indexOf/lastIndexOf，处理嵌套对象、字符串内的括号、转义符；新增 fenced code block 优先路径（```json / ```），多 JSON 块取第一个 parse 成功的；只返回 plain object（拒 array/number/string/null）。不做语法修复（尾逗号/单引号/注释）——避免在字符串内误改（如 "http://" 被 // 注释正则吃）。 - schema 模式 prompt 简化：删首尾双 STRUCTURED OUTPUT 强制（600+ token），改成指示 agent 在最后文本块 emit raw JSON；明确告知"StructuredOutput is not available in this environment"，消除调用幻觉。 - hooks.ts: detail.slice 用 typeof === 'string' 守卫；catch 块用 e instanceof Error ? e.message : String(e)（旧 journal / 第三方 adapter 可能写非 string detail，直接 .slice 会抛 TypeError 击穿日志）。 - claudeCodeBackend.test.ts: +9 测试覆盖 fenced / 嵌套 / 字符串内括号 / 转义引号 / 多块取首 / 类型守卫 / 损坏 JSON。 precheck: 5663 pass / 0 fail。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * docs(effort): 新增 /effort 交互面板设计 spec 设计要点： - /effort 无参 → 横向 slider 面板（low/medium/high/xhigh/max/ultracode） - ←/→ 移动光标，Enter 确认，Esc 取消 - ultracode 仅视觉占位，确认后提示走 /ultracode <context> - env override 时双标记 + 顶部警告 - 模型不支持时面板禁用 - 两阶段交付：先基础面板 commit，再做 ultracode 波纹动画 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * docs(effort): 新增 EffortPanel 基础面板实施计划（第一阶段）按 TDD 分 6 个 task：纯函数状态 → keybinding 注册 → 组件 → 命令挂载 → 分支测试 → precheck。波纹动画在第二阶段单独 commit。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * docs(effort): plan 补 q/ctrl+c 取消绑定，对齐 spec §5 状态机 verifier 抓到的 gap：spec §5 写明 Esc / Ctrl+C / q 都是取消事件，但 plan Task 2.3 只绑了 escape。补上 q 和 ctrl+c → effortPanel:cancel。同时把 Step 2.2 直接写成 6 个 action 版本（home/end），删除迂回表达。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * docs(effort): plan 修订执行前 review 发现的 5 处 gap - Task 3.3 EffortPanel.tsx 草稿：Faster/Smarter padEnd 语法错乱重写； useKeybindings import 路径从 @anthropic/ink 修正为 ../../keybindings/useKeybinding.js；移除冗余 renderSeparatorLine；保留 renderPaddedLine - Task 5.2 computeConfirmOutcome 改为注入 ApplyFn 模式：避免 effortPanelState → effort.tsx → EffortPanel 循环依赖；测试可注入 mockApply，无需 mock settings - Step 5.3 测试代码对齐注入版签名 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(effort): 新增 EffortPanel 纯函数状态模块（PanelPosition + 移动/初始光标）仅含纯函数与类型，无 React/Ink 依赖，便于单测。 - PANEL_POSITIONS：low → medium → high → xhigh → max → ultracode - moveLeft/moveRight：边界钳制（low 不再左移、ultracode 不再右移） - getInitialCursor：env override > displayed level Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(keybindings): 注册 EffortPanel context 与 6 个 action 绑定 ←/→/h/l/home/end/enter/escape/q/ctrl+c 到 effortPanel:* action。与 ModelPicker context 范式一致，避免左右键被全局 keybinding 拦截。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(effort): 实现 EffortPanel 组件主体（渲染 + 键盘交互 + 确认/取消分支） - 横向 slider 布局：Faster ↔ Smarter 两极，6 档刻度 - useKeybindings 注册 EffortPanel context（←/→/h/l/home/end/enter/escape/q/ctrl+c） - Enter 在 5 档之一 → 调 executeEffort 写 settings + AppState - Enter 在 ultracode → 输出引导文案，不写状态 - Esc/q → "Effort unchanged." - env override 时顶部黄色警告 - computeConfirmOutcome 注入 ApplyFn，便于测试（Task 5 补测试） Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(effort): /effort 无参时挂载 EffortPanel 交互面板 - 无参 → <EffortPanelWrapper> 透传 AppState.effortValue - current/status → 仍显示文本（不变） - 有参 → 直跳 executeEffort（不变） - help/-h/--help → 不变 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * test(effort): 补 computeConfirmOutcome 分支测试（注入 mockApply） - ultracode → kind=ultracode-hint，不调 applyFn - low → kind=apply，message/effortUpdate 来自 applyFn - applyFn 返回无 effortUpdate 时 outcome.effortUpdate 为 undefined - CANCEL_MESSAGE / ULTRACODE_HINT 常量 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(effort): 测试里 cursor cast 为 EffortValue，避免 PanelPosition 含 ultracode 触发 TS 错误 computeConfirmOutcome 的 ApplyFn 契约要求 EffortValue，但测试 mockApply 接收 PanelPosition。实际运行时 computeConfirmOutcome 在 ultracode 档位走 hint 分支不会调 applyFn， cast 安全。precheck 全量通过：5688 tests / 0 fail。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(effort): 面板对齐与配色修复 - 对齐：用 Box width={SEGMENT} + justifyContent="center" 让 ▲ 与档位名严格居中对齐，替代之前 string padEnd(11) 与 SEGMENT=12 不一致导致的 1 列偏移 - 配色：所有面板文字改用 theme.claude（Claude Orange rgb(215,119,87)），替代终端默认紫；分隔线/副标签/底栏用 theme.subtle；env 警告用 theme.warning - 光标档位的档位名也加粗，强化视觉焦点 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(effort): 面板文字改紫色，ULTRACODE_HINT 英文化 - 颜色：theme.claude（橙）→ theme.purple_FOR_SUBAGENTS_ONLY（Purple 600, rgb(147,51,234)），覆盖标题、Faster/Smarter、▲、档位名 - ULTRACODE_HINT：中文 → 英文 "ultracode is not an effort level. Use /ultracode <context> to start a multi-agent workflow." Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(effort): 统一用色版——选中 suggestion（蓝），未选中 subtle（灰）弃用 purple_FOR_SUBAGENTS_ONLY（subagent 专用）。改与项目其他面板一致： - 选中档位 + ▲：color="suggestion"（Medium blue rgb(87,105,247)）+ bold - 未选中档位 + 空 ▲ 占位：color="subtle"（Light gray rgb(175,175,175)） - 标题 / Faster / Smarter：color="suggestion" - 分隔线 / 副标签 / 底栏：color="subtle" Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(workflow): 终态前补发 phase_done，面板自动退出 running→terminal 转换 runWorkflow：脚本结束时 hook.phase 不会触发最后一个 phase 的 phase_done， UI 左栏会永远显示 running。三路径（completed/killed/failed）统一在 run_done 之前补发 emitTerminalPhaseDone。 WorkflowsPanel：抽 isRunTerminatedTransition 纯函数判定 running → terminal，面板 useEffect 检测到转换后自动退出聚焦。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(effort): 波纹动画纯函数 pickChar/computeRippleLine/mergeLayers + 18 测试 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(effort): useRippleFrame hook 包装 useAnimationFrame，按需订阅时钟 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(effort): EffortPanel 集成波纹背景——cursor 停在 ultracode 时切换波纹模式仅在 cursor === 'ultracode' 时启用 useRippleFrame，渲染 5 行波纹背景 + overlay 文字（Faster/Smarter、分隔线、▲、档位名、副标签）。其余档位保持原 PlainContent 渲染路径不动。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * refactor(effort): 波纹动画从字符密度改为颜色渐变按原版风格把波纹背景从 INTENSITY_CHARS 密度字符（'·∙░▒▓'）改为 suggestion 系颜色渐变（transparent → 暗深紫蓝 → suggestion → 高光）： rippleAnimation.ts: - 删除 pickChar / INTENSITY_CHARS / WAVE_PEAK_CHARS / mergeLayers - 新增 intensityToColor(intensity) → 'transparent' | '#xxxxxx' - 新增 computeRippleCells 返回 Cell[]（每位置 char+color） - 新增 applyOverlaysToCells(cells, overlays) 替代 mergeLayers - 新增 cellsToSegments(cells) 合并相邻同色段（减少 Text 节点） EffortPanel.tsx: - RippleContent 用 cells→segments→tokens 渲染 - 空格段用 BaseText backgroundColor 染色块（纯色块视觉） - 文字段用 Text color 染色（亮色突出） - tokens 按空格/文字二次拆分，避免混合段渲染歧义测试: 29 个 rippleAnimation 测试覆盖 intensityToColor 边界、 computeRippleCells 长度/震源/衰减、applyOverlaysToCells 覆盖/截断/ 防御式拷贝、cellsToSegments 合并逻辑。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(effort): 波纹参数调优——铺满左侧 + 速度调慢 + 全面板有底色用户反馈三个问题： 1. "低峰部分没有颜色变化" → intensity ≤ 0.1 返回 transparent 导致波谷位置看不见。改为永不返回 transparent，最低档 #0a0d1a 作为面板底色（暗紫黑海洋），波峰在底色上流动。 2. "波浪速度太快" → time 系数 0.012 → 0.004（约 1/3 速）。波峰移动速度从 34 cell/s 降到 11 cell/s，每帧颜色变化从 45% 降到 36%。 3. "波浪只到中间部分，没覆盖左侧" → falloff 覆盖半径 40 → 90。震源 x=65，左侧 dist=65 < 90，波纹可达最左端（约 30-50% 覆盖）。色阶调整： - 删除 transparent 档，新增 #0a0d1a 作最暗档（底色） - 最高档从 #8aa0ff（高光）改为 #5769F7（suggestion），避免与文字 overlay 同色互相吞噬 - 7 档颜色：#0a0d1a → #15182b → #1f2543 → #2a3360 → #3a4582 → #4a5bb0 → #5769F7 测试：删除 transparent 期望，改为期望具体颜色（#0a0d1a 等）。新增"覆盖半径扩大"测试验证 dist=65 仍有非最暗颜色。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(effort): 波纹 v3 — 去黑边 + 删中心高频涟漪 + y 轴覆盖快捷键行用户反馈三个问题： 1. "黑色边感觉不太对" — 最暗档 #0a0d1a (rgb 10,13,26) 太接近纯黑，远端波谷看起来像硬黑边。改为 #1a1f3a (rgb 26,31,58)，紫蓝感更强而非纯黑。 2. "中心的快速波纹有点奇怪" — 删除震源附近 dist<6 的高频涟漪叠加 (time*0.02，5 倍主波纹频率)。原本想让震源附近"水波感"更强，实际效果像"快速闪烁"反而突兀。主波纹已经足够，无需叠加。 3. "y 方向覆盖快捷键" — RippleContent 新增 y=2 行渲染快捷键 overlay ("←/→ adjust · Enter confirm · Esc cancel")。PlainContent 路径保持原 Box marginTop=1 + Text 渲染。色阶调整（紫蓝感更强）： - #1a1f3a (原 #0a0d1a) — 最暗档 - #1f2543 / #252c55 / #2e3870 / #3a4582 / #4a5bb0 / #5769F7 (中间档略调亮度，保持平滑过渡) 测试：震源点测试更新为"time=0 时波谷最暗，time 推进后扫过波峰变亮"，反映删除高频涟漪后的纯主波纹行为。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * chore(workflow): 工作流相关代码中文文案全部英文化源码（src/workflow/ + packages/workflow-engine/src/）的中文注释、用户可见错误消息、字符串字面量；测试文件的标题与注释；同步 6 条硬编码断言到英文化后的错误消息。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(effort): 波纹 v4 — 平滑波 + 全色环旋转 + 淡入淡出 + 宽度自适应 - 波函数改 (sin+1)/2：消除 max(0,sin) 平直暗带（约 6 行宽） - 主色相连续旋转（0.03°/ms，12s/圈全色环）：蓝→紫→品红→红→橙→黄→绿→青 - 文字 overlay 同步色相旋转（rotateHue 应用到 Faster/▲/档位名/分隔线/副标签） - 淡入淡出动画：fadeColor/fadeCells + fade 状态机 ~300ms 进出过渡 - 副标签固定 ultracode 段下方，不跟随光标移动 - 顶部/底部各加一行纯波纹行，视觉一致 - 宽度自适应终端列数：窄则 72，宽则铺满（computeSegment/computeRippleSourceX） - 快捷键改 plain Text，不参与波纹背景渲染 - 新增 18 测试（fadeColor/fadeCells/rotateHue/getHueShiftAtTime） Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * refactor: remove CYBER_RISK_MITIGATION_REMINDER from FileReadTool Co-Authored-By: deepseek-v4-pro <deepseek-ai@claude-code-best.win> * fix: prevent ReDoS in extractMeta regex by anchoring to splice boundary Co-Authored-By: deepseek-v4-pro <deepseek-ai@claude-code-best.win> * chore: 更新脚本 --------- Co-authored-by: glm-5.1 <zai-org@claude-code-best.win> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: deepseek-v4-pro <deepseek-ai@claude-code-best.win>
2026-06-21 15:55:50 +00:00 · 2026-06-14 18:13:49 +08:00
parent 3e3e1de81b
commit 58ee6419b1
130 changed files with 23347 additions and 885 deletions
--- a/src/workflow/tests/WorkflowsPanel.test.tsx
+++ b/src/workflow/tests/WorkflowsPanel.test.tsx
@@ -0,0 +1,197 @@
+import { expect, test } from 'bun:test';
+import { PassThrough } from 'node:stream';
+import React from 'react';
+import { wrappedRender as render } from '@anthropic/ink';
+import { SentryErrorBoundary } from '../../components/SentryErrorBoundary.js';
+import type { RunProgress } from '../progress/store.js';
+import { call as panelCall } from '../panel/panelCall.js';
+import { clampSelected, isRunTerminatedTransition, WorkflowsPanel } from '../panel/WorkflowsPanel.js';
+import { truncateLabel } from '../panel/AgentList.js';
+import { STATUS_DOT } from '../panel/status.js';
+import { __resetWorkflowServiceForTests, getWorkflowService } from '../service.js';
+
+// Pure function: clamp selection to valid range (same source as clampSelected inside the panel).
+test('clampSelected: empty list → 0; out of bounds → last; negative/NaN → 0; normal → original', () => {
+  expect(clampSelected(5, 0)).toBe(0);
+  expect(clampSelected(5, 3)).toBe(2);
+  expect(clampSelected(-3, 3)).toBe(0);
+  expect(clampSelected(1, 3)).toBe(1);
+  expect(clampSelected(0, 1)).toBe(0);
+  // NaN (e.g. uninitialized state) safely falls back to 0
+  expect(clampSelected(Number.NaN, 3)).toBe(0);
+});
+
+// truncateLabel: short label as-is; with `#number` suffix keep suffix, truncate prefix + ellipsis;
+// without suffix, cut from the right. Lets audit workflow's verify:${dim}#${idx} multi-finding still be distinguishable.
+test('truncateLabel: short label as-is; with #number suffix keep suffix and truncate prefix; without suffix cut from right', () => {
+  // short label as-is
+  expect(truncateLabel('agent-1', 18)).toBe('agent-1');
+  expect(truncateLabel('review:bugs', 18)).toBe('review:bugs');
+  // exactly max length (boundary)
+  expect(truncateLabel('review:correctness', 18)).toBe('review:correctness');
+  // over max + with #number suffix: keep suffix, truncate prefix + ellipsis
+  expect(truncateLabel('verify:correctness#0', 18)).toBe('verify:correctn…#0');
+  expect(truncateLabel('verify:architecture#15', 18)).toBe('verify:archite…#15');
+  // multi-digit #idx also distinguishable
+  expect(truncateLabel('verify:correctness#2', 18)).toBe('verify:correctn…#2');
+  // without #number suffix: cut from right (legacy behavior)
+  expect(truncateLabel('a-very-long-label-no-suffix', 18)).toBe('a-very-long-label-');
+});
+
+// STATUS_DOT covers four states, all visible dot characters.
+test('STATUS_DOT covers running/completed/failed/killed and is non-empty character', () => {
+  const statuses = ['running', 'completed', 'failed', 'killed'] as const;
+  for (const s of statuses) {
+    expect(STATUS_DOT[s]).toBeTruthy();
+    expect(STATUS_DOT[s].length).toBeGreaterThan(0);
+  }
+});
+
+// Progress data shape contract: fields read by the panel exist/are readable on a typical RunProgress,
+// preventing silent panel render breakage from store.ts structural drift.
+test('RunProgress field contract: keys read by panel all exist', () => {
+  const run: RunProgress = {
+    runId: 'r1',
+    workflowName: 'review',
+    status: 'running',
+    phases: [{ title: 'Find', status: 'done' }],
+    declaredPhases: ['Find', 'Review'],
+    currentPhase: 'Review',
+    agents: [{ id: 1, label: 'review:api', phase: 'Review', status: 'running' }],
+    agentCount: 1,
+    startedAt: 1,
+    updatedAt: 1,
+  };
+  // paths read by panel WorkflowList/Detail
+  expect(run.status).toBe('running');
+  expect(STATUS_DOT[run.status]).toBe('●');
+  expect(run.currentPhase).toBe('Review');
+  expect(run.agents.length).toBe(run.agentCount);
+  expect(run.phases[0]?.title).toBe('Find');
+  expect(run.phases[0]?.status).toBe('done');
+  expect(run.agents[0]?.label).toBe('review:api');
+});
+
+// Completed/failed shape: returnValue / error only shown when not running.
+test('RunProgress completed/failed shape: returnValue/error optional', () => {
+  const completed: RunProgress = {
+    runId: 'r2',
+    workflowName: 'w',
+    status: 'completed',
+    phases: [],
+    declaredPhases: [],
+    currentPhase: null,
+    agents: [],
+    agentCount: 0,
+    returnValue: 'ok',
+    startedAt: 2,
+    updatedAt: 2,
+  };
+  const failed: RunProgress = {
+    runId: 'r3',
+    workflowName: 'w',
+    status: 'failed',
+    phases: [],
+    declaredPhases: [],
+    currentPhase: null,
+    agents: [],
+    agentCount: 0,
+    error: 'boom',
+    startedAt: 3,
+    updatedAt: 3,
+  };
+  expect(completed.returnValue).toBe('ok');
+  expect(completed.error).toBeUndefined();
+  expect(failed.error).toBe('boom');
+  expect(failed.returnValue).toBeUndefined();
+  expect(STATUS_DOT['completed']).toBe('✓');
+  expect(STATUS_DOT['failed']).toBe('✗');
+});
+
+// Fix M: useSyncExternalStore / listNamed / child component throwing should not break through REPL.
+// panelCall must wrap WorkflowsPanel in SentryErrorBoundary.
+test('panelCall wraps WorkflowsPanel in SentryErrorBoundary (fix M regression)', async () => {
+  const element = (await (panelCall as unknown as (a: unknown, b: unknown, c: unknown) => Promise<React.ReactNode>)(
+    () => {},
+    { canUseTool: undefined },
+    '',
+  )) as React.ReactElement<{ name?: string; children: React.ReactNode }>;
+  expect(element.type).toBe(SentryErrorBoundary);
+  expect(element.props.name).toBe('WorkflowsPanel');
+  const child = element.props.children as React.ReactElement<{
+    onDone: () => void;
+  }>;
+  expect(child.type).toBe(WorkflowsPanel);
+  expect(React.isValidElement(child)).toBe(true);
+  expect(typeof child.props.onDone).toBe('function');
+});
+
+// ---- Task 6: panel mount triggers loadPersistedRuns once ----
+// Verify that WorkflowsPanel mount calls svc.loadPersistedRuns() exactly once.
+// The persistedLoaded flag inside service guards idempotency; re-render / re-mount does not repeat the call.
+// Use a spy to replace the singleton's loadPersistedRuns, render to a PassThrough stream, wait for useEffect to trigger.
+
+test('WorkflowsPanel mount triggers loadPersistedRuns once', async () => {
+  __resetWorkflowServiceForTests();
+  const svc = getWorkflowService();
+  let calls = 0;
+  const orig = svc.loadPersistedRuns.bind(svc);
+  svc.loadPersistedRuns = async () => {
+    calls++;
+  };
+
+  const stdout = new PassThrough();
+  // consume data to avoid buffer overflow (render writes multiple frames)
+  stdout.on('data', () => {});
+  let instance: { unmount: () => void; waitUntilExit: () => Promise<void> } | undefined;
+  try {
+    instance = await render(
+      React.createElement(WorkflowsPanel, {
+        onDone: () => {},
+        context: { canUseTool: undefined } as never,
+      }),
+      { stdout: stdout as unknown as NodeJS.WriteStream, patchConsole: false },
+    );
+    // after mount useEffect triggers asynchronously; wait a tick for React commit + effect to complete
+    await new Promise(r => setTimeout(r, 30));
+
+    expect(calls).toBe(1);
+  } finally {
+    instance?.unmount();
+    svc.loadPersistedRuns = orig;
+    __resetWorkflowServiceForTests();
+  }
+});
+
+// When the focused run transitions from running to terminal, the panel auto onDone() (800ms delay lets the user see the terminal state).
+// Only same-runId state transitions trigger: switching to a completed tab does not exit; opening history panel does not exit either.
+// Transition detection logic is extracted into the isRunTerminatedTransition pure function for offline unit testing (Ink test mode does not
+// auto-pump concurrent state updates, integration tests are unreliable).
+test('isRunTerminatedTransition: same runId running → terminal triggers; other cases do not trigger', () => {
+  const running = { runId: 'r1', status: 'running' as const };
+  const completed = { runId: 'r1', status: 'completed' as const };
+  const failed = { runId: 'r1', status: 'failed' as const };
+  const killed = { runId: 'r1', status: 'killed' as const };
+
+  // same run running → terminal: all three terminal states trigger
+  expect(isRunTerminatedTransition(running, completed)).toBe(true);
+  expect(isRunTerminatedTransition(running, failed)).toBe(true);
+  expect(isRunTerminatedTransition(running, killed)).toBe(true);
+
+  // prev=null (open history panel): does not trigger
+  expect(isRunTerminatedTransition(null, completed)).toBe(false);
+  // curr=null (runs cleared): does not trigger
+  expect(isRunTerminatedTransition(running, null)).toBe(false);
+
+  // different runId (switch tab): does not trigger
+  expect(isRunTerminatedTransition({ runId: 'r1', status: 'running' }, { runId: 'r2', status: 'completed' })).toBe(
+    false,
+  );
+
+  // same run but prev not running (already terminal and re-rendered): does not trigger
+  expect(isRunTerminatedTransition(completed, completed)).toBe(false);
+  expect(isRunTerminatedTransition(killed, completed)).toBe(false);
+
+  // same run running → running (no change): does not trigger
+  expect(isRunTerminatedTransition(running, running)).toBe(false);
+});
--- a/src/workflow/tests/claudeCodeBackend.test.ts
+++ b/src/workflow/tests/claudeCodeBackend.test.ts
@@ -0,0 +1,398 @@
+import { expect, test, mock } from 'bun:test'
+
+// Note: mock specifier must resolve to the same module that impl actually imports (bun mock.module
+// matches by resolved module). impl uses '@claude-code-best/builtin-tools/...' and 'src/*' alias
+// path imports, so the same specifier is used here.
+mock.module(
+  '@claude-code-best/builtin-tools/tools/AgentTool/runAgent.js',
+  () => ({
+    runAgent: async function* () {
+      yield {
+        type: 'assistant',
+        message: { content: [{ type: 'text', text: 'agent-text' }] },
+      }
+    },
+  }),
+)
+mock.module(
+  '@claude-code-best/builtin-tools/tools/AgentTool/agentToolUtils.js',
+  () => ({
+    finalizeAgentTool: () => ({
+      content: [{ type: 'text', text: 'agent-text' }],
+      usage: { output_tokens: 42 },
+      totalTokens: 42,
+      totalToolUseCount: 3,
+    }),
+  }),
+)
+mock.module(
+  '@claude-code-best/builtin-tools/tools/AgentTool/loadAgentsDir.js',
+  () => ({
+    isBuiltInAgent: () => true,
+  }),
+)
+mock.module('src/tools.js', () => ({ assembleToolPool: () => ({ tools: [] }) }))
+mock.module('src/utils/messages.js', () => ({
+  // Return a shape that satisfies UserMessage consumers process-wide.
+  // Bun's mock.module is process-global (last-write-wins), so an incomplete
+  // mock here corrupts every later test that imports the real createUserMessage
+  // (e.g. bridgeMessaging.test.ts's `type !== 'user'` early-exit, or
+  // processSlashCommand.test.ts's `message.content` access). Mirror the real
+  // shape from src/utils/messages.ts: type + message envelope + passthrough.
+  createUserMessage: (
+    o: {
+      content: string
+    } & Record<string, unknown>,
+  ) => ({
+    type: 'user' as const,
+    message: { role: 'user', content: o.content },
+    ...o,
+  }),
+  extractTextContent: () => 'agent-text',
+}))
+mock.module('src/utils/uuid.js', () => ({ createAgentId: () => 'agent-1' }))
+mock.module('src/services/analytics/index.js', () => ({ logEvent: () => {} }))
+mock.module('src/utils/debug.js', () => ({ logForDebugging: () => {} }))
+
+// isolation:'worktree' tests: mock worktree trio (to avoid actually running git worktree add).
+// Note mock.module is process-global; worktreeState is defined outside the factory for test reset.
+// Do not mock cwd.js: runWithCwdOverride actually running AsyncLocalStorage is harmless to mocked runAgent,
+// and avoids polluting other tests in the same process that depend on pwd/getCwd.
+const worktreeState = {
+  shouldThrow: false,
+  hasChanges: false,
+  created: [] as string[],
+  removed: [] as string[],
+  changesCalls: 0,
+}
+mock.module('src/utils/worktree.js', () => ({
+  createAgentWorktree: async (slug: string) => {
+    if (worktreeState.shouldThrow) throw new Error('wt boom')
+    worktreeState.created.push(slug)
+    return {
+      worktreePath: '/fake/wt',
+      worktreeBranch: 'wt-branch',
+      headCommit: 'abc123',
+      gitRoot: '/fake',
+      hookBased: false,
+    }
+  },
+  hasWorktreeChanges: async () => {
+    worktreeState.changesCalls++
+    return worktreeState.hasChanges
+  },
+  removeAgentWorktree: async (path: string) => {
+    worktreeState.removed.push(path)
+    return true
+  },
+}))
+
+import { WorkflowAbortedError } from '@claude-code-best/workflow-engine'
+import {
+  claudeCodeBackend,
+  resolveAgentDefinition,
+  mapWorkflowModel,
+  extractStructuredOutput,
+  WORKFLOW_AGENT,
+} from '../backends/claudeCodeBackend.js'
+import { makeHostHandle } from '../hostHandle.js'
+
+function ctx() {
+  return {
+    host: makeHostHandle({
+      toolUseContext: {
+        options: {
+          agentDefinitions: { activeAgents: [] },
+          querySource: 'workflow',
+          mainLoopModel: 'm',
+        },
+        getAppState: () => ({
+          toolPermissionContext: {
+            mode: 'acceptEdits',
+            alwaysAllowRules: {},
+          },
+          mcp: { tools: [] },
+        }),
+      } as never,
+      canUseTool: (() => Promise.resolve({ behavior: 'allow' })) as never,
+      // run() does not read parentMessage; use an empty object placeholder to satisfy the WorkflowHostBundle type.
+      parentMessage: {} as never,
+    }),
+    signal: new AbortController().signal,
+    runId: 'r1',
+    agentId: 1,
+  }
+}
+
+test('text agent → ok + token/tool/model accounting', async () => {
+  const res = await claudeCodeBackend.run({ prompt: 'do it' }, ctx())
+  expect(res.kind).toBe('ok')
+  if (res.kind === 'ok') {
+    expect(res.output).toBe('agent-text')
+    expect(res.usage.outputTokens).toBe(42)
+    // panel display fields: tokenCount(=totalTokens) / toolCount / model (fallback mainLoopModel 'm')
+    expect(res.tokenCount).toBe(42)
+    expect(res.toolCount).toBe(3)
+    expect(res.model).toBe('m')
+  }
+})
+
+test('isolation:worktree → create worktree + auto-cleanup on no changes; slug matches cleanup regex', async () => {
+  worktreeState.shouldThrow = false
+  worktreeState.hasChanges = false
+  worktreeState.created = []
+  worktreeState.removed = []
+  worktreeState.changesCalls = 0
+  const res = await claudeCodeBackend.run(
+    { prompt: 'do', isolation: 'worktree' },
+    ctx(),
+  )
+  expect(res.kind).toBe('ok')
+  expect(worktreeState.created).toHaveLength(1)
+  // slug must match cleanupStaleAgentWorktrees cleanup regex ^wf_[0-9a-f]{8}-[0-9a-f]{3}-\d+$
+  expect(worktreeState.created[0]).toMatch(/^wf_[0-9a-f]{8}-[0-9a-f]{3}-\d+$/)
+  expect(worktreeState.changesCalls).toBe(1)
+  expect(worktreeState.removed).toHaveLength(1) // no changes → auto-remove
+})
+
+test('isolation:worktree has changes → keep worktree (no remove)', async () => {
+  worktreeState.hasChanges = true
+  worktreeState.created = []
+  worktreeState.removed = []
+  worktreeState.changesCalls = 0
+  const res = await claudeCodeBackend.run(
+    { prompt: 'do', isolation: 'worktree' },
+    ctx(),
+  )
+  expect(res.kind).toBe('ok')
+  expect(worktreeState.removed).toHaveLength(0) // has changes → keep
+  expect(worktreeState.changesCalls).toBe(1)
+})
+
+test('isolation:worktree creation fails → fail-closed returns dead (does not silently degrade to shared cwd)', async () => {
+  worktreeState.shouldThrow = true
+  const res = await claudeCodeBackend.run(
+    { prompt: 'do', isolation: 'worktree' },
+    ctx(),
+  )
+  expect(res.kind).toBe('dead')
+  worktreeState.shouldThrow = false
+})
+
+test('no isolation → no worktree created', async () => {
+  worktreeState.created = []
+  const res = await claudeCodeBackend.run({ prompt: 'do' }, ctx())
+  expect(res.kind).toBe('ok')
+  expect(worktreeState.created).toHaveLength(0)
+})
+
+test('runAgent throws → dead', async () => {
+  // override mock so runAgent throws (last-write-wins)
+  mock.module(
+    '@claude-code-best/builtin-tools/tools/AgentTool/runAgent.js',
+    () => ({
+      // biome-ignore lint/correctness/useYield: intentionally throws to test dead branch (no yield)
+      runAgent: async function* () {
+        throw new Error('boom')
+      },
+    }),
+  )
+  const res = await claudeCodeBackend.run({ prompt: 'fail' }, ctx())
+  expect(res.kind).toBe('dead')
+})
+
+// The next three groups of tests cover the 'x' invalid fix: backend must bridge ctx.signal to runAgent.override
+// .abortController, and recognize AbortError as abort (throw WorkflowAbortedError, not swallow as dead).
+// Also verify registerAgentAbort injection so service.kill(runId, agentId) can precisely abort a single agent.
+
+test('ctx.signal pre-abort → backend bridge: override.abortController.signal.aborted=true', async () => {
+  // use capturedOverride to expose the agentAbort created by backend (the override.abortController received by mock)
+  let capturedController: AbortController | undefined
+  mock.module(
+    '@claude-code-best/builtin-tools/tools/AgentTool/runAgent.js',
+    () => ({
+      runAgent: async function* (opts: {
+        override?: { abortController?: AbortController }
+      }) {
+        capturedController = opts.override?.abortController
+        yield {
+          type: 'assistant',
+          message: { content: [{ type: 'text', text: 'x' }] },
+        }
+      },
+    }),
+  )
+  const parentAbort = new AbortController()
+  parentAbort.abort()
+  // mock does not throw → backend takes the normal return path; but the bridge `if (ctx.signal.aborted) agentAbort.abort()`
+  // has already triggered synchronously, capturedController.signal.aborted must be true (root cause of kill bridge)
+  await claudeCodeBackend.run(
+    { prompt: 'pre-aborted' },
+    { ...ctx(), signal: parentAbort.signal },
+  )
+  expect(capturedController?.signal.aborted).toBe(true)
+})
+
+test('runAgent throws AbortError → backend throws WorkflowAbortedError (not swallowed as dead)', async () => {
+  mock.module(
+    '@claude-code-best/builtin-tools/tools/AgentTool/runAgent.js',
+    () => ({
+      // biome-ignore lint/correctness/useYield: intentionally throws AbortError to test recognition branch
+      runAgent: async function* () {
+        const e = new Error('aborted by parent')
+        e.name = 'AbortError'
+        throw e
+      },
+    }),
+  )
+  await expect(
+    claudeCodeBackend.run({ prompt: 'abort' }, ctx()),
+  ).rejects.toBeInstanceOf(WorkflowAbortedError)
+})
+
+test('registerAgentAbort/unregisterAgentAbort injection: key=ctx.agentId (number), controller from bridge', async () => {
+  // restore default mock (previous test changed it to throw AbortError)
+  mock.module(
+    '@claude-code-best/builtin-tools/tools/AgentTool/runAgent.js',
+    () => ({
+      runAgent: async function* () {
+        yield {
+          type: 'assistant',
+          message: { content: [{ type: 'text', text: 'agent-text' }] },
+        }
+      },
+    }),
+  )
+  const registered: Array<{ id: number; controller: AbortController }> = []
+  const unregistered: number[] = []
+  await claudeCodeBackend.run(
+    { prompt: 'wiring' },
+    {
+      ...ctx(),
+      agentId: 42,
+      registerAgentAbort: (id, ac) => registered.push({ id, controller: ac }),
+      unregisterAgentAbort: id => unregistered.push(id),
+    },
+  )
+  expect(registered).toHaveLength(1)
+  expect(registered[0]?.id).toBe(42) // engine numeric agentId (not coreAgentId string)
+  expect(registered[0]?.controller).toBeInstanceOf(AbortController)
+  expect(unregistered).toEqual([42]) // finally cleanup idempotent
+})
+
+test('id and capabilities shape', () => {
+  expect(claudeCodeBackend.id).toBe('claude-code')
+  expect(claudeCodeBackend.capabilities.structuredOutput).toBe(true)
+  expect(claudeCodeBackend.capabilities.tools).toBe(true)
+})
+
+test('resolveAgentDefinition: no agentType → WORKFLOW_AGENT fallback', () => {
+  const tuc = {
+    options: { agentDefinitions: { activeAgents: [] } },
+  } as never
+  expect(resolveAgentDefinition(undefined, tuc)).toBe(WORKFLOW_AGENT)
+})
+
+test('resolveAgentDefinition: hits activeAgents', () => {
+  const fake = { agentType: 'Explore', permissionMode: 'plan' } as never
+  const tuc = {
+    options: { agentDefinitions: { activeAgents: [fake] } },
+  } as never
+  expect(resolveAgentDefinition('Explore', tuc)).toBe(fake)
+  // miss still falls back
+  expect(resolveAgentDefinition('Nope', tuc)).toBe(WORKFLOW_AGENT)
+})
+
+test('mapWorkflowModel passthrough', () => {
+  expect(mapWorkflowModel(undefined)).toBeUndefined()
+  expect(mapWorkflowModel('claude-haiku-*')).toBe('claude-haiku-*')
+})
+
+test('extractStructuredOutput: valid JSON extracted; invalid returns null', () => {
+  expect(
+    extractStructuredOutput([
+      { type: 'text', text: 'prefix {"a":1,"b":2} suffix' },
+    ]),
+  ).toEqual({ a: 1, b: 2 })
+  expect(
+    extractStructuredOutput([{ type: 'text', text: 'no json here' }]),
+  ).toBeNull()
+  expect(extractStructuredOutput([])).toBeNull()
+})
+
+test('extractStructuredOutput: fenced code block (strip fence + strip language tag)', () => {
+  expect(
+    extractStructuredOutput([
+      {
+        type: 'text',
+        text: 'Here are the findings:\n```json\n{"findings":[{"title":"x"}]}\n```\nDone.',
+      },
+    ]),
+  ).toEqual({ findings: [{ title: 'x' }] })
+  // no language tag
+  expect(
+    extractStructuredOutput([{ type: 'text', text: '```\n{"a":1}\n```' }]),
+  ).toEqual({ a: 1 })
+})
+
+test('extractStructuredOutput: nested object (bracket-balanced scan; legacy indexOf/lastIndexOf would cross-block concat)', () => {
+  const text = 'Result: {"outer":{"inner":{"deep":true}},"n":3} trailing'
+  expect(extractStructuredOutput([{ type: 'text', text }])).toEqual({
+    outer: { inner: { deep: true } },
+    n: 3,
+  })
+})
+
+test('extractStructuredOutput: brackets inside strings are not counted as pairing', () => {
+  // } inside a string does not zero out depth, scan can skip to the real pairing }
+  const text = '{"note":"this } char is in a string","ok":true}'
+  expect(extractStructuredOutput([{ type: 'text', text }])).toEqual({
+    note: 'this } char is in a string',
+    ok: true,
+  })
+})
+
+test('extractStructuredOutput: escaped quotes do not break string boundary', () => {
+  const text = '{"escaped":"he said \\"hi\\"","n":1}'
+  expect(extractStructuredOutput([{ type: 'text', text }])).toEqual({
+    escaped: 'he said "hi"',
+    n: 1,
+  })
+})
+
+test('extractStructuredOutput: multiple JSON blocks → return first parse success', () => {
+  // first one unbalanced (no pairing }), skip to the second
+  const text = 'broken { stuff\n{"real":1}\n{"ignored":2}'
+  expect(extractStructuredOutput([{ type: 'text', text }])).toEqual({ real: 1 })
+})
+
+test('extractStructuredOutput: array / number / string / null do not count as object', () => {
+  expect(
+    extractStructuredOutput([{ type: 'text', text: '[1,2,3]' }]),
+  ).toBeNull()
+  expect(extractStructuredOutput([{ type: 'text', text: '42' }])).toBeNull()
+  expect(
+    extractStructuredOutput([{ type: 'text', text: '"raw string"' }]),
+  ).toBeNull()
+  expect(extractStructuredOutput([{ type: 'text', text: 'null' }])).toBeNull()
+})
+
+test('extractStructuredOutput: multiple text blocks → cross-block find first success', () => {
+  expect(
+    extractStructuredOutput([
+      { type: 'text', text: 'no json' },
+      { type: 'text', text: '```json\n{"k":"v"}\n```' },
+    ]),
+  ).toEqual({ k: 'v' })
+})
+
+test('extractStructuredOutput: broken JSON returns null (does not throw)', () => {
+  expect(
+    extractStructuredOutput([
+      { type: 'text', text: '{broken: missing quotes}' },
+    ]),
+  ).toBeNull()
+  expect(
+    extractStructuredOutput([{ type: 'text', text: '{"a":1,}' }]), // trailing comma — no syntax repair
+  ).toBeNull()
+})
--- a/src/workflow/tests/notifications.test.ts
+++ b/src/workflow/tests/notifications.test.ts
@@ -0,0 +1,176 @@
+import { describe, expect, test } from 'bun:test'
+import type { RunProgress } from '../progress/store.js'
+import type { WorkflowService } from '../service.js'
+
+function makeMockService(runs: RunProgress[]): {
+  service: WorkflowService
+  emit: () => void
+  setRuns: (runs: RunProgress[]) => void
+} {
+  let current = runs
+  const listeners = new Set<() => void>()
+  return {
+    service: {
+      ports: {},
+      launch: async () => ({ runId: 'x' }),
+      kill: () => {},
+      listRuns: () => current,
+      getRun: () => undefined,
+      subscribe: (fn: () => void) => {
+        listeners.add(fn)
+        return () => {
+          listeners.delete(fn)
+        }
+      },
+      listNamed: async () => [],
+    } as unknown as WorkflowService,
+    emit: () => {
+      for (const fn of listeners) fn()
+    },
+    setRuns: r => {
+      current = r
+    },
+  }
+}
+
+function makeRun(
+  runId: string,
+  status: RunProgress['status'],
+  overrides: Partial<RunProgress> = {},
+): RunProgress {
+  return {
+    runId,
+    workflowName: 'wf',
+    status,
+    phases: [],
+    declaredPhases: [],
+    currentPhase: null,
+    agents: [],
+    agentCount: 0,
+    startedAt: Date.now(),
+    updatedAt: Date.now(),
+    ...overrides,
+  }
+}
+
+describe('installWorkflowNotifications', () => {
+  test('running → completed triggers notification (incl. workflow name)', async () => {
+    const { installWorkflowNotifications } = await import('../notifications.js')
+    const { service, emit, setRuns } = makeMockService([
+      makeRun('r1', 'running'),
+    ])
+    const calls: string[] = []
+    const unsubscribe = installWorkflowNotifications(service, msg =>
+      calls.push(msg),
+    )
+
+    // first emit: listener records initial running state, no notification
+    emit()
+    expect(calls.length).toBe(0)
+
+    setRuns([makeRun('r1', 'completed')])
+    emit()
+
+    expect(calls.length).toBe(1)
+    expect(calls[0]).toMatch(/task-notification/)
+    expect(calls[0]).toMatch(/completed successfully/)
+    expect(calls[0]).toMatch(/"wf"/)
+    unsubscribe()
+  })
+
+  test('running → failed triggers notification, includes error text', async () => {
+    const { installWorkflowNotifications } = await import('../notifications.js')
+    const { service, emit, setRuns } = makeMockService([
+      makeRun('r1', 'running'),
+    ])
+    const calls: string[] = []
+    installWorkflowNotifications(service, msg => calls.push(msg))
+
+    emit() // record initial running
+    setRuns([makeRun('r1', 'failed', { error: 'agent X boom' })])
+    emit()
+
+    expect(calls.length).toBe(1)
+    expect(calls[0]).toMatch(/failed/)
+    expect(calls[0]).toMatch(/agent X boom/)
+  })
+
+  test('running → killed triggers notification', async () => {
+    const { installWorkflowNotifications } = await import('../notifications.js')
+    const { service, emit, setRuns } = makeMockService([
+      makeRun('r1', 'running'),
+    ])
+    const calls: string[] = []
+    installWorkflowNotifications(service, msg => calls.push(msg))
+
+    emit() // record initial running
+    setRuns([makeRun('r1', 'killed')])
+    emit()
+
+    expect(calls.length).toBe(1)
+    expect(calls[0]).toMatch(/was stopped/)
+  })
+
+  test('first time seeing run (no prev) does not notify (avoid notifying historical runs on startup)', async () => {
+    const { installWorkflowNotifications } = await import('../notifications.js')
+    const { service, emit, setRuns } = makeMockService([])
+    const calls: string[] = []
+    installWorkflowNotifications(service, msg => calls.push(msg))
+
+    // first emit after startup, sees r1 already completed — should not notify (not a transition from running)
+    setRuns([makeRun('r1', 'completed')])
+    emit()
+
+    expect(calls.length).toBe(0)
+  })
+
+  test('running → running does not notify', async () => {
+    const { installWorkflowNotifications } = await import('../notifications.js')
+    const { service, emit, setRuns } = makeMockService([
+      makeRun('r1', 'running'),
+    ])
+    const calls: string[] = []
+    installWorkflowNotifications(service, msg => calls.push(msg))
+
+    emit() // record initial running
+    setRuns([makeRun('r1', 'running', { agentCount: 1 })])
+    emit()
+
+    expect(calls.length).toBe(0)
+  })
+
+  test('already completed run emitting again does not repeat notification', async () => {
+    const { installWorkflowNotifications } = await import('../notifications.js')
+    const { service, emit, setRuns } = makeMockService([
+      makeRun('r1', 'running'),
+    ])
+    const calls: string[] = []
+    installWorkflowNotifications(service, msg => calls.push(msg))
+
+    emit() // record initial running
+    setRuns([makeRun('r1', 'completed')])
+    emit()
+    expect(calls.length).toBe(1)
+
+    emit()
+    expect(calls.length).toBe(1)
+  })
+
+  test('after unsubscribe no more notifications', async () => {
+    const { installWorkflowNotifications } = await import('../notifications.js')
+    const { service, emit, setRuns } = makeMockService([
+      makeRun('r1', 'running'),
+    ])
+    const calls: string[] = []
+    const unsubscribe = installWorkflowNotifications(service, msg =>
+      calls.push(msg),
+    )
+
+    emit() // record initial running
+    unsubscribe()
+    setRuns([makeRun('r1', 'completed')])
+    emit()
+
+    expect(calls.length).toBe(0)
+  })
+})
--- a/src/workflow/tests/persistence.test.ts
+++ b/src/workflow/tests/persistence.test.ts
@@ -0,0 +1,199 @@
+import { expect, test } from 'bun:test'
+import {
+  mkdir,
+  mkdtemp,
+  readFile,
+  readdir,
+  rm,
+  writeFile as fsWriteFile,
+} from 'node:fs/promises'
+import { tmpdir } from 'node:os'
+import { join } from 'node:path'
+import {
+  getRunsDir,
+  listPersistedRuns,
+  readRunState,
+  writeRunState,
+} from '../persistence.js'
+import type { RunProgress } from '../progress/store.js'
+
+function makeRun(over: Partial<RunProgress> = {}): RunProgress {
+  return {
+    runId: 'r1',
+    workflowName: 'w',
+    status: 'completed',
+    phases: [],
+    declaredPhases: [],
+    currentPhase: null,
+    agents: [],
+    agentCount: 0,
+    startedAt: 1000,
+    updatedAt: 2000,
+    ...over,
+  } as RunProgress
+}
+
+test('writeRunState → readRunState round-trip consistent (returnValue is object)', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    const run = makeRun({
+      returnValue: { confirmedCount: 2, items: ['a', 'b'] },
+    })
+    await writeRunState(dir, run)
+    const got = await readRunState(dir, 'r1')
+    expect(got).not.toBeNull()
+    expect(got!.runId).toBe('r1')
+    expect(got!.returnValue).toEqual({ confirmedCount: 2, items: ['a', 'b'] })
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('readRunState missing file → null', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    const got = await readRunState(dir, 'never-exists')
+    expect(got).toBeNull()
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('readRunState corrupt JSON → null', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    await mkdir(join(dir, 'rX'), { recursive: true })
+    await fsWriteFile(join(dir, 'rX', 'state.json'), '{not valid json', 'utf-8')
+    const got = await readRunState(dir, 'rX')
+    expect(got).toBeNull()
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('readRunState schemaVersion mismatch → null', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    await mkdir(join(dir, 'rX'), { recursive: true })
+    await fsWriteFile(
+      join(dir, 'rX', 'state.json'),
+      JSON.stringify({ schemaVersion: 999, run: makeRun({ runId: 'rX' }) }),
+      'utf-8',
+    )
+    const got = await readRunState(dir, 'rX')
+    expect(got).toBeNull()
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('writeRunState atomic write: no tmp residue after success', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    await writeRunState(dir, makeRun({ runId: 'rAtom' }))
+    const sub = await readdir(join(dir, 'rAtom'))
+    expect(sub).toContain('state.json')
+    expect(sub).not.toContain('state.json.tmp')
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('listPersistedRuns scans multiple subdirs, skips dirs without state.json, sorts by updatedAt desc', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    // three valid runs + one half-broken dir with only journal, no state.json
+    await writeRunState(dir, makeRun({ runId: 'old', updatedAt: 1000 }))
+    await writeRunState(dir, makeRun({ runId: 'mid', updatedAt: 2000 }))
+    await writeRunState(dir, makeRun({ runId: 'new', updatedAt: 3000 }))
+    await mkdir(join(dir, 'half-broken'), { recursive: true })
+
+    const runs = await listPersistedRuns(dir)
+    expect(runs.map(r => r.runId)).toEqual(['new', 'mid', 'old'])
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('listPersistedRuns scans a corrupt state.json → skip that single one, continue scanning the rest', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    await writeRunState(dir, makeRun({ runId: 'good' }))
+    await mkdir(join(dir, 'bad'), { recursive: true })
+    await fsWriteFile(join(dir, 'bad', 'state.json'), 'corrupt', 'utf-8')
+
+    const runs = await listPersistedRuns(dir)
+    expect(runs.map(r => r.runId)).toEqual(['good'])
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('writeRunState does not throw when returnValue is null/string/array', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    await writeRunState(dir, makeRun({ runId: 'n', returnValue: null }))
+    await writeRunState(dir, makeRun({ runId: 's', returnValue: 'text' }))
+    await writeRunState(dir, makeRun({ runId: 'a', returnValue: [1, 2, 3] }))
+    expect((await readRunState(dir, 'n'))!.returnValue).toBeNull()
+    expect((await readRunState(dir, 's'))!.returnValue).toBe('text')
+    expect((await readRunState(dir, 'a'))!.returnValue).toEqual([1, 2, 3])
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('writeRunState overwrite: same runId second write overwrites old content', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    await writeRunState(dir, makeRun({ runId: 'rOV', status: 'running' }))
+    await writeRunState(dir, makeRun({ runId: 'rOV', status: 'completed' }))
+    const got = await readRunState(dir, 'rOV')
+    expect(got!.status).toBe('completed')
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('writeRunState writes full AgentProgress (no output content, includes label/phase/token etc.)', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    const run = makeRun({
+      runId: 'rAg',
+      agents: [
+        {
+          id: 1,
+          label: 'review:hooks',
+          phase: 'Review',
+          status: 'done',
+          outputShape: 'object',
+          tokenCount: 12345,
+          toolCount: 3,
+          model: 'claude-sonnet-4-6',
+        },
+      ],
+      agentCount: 1,
+    })
+    await writeRunState(dir, run)
+    const got = await readRunState(dir, 'rAg')
+    expect(got!.agents).toHaveLength(1)
+    expect(got!.agents[0]).toEqual({
+      id: 1,
+      label: 'review:hooks',
+      phase: 'Review',
+      status: 'done',
+      outputShape: 'object',
+      tokenCount: 12345,
+      toolCount: 3,
+      model: 'claude-sonnet-4-6',
+    })
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('getRunsDir returns <projectRoot>/.claude/workflow-runs shape', () => {
+  const dir = getRunsDir()
+  // do not hard-code projectRoot (differs across machines), only check suffix structure
+  expect(dir.endsWith(`${join('.claude', 'workflow-runs')}`)).toBe(true)
+})
--- a/src/workflow/tests/ports.test.ts
+++ b/src/workflow/tests/ports.test.ts
@@ -0,0 +1,198 @@
+import { expect, test } from 'bun:test'
+// Note: this test does not mock bootstrap/state, utils/cwd, analytics, debug.
+// Reason: mock.module is process-global (last-write-wins); mocking these common modules would pollute
+// other tests in the same process (e.g. src/commands/__tests__/autonomy.test.ts imports the real
+// bootstrap/state via its dependency chain). ports can resolve getProjectRoot/getCwd normally in the test env,
+// logEvent/logForDebugging are silent no-ops when sink is not attached, no need to mock.
+
+import { buildRegistry } from '../registry.js'
+import { createWorkflowPorts } from '../ports.js'
+import { createProgressBus } from '../progress/bus.js'
+import { createProgressStoreFromBus } from '../progress/store.js'
+import { getProjectRoot } from '../../bootstrap/state.js'
+import type { SetAppState } from '../../Task.js'
+import type { AppState } from '../../state/AppState.tsx'
+
+test('buildRegistry registers claude-code as default and resolve hits', () => {
+  const reg = buildRegistry()
+  expect(reg.has('claude-code')).toBe(true)
+  expect(reg.resolve({ prompt: 'x' }).id).toBe('claude-code')
+  expect(reg.resolve({ prompt: 'x', agentType: 'whatever' }).id).toBe(
+    'claude-code',
+  )
+})
+
+test('createWorkflowPorts assembles full ports (incl. agentAdapterRegistry and progressEmitter→bus)', () => {
+  const bus = createProgressBus()
+  const store = createProgressStoreFromBus(bus)
+  const ports = createWorkflowPorts({ bus, store })
+
+  expect(ports.agentAdapterRegistry).toBeDefined()
+  expect(ports.agentAdapterRegistry!.resolve({ prompt: 'x' }).id).toBe(
+    'claude-code',
+  )
+  expect(typeof ports.taskRegistrar.register).toBe('function')
+  expect(typeof ports.taskRegistrar.kill).toBe('function')
+  expect(typeof ports.hostFactory).toBe('function')
+  // agentRunner fallback fields still exist (WorkflowPorts required)
+  expect(ports.agentRunner).toBeDefined()
+  expect(typeof ports.agentRunner.runAgentToResult).toBe('function')
+
+  // progressEmitter via bus → store: emit a run_started, store can see it
+  ports.progressEmitter.emit({
+    type: 'run_started',
+    runId: 't',
+    workflowName: 'w',
+    meta: null,
+  })
+  expect(store.get('t')?.workflowName).toBe('w')
+})
+
+test('taskRegistrar.register/complete/kill routes via RunBinding (real setAppState, no mock)', () => {
+  const bus = createProgressBus()
+  const store = createProgressStoreFromBus(bus)
+  const ports = createWorkflowPorts({ bus, store })
+
+  // real setAppState: use a local AppState object to hold tasks, registerTask goes through the real code path.
+  const state = { tasks: {} } as unknown as AppState
+  const setAppState: SetAppState = f => {
+    Object.assign(state, f(state))
+  }
+
+  const hostCtx = ports.hostFactory({
+    context: {
+      agentId: 'a-1',
+      toolUseId: 'tu-1',
+      setAppState,
+    },
+    canUseTool: (() => Promise.resolve({ behavior: 'allow' })) as never,
+    parentMessage: {} as never,
+  })
+
+  const { runId, signal } = ports.taskRegistrar.register(
+    {
+      workflowName: 'wf',
+      summary: 'summary',
+      workflowFile: 'wf.ts',
+      toolUseId: 'tu-1',
+    },
+    hostCtx.handle,
+  )
+  expect(typeof runId).toBe('string')
+  expect(signal).toBeInstanceOf(AbortSignal)
+
+  // complete/fail/kill do not throw (RunBinding hit)
+  expect(() => ports.taskRegistrar.complete(runId, 'done')).not.toThrow()
+  expect(() => ports.taskRegistrar.kill(runId)).not.toThrow()
+  // unknown runId safe no-op
+  expect(() => ports.taskRegistrar.complete('nope')).not.toThrow()
+  expect(ports.taskRegistrar.pendingAction('nope')).toBeNull()
+
+  // after terminal state binding is reclaimed: calling complete on the same runId again should be safe no-op (no throw, no repeated call to workflow task fn)
+  ports.taskRegistrar.complete(runId)
+  ports.taskRegistrar.kill(runId)
+})
+
+// agent-level kill bridge: register → killAgent precisely aborts; kill(runId) aborts all agents.
+test('taskRegistrar agentAbortControllers: register/killAgent precise abort; kill(runId) batch abort', () => {
+  const bus = createProgressBus()
+  const store = createProgressStoreFromBus(bus)
+  const ports = createWorkflowPorts({ bus, store })
+  // impl always provides these — cast flattens optional to required (avoids per-line ! assertion)
+  const tr = ports.taskRegistrar as Required<typeof ports.taskRegistrar>
+
+  const state = { tasks: {} } as unknown as AppState
+  const setAppState: SetAppState = f => {
+    Object.assign(state, f(state))
+  }
+  const hostCtx = ports.hostFactory({
+    context: { agentId: 'a-1', toolUseId: 'tu-1', setAppState },
+    canUseTool: (() => Promise.resolve({ behavior: 'allow' })) as never,
+    parentMessage: {} as never,
+  })
+  const { runId } = tr.register(
+    {
+      workflowName: 'wf',
+      summary: 'summary',
+      workflowFile: 'wf.ts',
+      toolUseId: 'tu-1',
+    },
+    hostCtx.handle,
+  )
+
+  // register AbortController for two agents (simulating backend calling when launching agent)
+  const ac1 = new AbortController()
+  const ac2 = new AbortController()
+  tr.registerAgentAbort(runId, 1, ac1)
+  tr.registerAgentAbort(runId, 2, ac2)
+  expect(ac1.signal.aborted).toBe(false)
+  expect(ac2.signal.aborted).toBe(false)
+
+  // killAgent precisely aborts agent #1: only ac1 aborts, ac2 unaffected
+  expect(tr.killAgent(runId, 1)).toBe(true)
+  expect(ac1.signal.aborted).toBe(true)
+  expect(ac2.signal.aborted).toBe(false)
+  // repeat kill on same agent: controller already deleted, returns false (idempotent)
+  expect(tr.killAgent(runId, 1)).toBe(false)
+
+  // unknown agentId / unknown runId safe returns false
+  expect(tr.killAgent(runId, 999)).toBe(false)
+  expect(tr.killAgent('nope', 1)).toBe(false)
+
+  // kill(runId) batch aborts remaining agent (ac2)
+  tr.kill(runId)
+  expect(ac2.signal.aborted).toBe(true)
+
+  // after run terminal state binding is reclaimed: killAgent returns false
+  expect(tr.killAgent(runId, 2)).toBe(false)
+})
+
+test('unregisterAgentAbort deletes from Map (backend finally cleanup idempotent)', () => {
+  const bus = createProgressBus()
+  const store = createProgressStoreFromBus(bus)
+  const ports = createWorkflowPorts({ bus, store })
+  const tr = ports.taskRegistrar as Required<typeof ports.taskRegistrar>
+
+  const state = { tasks: {} } as unknown as AppState
+  const setAppState: SetAppState = f => {
+    Object.assign(state, f(state))
+  }
+  const hostCtx = ports.hostFactory({
+    context: { agentId: 'a-1', toolUseId: 'tu-1', setAppState },
+    canUseTool: (() => Promise.resolve({ behavior: 'allow' })) as never,
+    parentMessage: {} as never,
+  })
+  const { runId } = tr.register(
+    {
+      workflowName: 'wf',
+      summary: 'summary',
+      workflowFile: 'wf.ts',
+      toolUseId: 'tu-1',
+    },
+    hostCtx.handle,
+  )
+  const ac = new AbortController()
+  tr.registerAgentAbort(runId, 5, ac)
+  // after unregister killAgent has no target, returns false (does not throw)
+  tr.unregisterAgentAbort(runId, 5)
+  expect(tr.killAgent(runId, 5)).toBe(false)
+  // repeat unregister idempotent (backend finally does not throw)
+  expect(() => tr.unregisterAgentAbort(runId, 5)).not.toThrow()
+  // unknown runId safe no-op
+  expect(() => tr.unregisterAgentAbort('nope', 5)).not.toThrow()
+})
+
+test('hostFactory.cwd and journalStore share root (getProjectRoot) — fix K regression', () => {
+  // historical bug: hostFactory.cwd used getCwd(), journalStore used getProjectRoot(),
+  // when user enters worktree/subdirectory the two differ → named workflow resolution and journal persist out of sync.
+  // After fix both use projectRoot, this test locks-in that choice, preventing regression.
+  const bus = createProgressBus()
+  const store = createProgressStoreFromBus(bus)
+  const ports = createWorkflowPorts({ bus, store })
+  const hostCtx = ports.hostFactory({
+    context: { agentId: 'a', toolUseId: 'tu' },
+    canUseTool: (() => Promise.resolve({ behavior: 'allow' })) as never,
+    parentMessage: {} as never,
+  })
+  expect(hostCtx.cwd).toBe(getProjectRoot())
+})
--- a/src/workflow/tests/progressBus.test.ts
+++ b/src/workflow/tests/progressBus.test.ts
@@ -0,0 +1,23 @@
+import { expect, test, mock } from 'bun:test'
+import { createProgressBus } from '../progress/bus.js'
+
+test('emit broadcasts to all subscribers', () => {
+  const bus = createProgressBus()
+  const a = mock(() => {})
+  const b = mock(() => {})
+  bus.subscribe(a)
+  bus.subscribe(b)
+  const ev = { type: 'log' as const, runId: 'r', message: 'hi' }
+  bus.emit(ev)
+  expect(a).toHaveBeenCalledTimes(1)
+  expect(b).toHaveBeenCalledWith(ev)
+})
+
+test('subscribe returns unsubscribe', () => {
+  const bus = createProgressBus()
+  const fn = mock(() => {})
+  const unsub = bus.subscribe(fn)
+  unsub()
+  bus.emit({ type: 'log', runId: 'r', message: 'x' })
+  expect(fn).not.toHaveBeenCalled()
+})
--- a/src/workflow/tests/progressStore.test.ts
+++ b/src/workflow/tests/progressStore.test.ts
@@ -0,0 +1,289 @@
+import { expect, test } from 'bun:test'
+import { createProgressBus, type ProgressBus } from '../progress/bus.js'
+import {
+  createProgressStoreFromBus,
+  type RunProgress,
+} from '../progress/store.js'
+import type { AgentRunResult } from '@claude-code-best/workflow-engine'
+
+const ok = (o: string): AgentRunResult => ({
+  kind: 'ok',
+  output: o,
+  usage: { outputTokens: 1 },
+})
+
+function newStore() {
+  const bus: ProgressBus = createProgressBus()
+  return { bus, store: createProgressStoreFromBus(bus) }
+}
+
+test('run_started creates entry; phase_started/done updates phases', () => {
+  const { bus, store } = newStore()
+  bus.emit({ type: 'run_started', runId: 'r1', workflowName: 'w', meta: null })
+  bus.emit({ type: 'phase_started', runId: 'r1', phase: 'A' })
+  bus.emit({ type: 'phase_started', runId: 'r1', phase: 'B' })
+  bus.emit({ type: 'phase_done', runId: 'r1', phase: 'A' })
+  const r = store.get('r1')!
+  expect(r.phases.map(p => [p.title, p.status])).toEqual([
+    ['A', 'done'],
+    ['B', 'running'],
+  ])
+  expect(r.currentPhase).toBe('B')
+})
+
+test('concurrent agent_done correlates by agentId precisely (regression of old LIFO race)', () => {
+  const { bus, store } = newStore()
+  bus.emit({ type: 'run_started', runId: 'r1', workflowName: 'w', meta: null })
+  bus.emit({
+    type: 'agent_started',
+    runId: 'r1',
+    agentId: 0,
+    label: 'a',
+    phase: 'A',
+  })
+  bus.emit({
+    type: 'agent_started',
+    runId: 'r1',
+    agentId: 1,
+    label: 'b',
+    phase: 'A',
+  })
+  bus.emit({
+    type: 'agent_done',
+    runId: 'r1',
+    agentId: 1,
+    label: 'b',
+    phase: 'A',
+    result: ok('b-out'),
+  })
+  bus.emit({
+    type: 'agent_done',
+    runId: 'r1',
+    agentId: 0,
+    label: 'a',
+    phase: 'A',
+    result: ok('a-out'),
+  })
+  const agents = store.get('r1')!.agents
+  expect(agents.find(x => x.id === 0)?.status).toBe('done')
+  expect(agents.find(x => x.id === 1)?.status).toBe('done')
+  expect(agents.find(x => x.id === 0)?.label).toBe('a')
+  expect(agents.find(x => x.id === 1)?.label).toBe('b')
+})
+
+test('journal hit (agent_done without started) backfills done entry by id', () => {
+  const { bus, store } = newStore()
+  bus.emit({ type: 'run_started', runId: 'r1', workflowName: 'w', meta: null })
+  bus.emit({
+    type: 'agent_done',
+    runId: 'r1',
+    agentId: 7,
+    label: 'c',
+    phase: 'A',
+    result: ok('c'),
+  })
+  const a = store.get('r1')!.agents.find(x => x.id === 7)!
+  expect(a.status).toBe('done')
+})
+
+test('run_done terminal state + list sort + subscribe notification', () => {
+  const { bus, store } = newStore()
+  let calls = 0
+  store.subscribe(() => calls++)
+  bus.emit({ type: 'run_started', runId: 'r1', workflowName: 'w', meta: null })
+  bus.emit({
+    type: 'run_done',
+    runId: 'r1',
+    status: 'completed',
+    returnValue: 42,
+  })
+  const r = store.get('r1')!
+  expect(r.status).toBe('completed')
+  expect(r.returnValue).toBe(42)
+  expect(store.list().map(x => x.runId)).toEqual(['r1'])
+  expect(calls).toBe(2)
+})
+
+test('run_done failed terminal state records error', () => {
+  const { bus, store } = newStore()
+  bus.emit({ type: 'run_started', runId: 'r2', workflowName: 'w', meta: null })
+  bus.emit({ type: 'run_done', runId: 'r2', status: 'failed', error: 'boom' })
+  const r = store.get('r2')!
+  expect(r.status).toBe('failed')
+  expect(r.error).toBe('boom')
+})
+
+test('log event does not trigger notify', () => {
+  const { bus, store } = newStore()
+  let calls = 0
+  store.subscribe(() => calls++)
+  bus.emit({ type: 'run_started', runId: 'r3', workflowName: 'w', meta: null })
+  const before = calls
+  bus.emit({ type: 'log', runId: 'r3', message: 'hi' })
+  expect(calls).toBe(before) // log should not trigger notify
+})
+
+test('run_started persists declaredPhases (from meta.phases, order preserved)', () => {
+  const { bus, store } = newStore()
+  bus.emit({
+    type: 'run_started',
+    runId: 'r1',
+    workflowName: 'w',
+    meta: {
+      name: 'w',
+      description: 'd',
+      phases: [{ title: 'Find' }, { title: 'Review' }, { title: 'Verify' }],
+    },
+  })
+  expect(store.get('r1')!.declaredPhases).toEqual(['Find', 'Review', 'Verify'])
+})
+
+test('run_started meta is null → declaredPhases = []', () => {
+  const { bus, store } = newStore()
+  bus.emit({ type: 'run_started', runId: 'r1', workflowName: 'w', meta: null })
+  expect(store.get('r1')!.declaredPhases).toEqual([])
+})
+
+test('agent_done persists outputShape (ok·object / ok·text / dead none)', () => {
+  const { bus, store } = newStore()
+  bus.emit({ type: 'run_started', runId: 'r1', workflowName: 'w', meta: null })
+  bus.emit({ type: 'agent_started', runId: 'r1', agentId: 0, phase: 'A' })
+  bus.emit({ type: 'agent_started', runId: 'r1', agentId: 1, phase: 'A' })
+  bus.emit({ type: 'agent_started', runId: 'r1', agentId: 2, phase: 'A' })
+  bus.emit({
+    type: 'agent_done',
+    runId: 'r1',
+    agentId: 0,
+    phase: 'A',
+    result: { kind: 'ok', output: { x: 1 }, usage: { outputTokens: 1 } },
+  })
+  bus.emit({
+    type: 'agent_done',
+    runId: 'r1',
+    agentId: 1,
+    phase: 'A',
+    result: { kind: 'ok', output: 'hi', usage: { outputTokens: 1 } },
+  })
+  bus.emit({
+    type: 'agent_done',
+    runId: 'r1',
+    agentId: 2,
+    phase: 'A',
+    result: { kind: 'dead' },
+  })
+  const agents = store.get('r1')!.agents
+  expect(agents.find(a => a.id === 0)?.outputShape).toBe('object')
+  expect(agents.find(a => a.id === 1)?.outputShape).toBe('text')
+  expect(agents.find(a => a.id === 2)?.outputShape).toBeUndefined()
+})
+
+test('agent_progress real-time updates token/tool (correlated by agentId)', () => {
+  const { bus, store } = newStore()
+  bus.emit({ type: 'run_started', runId: 'r1', workflowName: 'w', meta: null })
+  bus.emit({
+    type: 'agent_started',
+    runId: 'r1',
+    agentId: 0,
+    label: 'a',
+    phase: 'A',
+  })
+  bus.emit({
+    type: 'agent_progress',
+    runId: 'r1',
+    agentId: 0,
+    tokenCount: 1200,
+    toolCount: 2,
+  })
+  let a = store.get('r1')!.agents.find(x => x.id === 0)!
+  expect(a.tokenCount).toBe(1200)
+  expect(a.toolCount).toBe(2)
+  bus.emit({
+    type: 'agent_progress',
+    runId: 'r1',
+    agentId: 0,
+    tokenCount: 2400,
+    toolCount: 3,
+  })
+  a = store.get('r1')!.agents.find(x => x.id === 0)!
+  expect(a.tokenCount).toBe(2400)
+  expect(a.toolCount).toBe(3)
+})
+
+test('agent_done persists model/tokenCount/toolCount (ok variant)', () => {
+  const { bus, store } = newStore()
+  bus.emit({ type: 'run_started', runId: 'r1', workflowName: 'w', meta: null })
+  bus.emit({ type: 'agent_started', runId: 'r1', agentId: 0, phase: 'A' })
+  bus.emit({
+    type: 'agent_done',
+    runId: 'r1',
+    agentId: 0,
+    phase: 'A',
+    result: {
+      kind: 'ok',
+      output: 'x',
+      usage: { outputTokens: 5 },
+      model: 'glm-5.2',
+      tokenCount: 22900,
+      toolCount: 1,
+    },
+  })
+  const a = store.get('r1')!.agents.find(x => x.id === 0)!
+  expect(a.model).toBe('glm-5.2')
+  expect(a.tokenCount).toBe(22900)
+  expect(a.toolCount).toBe(1)
+})
+
+// ---- hydrate: inject historical run from disk (cross-restart recovery) ----
+
+test('hydrate injects new run → get hits + list includes it + notifies listener', () => {
+  const { store } = newStore()
+  let notified = 0
+  store.subscribe(() => notified++)
+
+  const historical: RunProgress = {
+    runId: 'hist-1',
+    workflowName: 'old-job',
+    status: 'completed',
+    phases: [],
+    declaredPhases: [],
+    currentPhase: null,
+    agents: [],
+    agentCount: 5,
+    returnValue: { summary: 'past' },
+    startedAt: 1,
+    updatedAt: 2,
+  }
+  store.hydrate(historical)
+
+  expect(store.get('hist-1')).toBe(historical)
+  expect(store.list().map(r => r.runId)).toContain('hist-1')
+  expect(notified).toBeGreaterThan(0)
+})
+
+test('hydrate existing runId → skip (memory first, not overwritten by disk)', () => {
+  const { bus, store } = newStore()
+  bus.emit({
+    type: 'run_started',
+    runId: 'r1',
+    workflowName: 'live',
+    meta: null,
+  })
+
+  const stale: RunProgress = {
+    runId: 'r1',
+    workflowName: 'STALE-SHOULD-NOT-WIN',
+    status: 'completed',
+    phases: [],
+    declaredPhases: [],
+    currentPhase: null,
+    agents: [],
+    agentCount: 0,
+    startedAt: 1,
+    updatedAt: 2,
+  }
+  store.hydrate(stale)
+
+  const got = store.get('r1')!
+  expect(got.workflowName).toBe('live')
+  expect(got.status).toBe('running')
+})
--- a/src/workflow/tests/runStatePersistence.test.ts
+++ b/src/workflow/tests/runStatePersistence.test.ts
@@ -0,0 +1,177 @@
+import { expect, test } from 'bun:test'
+import { mkdtemp, rm, writeFile } from 'node:fs/promises'
+import { tmpdir } from 'node:os'
+import { join } from 'node:path'
+import { attachRunStatePersistence, readRunState } from '../persistence.js'
+import { createProgressBus } from '../progress/bus.js'
+import { createProgressStoreFromBus } from '../progress/store.js'
+
+/**
+ * Contract test for attachRunStatePersistence (adjusted Task 4):
+ * directly test the bus + store combination, bypassing makeService (keeps makeService signature (ports, store, cwdOverride?) unchanged).
+ *
+ * runsDir is injected as tmpdir via attachRunStatePersistence's third parameter runsDirProvider,
+ * to avoid writing to the real project directory (Bun ESM module namespace is read-only, cannot monkey-patch getRunsDir).
+ */
+
+test('run_done completed → writes state.json to disk, returnValue consistent', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-persist-'))
+  try {
+    const bus = createProgressBus()
+    const store = createProgressStoreFromBus(bus)
+    attachRunStatePersistence(bus, store, () => dir)
+
+    bus.emit({
+      type: 'run_started',
+      runId: 'rW',
+      workflowName: 'w',
+      meta: null,
+    })
+    bus.emit({
+      type: 'run_done',
+      runId: 'rW',
+      status: 'completed',
+      returnValue: { ok: true, n: 3 },
+    })
+
+    // writeRunState is async (void writeRunState(...) in the subscription); let the microtask complete
+    await new Promise(r => setTimeout(r, 50))
+
+    const got = await readRunState(dir, 'rW')
+    expect(got).not.toBeNull()
+    expect(got!.status).toBe('completed')
+    expect(got!.returnValue).toEqual({ ok: true, n: 3 })
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('run_done failed → writes status=failed + error field to disk', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-persist-'))
+  try {
+    const bus = createProgressBus()
+    const store = createProgressStoreFromBus(bus)
+    attachRunStatePersistence(bus, store, () => dir)
+
+    bus.emit({
+      type: 'run_started',
+      runId: 'rF',
+      workflowName: 'w',
+      meta: null,
+    })
+    bus.emit({
+      type: 'run_done',
+      runId: 'rF',
+      status: 'failed',
+      error: 'boom',
+    })
+    await new Promise(r => setTimeout(r, 50))
+
+    const got = await readRunState(dir, 'rF')
+    expect(got).not.toBeNull()
+    expect(got!.status).toBe('failed')
+    expect(got!.error).toBe('boom')
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('run_done killed → writes status=killed to disk', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-persist-'))
+  try {
+    const bus = createProgressBus()
+    const store = createProgressStoreFromBus(bus)
+    attachRunStatePersistence(bus, store, () => dir)
+
+    bus.emit({
+      type: 'run_started',
+      runId: 'rK',
+      workflowName: 'w',
+      meta: null,
+    })
+    bus.emit({ type: 'run_done', runId: 'rK', status: 'killed' })
+    await new Promise(r => setTimeout(r, 50))
+
+    const got = await readRunState(dir, 'rK')
+    expect(got?.status).toBe('killed')
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('writeRunState internal IO exception is swallowed: attachRunStatePersistence does not propagate, bus emit does not break', async () => {
+  const blockerDir = await mkdtemp(join(tmpdir(), 'wf-persist-'))
+  // first create a same-named file, so subdir mkdir fails → writeRunState internal catch swallows it
+  await writeFile(join(blockerDir, 'not-a-dir.txt'), 'blocker', 'utf-8')
+  try {
+    const bus = createProgressBus()
+    const store = createProgressStoreFromBus(bus)
+    // runsDir points to a dir whose parent path is a file: mkdir recursive fails
+    attachRunStatePersistence(bus, store, () =>
+      join(blockerDir, 'not-a-dir.txt'),
+    )
+
+    // an extra subscriber to verify it still gets notified (bus emit should not break due to internal exception in persistence listener)
+    let otherNotified = 0
+    bus.subscribe(() => otherNotified++)
+
+    // bus.emit should not throw — writeRunState swallows the exception internally
+    expect(() => {
+      bus.emit({
+        type: 'run_started',
+        runId: 'rErr',
+        workflowName: 'w',
+        meta: null,
+      })
+      bus.emit({
+        type: 'run_done',
+        runId: 'rErr',
+        status: 'completed',
+        returnValue: 'x',
+      })
+    }).not.toThrow()
+
+    // let writeRunState's microtask complete (exception swallowed internally)
+    await new Promise(r => setTimeout(r, 50))
+
+    // this store subscriber still works normally (received both run_started + run_done events)
+    expect(otherNotified).toBeGreaterThanOrEqual(2)
+    expect(store.get('rErr')?.status).toBe('completed')
+  } finally {
+    await rm(blockerDir, { recursive: true, force: true })
+  }
+})
+
+test('attachRunStatePersistence returns unsubscribe; after calling it no more disk writes', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-persist-'))
+  try {
+    const bus = createProgressBus()
+    const store = createProgressStoreFromBus(bus)
+    const unsub = attachRunStatePersistence(bus, store, () => dir)
+
+    // first emit a run_done, verify disk write takes effect
+    bus.emit({
+      type: 'run_started',
+      runId: 'r1',
+      workflowName: 'w',
+      meta: null,
+    })
+    bus.emit({ type: 'run_done', runId: 'r1', status: 'completed' })
+    await new Promise(r => setTimeout(r, 50))
+    expect(await readRunState(dir, 'r1')).not.toBeNull()
+
+    // after unsubscribe, emit run_done again, should not write to disk
+    unsub()
+    bus.emit({
+      type: 'run_started',
+      runId: 'r2',
+      workflowName: 'w',
+      meta: null,
+    })
+    bus.emit({ type: 'run_done', runId: 'r2', status: 'completed' })
+    await new Promise(r => setTimeout(r, 50))
+    expect(await readRunState(dir, 'r2')).toBeNull()
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
--- a/src/workflow/tests/selectors.test.ts
+++ b/src/workflow/tests/selectors.test.ts
@@ -0,0 +1,82 @@
+import { expect, test } from 'bun:test'
+import type { AgentProgress, RunProgress } from '../progress/store.js'
+import {
+  ALL_PHASE,
+  mergePhases,
+  filterAgentsByPhase,
+  tabLabel,
+} from '../panel/selectors.js'
+
+function run(partial: Partial<RunProgress>): RunProgress {
+  return {
+    runId: 'r1',
+    workflowName: 'w',
+    status: 'running',
+    phases: [],
+    declaredPhases: [],
+    currentPhase: null,
+    agents: [],
+    agentCount: 0,
+    startedAt: 1,
+    updatedAt: 1,
+    ...partial,
+  }
+}
+
+test('mergePhases: declared order first, actual phases append undeclared ones, counts done/total', () => {
+  const r = run({
+    declaredPhases: ['Find', 'Review', 'Verify'],
+    phases: [
+      { title: 'Find', status: 'done' },
+      { title: 'Review', status: 'running' },
+    ],
+    agents: [
+      {
+        id: 1,
+        phase: 'Find',
+        status: 'done',
+        resultKind: 'ok',
+        outputShape: 'text',
+      },
+      { id: 2, phase: 'Find', status: 'done', resultKind: 'dead' },
+      { id: 3, phase: 'Review', status: 'running' },
+    ],
+  })
+  expect(mergePhases(r)).toEqual([
+    { title: 'Find', status: 'done', done: 2, total: 2 },
+    { title: 'Review', status: 'running', done: 0, total: 1 },
+    { title: 'Verify', status: 'pending', done: 0, total: 0 },
+  ])
+})
+
+test('mergePhases: actual but undeclared phase appended to the end', () => {
+  const r = run({
+    declaredPhases: ['Find'],
+    phases: [
+      { title: 'Find', status: 'done' },
+      { title: 'Adhoc', status: 'running' },
+    ],
+    agents: [],
+  })
+  expect(mergePhases(r).map(p => p.title)).toEqual(['Find', 'Adhoc'])
+})
+
+test('filterAgentsByPhase: All / undefined → all; specified → only that phase', () => {
+  const agents: AgentProgress[] = [
+    { id: 1, phase: 'A', status: 'running' },
+    {
+      id: 2,
+      phase: 'B',
+      status: 'done',
+      resultKind: 'ok',
+      outputShape: 'text',
+    },
+  ]
+  expect(filterAgentsByPhase(agents, undefined)).toHaveLength(2)
+  expect(filterAgentsByPhase(agents, ALL_PHASE)).toHaveLength(2)
+  expect(filterAgentsByPhase(agents, 'A')).toEqual([agents[0]])
+})
+
+test('tabLabel: workflow name + last 4 chars short code of runId', () => {
+  expect(tabLabel('review-changes', 'wf_abc123def')).toBe('review-changes#3def')
+})
--- a/src/workflow/tests/service.test.ts
+++ b/src/workflow/tests/service.test.ts
@@ -0,0 +1,594 @@
+import { expect, test } from 'bun:test'
+// DI pattern: do not use mock.module (process-global, last-write-wins, would pollute other tests in the same process such as
+// autonomy.test.ts). Instead hand-construct FAKE WorkflowPorts: registry.run returns a fixed ok
+// result, taskRegistrar maintains abort bindings, journalStore is an in-memory empty impl. The real runWorkflow
+// thus runs to completion without needing LLM or mocks.
+
+import { mkdtemp, rm, writeFile } from 'node:fs/promises'
+import { tmpdir } from 'node:os'
+import { join } from 'node:path'
+import { makeService, __resetWorkflowServiceForTests } from '../service.js'
+import { createProgressBus } from '../progress/bus.js'
+import {
+  createProgressStoreFromBus,
+  type RunProgress,
+} from '../progress/store.js'
+import type {
+  AgentRunResult,
+  ProgressEvent,
+  WorkflowPorts,
+} from '@claude-code-best/workflow-engine'
+
+// Construct FAKE ports: registry.run returns a fixed AgentRunResult, taskRegistrar has bindings,
+// journalStore is an in-memory empty impl. progressEmitter.emit → bus.emit (store subscribes to bus at construction).
+// Note: runWorkflow itself emits run_started/run_done; taskRegistrar only manages abort bindings,
+// does not re-emit events (avoids store reducer receiving duplicate run_done).
+type RegistrarCall =
+  | { kind: 'complete'; runId: string; summary?: string }
+  | { kind: 'fail'; runId: string; error?: string }
+  | { kind: 'kill'; runId: string }
+  | {
+      kind: 'registerAgentAbort'
+      runId: string
+      agentId: number
+      controller: AbortController
+    }
+  | { kind: 'unregisterAgentAbort'; runId: string; agentId: number }
+  | { kind: 'killAgent'; runId: string; agentId: number }
+
+function fakePorts(
+  opts: {
+    /** adapter.run throws (simulates agent backend crash). */
+    adapterThrow?: string
+    /** adapter.run return value (default ok). */
+    adapterResult?: AgentRunResult
+    /** agentRunner.runAgentToResult return value (fallback path, default throws). */
+    runnerResult?: AgentRunResult
+  } = {},
+): {
+  ports: WorkflowPorts
+  store: ReturnType<typeof createProgressStoreFromBus>
+  killed: string[]
+  /** taskRegistrar call records (complete/fail/kill/registerAgentAbort/...). */
+  calls: RegistrarCall[]
+  /** runId → (agentId → AbortController). Used by tests to simulate backend registration. */
+  agentBindings: Map<string, Map<number, AbortController>>
+  /** adapter.run call count (accumulates on retry). holder reference, tests read adapterCalls.value. */
+  adapterCallsRef: { value: number }
+} {
+  const bus = createProgressBus()
+  const store = createProgressStoreFromBus(bus)
+  const killed: string[] = []
+  const calls: RegistrarCall[] = []
+  const bindings = new Map<string, { abort: AbortController }>()
+  // agentId → AbortController (per runId). killAgent uses this to abort precisely.
+  const agentBindings = new Map<string, Map<number, AbortController>>()
+  // adapter.run call count (accumulates on retry). Use holder object to avoid closure/getter
+  // snapshot semantics issues in Bun test runner — when returning, shorthand takes the current value (=0),
+  // subsequent outer variable ++ does not reflect into the returned object field. holder reference is stable.
+  const adapterCallsRef = { value: 0 }
+  let seq = 0
+  const ports = {
+    // hostFactory is not actually called by the service.launch path (service builds its own host handle),
+    // but the WorkflowPorts type requires it to exist; keep a minimal impl.
+    hostFactory: () => ({
+      handle: {} as never,
+      cwd: '/tmp',
+      budgetTotal: null,
+      toolUseId: 'tu',
+    }),
+    agentAdapterRegistry: {
+      resolve: () => ({
+        id: 'claude-code',
+        capabilities: { structuredOutput: true },
+        run:
+          opts.adapterThrow !== undefined
+            ? async (): Promise<AgentRunResult> => {
+                adapterCallsRef.value++
+                throw new Error(opts.adapterThrow)
+              }
+            : async (): Promise<AgentRunResult> => {
+                adapterCallsRef.value++
+                return (
+                  opts.adapterResult ?? {
+                    kind: 'ok',
+                    output: 'mock-out',
+                    usage: { outputTokens: 1 },
+                  }
+                )
+              },
+      }),
+    },
+    agentRunner: {
+      runAgentToResult:
+        opts.runnerResult !== undefined
+          ? async () => opts.runnerResult
+          : async () => {
+              throw new Error('should not reach')
+            },
+    },
+    progressEmitter: {
+      emit: (e: ProgressEvent) => bus.emit(e),
+    },
+    taskRegistrar: {
+      register: ({ workflowName }: { workflowName: string }) => {
+        const abort = new AbortController()
+        seq += 1
+        const runId = `run-${seq}`
+        bindings.set(runId, { abort })
+        agentBindings.set(runId, new Map())
+        return { runId, signal: abort.signal }
+      },
+      complete: (runId: string, summary?: string) => {
+        calls.push({ kind: 'complete', runId, summary })
+      },
+      fail: (runId: string, error?: string) => {
+        calls.push({ kind: 'fail', runId, error })
+      },
+      kill: (runId: string) => {
+        killed.push(runId)
+        calls.push({ kind: 'kill', runId })
+        bindings.get(runId)?.abort.abort()
+      },
+      registerAgentAbort: (
+        runId: string,
+        agentId: number,
+        controller: AbortController,
+      ) => {
+        calls.push({
+          kind: 'registerAgentAbort',
+          runId,
+          agentId,
+          controller,
+        })
+        agentBindings.get(runId)?.set(agentId, controller)
+      },
+      unregisterAgentAbort: (runId: string, agentId: number) => {
+        calls.push({ kind: 'unregisterAgentAbort', runId, agentId })
+        agentBindings.get(runId)?.delete(agentId)
+      },
+      killAgent: (runId: string, agentId: number) => {
+        calls.push({ kind: 'killAgent', runId, agentId })
+        const ac = agentBindings.get(runId)?.get(agentId)
+        if (!ac) return false
+        ac.abort()
+        agentBindings.get(runId)!.delete(agentId)
+        return true
+      },
+      pendingAction: () => null,
+    },
+    journalStore: {
+      read: async () => [],
+      append: async () => {},
+      truncate: async () => {},
+    },
+    permissionGate: { isAborted: () => false },
+    logger: {
+      debug: () => {},
+      event: () => {},
+      warn: () => {},
+    },
+  } as unknown as WorkflowPorts
+  return { ports, store, killed, calls, agentBindings, adapterCallsRef }
+}
+
+const stubTUC = { agentId: 'a1', toolUseId: 'tu' } as never
+const stubCanUseTool = (() => Promise.resolve({ behavior: 'allow' })) as never
+
+/** Wait for detached runWorkflow to complete (detached call, need to drain microtasks/macrotasks). */
+async function settle(): Promise<void> {
+  await new Promise(r => setTimeout(r, 60))
+}
+
+test('launch → completed; store shows this run', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store } = fakePorts()
+  const svc = makeService(ports, store)
+  const { runId } = await svc.launch(
+    { script: `return agent('compute')` },
+    stubTUC,
+    stubCanUseTool,
+  )
+  await settle()
+  const r = svc.getRun(runId)
+  expect(r).toBeDefined()
+  // detached execution may still be running within the settle window, or already completed — both are acceptable.
+  expect(['completed', 'running']).toContain(r!.status)
+  expect(r!.workflowName).toBe('workflow')
+})
+
+test('launch inline script → returns scriptPath (persisted to cwdOverride dir)', async () => {
+  __resetWorkflowServiceForTests()
+  const dir = await mkdtemp(join(tmpdir(), 'wf-svc-'))
+  try {
+    const { ports, store } = fakePorts()
+    const svc = makeService(ports, store, dir)
+    const result = await svc.launch(
+      { script: `return agent('x')` },
+      stubTUC,
+      stubCanUseTool,
+    )
+    expect(result.scriptPath).toBe(
+      join(dir, '.claude', 'workflow-runs', 'run-1', 'script.js'),
+    )
+    const { readFile } = await import('node:fs/promises')
+    expect(await readFile(result.scriptPath!, 'utf-8')).toBe(
+      `return agent('x')`,
+    )
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('kill goes through taskRegistrar.kill', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store, killed } = fakePorts()
+  const svc = makeService(ports, store)
+  const { runId } = await svc.launch(
+    { script: `return agent('x')` },
+    stubTUC,
+    stubCanUseTool,
+  )
+  svc.kill(runId)
+  expect(killed).toContain(runId)
+})
+
+test('killAgent goes through taskRegistrar.killAgent: precisely aborts a single agent', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store, calls, agentBindings } = fakePorts()
+  const svc = makeService(ports, store)
+  const { runId } = await svc.launch(
+    { script: `return agent('x')` },
+    stubTUC,
+    stubCanUseTool,
+  )
+  // simulate backend registering AbortController when launching agent
+  const ac = new AbortController()
+  agentBindings.get(runId)!.set(7, ac)
+  // service.killAgent routes to taskRegistrar.killAgent, which actually aborts the corresponding controller
+  expect(svc.killAgent(runId, 7)).toBe(true)
+  expect(ac.signal.aborted).toBe(true)
+  expect(
+    calls.some(
+      c => c.kind === 'killAgent' && c.runId === runId && c.agentId === 7,
+    ),
+  ).toBe(true)
+  // after abort controller is deleted from Map: calling killAgent on same agent again returns false (idempotent)
+  expect(svc.killAgent(runId, 7)).toBe(false)
+  // unknown agentId / unknown runId safe returns false
+  expect(svc.killAgent(runId, 999)).toBe(false)
+  expect(svc.killAgent('nope', 1)).toBe(false)
+})
+
+test('listRuns/subscribe come from store', () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store } = fakePorts()
+  const svc = makeService(ports, store)
+  expect(svc.listRuns()).toEqual([])
+  let n = 0
+  const unsub = svc.subscribe(() => {
+    n++
+  })
+  expect(typeof unsub).toBe('function')
+  unsub()
+  expect(n).toBe(0)
+})
+
+test('listNamed delegates to namedWorkflows (empty dir → []; with files → lists)', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store } = fakePorts()
+  const svc = makeService(ports, store)
+  // non-existent dir → []
+  const empty = await svc.listNamed(
+    join(tmpdir(), `wf-nope-${Math.random().toString(36).slice(2)}`),
+  )
+  expect(empty).toEqual([])
+  // dir with named files → lists names (extension stripped, sorted)
+  const dir = await mkdtemp(join(tmpdir(), 'wf-named-'))
+  try {
+    await writeFile(
+      join(dir, 'a.ts'),
+      'export const meta = { name: "a", description: "d" }\nreturn 1',
+    )
+    await writeFile(join(dir, 'b.js'), 'return 2')
+    const names = await svc.listNamed(dir)
+    expect(names).toEqual(['a', 'b'])
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('missing script/name/scriptPath → throws', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store } = fakePorts()
+  const svc = makeService(ports, store)
+  await expect(svc.launch({}, stubTUC, stubCanUseTool)).rejects.toThrow(
+    /script|name|scriptPath/,
+  )
+})
+
+test('scriptPath reads file content and validates', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store } = fakePorts()
+  const svc = makeService(ports, store)
+  const dir = await mkdtemp(join(tmpdir(), 'wf-path-'))
+  const file = join(dir, 's.ts')
+  try {
+    await writeFile(file, `return agent('from-file')`)
+    const { runId } = await svc.launch(
+      { scriptPath: file },
+      stubTUC,
+      stubCanUseTool,
+    )
+    await settle()
+    const r = svc.getRun(runId)
+    expect(r).toBeDefined()
+    expect(['completed', 'running']).toContain(r!.status)
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('parseScript validation failed → launch throws', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store } = fakePorts()
+  const svc = makeService(ports, store)
+  // trigger ScriptError: meta literal missing description (validateMeta requires both name+description to be strings)
+  await expect(
+    svc.launch(
+      { script: `export const meta = { name: "x" }\nreturn 1` },
+      stubTUC,
+      stubCanUseTool,
+    ),
+  ).rejects.toThrow(/Script validation failed/i)
+})
+
+// ---- Service-layer failure routing coverage (review gap: .then/.catch → taskRegistrar path) ----
+
+test('script run throws → service routes to taskRegistrar.fail, with error text', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store, calls } = fakePorts()
+  const svc = makeService(ports, store)
+  await svc.launch(
+    { script: `throw new Error('script boom')` },
+    stubTUC,
+    stubCanUseTool,
+  )
+  await settle()
+  const fail = calls.find(c => c.kind === 'fail')
+  expect(fail).toBeDefined()
+  expect(fail?.kind === 'fail' && fail.error).toMatch(/script boom/)
+})
+
+test('adapter throws → retry still throws → degrade to dead → workflow completed (not fail)', async () => {
+  __resetWorkflowServiceForTests()
+  // new semantics: agent non-abort throw → retry once → still throws → degrade to dead (agent returns null),
+  // workflow continues and completes. Retry tolerates transient failures (429/network), but a permanently
+  // broken agent does not break through the entire workflow (consistent with parallel/pipeline null-on-error contract).
+  const { ports, store, calls, adapterCallsRef } = fakePorts({
+    adapterThrow: 'adapter boom',
+  })
+  const svc = makeService(ports, store)
+  await svc.launch({ script: `return agent('x')` }, stubTUC, stubCanUseTool)
+  await settle()
+  // retry once → adapter called 2 times
+  expect(adapterCallsRef.value).toBe(2)
+  // workflow normal completed, not failed
+  const complete = calls.find(c => c.kind === 'complete')
+  expect(complete).toBeDefined()
+  const fail = calls.find(c => c.kind === 'fail')
+  expect(fail).toBeUndefined()
+})
+
+test('script completes normally → service routes to taskRegistrar.complete', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store, calls } = fakePorts()
+  const svc = makeService(ports, store)
+  await svc.launch({ script: `return agent('x')` }, stubTUC, stubCanUseTool)
+  await settle()
+  expect(calls.some(c => c.kind === 'complete')).toBe(true)
+})
+
+// ---- Fix N: shutdown cleanup ----
+
+test('shutdown kills all running runs (taskRegistrar.kill called for each)', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store, killed } = fakePorts()
+  // make adapter slower, so during settle the run is still running
+  const slowPorts = {
+    ...ports,
+    agentAdapterRegistry: {
+      resolve: () => ({
+        id: 'claude-code',
+        capabilities: { structuredOutput: true },
+        run: async (): Promise<AgentRunResult> => {
+          await new Promise(r => setTimeout(r, 200))
+          return { kind: 'ok', output: 'slow', usage: { outputTokens: 1 } }
+        },
+      }),
+    },
+  } as unknown as typeof ports
+  const slowSvc = makeService(slowPorts, store)
+  const { runId: a } = await slowSvc.launch(
+    { script: `return agent('a')` },
+    stubTUC,
+    stubCanUseTool,
+  )
+  const { runId: b } = await slowSvc.launch(
+    { script: `return agent('b')` },
+    stubTUC,
+    stubCanUseTool,
+  )
+  killed.length = 0
+  slowSvc.shutdown()
+  expect(killed).toContain(a)
+  expect(killed).toContain(b)
+})
+
+test('shutdown does not re-kill completed runs; idempotent (multiple calls safe)', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store, killed } = fakePorts()
+  const svc = makeService(ports, store)
+  const { runId } = await svc.launch(
+    { script: `return agent('x')` },
+    stubTUC,
+    stubCanUseTool,
+  )
+  await settle() // complete
+  killed.length = 0
+  svc.shutdown()
+  // already completed should not be killed again
+  expect(killed).not.toContain(runId)
+  // idempotent
+  expect(() => svc.shutdown()).not.toThrow()
+})
+
+// ---- Task 5: loadPersistedRuns + getRunAsync fallback ----
+// runsDirProvider is injected as makeService's fourth optional parameter with tmpdir, to avoid writing to the real project dir
+// (Bun ESM module namespace is read-only, cannot monkey-patch getRunsDir).
+
+test('loadPersistedRuns scans disk to hydrate historical runs; existing in-memory runs are not overwritten', async () => {
+  __resetWorkflowServiceForTests()
+  const dir = await mkdtemp(join(tmpdir(), 'wf-svc-'))
+  try {
+    // disk first has two historical runs
+    const { writeRunState } = await import('../persistence.js')
+    const historicalA = {
+      runId: 'hA',
+      workflowName: 'old-A',
+      status: 'completed',
+      phases: [],
+      declaredPhases: [],
+      currentPhase: null,
+      agents: [],
+      agentCount: 1,
+      returnValue: 'a',
+      startedAt: 10,
+      updatedAt: 20,
+    } as RunProgress
+    const historicalB = {
+      runId: 'hB',
+      workflowName: 'old-B',
+      status: 'failed',
+      phases: [],
+      declaredPhases: [],
+      currentPhase: null,
+      agents: [],
+      agentCount: 2,
+      error: 'x',
+      startedAt: 30,
+      updatedAt: 40,
+    } as RunProgress
+    await writeRunState(dir, historicalA)
+    await writeRunState(dir, historicalB)
+
+    const { ports, store } = fakePorts()
+    // in-memory first has one current-session run (via ports.progressEmitter.emit through bus → store)
+    ports.progressEmitter.emit({
+      type: 'run_started',
+      runId: 'live',
+      workflowName: 'live-w',
+      meta: null,
+    })
+    const svc = makeService(ports, store, undefined, () => dir)
+
+    await svc.loadPersistedRuns()
+
+    const ids = svc.listRuns().map(r => r.runId)
+    expect(ids).toContain('hA')
+    expect(ids).toContain('hB')
+    expect(ids).toContain('live')
+    // memory first: live is still running (not overwritten by disk; disk has no live so no STALE injected)
+    expect(svc.getRun('live')!.status).toBe('running')
+    expect(svc.getRun('hA')!.returnValue).toBe('a')
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('loadPersistedRuns repeated calls scan disk only once (persistedLoaded flag)', async () => {
+  __resetWorkflowServiceForTests()
+  const dir = await mkdtemp(join(tmpdir(), 'wf-svc-'))
+  try {
+    const { ports, store } = fakePorts()
+    const svc = makeService(ports, store, undefined, () => dir)
+
+    await svc.loadPersistedRuns()
+    await svc.loadPersistedRuns()
+    await svc.loadPersistedRuns()
+
+    // repeated calls do not throw, do not change listRuns result (empty dir)
+    expect(svc.listRuns()).toEqual([])
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('getRunAsync memory hit → no disk read', async () => {
+  __resetWorkflowServiceForTests()
+  const dir = await mkdtemp(join(tmpdir(), 'wf-svc-'))
+  try {
+    const { ports, store } = fakePorts()
+    const svc = makeService(ports, store, undefined, () => dir)
+    ports.progressEmitter.emit({
+      type: 'run_started',
+      runId: 'live',
+      workflowName: 'w',
+      meta: null,
+    })
+
+    const got = await svc.getRunAsync('live')
+    expect(got?.runId).toBe('live')
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('getRunAsync memory miss + disk hit → returns disk value, and does not inject into memory (subsequent get still reads disk)', async () => {
+  __resetWorkflowServiceForTests()
+  const dir = await mkdtemp(join(tmpdir(), 'wf-svc-'))
+  try {
+    const { writeRunState } = await import('../persistence.js')
+    const historical = {
+      runId: 'hist-only',
+      workflowName: 'old',
+      status: 'completed',
+      phases: [],
+      declaredPhases: [],
+      currentPhase: null,
+      agents: [],
+      agentCount: 0,
+      returnValue: { x: 1 },
+      startedAt: 1,
+      updatedAt: 2,
+    } as RunProgress
+    await writeRunState(dir, historical)
+
+    const { ports, store } = fakePorts()
+    const svc = makeService(ports, store, undefined, () => dir)
+
+    const got = await svc.getRunAsync('hist-only')
+    expect(got?.returnValue).toEqual({ x: 1 })
+    // not injected into memory: in-memory list does not contain (not hydrated)
+    expect(svc.listRuns().map(r => r.runId)).not.toContain('hist-only')
+    // subsequent get still returns (each goes through readRunState fallback)
+    const got2 = await svc.getRunAsync('hist-only')
+    expect(got2?.returnValue).toEqual({ x: 1 })
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('getRunAsync memory miss + disk miss → undefined', async () => {
+  __resetWorkflowServiceForTests()
+  const dir = await mkdtemp(join(tmpdir(), 'wf-svc-'))
+  try {
+    const { ports, store } = fakePorts()
+    const svc = makeService(ports, store, undefined, () => dir)
+
+    const got = await svc.getRunAsync('no-such-run')
+    expect(got).toBeUndefined()
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
--- a/src/workflow/tests/status.test.ts
+++ b/src/workflow/tests/status.test.ts
@@ -0,0 +1,88 @@
+import { expect, test } from 'bun:test'
+import type { AgentProgress, RunProgress } from '../progress/store.js'
+import {
+  STATUS_DOT,
+  RUN_STATUS_COLOR,
+  RUN_STATUS_TEXT,
+  PHASE_MARK,
+  PHASE_COLOR,
+  agentVisual,
+  formatTokenCount,
+  agentMetaText,
+} from '../panel/status.js'
+
+test('STATUS_DOT / RUN_STATUS_COLOR / RUN_STATUS_TEXT cover four run states', () => {
+  const statuses: RunProgress['status'][] = [
+    'running',
+    'completed',
+    'failed',
+    'killed',
+  ]
+  for (const s of statuses) {
+    expect(STATUS_DOT[s].length).toBeGreaterThan(0)
+    expect(RUN_STATUS_COLOR[s]).toBeTruthy()
+    expect(RUN_STATUS_TEXT[s].length).toBeGreaterThan(0)
+  }
+  expect(STATUS_DOT.running).toBe('●')
+  expect(STATUS_DOT.completed).toBe('✓')
+  expect(STATUS_DOT.failed).toBe('✗')
+  expect(STATUS_DOT.killed).toBe('■')
+  expect(RUN_STATUS_TEXT.completed).toBe('done')
+  expect(RUN_STATUS_TEXT.running).toBe('running')
+})
+
+test('PHASE_MARK / PHASE_COLOR cover running/done/pending', () => {
+  expect(PHASE_MARK.running).toBe('●')
+  expect(PHASE_MARK.done).toBe('✓')
+  expect(PHASE_MARK.pending).toBe('○')
+  expect(PHASE_COLOR.pending).toBe('subtle')
+})
+
+test('agentVisual: running → ● warning', () => {
+  const a: AgentProgress = { id: 1, status: 'running' }
+  expect(agentVisual(a)).toEqual({ mark: '●', color: 'warning' })
+})
+
+test('agentVisual: done·ok → ✓ success (no longer carries outputShape suffix)', () => {
+  const a: AgentProgress = {
+    id: 1,
+    status: 'done',
+    resultKind: 'ok',
+    outputShape: 'object',
+  }
+  expect(agentVisual(a)).toEqual({ mark: '✓', color: 'success' })
+})
+
+test('agentVisual: dead → ✗ error', () => {
+  const a: AgentProgress = { id: 1, status: 'done', resultKind: 'dead' }
+  expect(agentVisual(a)).toEqual({ mark: '✗', color: 'error' })
+})
+
+test('formatTokenCount: <1000 original value, ≥1000 keeps 1 decimal + k', () => {
+  expect(formatTokenCount(undefined)).toBe('0')
+  expect(formatTokenCount(0)).toBe('0')
+  expect(formatTokenCount(42)).toBe('42')
+  expect(formatTokenCount(1000)).toBe('1.0k')
+  expect(formatTokenCount(22900)).toBe('22.9k')
+})
+
+test('agentMetaText: model · Nk tok · N tool', () => {
+  const a: AgentProgress = {
+    id: 1,
+    status: 'done',
+    model: 'glm-5.2',
+    tokenCount: 22900,
+    toolCount: 1,
+  }
+  expect(agentMetaText(a)).toBe('glm-5.2 · 22.9k tok · 1 tool')
+})
+
+test('agentMetaText: omits prefix when no model', () => {
+  const a: AgentProgress = {
+    id: 1,
+    status: 'running',
+    tokenCount: 500,
+    toolCount: 2,
+  }
+  expect(agentMetaText(a)).toBe('500 tok · 2 tool')
+})
--- a/src/workflow/tests/useWorkflowKeyboard.test.ts
+++ b/src/workflow/tests/useWorkflowKeyboard.test.ts
@@ -0,0 +1,45 @@
+import { expect, test } from 'bun:test'
+import { routeWorkflowKey } from '../panel/useWorkflowKeyboard.js'
+
+test('Tab → nextTab；Shift+Tab → prevTab', () => {
+  expect(routeWorkflowKey('', { tab: true })).toBe('nextTab')
+  expect(routeWorkflowKey('', { tab: true, shift: true })).toBe('prevTab')
+})
+
+test('q / Esc → quit', () => {
+  expect(routeWorkflowKey('q', {})).toBe('quit')
+  expect(routeWorkflowKey('', { escape: true })).toBe('quit')
+})
+
+test('x → killAgent；K → killWorkflow；r → resume；n → newRun', () => {
+  expect(routeWorkflowKey('x', {})).toBe('killAgent')
+  expect(routeWorkflowKey('K', {})).toBe('killWorkflow')
+  expect(routeWorkflowKey('r', {})).toBe('resume')
+  expect(routeWorkflowKey('n', {})).toBe('newRun')
+})
+
+test('confirm mode: y/Enter → confirmYes; n/Esc/q → confirmNo; other keys → null', () => {
+  expect(routeWorkflowKey('y', {}, 'confirm')).toBe('confirmYes')
+  expect(routeWorkflowKey('Y', {}, 'confirm')).toBe('confirmYes')
+  expect(routeWorkflowKey('', { return: true }, 'confirm')).toBe('confirmYes')
+  expect(routeWorkflowKey('n', {}, 'confirm')).toBe('confirmNo')
+  expect(routeWorkflowKey('N', {}, 'confirm')).toBe('confirmNo')
+  expect(routeWorkflowKey('', { escape: true }, 'confirm')).toBe('confirmNo')
+  expect(routeWorkflowKey('q', {}, 'confirm')).toBe('confirmNo')
+  // confirm mode swallows navigation/edit keys, preventing accidental triggers
+  expect(routeWorkflowKey('x', {}, 'confirm')).toBeNull()
+  expect(routeWorkflowKey('', { tab: true }, 'confirm')).toBeNull()
+  expect(routeWorkflowKey('', { upArrow: true }, 'confirm')).toBeNull()
+})
+
+test('←/→ switch focus column; ↑/↓ move within column', () => {
+  expect(routeWorkflowKey('', { leftArrow: true })).toBe('focusLeft')
+  expect(routeWorkflowKey('', { rightArrow: true })).toBe('focusRight')
+  expect(routeWorkflowKey('', { upArrow: true })).toBe('moveUp')
+  expect(routeWorkflowKey('', { downArrow: true })).toBe('moveDown')
+})
+
+test('unrelated input → null', () => {
+  expect(routeWorkflowKey('z', {})).toBeNull()
+  expect(routeWorkflowKey('', {})).toBeNull()
+})