feat: dynamic-workflow 来了 (#1271)

* feat(workflow): add workflow engine, /workflows panel, /ultracode skill 将 feat/sdk-backend 分支中 workflow 相关的 20 个 commit 压缩为单 commit： - 工作流引擎核心：phase / agent / parallel / pipeline 编排原语（packages/workflow-engine/） - /workflows 面板：三区焦点布局（顶部 run tabs + 左侧 phase 侧栏 + 右侧 agent 列表） - /ultracode skill：多 agent workflow 编排入口 - 进度存储 / journal / notification 系统 - WorkflowService 生命周期管理 + SentryErrorBoundary - 脚本沙箱：禁用 dynamic import()、JSON args 防御性归一化 - journal 与 named-workflow 路径统一在 projectRoot - 错误处理：parallel/pipeline hooks 错误日志、failure routing、semaphore abort - workflow 工具升级为 core 工具 + PascalCase 命名 Co-Authored-By: glm-5.1 <zai-org@claude-code-best.win> * feat(workflow): 复刻 ultracode 手册并修复 worktree/inline/opt-in 三处缺口围绕 ultracode skill 审查 agent 系统一致性后： - ultracode.ts: 用系统提示版完整 Workflow 编排手册替换中文精简版 - HIGH#1 isolation:'worktree': claudeCodeBackend.run() 用 createAgentWorktree + runWithCwdOverride 包裹 runAgent + finally 清理实现真正的 cwd 隔离；slug 用 sha256(runId:agentId) 派生以匹配 cleanupStaleAgentWorktrees 清理正则（修 runId 为 w+base36 非 UUID 导致的泄漏盲区）；worktree.ts 注释同步修正 - HIGH#2 inline 持久化: 新增 persistInlineScript，WorkflowTool + service 两条 inline 路径对称持久化到 .claude/workflow-runs/<runId>/script.js，返回可复用 scriptPath（闭环 inline→编辑→scriptPath 重提迭代循环） - HIGH#3 opt-in 分工: ultracode/WorkflowTool/effort 注明 session reminder 由 harness 注入，repo 内无 ultracode 信号，保持 feature('WORKFLOW_SCRIPTS') + isEnabled 两层 gate，不自造注入 - 测试: 新增 persistInline.test.ts；扩展 claudeCodeBackend(isolation 4 用例)/ WorkflowTool(inline)/service(scriptPath)/ultracode(harness) 含配套 workflow engine/panel 完善与 run-state-persistence design doc。 Co-Authored-By: Claude <noreply@anthropic.com> * feat(workflow): run 终态落盘 state.json 支持跨重启恢复终态 RunProgress（含 returnValue/error）此前只在内存 ProgressStore，进程重启即丢失。本次让其落盘到 .claude/workflow-runs/<runId>/state.json，使 (a) 重启后可按 runId 取 return、(b) /workflows 面板跨重启展示历史 run。跨进程 resume 明确不在范围。 - persistence.ts: getRunsDir/writeRunState/readRunState/listPersistedRuns + attachRunStatePersistence；原子覆盖写（tmp+rename），读容错（缺文件/ 损坏/schemaVersion 不符 → null），写 best-effort（IO 失败只 log warn） - progress/store.ts: 加 hydrate(run) 直接注入磁盘 run（已存在 runId 跳过，内存优先） - service.ts: getWorkflowService() 接线 attachRunStatePersistence(bus, store) 订阅 run_done（completed/failed/killed 三态共用，shutdown-kill 也走同路径，无需额外钩子）；WorkflowService 加 getRunAsync(id) 内存 miss→读盘 fallback（不注入内存）+ loadPersistedRuns() 扫盘 hydrate （persistedLoaded flag 守护幂等） - panel/WorkflowsPanel.tsx: mount 时调一次 loadPersistedRuns（重 mount 不重复） - ports.ts: runsDir 改用 getRunsDir() 消除拼接重复 - 测试: persistence.test.ts(11)/runStatePersistence.test.ts(5)/ progressStore(2)/service(5)/WorkflowsPanel(1) 共 24 个新测试； precheck 5629 pass / 0 fail 设计偏离: 计划原写 monkey-patch getRunsDir 指向 tmpdir，Bun ESM namespace 不可变不可行；改用可选 runsDirProvider 参数（默认 getRunsDir）DI 注入，加到 attachRunStatePersistence 与 makeService（cwdOverride 之后第 4 参），与现有 cwdOverride 模式一致。makeService 的 cwdOverride 保持不变，不破坏 inline 持久化特性。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(workflow): 默认并发降为 3 并支持 per-run maxConcurrency 注入 - DEFAULT_MAX_CONCURRENCY=3 替代旧的 min(16, cores-2)；MAX_CONCURRENCY_CAP=16 保留为用户输入的绝对上限 - 新增 clampMaxConcurrency() 处理 undefined/<1/>CAP 边界 - WorkflowInput schema 新增 maxConcurrency: number.int().min(1).max(16).optional() - 引擎层 context/runWorkflow 全链路透传：semaphore 容量来自 per-run 入参 - WorkflowTool prompt 增加指引：fan-out 场景先用 AskUserQuestion 与用户确认并发再启动 - 同步 ultracode skill + audit workflow spec 的并发文字（删 cpu-cores 公式） - 同步 docs/features/workflow-scripts.md 旧公式 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(workflow): 面板 UI 字符串英文化 WorkflowsPanel 中 4 处面向用户的中文（onDone 错误消息、键位提示行）改为英文；其他面板组件（AgentList/TabsBar）原本已是英文。代码注释保留中文，与 workflow 模块惯例一致。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(workflow): 中断系统（x 杀单 agent / K 杀整个 workflow，Dialog 二次确认） - claudeCodeBackend 桥接 ctx.signal → runAgent.override.abortController（修 'x' 无效根因：abort 到不了内部 fetch） - AbortError 识别为 throw WorkflowAbortedError（不再吞成 dead，workflow 能感知被 kill） - ports.taskRegistrar 加 registerAgentAbort/unregisterAgentAbort/killAgent；service.killAgent(runId, agentId) 精确中断 - 面板键位：'x' 杀当前 agent（agents 列聚焦时） / 'K' 杀整个 workflow；Dialog 二次确认 + confirm 模式吞导航键防误触 - 新增测试 8 项（backend signal bridge / hooks inject / ports killAgent / service killAgent） Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * docs(workflow): ultracode skill 加 model tier 选择指引（haiku/sonnet/opus/best 场景匹配）补足 agent() 已有 model 参数缺的判断依据：列出 4 个 tier 的成本/延迟量级和典型场景，明确"无法 articulate 为什么换 tier 就 omit"的 rule of thumb。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(workflow): maxConcurrency≠3 必须先 AskUserQuestion（默认 3 推荐值）把 fan-out 时才问改成任何 maxConcurrency≠3 都必须问。唯一例外：用户在当前会话已明确说过并发数（"use 6" / "maxConcurrency 9"）。 prompt (WorkflowTool.ts) + skill (ultracode.ts) + audit spec 三处同步。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(workflow): agent 失败自动重试一次（dead 或非 abort throw） - hooks.agent 包装 invokeBackend：第一次 dead 或非 abort throw → 重试一次 - WorkflowAbortedError（kill）不重试——是用户意图 - registry.resolve 配置错（AdapterNotFoundError 等）在 try 外直接上抛，不走重试—— 配置问题重试无意义且掩盖 bug - 重试仍失败：dead 保持 dead；throw 降级 dead（不击穿 workflow，与 parallel/pipeline null-on-error 契约一致） - budget 不重复扣：dead 不 addOutputTokens，重试 ok 才扣一次 - 新增 7 项 hooks 层重试测试 + 1 项 service 层降级测试 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(workflow): 面板 label 截断保留 #数字后缀（同 dim 多 finding 可区分） audit workflow 用 verify:\${dim}#\${findingIdx} 命名 verify agent。旧逻辑 slice(0, 18) 从右切把 #idx 全吃了——同 dimension 多 finding 肉眼无法区分。新逻辑：含 #数字后缀时保留后缀，前缀截断 + … 省略号。例：verify:correctness#0 → verify:correctn…#0 verify:architecture#15 → verify:archite…#15 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(workflow): kill 整个 workflow 后立即回主 chat run_done→store→notifications.ts 的通知路径已有，但 confirmYes 后面板继续挂着挡住主 chat，用户看不到"已停止"反馈。kill 后调 onDone() 立即退出面板，让主 chat 的 `Workflow "<name>" was stopped` 通知直接可见。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(workflow): agent dead 带 reason/detail + prompt 加压 StructuredOutput 12 agent audit workflow 8 个 dead，journal 只记 {kind:"dead"} 无信息，事后无法区分 "agent 没产 StructuredOutput" vs "runAgent 抛错"。证据指向主因：sonnet 长 tool chain 后忘记调 StructuredOutput， extractStructuredOutput 返回 null 即降级 dead。 - types.ts: AgentRunResult.dead 加可选 reason/detail 字段（no-structured-output / runagent-threw / worktree-failed / unknown）兼容旧 journal（均 optional）。 - claudeCodeBackend.ts: 三处 dead 填 reason + detail； no-structured-output 把 finalized 文本前 200 字符做 detail，让日志/面板能立刻看到 agent 最后说了什么。 - claudeCodeBackend.ts: schema 模式 prompt 首尾各放一次 StructuredOutput 强制要求，针对 sonnet 长 tool chain 后忘记收尾。 - hooks.ts: retry 日志带 reason；retry 仍 throw 时降级 dead 也填 reason=runagent-threw + detail。 - types.test.ts: 加 reason JSON 往返 + 旧 journal 兼容测试。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(workflow): schema 模式弃用 StructuredOutput 工具契约，改鲁棒 JSON 文本解析上一轮 70a2f76 把"agent 长 tool chain 后忘调 StructuredOutput"当作死因，加 prompt 头尾双强制。但实测跑 5 个 review agent 4 个 dead，detail 全是 "StructuredOutput tool is not available as a deferred tool"——根因是该工具从未注入 workflow sub-agent 的工具集（assembleToolPool 默认池不含，只有 stop_hook 路径 execAgentHook.ts 显式 createStructuredOutputTool()）。 prompt 反复要求调一个不可达的工具，agent 困扰、长篇辩解、最终没产 JSON。 - claudeCodeBackend.ts: - extractStructuredOutput 重写：括号栈扫描替代 indexOf/lastIndexOf，处理嵌套对象、字符串内的括号、转义符；新增 fenced code block 优先路径（```json / ```），多 JSON 块取第一个 parse 成功的；只返回 plain object（拒 array/number/string/null）。不做语法修复（尾逗号/单引号/注释）——避免在字符串内误改（如 "http://" 被 // 注释正则吃）。 - schema 模式 prompt 简化：删首尾双 STRUCTURED OUTPUT 强制（600+ token），改成指示 agent 在最后文本块 emit raw JSON；明确告知"StructuredOutput is not available in this environment"，消除调用幻觉。 - hooks.ts: detail.slice 用 typeof === 'string' 守卫；catch 块用 e instanceof Error ? e.message : String(e)（旧 journal / 第三方 adapter 可能写非 string detail，直接 .slice 会抛 TypeError 击穿日志）。 - claudeCodeBackend.test.ts: +9 测试覆盖 fenced / 嵌套 / 字符串内括号 / 转义引号 / 多块取首 / 类型守卫 / 损坏 JSON。 precheck: 5663 pass / 0 fail。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * docs(effort): 新增 /effort 交互面板设计 spec 设计要点： - /effort 无参 → 横向 slider 面板（low/medium/high/xhigh/max/ultracode） - ←/→ 移动光标，Enter 确认，Esc 取消 - ultracode 仅视觉占位，确认后提示走 /ultracode <context> - env override 时双标记 + 顶部警告 - 模型不支持时面板禁用 - 两阶段交付：先基础面板 commit，再做 ultracode 波纹动画 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * docs(effort): 新增 EffortPanel 基础面板实施计划（第一阶段）按 TDD 分 6 个 task：纯函数状态 → keybinding 注册 → 组件 → 命令挂载 → 分支测试 → precheck。波纹动画在第二阶段单独 commit。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * docs(effort): plan 补 q/ctrl+c 取消绑定，对齐 spec §5 状态机 verifier 抓到的 gap：spec §5 写明 Esc / Ctrl+C / q 都是取消事件，但 plan Task 2.3 只绑了 escape。补上 q 和 ctrl+c → effortPanel:cancel。同时把 Step 2.2 直接写成 6 个 action 版本（home/end），删除迂回表达。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * docs(effort): plan 修订执行前 review 发现的 5 处 gap - Task 3.3 EffortPanel.tsx 草稿：Faster/Smarter padEnd 语法错乱重写； useKeybindings import 路径从 @anthropic/ink 修正为 ../../keybindings/useKeybinding.js；移除冗余 renderSeparatorLine；保留 renderPaddedLine - Task 5.2 computeConfirmOutcome 改为注入 ApplyFn 模式：避免 effortPanelState → effort.tsx → EffortPanel 循环依赖；测试可注入 mockApply，无需 mock settings - Step 5.3 测试代码对齐注入版签名 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(effort): 新增 EffortPanel 纯函数状态模块（PanelPosition + 移动/初始光标）仅含纯函数与类型，无 React/Ink 依赖，便于单测。 - PANEL_POSITIONS：low → medium → high → xhigh → max → ultracode - moveLeft/moveRight：边界钳制（low 不再左移、ultracode 不再右移） - getInitialCursor：env override > displayed level Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(keybindings): 注册 EffortPanel context 与 6 个 action 绑定 ←/→/h/l/home/end/enter/escape/q/ctrl+c 到 effortPanel:* action。与 ModelPicker context 范式一致，避免左右键被全局 keybinding 拦截。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(effort): 实现 EffortPanel 组件主体（渲染 + 键盘交互 + 确认/取消分支） - 横向 slider 布局：Faster ↔ Smarter 两极，6 档刻度 - useKeybindings 注册 EffortPanel context（←/→/h/l/home/end/enter/escape/q/ctrl+c） - Enter 在 5 档之一 → 调 executeEffort 写 settings + AppState - Enter 在 ultracode → 输出引导文案，不写状态 - Esc/q → "Effort unchanged." - env override 时顶部黄色警告 - computeConfirmOutcome 注入 ApplyFn，便于测试（Task 5 补测试） Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(effort): /effort 无参时挂载 EffortPanel 交互面板 - 无参 → <EffortPanelWrapper> 透传 AppState.effortValue - current/status → 仍显示文本（不变） - 有参 → 直跳 executeEffort（不变） - help/-h/--help → 不变 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * test(effort): 补 computeConfirmOutcome 分支测试（注入 mockApply） - ultracode → kind=ultracode-hint，不调 applyFn - low → kind=apply，message/effortUpdate 来自 applyFn - applyFn 返回无 effortUpdate 时 outcome.effortUpdate 为 undefined - CANCEL_MESSAGE / ULTRACODE_HINT 常量 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(effort): 测试里 cursor cast 为 EffortValue，避免 PanelPosition 含 ultracode 触发 TS 错误 computeConfirmOutcome 的 ApplyFn 契约要求 EffortValue，但测试 mockApply 接收 PanelPosition。实际运行时 computeConfirmOutcome 在 ultracode 档位走 hint 分支不会调 applyFn， cast 安全。precheck 全量通过：5688 tests / 0 fail。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(effort): 面板对齐与配色修复 - 对齐：用 Box width={SEGMENT} + justifyContent="center" 让 ▲ 与档位名严格居中对齐，替代之前 string padEnd(11) 与 SEGMENT=12 不一致导致的 1 列偏移 - 配色：所有面板文字改用 theme.claude（Claude Orange rgb(215,119,87)），替代终端默认紫；分隔线/副标签/底栏用 theme.subtle；env 警告用 theme.warning - 光标档位的档位名也加粗，强化视觉焦点 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(effort): 面板文字改紫色，ULTRACODE_HINT 英文化 - 颜色：theme.claude（橙）→ theme.purple_FOR_SUBAGENTS_ONLY（Purple 600, rgb(147,51,234)），覆盖标题、Faster/Smarter、▲、档位名 - ULTRACODE_HINT：中文 → 英文 "ultracode is not an effort level. Use /ultracode <context> to start a multi-agent workflow." Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(effort): 统一用色版——选中 suggestion（蓝），未选中 subtle（灰）弃用 purple_FOR_SUBAGENTS_ONLY（subagent 专用）。改与项目其他面板一致： - 选中档位 + ▲：color="suggestion"（Medium blue rgb(87,105,247)）+ bold - 未选中档位 + 空 ▲ 占位：color="subtle"（Light gray rgb(175,175,175)） - 标题 / Faster / Smarter：color="suggestion" - 分隔线 / 副标签 / 底栏：color="subtle" Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(workflow): 终态前补发 phase_done，面板自动退出 running→terminal 转换 runWorkflow：脚本结束时 hook.phase 不会触发最后一个 phase 的 phase_done， UI 左栏会永远显示 running。三路径（completed/killed/failed）统一在 run_done 之前补发 emitTerminalPhaseDone。 WorkflowsPanel：抽 isRunTerminatedTransition 纯函数判定 running → terminal，面板 useEffect 检测到转换后自动退出聚焦。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(effort): 波纹动画纯函数 pickChar/computeRippleLine/mergeLayers + 18 测试 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(effort): useRippleFrame hook 包装 useAnimationFrame，按需订阅时钟 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(effort): EffortPanel 集成波纹背景——cursor 停在 ultracode 时切换波纹模式仅在 cursor === 'ultracode' 时启用 useRippleFrame，渲染 5 行波纹背景 + overlay 文字（Faster/Smarter、分隔线、▲、档位名、副标签）。其余档位保持原 PlainContent 渲染路径不动。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * refactor(effort): 波纹动画从字符密度改为颜色渐变按原版风格把波纹背景从 INTENSITY_CHARS 密度字符（'·∙░▒▓'）改为 suggestion 系颜色渐变（transparent → 暗深紫蓝 → suggestion → 高光）： rippleAnimation.ts: - 删除 pickChar / INTENSITY_CHARS / WAVE_PEAK_CHARS / mergeLayers - 新增 intensityToColor(intensity) → 'transparent' | '#xxxxxx' - 新增 computeRippleCells 返回 Cell[]（每位置 char+color） - 新增 applyOverlaysToCells(cells, overlays) 替代 mergeLayers - 新增 cellsToSegments(cells) 合并相邻同色段（减少 Text 节点） EffortPanel.tsx: - RippleContent 用 cells→segments→tokens 渲染 - 空格段用 BaseText backgroundColor 染色块（纯色块视觉） - 文字段用 Text color 染色（亮色突出） - tokens 按空格/文字二次拆分，避免混合段渲染歧义测试: 29 个 rippleAnimation 测试覆盖 intensityToColor 边界、 computeRippleCells 长度/震源/衰减、applyOverlaysToCells 覆盖/截断/ 防御式拷贝、cellsToSegments 合并逻辑。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(effort): 波纹参数调优——铺满左侧 + 速度调慢 + 全面板有底色用户反馈三个问题： 1. "低峰部分没有颜色变化" → intensity ≤ 0.1 返回 transparent 导致波谷位置看不见。改为永不返回 transparent，最低档 #0a0d1a 作为面板底色（暗紫黑海洋），波峰在底色上流动。 2. "波浪速度太快" → time 系数 0.012 → 0.004（约 1/3 速）。波峰移动速度从 34 cell/s 降到 11 cell/s，每帧颜色变化从 45% 降到 36%。 3. "波浪只到中间部分，没覆盖左侧" → falloff 覆盖半径 40 → 90。震源 x=65，左侧 dist=65 < 90，波纹可达最左端（约 30-50% 覆盖）。色阶调整： - 删除 transparent 档，新增 #0a0d1a 作最暗档（底色） - 最高档从 #8aa0ff（高光）改为 #5769F7（suggestion），避免与文字 overlay 同色互相吞噬 - 7 档颜色：#0a0d1a → #15182b → #1f2543 → #2a3360 → #3a4582 → #4a5bb0 → #5769F7 测试：删除 transparent 期望，改为期望具体颜色（#0a0d1a 等）。新增"覆盖半径扩大"测试验证 dist=65 仍有非最暗颜色。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * fix(effort): 波纹 v3 — 去黑边 + 删中心高频涟漪 + y 轴覆盖快捷键行用户反馈三个问题： 1. "黑色边感觉不太对" — 最暗档 #0a0d1a (rgb 10,13,26) 太接近纯黑，远端波谷看起来像硬黑边。改为 #1a1f3a (rgb 26,31,58)，紫蓝感更强而非纯黑。 2. "中心的快速波纹有点奇怪" — 删除震源附近 dist<6 的高频涟漪叠加 (time*0.02，5 倍主波纹频率)。原本想让震源附近"水波感"更强，实际效果像"快速闪烁"反而突兀。主波纹已经足够，无需叠加。 3. "y 方向覆盖快捷键" — RippleContent 新增 y=2 行渲染快捷键 overlay ("←/→ adjust · Enter confirm · Esc cancel")。PlainContent 路径保持原 Box marginTop=1 + Text 渲染。色阶调整（紫蓝感更强）： - #1a1f3a (原 #0a0d1a) — 最暗档 - #1f2543 / #252c55 / #2e3870 / #3a4582 / #4a5bb0 / #5769F7 (中间档略调亮度，保持平滑过渡) 测试：震源点测试更新为"time=0 时波谷最暗，time 推进后扫过波峰变亮"，反映删除高频涟漪后的纯主波纹行为。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * chore(workflow): 工作流相关代码中文文案全部英文化源码（src/workflow/ + packages/workflow-engine/src/）的中文注释、用户可见错误消息、字符串字面量；测试文件的标题与注释；同步 6 条硬编码断言到英文化后的错误消息。 Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * feat(effort): 波纹 v4 — 平滑波 + 全色环旋转 + 淡入淡出 + 宽度自适应 - 波函数改 (sin+1)/2：消除 max(0,sin) 平直暗带（约 6 行宽） - 主色相连续旋转（0.03°/ms，12s/圈全色环）：蓝→紫→品红→红→橙→黄→绿→青 - 文字 overlay 同步色相旋转（rotateHue 应用到 Faster/▲/档位名/分隔线/副标签） - 淡入淡出动画：fadeColor/fadeCells + fade 状态机 ~300ms 进出过渡 - 副标签固定 ultracode 段下方，不跟随光标移动 - 顶部/底部各加一行纯波纹行，视觉一致 - 宽度自适应终端列数：窄则 72，宽则铺满（computeSegment/computeRippleSourceX） - 快捷键改 plain Text，不参与波纹背景渲染 - 新增 18 测试（fadeColor/fadeCells/rotateHue/getHueShiftAtTime） Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win> * refactor: remove CYBER_RISK_MITIGATION_REMINDER from FileReadTool Co-Authored-By: deepseek-v4-pro <deepseek-ai@claude-code-best.win> * fix: prevent ReDoS in extractMeta regex by anchoring to splice boundary Co-Authored-By: deepseek-v4-pro <deepseek-ai@claude-code-best.win> * chore: 更新脚本 --------- Co-authored-by: glm-5.1 <zai-org@claude-code-best.win> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: deepseek-v4-pro <deepseek-ai@claude-code-best.win>
2026-06-17 22:05:50 +00:00 · 2026-06-14 18:13:49 +08:00
parent 3e3e1de81b
commit 58ee6419b1
130 changed files with 23347 additions and 885 deletions
--- a/src/workflow/WorkflowPermissionRequest.tsx
+++ b/src/workflow/WorkflowPermissionRequest.tsx
@@ -0,0 +1,145 @@
+import React, { useCallback, useMemo } from 'react';
+import { Box, Text, useTheme } from '@anthropic/ink';
+import { getTheme, type Theme } from 'src/utils/theme.js';
+import { env } from 'src/utils/env.js';
+import { shouldShowAlwaysAllowOptions } from 'src/utils/permissions/permissionsLoader.js';
+import { logUnaryEvent } from 'src/utils/unaryLogging.js';
+import { PermissionDialog } from 'src/components/permissions/PermissionDialog.js';
+import { PermissionPrompt, type PermissionPromptOption } from 'src/components/permissions/PermissionPrompt.js';
+import type { PermissionRequestProps } from 'src/components/permissions/PermissionRequest.js';
+import { PermissionRuleExplanation } from 'src/components/permissions/PermissionRuleExplanation.js';
+
+type OptionValue = 'yes' | 'yes-dont-ask-again' | 'no';
+
+/**
+ * Permission request UI for the WorkflowTool. Asks the user to confirm
+ * executing a workflow script.
+ * Follows the MonitorPermissionRequest / FallbackPermissionRequest pattern.
+ */
+export function WorkflowPermissionRequest({
+  toolUseConfirm,
+  onDone,
+  onReject,
+  workerBadge,
+}: PermissionRequestProps): React.ReactNode {
+  const [themeName] = useTheme();
+  const theme = getTheme(themeName);
+
+  const input = toolUseConfirm.input as {
+    workflow: string;
+    args?: string;
+  };
+
+  const showAlwaysAllowOptions = useMemo(() => shouldShowAlwaysAllowOptions(), []);
+
+  const options: PermissionPromptOption<OptionValue>[] = useMemo(() => {
+    const opts: PermissionPromptOption<OptionValue>[] = [
+      {
+        label: 'Yes',
+        value: 'yes',
+        feedbackConfig: { type: 'accept' as const },
+      },
+    ];
+    if (showAlwaysAllowOptions) {
+      opts.push({
+        label: (
+          <Text>
+            Yes, and don{'\u2019'}t ask again for <Text bold>{toolUseConfirm.tool.name}</Text> commands
+          </Text>
+        ),
+        value: 'yes-dont-ask-again',
+      });
+    }
+    opts.push({
+      label: 'No',
+      value: 'no',
+      feedbackConfig: { type: 'reject' as const },
+    });
+    return opts;
+  }, [showAlwaysAllowOptions, toolUseConfirm.tool.name]);
+
+  const handleSelect = useCallback(
+    (value: OptionValue, feedback?: string) => {
+      switch (value) {
+        case 'yes':
+          logUnaryEvent({
+            completion_type: 'tool_use_single',
+            event: 'accept',
+            metadata: {
+              language_name: 'none',
+              message_id: toolUseConfirm.assistantMessage.message.id ?? '',
+              platform: env.platform,
+            },
+          });
+          toolUseConfirm.onAllow(toolUseConfirm.input, [], feedback);
+          onDone();
+          break;
+        case 'yes-dont-ask-again':
+          logUnaryEvent({
+            completion_type: 'tool_use_single',
+            event: 'accept',
+            metadata: {
+              language_name: 'none',
+              message_id: toolUseConfirm.assistantMessage.message.id ?? '',
+              platform: env.platform,
+            },
+          });
+          toolUseConfirm.onAllow(toolUseConfirm.input, [
+            {
+              type: 'addRules',
+              rules: [{ toolName: toolUseConfirm.tool.name }],
+              behavior: 'allow',
+              destination: 'localSettings',
+            },
+          ]);
+          onDone();
+          break;
+        case 'no':
+          logUnaryEvent({
+            completion_type: 'tool_use_single',
+            event: 'reject',
+            metadata: {
+              language_name: 'none',
+              message_id: toolUseConfirm.assistantMessage.message.id ?? '',
+              platform: env.platform,
+            },
+          });
+          toolUseConfirm.onReject(feedback);
+          onReject();
+          onDone();
+          break;
+      }
+    },
+    [toolUseConfirm, onDone, onReject],
+  );
+
+  const handleCancel = useCallback(() => {
+    logUnaryEvent({
+      completion_type: 'tool_use_single',
+      event: 'reject',
+      metadata: {
+        language_name: 'none',
+        message_id: toolUseConfirm.assistantMessage.message.id ?? '',
+        platform: env.platform,
+      },
+    });
+    toolUseConfirm.onReject();
+    onReject();
+    onDone();
+  }, [toolUseConfirm, onDone, onReject]);
+
+  return (
+    <PermissionDialog title="Workflow" workerBadge={workerBadge}>
+      <Box flexDirection="column" gap={1}>
+        <Box flexDirection="column">
+          <Text bold color={theme.permission as keyof Theme}>
+            Execute workflow: {input.workflow}
+          </Text>
+          {input.args && <Text dimColor>Arguments: {input.args}</Text>}
+        </Box>
+        <PermissionRuleExplanation permissionResult={toolUseConfirm.permissionResult} toolType="command" />
+        <PermissionPrompt<OptionValue> options={options} onSelect={handleSelect} onCancel={handleCancel} />
+      </Box>
+    </PermissionDialog>
+  );
+}
--- a/src/workflow/tests/WorkflowsPanel.test.tsx
+++ b/src/workflow/tests/WorkflowsPanel.test.tsx
@@ -0,0 +1,197 @@
+import { expect, test } from 'bun:test';
+import { PassThrough } from 'node:stream';
+import React from 'react';
+import { wrappedRender as render } from '@anthropic/ink';
+import { SentryErrorBoundary } from '../../components/SentryErrorBoundary.js';
+import type { RunProgress } from '../progress/store.js';
+import { call as panelCall } from '../panel/panelCall.js';
+import { clampSelected, isRunTerminatedTransition, WorkflowsPanel } from '../panel/WorkflowsPanel.js';
+import { truncateLabel } from '../panel/AgentList.js';
+import { STATUS_DOT } from '../panel/status.js';
+import { __resetWorkflowServiceForTests, getWorkflowService } from '../service.js';
+
+// Pure function: clamp selection to valid range (same source as clampSelected inside the panel).
+test('clampSelected: empty list → 0; out of bounds → last; negative/NaN → 0; normal → original', () => {
+  expect(clampSelected(5, 0)).toBe(0);
+  expect(clampSelected(5, 3)).toBe(2);
+  expect(clampSelected(-3, 3)).toBe(0);
+  expect(clampSelected(1, 3)).toBe(1);
+  expect(clampSelected(0, 1)).toBe(0);
+  // NaN (e.g. uninitialized state) safely falls back to 0
+  expect(clampSelected(Number.NaN, 3)).toBe(0);
+});
+
+// truncateLabel: short label as-is; with `#number` suffix keep suffix, truncate prefix + ellipsis;
+// without suffix, cut from the right. Lets audit workflow's verify:${dim}#${idx} multi-finding still be distinguishable.
+test('truncateLabel: short label as-is; with #number suffix keep suffix and truncate prefix; without suffix cut from right', () => {
+  // short label as-is
+  expect(truncateLabel('agent-1', 18)).toBe('agent-1');
+  expect(truncateLabel('review:bugs', 18)).toBe('review:bugs');
+  // exactly max length (boundary)
+  expect(truncateLabel('review:correctness', 18)).toBe('review:correctness');
+  // over max + with #number suffix: keep suffix, truncate prefix + ellipsis
+  expect(truncateLabel('verify:correctness#0', 18)).toBe('verify:correctn…#0');
+  expect(truncateLabel('verify:architecture#15', 18)).toBe('verify:archite…#15');
+  // multi-digit #idx also distinguishable
+  expect(truncateLabel('verify:correctness#2', 18)).toBe('verify:correctn…#2');
+  // without #number suffix: cut from right (legacy behavior)
+  expect(truncateLabel('a-very-long-label-no-suffix', 18)).toBe('a-very-long-label-');
+});
+
+// STATUS_DOT covers four states, all visible dot characters.
+test('STATUS_DOT covers running/completed/failed/killed and is non-empty character', () => {
+  const statuses = ['running', 'completed', 'failed', 'killed'] as const;
+  for (const s of statuses) {
+    expect(STATUS_DOT[s]).toBeTruthy();
+    expect(STATUS_DOT[s].length).toBeGreaterThan(0);
+  }
+});
+
+// Progress data shape contract: fields read by the panel exist/are readable on a typical RunProgress,
+// preventing silent panel render breakage from store.ts structural drift.
+test('RunProgress field contract: keys read by panel all exist', () => {
+  const run: RunProgress = {
+    runId: 'r1',
+    workflowName: 'review',
+    status: 'running',
+    phases: [{ title: 'Find', status: 'done' }],
+    declaredPhases: ['Find', 'Review'],
+    currentPhase: 'Review',
+    agents: [{ id: 1, label: 'review:api', phase: 'Review', status: 'running' }],
+    agentCount: 1,
+    startedAt: 1,
+    updatedAt: 1,
+  };
+  // paths read by panel WorkflowList/Detail
+  expect(run.status).toBe('running');
+  expect(STATUS_DOT[run.status]).toBe('●');
+  expect(run.currentPhase).toBe('Review');
+  expect(run.agents.length).toBe(run.agentCount);
+  expect(run.phases[0]?.title).toBe('Find');
+  expect(run.phases[0]?.status).toBe('done');
+  expect(run.agents[0]?.label).toBe('review:api');
+});
+
+// Completed/failed shape: returnValue / error only shown when not running.
+test('RunProgress completed/failed shape: returnValue/error optional', () => {
+  const completed: RunProgress = {
+    runId: 'r2',
+    workflowName: 'w',
+    status: 'completed',
+    phases: [],
+    declaredPhases: [],
+    currentPhase: null,
+    agents: [],
+    agentCount: 0,
+    returnValue: 'ok',
+    startedAt: 2,
+    updatedAt: 2,
+  };
+  const failed: RunProgress = {
+    runId: 'r3',
+    workflowName: 'w',
+    status: 'failed',
+    phases: [],
+    declaredPhases: [],
+    currentPhase: null,
+    agents: [],
+    agentCount: 0,
+    error: 'boom',
+    startedAt: 3,
+    updatedAt: 3,
+  };
+  expect(completed.returnValue).toBe('ok');
+  expect(completed.error).toBeUndefined();
+  expect(failed.error).toBe('boom');
+  expect(failed.returnValue).toBeUndefined();
+  expect(STATUS_DOT['completed']).toBe('✓');
+  expect(STATUS_DOT['failed']).toBe('✗');
+});
+
+// Fix M: useSyncExternalStore / listNamed / child component throwing should not break through REPL.
+// panelCall must wrap WorkflowsPanel in SentryErrorBoundary.
+test('panelCall wraps WorkflowsPanel in SentryErrorBoundary (fix M regression)', async () => {
+  const element = (await (panelCall as unknown as (a: unknown, b: unknown, c: unknown) => Promise<React.ReactNode>)(
+    () => {},
+    { canUseTool: undefined },
+    '',
+  )) as React.ReactElement<{ name?: string; children: React.ReactNode }>;
+  expect(element.type).toBe(SentryErrorBoundary);
+  expect(element.props.name).toBe('WorkflowsPanel');
+  const child = element.props.children as React.ReactElement<{
+    onDone: () => void;
+  }>;
+  expect(child.type).toBe(WorkflowsPanel);
+  expect(React.isValidElement(child)).toBe(true);
+  expect(typeof child.props.onDone).toBe('function');
+});
+
+// ---- Task 6: panel mount triggers loadPersistedRuns once ----
+// Verify that WorkflowsPanel mount calls svc.loadPersistedRuns() exactly once.
+// The persistedLoaded flag inside service guards idempotency; re-render / re-mount does not repeat the call.
+// Use a spy to replace the singleton's loadPersistedRuns, render to a PassThrough stream, wait for useEffect to trigger.
+
+test('WorkflowsPanel mount triggers loadPersistedRuns once', async () => {
+  __resetWorkflowServiceForTests();
+  const svc = getWorkflowService();
+  let calls = 0;
+  const orig = svc.loadPersistedRuns.bind(svc);
+  svc.loadPersistedRuns = async () => {
+    calls++;
+  };
+
+  const stdout = new PassThrough();
+  // consume data to avoid buffer overflow (render writes multiple frames)
+  stdout.on('data', () => {});
+  let instance: { unmount: () => void; waitUntilExit: () => Promise<void> } | undefined;
+  try {
+    instance = await render(
+      React.createElement(WorkflowsPanel, {
+        onDone: () => {},
+        context: { canUseTool: undefined } as never,
+      }),
+      { stdout: stdout as unknown as NodeJS.WriteStream, patchConsole: false },
+    );
+    // after mount useEffect triggers asynchronously; wait a tick for React commit + effect to complete
+    await new Promise(r => setTimeout(r, 30));
+
+    expect(calls).toBe(1);
+  } finally {
+    instance?.unmount();
+    svc.loadPersistedRuns = orig;
+    __resetWorkflowServiceForTests();
+  }
+});
+
+// When the focused run transitions from running to terminal, the panel auto onDone() (800ms delay lets the user see the terminal state).
+// Only same-runId state transitions trigger: switching to a completed tab does not exit; opening history panel does not exit either.
+// Transition detection logic is extracted into the isRunTerminatedTransition pure function for offline unit testing (Ink test mode does not
+// auto-pump concurrent state updates, integration tests are unreliable).
+test('isRunTerminatedTransition: same runId running → terminal triggers; other cases do not trigger', () => {
+  const running = { runId: 'r1', status: 'running' as const };
+  const completed = { runId: 'r1', status: 'completed' as const };
+  const failed = { runId: 'r1', status: 'failed' as const };
+  const killed = { runId: 'r1', status: 'killed' as const };
+
+  // same run running → terminal: all three terminal states trigger
+  expect(isRunTerminatedTransition(running, completed)).toBe(true);
+  expect(isRunTerminatedTransition(running, failed)).toBe(true);
+  expect(isRunTerminatedTransition(running, killed)).toBe(true);
+
+  // prev=null (open history panel): does not trigger
+  expect(isRunTerminatedTransition(null, completed)).toBe(false);
+  // curr=null (runs cleared): does not trigger
+  expect(isRunTerminatedTransition(running, null)).toBe(false);
+
+  // different runId (switch tab): does not trigger
+  expect(isRunTerminatedTransition({ runId: 'r1', status: 'running' }, { runId: 'r2', status: 'completed' })).toBe(
+    false,
+  );
+
+  // same run but prev not running (already terminal and re-rendered): does not trigger
+  expect(isRunTerminatedTransition(completed, completed)).toBe(false);
+  expect(isRunTerminatedTransition(killed, completed)).toBe(false);
+
+  // same run running → running (no change): does not trigger
+  expect(isRunTerminatedTransition(running, running)).toBe(false);
+});
--- a/src/workflow/tests/claudeCodeBackend.test.ts
+++ b/src/workflow/tests/claudeCodeBackend.test.ts
@@ -0,0 +1,398 @@
+import { expect, test, mock } from 'bun:test'
+
+// Note: mock specifier must resolve to the same module that impl actually imports (bun mock.module
+// matches by resolved module). impl uses '@claude-code-best/builtin-tools/...' and 'src/*' alias
+// path imports, so the same specifier is used here.
+mock.module(
+  '@claude-code-best/builtin-tools/tools/AgentTool/runAgent.js',
+  () => ({
+    runAgent: async function* () {
+      yield {
+        type: 'assistant',
+        message: { content: [{ type: 'text', text: 'agent-text' }] },
+      }
+    },
+  }),
+)
+mock.module(
+  '@claude-code-best/builtin-tools/tools/AgentTool/agentToolUtils.js',
+  () => ({
+    finalizeAgentTool: () => ({
+      content: [{ type: 'text', text: 'agent-text' }],
+      usage: { output_tokens: 42 },
+      totalTokens: 42,
+      totalToolUseCount: 3,
+    }),
+  }),
+)
+mock.module(
+  '@claude-code-best/builtin-tools/tools/AgentTool/loadAgentsDir.js',
+  () => ({
+    isBuiltInAgent: () => true,
+  }),
+)
+mock.module('src/tools.js', () => ({ assembleToolPool: () => ({ tools: [] }) }))
+mock.module('src/utils/messages.js', () => ({
+  // Return a shape that satisfies UserMessage consumers process-wide.
+  // Bun's mock.module is process-global (last-write-wins), so an incomplete
+  // mock here corrupts every later test that imports the real createUserMessage
+  // (e.g. bridgeMessaging.test.ts's `type !== 'user'` early-exit, or
+  // processSlashCommand.test.ts's `message.content` access). Mirror the real
+  // shape from src/utils/messages.ts: type + message envelope + passthrough.
+  createUserMessage: (
+    o: {
+      content: string
+    } & Record<string, unknown>,
+  ) => ({
+    type: 'user' as const,
+    message: { role: 'user', content: o.content },
+    ...o,
+  }),
+  extractTextContent: () => 'agent-text',
+}))
+mock.module('src/utils/uuid.js', () => ({ createAgentId: () => 'agent-1' }))
+mock.module('src/services/analytics/index.js', () => ({ logEvent: () => {} }))
+mock.module('src/utils/debug.js', () => ({ logForDebugging: () => {} }))
+
+// isolation:'worktree' tests: mock worktree trio (to avoid actually running git worktree add).
+// Note mock.module is process-global; worktreeState is defined outside the factory for test reset.
+// Do not mock cwd.js: runWithCwdOverride actually running AsyncLocalStorage is harmless to mocked runAgent,
+// and avoids polluting other tests in the same process that depend on pwd/getCwd.
+const worktreeState = {
+  shouldThrow: false,
+  hasChanges: false,
+  created: [] as string[],
+  removed: [] as string[],
+  changesCalls: 0,
+}
+mock.module('src/utils/worktree.js', () => ({
+  createAgentWorktree: async (slug: string) => {
+    if (worktreeState.shouldThrow) throw new Error('wt boom')
+    worktreeState.created.push(slug)
+    return {
+      worktreePath: '/fake/wt',
+      worktreeBranch: 'wt-branch',
+      headCommit: 'abc123',
+      gitRoot: '/fake',
+      hookBased: false,
+    }
+  },
+  hasWorktreeChanges: async () => {
+    worktreeState.changesCalls++
+    return worktreeState.hasChanges
+  },
+  removeAgentWorktree: async (path: string) => {
+    worktreeState.removed.push(path)
+    return true
+  },
+}))
+
+import { WorkflowAbortedError } from '@claude-code-best/workflow-engine'
+import {
+  claudeCodeBackend,
+  resolveAgentDefinition,
+  mapWorkflowModel,
+  extractStructuredOutput,
+  WORKFLOW_AGENT,
+} from '../backends/claudeCodeBackend.js'
+import { makeHostHandle } from '../hostHandle.js'
+
+function ctx() {
+  return {
+    host: makeHostHandle({
+      toolUseContext: {
+        options: {
+          agentDefinitions: { activeAgents: [] },
+          querySource: 'workflow',
+          mainLoopModel: 'm',
+        },
+        getAppState: () => ({
+          toolPermissionContext: {
+            mode: 'acceptEdits',
+            alwaysAllowRules: {},
+          },
+          mcp: { tools: [] },
+        }),
+      } as never,
+      canUseTool: (() => Promise.resolve({ behavior: 'allow' })) as never,
+      // run() does not read parentMessage; use an empty object placeholder to satisfy the WorkflowHostBundle type.
+      parentMessage: {} as never,
+    }),
+    signal: new AbortController().signal,
+    runId: 'r1',
+    agentId: 1,
+  }
+}
+
+test('text agent → ok + token/tool/model accounting', async () => {
+  const res = await claudeCodeBackend.run({ prompt: 'do it' }, ctx())
+  expect(res.kind).toBe('ok')
+  if (res.kind === 'ok') {
+    expect(res.output).toBe('agent-text')
+    expect(res.usage.outputTokens).toBe(42)
+    // panel display fields: tokenCount(=totalTokens) / toolCount / model (fallback mainLoopModel 'm')
+    expect(res.tokenCount).toBe(42)
+    expect(res.toolCount).toBe(3)
+    expect(res.model).toBe('m')
+  }
+})
+
+test('isolation:worktree → create worktree + auto-cleanup on no changes; slug matches cleanup regex', async () => {
+  worktreeState.shouldThrow = false
+  worktreeState.hasChanges = false
+  worktreeState.created = []
+  worktreeState.removed = []
+  worktreeState.changesCalls = 0
+  const res = await claudeCodeBackend.run(
+    { prompt: 'do', isolation: 'worktree' },
+    ctx(),
+  )
+  expect(res.kind).toBe('ok')
+  expect(worktreeState.created).toHaveLength(1)
+  // slug must match cleanupStaleAgentWorktrees cleanup regex ^wf_[0-9a-f]{8}-[0-9a-f]{3}-\d+$
+  expect(worktreeState.created[0]).toMatch(/^wf_[0-9a-f]{8}-[0-9a-f]{3}-\d+$/)
+  expect(worktreeState.changesCalls).toBe(1)
+  expect(worktreeState.removed).toHaveLength(1) // no changes → auto-remove
+})
+
+test('isolation:worktree has changes → keep worktree (no remove)', async () => {
+  worktreeState.hasChanges = true
+  worktreeState.created = []
+  worktreeState.removed = []
+  worktreeState.changesCalls = 0
+  const res = await claudeCodeBackend.run(
+    { prompt: 'do', isolation: 'worktree' },
+    ctx(),
+  )
+  expect(res.kind).toBe('ok')
+  expect(worktreeState.removed).toHaveLength(0) // has changes → keep
+  expect(worktreeState.changesCalls).toBe(1)
+})
+
+test('isolation:worktree creation fails → fail-closed returns dead (does not silently degrade to shared cwd)', async () => {
+  worktreeState.shouldThrow = true
+  const res = await claudeCodeBackend.run(
+    { prompt: 'do', isolation: 'worktree' },
+    ctx(),
+  )
+  expect(res.kind).toBe('dead')
+  worktreeState.shouldThrow = false
+})
+
+test('no isolation → no worktree created', async () => {
+  worktreeState.created = []
+  const res = await claudeCodeBackend.run({ prompt: 'do' }, ctx())
+  expect(res.kind).toBe('ok')
+  expect(worktreeState.created).toHaveLength(0)
+})
+
+test('runAgent throws → dead', async () => {
+  // override mock so runAgent throws (last-write-wins)
+  mock.module(
+    '@claude-code-best/builtin-tools/tools/AgentTool/runAgent.js',
+    () => ({
+      // biome-ignore lint/correctness/useYield: intentionally throws to test dead branch (no yield)
+      runAgent: async function* () {
+        throw new Error('boom')
+      },
+    }),
+  )
+  const res = await claudeCodeBackend.run({ prompt: 'fail' }, ctx())
+  expect(res.kind).toBe('dead')
+})
+
+// The next three groups of tests cover the 'x' invalid fix: backend must bridge ctx.signal to runAgent.override
+// .abortController, and recognize AbortError as abort (throw WorkflowAbortedError, not swallow as dead).
+// Also verify registerAgentAbort injection so service.kill(runId, agentId) can precisely abort a single agent.
+
+test('ctx.signal pre-abort → backend bridge: override.abortController.signal.aborted=true', async () => {
+  // use capturedOverride to expose the agentAbort created by backend (the override.abortController received by mock)
+  let capturedController: AbortController | undefined
+  mock.module(
+    '@claude-code-best/builtin-tools/tools/AgentTool/runAgent.js',
+    () => ({
+      runAgent: async function* (opts: {
+        override?: { abortController?: AbortController }
+      }) {
+        capturedController = opts.override?.abortController
+        yield {
+          type: 'assistant',
+          message: { content: [{ type: 'text', text: 'x' }] },
+        }
+      },
+    }),
+  )
+  const parentAbort = new AbortController()
+  parentAbort.abort()
+  // mock does not throw → backend takes the normal return path; but the bridge `if (ctx.signal.aborted) agentAbort.abort()`
+  // has already triggered synchronously, capturedController.signal.aborted must be true (root cause of kill bridge)
+  await claudeCodeBackend.run(
+    { prompt: 'pre-aborted' },
+    { ...ctx(), signal: parentAbort.signal },
+  )
+  expect(capturedController?.signal.aborted).toBe(true)
+})
+
+test('runAgent throws AbortError → backend throws WorkflowAbortedError (not swallowed as dead)', async () => {
+  mock.module(
+    '@claude-code-best/builtin-tools/tools/AgentTool/runAgent.js',
+    () => ({
+      // biome-ignore lint/correctness/useYield: intentionally throws AbortError to test recognition branch
+      runAgent: async function* () {
+        const e = new Error('aborted by parent')
+        e.name = 'AbortError'
+        throw e
+      },
+    }),
+  )
+  await expect(
+    claudeCodeBackend.run({ prompt: 'abort' }, ctx()),
+  ).rejects.toBeInstanceOf(WorkflowAbortedError)
+})
+
+test('registerAgentAbort/unregisterAgentAbort injection: key=ctx.agentId (number), controller from bridge', async () => {
+  // restore default mock (previous test changed it to throw AbortError)
+  mock.module(
+    '@claude-code-best/builtin-tools/tools/AgentTool/runAgent.js',
+    () => ({
+      runAgent: async function* () {
+        yield {
+          type: 'assistant',
+          message: { content: [{ type: 'text', text: 'agent-text' }] },
+        }
+      },
+    }),
+  )
+  const registered: Array<{ id: number; controller: AbortController }> = []
+  const unregistered: number[] = []
+  await claudeCodeBackend.run(
+    { prompt: 'wiring' },
+    {
+      ...ctx(),
+      agentId: 42,
+      registerAgentAbort: (id, ac) => registered.push({ id, controller: ac }),
+      unregisterAgentAbort: id => unregistered.push(id),
+    },
+  )
+  expect(registered).toHaveLength(1)
+  expect(registered[0]?.id).toBe(42) // engine numeric agentId (not coreAgentId string)
+  expect(registered[0]?.controller).toBeInstanceOf(AbortController)
+  expect(unregistered).toEqual([42]) // finally cleanup idempotent
+})
+
+test('id and capabilities shape', () => {
+  expect(claudeCodeBackend.id).toBe('claude-code')
+  expect(claudeCodeBackend.capabilities.structuredOutput).toBe(true)
+  expect(claudeCodeBackend.capabilities.tools).toBe(true)
+})
+
+test('resolveAgentDefinition: no agentType → WORKFLOW_AGENT fallback', () => {
+  const tuc = {
+    options: { agentDefinitions: { activeAgents: [] } },
+  } as never
+  expect(resolveAgentDefinition(undefined, tuc)).toBe(WORKFLOW_AGENT)
+})
+
+test('resolveAgentDefinition: hits activeAgents', () => {
+  const fake = { agentType: 'Explore', permissionMode: 'plan' } as never
+  const tuc = {
+    options: { agentDefinitions: { activeAgents: [fake] } },
+  } as never
+  expect(resolveAgentDefinition('Explore', tuc)).toBe(fake)
+  // miss still falls back
+  expect(resolveAgentDefinition('Nope', tuc)).toBe(WORKFLOW_AGENT)
+})
+
+test('mapWorkflowModel passthrough', () => {
+  expect(mapWorkflowModel(undefined)).toBeUndefined()
+  expect(mapWorkflowModel('claude-haiku-*')).toBe('claude-haiku-*')
+})
+
+test('extractStructuredOutput: valid JSON extracted; invalid returns null', () => {
+  expect(
+    extractStructuredOutput([
+      { type: 'text', text: 'prefix {"a":1,"b":2} suffix' },
+    ]),
+  ).toEqual({ a: 1, b: 2 })
+  expect(
+    extractStructuredOutput([{ type: 'text', text: 'no json here' }]),
+  ).toBeNull()
+  expect(extractStructuredOutput([])).toBeNull()
+})
+
+test('extractStructuredOutput: fenced code block (strip fence + strip language tag)', () => {
+  expect(
+    extractStructuredOutput([
+      {
+        type: 'text',
+        text: 'Here are the findings:\n```json\n{"findings":[{"title":"x"}]}\n```\nDone.',
+      },
+    ]),
+  ).toEqual({ findings: [{ title: 'x' }] })
+  // no language tag
+  expect(
+    extractStructuredOutput([{ type: 'text', text: '```\n{"a":1}\n```' }]),
+  ).toEqual({ a: 1 })
+})
+
+test('extractStructuredOutput: nested object (bracket-balanced scan; legacy indexOf/lastIndexOf would cross-block concat)', () => {
+  const text = 'Result: {"outer":{"inner":{"deep":true}},"n":3} trailing'
+  expect(extractStructuredOutput([{ type: 'text', text }])).toEqual({
+    outer: { inner: { deep: true } },
+    n: 3,
+  })
+})
+
+test('extractStructuredOutput: brackets inside strings are not counted as pairing', () => {
+  // } inside a string does not zero out depth, scan can skip to the real pairing }
+  const text = '{"note":"this } char is in a string","ok":true}'
+  expect(extractStructuredOutput([{ type: 'text', text }])).toEqual({
+    note: 'this } char is in a string',
+    ok: true,
+  })
+})
+
+test('extractStructuredOutput: escaped quotes do not break string boundary', () => {
+  const text = '{"escaped":"he said \\"hi\\"","n":1}'
+  expect(extractStructuredOutput([{ type: 'text', text }])).toEqual({
+    escaped: 'he said "hi"',
+    n: 1,
+  })
+})
+
+test('extractStructuredOutput: multiple JSON blocks → return first parse success', () => {
+  // first one unbalanced (no pairing }), skip to the second
+  const text = 'broken { stuff\n{"real":1}\n{"ignored":2}'
+  expect(extractStructuredOutput([{ type: 'text', text }])).toEqual({ real: 1 })
+})
+
+test('extractStructuredOutput: array / number / string / null do not count as object', () => {
+  expect(
+    extractStructuredOutput([{ type: 'text', text: '[1,2,3]' }]),
+  ).toBeNull()
+  expect(extractStructuredOutput([{ type: 'text', text: '42' }])).toBeNull()
+  expect(
+    extractStructuredOutput([{ type: 'text', text: '"raw string"' }]),
+  ).toBeNull()
+  expect(extractStructuredOutput([{ type: 'text', text: 'null' }])).toBeNull()
+})
+
+test('extractStructuredOutput: multiple text blocks → cross-block find first success', () => {
+  expect(
+    extractStructuredOutput([
+      { type: 'text', text: 'no json' },
+      { type: 'text', text: '```json\n{"k":"v"}\n```' },
+    ]),
+  ).toEqual({ k: 'v' })
+})
+
+test('extractStructuredOutput: broken JSON returns null (does not throw)', () => {
+  expect(
+    extractStructuredOutput([
+      { type: 'text', text: '{broken: missing quotes}' },
+    ]),
+  ).toBeNull()
+  expect(
+    extractStructuredOutput([{ type: 'text', text: '{"a":1,}' }]), // trailing comma — no syntax repair
+  ).toBeNull()
+})
--- a/src/workflow/tests/notifications.test.ts
+++ b/src/workflow/tests/notifications.test.ts
@@ -0,0 +1,176 @@
+import { describe, expect, test } from 'bun:test'
+import type { RunProgress } from '../progress/store.js'
+import type { WorkflowService } from '../service.js'
+
+function makeMockService(runs: RunProgress[]): {
+  service: WorkflowService
+  emit: () => void
+  setRuns: (runs: RunProgress[]) => void
+} {
+  let current = runs
+  const listeners = new Set<() => void>()
+  return {
+    service: {
+      ports: {},
+      launch: async () => ({ runId: 'x' }),
+      kill: () => {},
+      listRuns: () => current,
+      getRun: () => undefined,
+      subscribe: (fn: () => void) => {
+        listeners.add(fn)
+        return () => {
+          listeners.delete(fn)
+        }
+      },
+      listNamed: async () => [],
+    } as unknown as WorkflowService,
+    emit: () => {
+      for (const fn of listeners) fn()
+    },
+    setRuns: r => {
+      current = r
+    },
+  }
+}
+
+function makeRun(
+  runId: string,
+  status: RunProgress['status'],
+  overrides: Partial<RunProgress> = {},
+): RunProgress {
+  return {
+    runId,
+    workflowName: 'wf',
+    status,
+    phases: [],
+    declaredPhases: [],
+    currentPhase: null,
+    agents: [],
+    agentCount: 0,
+    startedAt: Date.now(),
+    updatedAt: Date.now(),
+    ...overrides,
+  }
+}
+
+describe('installWorkflowNotifications', () => {
+  test('running → completed triggers notification (incl. workflow name)', async () => {
+    const { installWorkflowNotifications } = await import('../notifications.js')
+    const { service, emit, setRuns } = makeMockService([
+      makeRun('r1', 'running'),
+    ])
+    const calls: string[] = []
+    const unsubscribe = installWorkflowNotifications(service, msg =>
+      calls.push(msg),
+    )
+
+    // first emit: listener records initial running state, no notification
+    emit()
+    expect(calls.length).toBe(0)
+
+    setRuns([makeRun('r1', 'completed')])
+    emit()
+
+    expect(calls.length).toBe(1)
+    expect(calls[0]).toMatch(/task-notification/)
+    expect(calls[0]).toMatch(/completed successfully/)
+    expect(calls[0]).toMatch(/"wf"/)
+    unsubscribe()
+  })
+
+  test('running → failed triggers notification, includes error text', async () => {
+    const { installWorkflowNotifications } = await import('../notifications.js')
+    const { service, emit, setRuns } = makeMockService([
+      makeRun('r1', 'running'),
+    ])
+    const calls: string[] = []
+    installWorkflowNotifications(service, msg => calls.push(msg))
+
+    emit() // record initial running
+    setRuns([makeRun('r1', 'failed', { error: 'agent X boom' })])
+    emit()
+
+    expect(calls.length).toBe(1)
+    expect(calls[0]).toMatch(/failed/)
+    expect(calls[0]).toMatch(/agent X boom/)
+  })
+
+  test('running → killed triggers notification', async () => {
+    const { installWorkflowNotifications } = await import('../notifications.js')
+    const { service, emit, setRuns } = makeMockService([
+      makeRun('r1', 'running'),
+    ])
+    const calls: string[] = []
+    installWorkflowNotifications(service, msg => calls.push(msg))
+
+    emit() // record initial running
+    setRuns([makeRun('r1', 'killed')])
+    emit()
+
+    expect(calls.length).toBe(1)
+    expect(calls[0]).toMatch(/was stopped/)
+  })
+
+  test('first time seeing run (no prev) does not notify (avoid notifying historical runs on startup)', async () => {
+    const { installWorkflowNotifications } = await import('../notifications.js')
+    const { service, emit, setRuns } = makeMockService([])
+    const calls: string[] = []
+    installWorkflowNotifications(service, msg => calls.push(msg))
+
+    // first emit after startup, sees r1 already completed — should not notify (not a transition from running)
+    setRuns([makeRun('r1', 'completed')])
+    emit()
+
+    expect(calls.length).toBe(0)
+  })
+
+  test('running → running does not notify', async () => {
+    const { installWorkflowNotifications } = await import('../notifications.js')
+    const { service, emit, setRuns } = makeMockService([
+      makeRun('r1', 'running'),
+    ])
+    const calls: string[] = []
+    installWorkflowNotifications(service, msg => calls.push(msg))
+
+    emit() // record initial running
+    setRuns([makeRun('r1', 'running', { agentCount: 1 })])
+    emit()
+
+    expect(calls.length).toBe(0)
+  })
+
+  test('already completed run emitting again does not repeat notification', async () => {
+    const { installWorkflowNotifications } = await import('../notifications.js')
+    const { service, emit, setRuns } = makeMockService([
+      makeRun('r1', 'running'),
+    ])
+    const calls: string[] = []
+    installWorkflowNotifications(service, msg => calls.push(msg))
+
+    emit() // record initial running
+    setRuns([makeRun('r1', 'completed')])
+    emit()
+    expect(calls.length).toBe(1)
+
+    emit()
+    expect(calls.length).toBe(1)
+  })
+
+  test('after unsubscribe no more notifications', async () => {
+    const { installWorkflowNotifications } = await import('../notifications.js')
+    const { service, emit, setRuns } = makeMockService([
+      makeRun('r1', 'running'),
+    ])
+    const calls: string[] = []
+    const unsubscribe = installWorkflowNotifications(service, msg =>
+      calls.push(msg),
+    )
+
+    emit() // record initial running
+    unsubscribe()
+    setRuns([makeRun('r1', 'completed')])
+    emit()
+
+    expect(calls.length).toBe(0)
+  })
+})
--- a/src/workflow/tests/persistence.test.ts
+++ b/src/workflow/tests/persistence.test.ts
@@ -0,0 +1,199 @@
+import { expect, test } from 'bun:test'
+import {
+  mkdir,
+  mkdtemp,
+  readFile,
+  readdir,
+  rm,
+  writeFile as fsWriteFile,
+} from 'node:fs/promises'
+import { tmpdir } from 'node:os'
+import { join } from 'node:path'
+import {
+  getRunsDir,
+  listPersistedRuns,
+  readRunState,
+  writeRunState,
+} from '../persistence.js'
+import type { RunProgress } from '../progress/store.js'
+
+function makeRun(over: Partial<RunProgress> = {}): RunProgress {
+  return {
+    runId: 'r1',
+    workflowName: 'w',
+    status: 'completed',
+    phases: [],
+    declaredPhases: [],
+    currentPhase: null,
+    agents: [],
+    agentCount: 0,
+    startedAt: 1000,
+    updatedAt: 2000,
+    ...over,
+  } as RunProgress
+}
+
+test('writeRunState → readRunState round-trip consistent (returnValue is object)', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    const run = makeRun({
+      returnValue: { confirmedCount: 2, items: ['a', 'b'] },
+    })
+    await writeRunState(dir, run)
+    const got = await readRunState(dir, 'r1')
+    expect(got).not.toBeNull()
+    expect(got!.runId).toBe('r1')
+    expect(got!.returnValue).toEqual({ confirmedCount: 2, items: ['a', 'b'] })
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('readRunState missing file → null', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    const got = await readRunState(dir, 'never-exists')
+    expect(got).toBeNull()
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('readRunState corrupt JSON → null', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    await mkdir(join(dir, 'rX'), { recursive: true })
+    await fsWriteFile(join(dir, 'rX', 'state.json'), '{not valid json', 'utf-8')
+    const got = await readRunState(dir, 'rX')
+    expect(got).toBeNull()
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('readRunState schemaVersion mismatch → null', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    await mkdir(join(dir, 'rX'), { recursive: true })
+    await fsWriteFile(
+      join(dir, 'rX', 'state.json'),
+      JSON.stringify({ schemaVersion: 999, run: makeRun({ runId: 'rX' }) }),
+      'utf-8',
+    )
+    const got = await readRunState(dir, 'rX')
+    expect(got).toBeNull()
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('writeRunState atomic write: no tmp residue after success', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    await writeRunState(dir, makeRun({ runId: 'rAtom' }))
+    const sub = await readdir(join(dir, 'rAtom'))
+    expect(sub).toContain('state.json')
+    expect(sub).not.toContain('state.json.tmp')
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('listPersistedRuns scans multiple subdirs, skips dirs without state.json, sorts by updatedAt desc', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    // three valid runs + one half-broken dir with only journal, no state.json
+    await writeRunState(dir, makeRun({ runId: 'old', updatedAt: 1000 }))
+    await writeRunState(dir, makeRun({ runId: 'mid', updatedAt: 2000 }))
+    await writeRunState(dir, makeRun({ runId: 'new', updatedAt: 3000 }))
+    await mkdir(join(dir, 'half-broken'), { recursive: true })
+
+    const runs = await listPersistedRuns(dir)
+    expect(runs.map(r => r.runId)).toEqual(['new', 'mid', 'old'])
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('listPersistedRuns scans a corrupt state.json → skip that single one, continue scanning the rest', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    await writeRunState(dir, makeRun({ runId: 'good' }))
+    await mkdir(join(dir, 'bad'), { recursive: true })
+    await fsWriteFile(join(dir, 'bad', 'state.json'), 'corrupt', 'utf-8')
+
+    const runs = await listPersistedRuns(dir)
+    expect(runs.map(r => r.runId)).toEqual(['good'])
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('writeRunState does not throw when returnValue is null/string/array', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    await writeRunState(dir, makeRun({ runId: 'n', returnValue: null }))
+    await writeRunState(dir, makeRun({ runId: 's', returnValue: 'text' }))
+    await writeRunState(dir, makeRun({ runId: 'a', returnValue: [1, 2, 3] }))
+    expect((await readRunState(dir, 'n'))!.returnValue).toBeNull()
+    expect((await readRunState(dir, 's'))!.returnValue).toBe('text')
+    expect((await readRunState(dir, 'a'))!.returnValue).toEqual([1, 2, 3])
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('writeRunState overwrite: same runId second write overwrites old content', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    await writeRunState(dir, makeRun({ runId: 'rOV', status: 'running' }))
+    await writeRunState(dir, makeRun({ runId: 'rOV', status: 'completed' }))
+    const got = await readRunState(dir, 'rOV')
+    expect(got!.status).toBe('completed')
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('writeRunState writes full AgentProgress (no output content, includes label/phase/token etc.)', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-'))
+  try {
+    const run = makeRun({
+      runId: 'rAg',
+      agents: [
+        {
+          id: 1,
+          label: 'review:hooks',
+          phase: 'Review',
+          status: 'done',
+          outputShape: 'object',
+          tokenCount: 12345,
+          toolCount: 3,
+          model: 'claude-sonnet-4-6',
+        },
+      ],
+      agentCount: 1,
+    })
+    await writeRunState(dir, run)
+    const got = await readRunState(dir, 'rAg')
+    expect(got!.agents).toHaveLength(1)
+    expect(got!.agents[0]).toEqual({
+      id: 1,
+      label: 'review:hooks',
+      phase: 'Review',
+      status: 'done',
+      outputShape: 'object',
+      tokenCount: 12345,
+      toolCount: 3,
+      model: 'claude-sonnet-4-6',
+    })
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('getRunsDir returns <projectRoot>/.claude/workflow-runs shape', () => {
+  const dir = getRunsDir()
+  // do not hard-code projectRoot (differs across machines), only check suffix structure
+  expect(dir.endsWith(`${join('.claude', 'workflow-runs')}`)).toBe(true)
+})
--- a/src/workflow/tests/ports.test.ts
+++ b/src/workflow/tests/ports.test.ts
@@ -0,0 +1,198 @@
+import { expect, test } from 'bun:test'
+// Note: this test does not mock bootstrap/state, utils/cwd, analytics, debug.
+// Reason: mock.module is process-global (last-write-wins); mocking these common modules would pollute
+// other tests in the same process (e.g. src/commands/__tests__/autonomy.test.ts imports the real
+// bootstrap/state via its dependency chain). ports can resolve getProjectRoot/getCwd normally in the test env,
+// logEvent/logForDebugging are silent no-ops when sink is not attached, no need to mock.
+
+import { buildRegistry } from '../registry.js'
+import { createWorkflowPorts } from '../ports.js'
+import { createProgressBus } from '../progress/bus.js'
+import { createProgressStoreFromBus } from '../progress/store.js'
+import { getProjectRoot } from '../../bootstrap/state.js'
+import type { SetAppState } from '../../Task.js'
+import type { AppState } from '../../state/AppState.tsx'
+
+test('buildRegistry registers claude-code as default and resolve hits', () => {
+  const reg = buildRegistry()
+  expect(reg.has('claude-code')).toBe(true)
+  expect(reg.resolve({ prompt: 'x' }).id).toBe('claude-code')
+  expect(reg.resolve({ prompt: 'x', agentType: 'whatever' }).id).toBe(
+    'claude-code',
+  )
+})
+
+test('createWorkflowPorts assembles full ports (incl. agentAdapterRegistry and progressEmitter→bus)', () => {
+  const bus = createProgressBus()
+  const store = createProgressStoreFromBus(bus)
+  const ports = createWorkflowPorts({ bus, store })
+
+  expect(ports.agentAdapterRegistry).toBeDefined()
+  expect(ports.agentAdapterRegistry!.resolve({ prompt: 'x' }).id).toBe(
+    'claude-code',
+  )
+  expect(typeof ports.taskRegistrar.register).toBe('function')
+  expect(typeof ports.taskRegistrar.kill).toBe('function')
+  expect(typeof ports.hostFactory).toBe('function')
+  // agentRunner fallback fields still exist (WorkflowPorts required)
+  expect(ports.agentRunner).toBeDefined()
+  expect(typeof ports.agentRunner.runAgentToResult).toBe('function')
+
+  // progressEmitter via bus → store: emit a run_started, store can see it
+  ports.progressEmitter.emit({
+    type: 'run_started',
+    runId: 't',
+    workflowName: 'w',
+    meta: null,
+  })
+  expect(store.get('t')?.workflowName).toBe('w')
+})
+
+test('taskRegistrar.register/complete/kill routes via RunBinding (real setAppState, no mock)', () => {
+  const bus = createProgressBus()
+  const store = createProgressStoreFromBus(bus)
+  const ports = createWorkflowPorts({ bus, store })
+
+  // real setAppState: use a local AppState object to hold tasks, registerTask goes through the real code path.
+  const state = { tasks: {} } as unknown as AppState
+  const setAppState: SetAppState = f => {
+    Object.assign(state, f(state))
+  }
+
+  const hostCtx = ports.hostFactory({
+    context: {
+      agentId: 'a-1',
+      toolUseId: 'tu-1',
+      setAppState,
+    },
+    canUseTool: (() => Promise.resolve({ behavior: 'allow' })) as never,
+    parentMessage: {} as never,
+  })
+
+  const { runId, signal } = ports.taskRegistrar.register(
+    {
+      workflowName: 'wf',
+      summary: 'summary',
+      workflowFile: 'wf.ts',
+      toolUseId: 'tu-1',
+    },
+    hostCtx.handle,
+  )
+  expect(typeof runId).toBe('string')
+  expect(signal).toBeInstanceOf(AbortSignal)
+
+  // complete/fail/kill do not throw (RunBinding hit)
+  expect(() => ports.taskRegistrar.complete(runId, 'done')).not.toThrow()
+  expect(() => ports.taskRegistrar.kill(runId)).not.toThrow()
+  // unknown runId safe no-op
+  expect(() => ports.taskRegistrar.complete('nope')).not.toThrow()
+  expect(ports.taskRegistrar.pendingAction('nope')).toBeNull()
+
+  // after terminal state binding is reclaimed: calling complete on the same runId again should be safe no-op (no throw, no repeated call to workflow task fn)
+  ports.taskRegistrar.complete(runId)
+  ports.taskRegistrar.kill(runId)
+})
+
+// agent-level kill bridge: register → killAgent precisely aborts; kill(runId) aborts all agents.
+test('taskRegistrar agentAbortControllers: register/killAgent precise abort; kill(runId) batch abort', () => {
+  const bus = createProgressBus()
+  const store = createProgressStoreFromBus(bus)
+  const ports = createWorkflowPorts({ bus, store })
+  // impl always provides these — cast flattens optional to required (avoids per-line ! assertion)
+  const tr = ports.taskRegistrar as Required<typeof ports.taskRegistrar>
+
+  const state = { tasks: {} } as unknown as AppState
+  const setAppState: SetAppState = f => {
+    Object.assign(state, f(state))
+  }
+  const hostCtx = ports.hostFactory({
+    context: { agentId: 'a-1', toolUseId: 'tu-1', setAppState },
+    canUseTool: (() => Promise.resolve({ behavior: 'allow' })) as never,
+    parentMessage: {} as never,
+  })
+  const { runId } = tr.register(
+    {
+      workflowName: 'wf',
+      summary: 'summary',
+      workflowFile: 'wf.ts',
+      toolUseId: 'tu-1',
+    },
+    hostCtx.handle,
+  )
+
+  // register AbortController for two agents (simulating backend calling when launching agent)
+  const ac1 = new AbortController()
+  const ac2 = new AbortController()
+  tr.registerAgentAbort(runId, 1, ac1)
+  tr.registerAgentAbort(runId, 2, ac2)
+  expect(ac1.signal.aborted).toBe(false)
+  expect(ac2.signal.aborted).toBe(false)
+
+  // killAgent precisely aborts agent #1: only ac1 aborts, ac2 unaffected
+  expect(tr.killAgent(runId, 1)).toBe(true)
+  expect(ac1.signal.aborted).toBe(true)
+  expect(ac2.signal.aborted).toBe(false)
+  // repeat kill on same agent: controller already deleted, returns false (idempotent)
+  expect(tr.killAgent(runId, 1)).toBe(false)
+
+  // unknown agentId / unknown runId safe returns false
+  expect(tr.killAgent(runId, 999)).toBe(false)
+  expect(tr.killAgent('nope', 1)).toBe(false)
+
+  // kill(runId) batch aborts remaining agent (ac2)
+  tr.kill(runId)
+  expect(ac2.signal.aborted).toBe(true)
+
+  // after run terminal state binding is reclaimed: killAgent returns false
+  expect(tr.killAgent(runId, 2)).toBe(false)
+})
+
+test('unregisterAgentAbort deletes from Map (backend finally cleanup idempotent)', () => {
+  const bus = createProgressBus()
+  const store = createProgressStoreFromBus(bus)
+  const ports = createWorkflowPorts({ bus, store })
+  const tr = ports.taskRegistrar as Required<typeof ports.taskRegistrar>
+
+  const state = { tasks: {} } as unknown as AppState
+  const setAppState: SetAppState = f => {
+    Object.assign(state, f(state))
+  }
+  const hostCtx = ports.hostFactory({
+    context: { agentId: 'a-1', toolUseId: 'tu-1', setAppState },
+    canUseTool: (() => Promise.resolve({ behavior: 'allow' })) as never,
+    parentMessage: {} as never,
+  })
+  const { runId } = tr.register(
+    {
+      workflowName: 'wf',
+      summary: 'summary',
+      workflowFile: 'wf.ts',
+      toolUseId: 'tu-1',
+    },
+    hostCtx.handle,
+  )
+  const ac = new AbortController()
+  tr.registerAgentAbort(runId, 5, ac)
+  // after unregister killAgent has no target, returns false (does not throw)
+  tr.unregisterAgentAbort(runId, 5)
+  expect(tr.killAgent(runId, 5)).toBe(false)
+  // repeat unregister idempotent (backend finally does not throw)
+  expect(() => tr.unregisterAgentAbort(runId, 5)).not.toThrow()
+  // unknown runId safe no-op
+  expect(() => tr.unregisterAgentAbort('nope', 5)).not.toThrow()
+})
+
+test('hostFactory.cwd and journalStore share root (getProjectRoot) — fix K regression', () => {
+  // historical bug: hostFactory.cwd used getCwd(), journalStore used getProjectRoot(),
+  // when user enters worktree/subdirectory the two differ → named workflow resolution and journal persist out of sync.
+  // After fix both use projectRoot, this test locks-in that choice, preventing regression.
+  const bus = createProgressBus()
+  const store = createProgressStoreFromBus(bus)
+  const ports = createWorkflowPorts({ bus, store })
+  const hostCtx = ports.hostFactory({
+    context: { agentId: 'a', toolUseId: 'tu' },
+    canUseTool: (() => Promise.resolve({ behavior: 'allow' })) as never,
+    parentMessage: {} as never,
+  })
+  expect(hostCtx.cwd).toBe(getProjectRoot())
+})
--- a/src/workflow/tests/progressBus.test.ts
+++ b/src/workflow/tests/progressBus.test.ts
@@ -0,0 +1,23 @@
+import { expect, test, mock } from 'bun:test'
+import { createProgressBus } from '../progress/bus.js'
+
+test('emit broadcasts to all subscribers', () => {
+  const bus = createProgressBus()
+  const a = mock(() => {})
+  const b = mock(() => {})
+  bus.subscribe(a)
+  bus.subscribe(b)
+  const ev = { type: 'log' as const, runId: 'r', message: 'hi' }
+  bus.emit(ev)
+  expect(a).toHaveBeenCalledTimes(1)
+  expect(b).toHaveBeenCalledWith(ev)
+})
+
+test('subscribe returns unsubscribe', () => {
+  const bus = createProgressBus()
+  const fn = mock(() => {})
+  const unsub = bus.subscribe(fn)
+  unsub()
+  bus.emit({ type: 'log', runId: 'r', message: 'x' })
+  expect(fn).not.toHaveBeenCalled()
+})
--- a/src/workflow/tests/progressStore.test.ts
+++ b/src/workflow/tests/progressStore.test.ts
@@ -0,0 +1,289 @@
+import { expect, test } from 'bun:test'
+import { createProgressBus, type ProgressBus } from '../progress/bus.js'
+import {
+  createProgressStoreFromBus,
+  type RunProgress,
+} from '../progress/store.js'
+import type { AgentRunResult } from '@claude-code-best/workflow-engine'
+
+const ok = (o: string): AgentRunResult => ({
+  kind: 'ok',
+  output: o,
+  usage: { outputTokens: 1 },
+})
+
+function newStore() {
+  const bus: ProgressBus = createProgressBus()
+  return { bus, store: createProgressStoreFromBus(bus) }
+}
+
+test('run_started creates entry; phase_started/done updates phases', () => {
+  const { bus, store } = newStore()
+  bus.emit({ type: 'run_started', runId: 'r1', workflowName: 'w', meta: null })
+  bus.emit({ type: 'phase_started', runId: 'r1', phase: 'A' })
+  bus.emit({ type: 'phase_started', runId: 'r1', phase: 'B' })
+  bus.emit({ type: 'phase_done', runId: 'r1', phase: 'A' })
+  const r = store.get('r1')!
+  expect(r.phases.map(p => [p.title, p.status])).toEqual([
+    ['A', 'done'],
+    ['B', 'running'],
+  ])
+  expect(r.currentPhase).toBe('B')
+})
+
+test('concurrent agent_done correlates by agentId precisely (regression of old LIFO race)', () => {
+  const { bus, store } = newStore()
+  bus.emit({ type: 'run_started', runId: 'r1', workflowName: 'w', meta: null })
+  bus.emit({
+    type: 'agent_started',
+    runId: 'r1',
+    agentId: 0,
+    label: 'a',
+    phase: 'A',
+  })
+  bus.emit({
+    type: 'agent_started',
+    runId: 'r1',
+    agentId: 1,
+    label: 'b',
+    phase: 'A',
+  })
+  bus.emit({
+    type: 'agent_done',
+    runId: 'r1',
+    agentId: 1,
+    label: 'b',
+    phase: 'A',
+    result: ok('b-out'),
+  })
+  bus.emit({
+    type: 'agent_done',
+    runId: 'r1',
+    agentId: 0,
+    label: 'a',
+    phase: 'A',
+    result: ok('a-out'),
+  })
+  const agents = store.get('r1')!.agents
+  expect(agents.find(x => x.id === 0)?.status).toBe('done')
+  expect(agents.find(x => x.id === 1)?.status).toBe('done')
+  expect(agents.find(x => x.id === 0)?.label).toBe('a')
+  expect(agents.find(x => x.id === 1)?.label).toBe('b')
+})
+
+test('journal hit (agent_done without started) backfills done entry by id', () => {
+  const { bus, store } = newStore()
+  bus.emit({ type: 'run_started', runId: 'r1', workflowName: 'w', meta: null })
+  bus.emit({
+    type: 'agent_done',
+    runId: 'r1',
+    agentId: 7,
+    label: 'c',
+    phase: 'A',
+    result: ok('c'),
+  })
+  const a = store.get('r1')!.agents.find(x => x.id === 7)!
+  expect(a.status).toBe('done')
+})
+
+test('run_done terminal state + list sort + subscribe notification', () => {
+  const { bus, store } = newStore()
+  let calls = 0
+  store.subscribe(() => calls++)
+  bus.emit({ type: 'run_started', runId: 'r1', workflowName: 'w', meta: null })
+  bus.emit({
+    type: 'run_done',
+    runId: 'r1',
+    status: 'completed',
+    returnValue: 42,
+  })
+  const r = store.get('r1')!
+  expect(r.status).toBe('completed')
+  expect(r.returnValue).toBe(42)
+  expect(store.list().map(x => x.runId)).toEqual(['r1'])
+  expect(calls).toBe(2)
+})
+
+test('run_done failed terminal state records error', () => {
+  const { bus, store } = newStore()
+  bus.emit({ type: 'run_started', runId: 'r2', workflowName: 'w', meta: null })
+  bus.emit({ type: 'run_done', runId: 'r2', status: 'failed', error: 'boom' })
+  const r = store.get('r2')!
+  expect(r.status).toBe('failed')
+  expect(r.error).toBe('boom')
+})
+
+test('log event does not trigger notify', () => {
+  const { bus, store } = newStore()
+  let calls = 0
+  store.subscribe(() => calls++)
+  bus.emit({ type: 'run_started', runId: 'r3', workflowName: 'w', meta: null })
+  const before = calls
+  bus.emit({ type: 'log', runId: 'r3', message: 'hi' })
+  expect(calls).toBe(before) // log should not trigger notify
+})
+
+test('run_started persists declaredPhases (from meta.phases, order preserved)', () => {
+  const { bus, store } = newStore()
+  bus.emit({
+    type: 'run_started',
+    runId: 'r1',
+    workflowName: 'w',
+    meta: {
+      name: 'w',
+      description: 'd',
+      phases: [{ title: 'Find' }, { title: 'Review' }, { title: 'Verify' }],
+    },
+  })
+  expect(store.get('r1')!.declaredPhases).toEqual(['Find', 'Review', 'Verify'])
+})
+
+test('run_started meta is null → declaredPhases = []', () => {
+  const { bus, store } = newStore()
+  bus.emit({ type: 'run_started', runId: 'r1', workflowName: 'w', meta: null })
+  expect(store.get('r1')!.declaredPhases).toEqual([])
+})
+
+test('agent_done persists outputShape (ok·object / ok·text / dead none)', () => {
+  const { bus, store } = newStore()
+  bus.emit({ type: 'run_started', runId: 'r1', workflowName: 'w', meta: null })
+  bus.emit({ type: 'agent_started', runId: 'r1', agentId: 0, phase: 'A' })
+  bus.emit({ type: 'agent_started', runId: 'r1', agentId: 1, phase: 'A' })
+  bus.emit({ type: 'agent_started', runId: 'r1', agentId: 2, phase: 'A' })
+  bus.emit({
+    type: 'agent_done',
+    runId: 'r1',
+    agentId: 0,
+    phase: 'A',
+    result: { kind: 'ok', output: { x: 1 }, usage: { outputTokens: 1 } },
+  })
+  bus.emit({
+    type: 'agent_done',
+    runId: 'r1',
+    agentId: 1,
+    phase: 'A',
+    result: { kind: 'ok', output: 'hi', usage: { outputTokens: 1 } },
+  })
+  bus.emit({
+    type: 'agent_done',
+    runId: 'r1',
+    agentId: 2,
+    phase: 'A',
+    result: { kind: 'dead' },
+  })
+  const agents = store.get('r1')!.agents
+  expect(agents.find(a => a.id === 0)?.outputShape).toBe('object')
+  expect(agents.find(a => a.id === 1)?.outputShape).toBe('text')
+  expect(agents.find(a => a.id === 2)?.outputShape).toBeUndefined()
+})
+
+test('agent_progress real-time updates token/tool (correlated by agentId)', () => {
+  const { bus, store } = newStore()
+  bus.emit({ type: 'run_started', runId: 'r1', workflowName: 'w', meta: null })
+  bus.emit({
+    type: 'agent_started',
+    runId: 'r1',
+    agentId: 0,
+    label: 'a',
+    phase: 'A',
+  })
+  bus.emit({
+    type: 'agent_progress',
+    runId: 'r1',
+    agentId: 0,
+    tokenCount: 1200,
+    toolCount: 2,
+  })
+  let a = store.get('r1')!.agents.find(x => x.id === 0)!
+  expect(a.tokenCount).toBe(1200)
+  expect(a.toolCount).toBe(2)
+  bus.emit({
+    type: 'agent_progress',
+    runId: 'r1',
+    agentId: 0,
+    tokenCount: 2400,
+    toolCount: 3,
+  })
+  a = store.get('r1')!.agents.find(x => x.id === 0)!
+  expect(a.tokenCount).toBe(2400)
+  expect(a.toolCount).toBe(3)
+})
+
+test('agent_done persists model/tokenCount/toolCount (ok variant)', () => {
+  const { bus, store } = newStore()
+  bus.emit({ type: 'run_started', runId: 'r1', workflowName: 'w', meta: null })
+  bus.emit({ type: 'agent_started', runId: 'r1', agentId: 0, phase: 'A' })
+  bus.emit({
+    type: 'agent_done',
+    runId: 'r1',
+    agentId: 0,
+    phase: 'A',
+    result: {
+      kind: 'ok',
+      output: 'x',
+      usage: { outputTokens: 5 },
+      model: 'glm-5.2',
+      tokenCount: 22900,
+      toolCount: 1,
+    },
+  })
+  const a = store.get('r1')!.agents.find(x => x.id === 0)!
+  expect(a.model).toBe('glm-5.2')
+  expect(a.tokenCount).toBe(22900)
+  expect(a.toolCount).toBe(1)
+})
+
+// ---- hydrate: inject historical run from disk (cross-restart recovery) ----
+
+test('hydrate injects new run → get hits + list includes it + notifies listener', () => {
+  const { store } = newStore()
+  let notified = 0
+  store.subscribe(() => notified++)
+
+  const historical: RunProgress = {
+    runId: 'hist-1',
+    workflowName: 'old-job',
+    status: 'completed',
+    phases: [],
+    declaredPhases: [],
+    currentPhase: null,
+    agents: [],
+    agentCount: 5,
+    returnValue: { summary: 'past' },
+    startedAt: 1,
+    updatedAt: 2,
+  }
+  store.hydrate(historical)
+
+  expect(store.get('hist-1')).toBe(historical)
+  expect(store.list().map(r => r.runId)).toContain('hist-1')
+  expect(notified).toBeGreaterThan(0)
+})
+
+test('hydrate existing runId → skip (memory first, not overwritten by disk)', () => {
+  const { bus, store } = newStore()
+  bus.emit({
+    type: 'run_started',
+    runId: 'r1',
+    workflowName: 'live',
+    meta: null,
+  })
+
+  const stale: RunProgress = {
+    runId: 'r1',
+    workflowName: 'STALE-SHOULD-NOT-WIN',
+    status: 'completed',
+    phases: [],
+    declaredPhases: [],
+    currentPhase: null,
+    agents: [],
+    agentCount: 0,
+    startedAt: 1,
+    updatedAt: 2,
+  }
+  store.hydrate(stale)
+
+  const got = store.get('r1')!
+  expect(got.workflowName).toBe('live')
+  expect(got.status).toBe('running')
+})
--- a/src/workflow/tests/runStatePersistence.test.ts
+++ b/src/workflow/tests/runStatePersistence.test.ts
@@ -0,0 +1,177 @@
+import { expect, test } from 'bun:test'
+import { mkdtemp, rm, writeFile } from 'node:fs/promises'
+import { tmpdir } from 'node:os'
+import { join } from 'node:path'
+import { attachRunStatePersistence, readRunState } from '../persistence.js'
+import { createProgressBus } from '../progress/bus.js'
+import { createProgressStoreFromBus } from '../progress/store.js'
+
+/**
+ * Contract test for attachRunStatePersistence (adjusted Task 4):
+ * directly test the bus + store combination, bypassing makeService (keeps makeService signature (ports, store, cwdOverride?) unchanged).
+ *
+ * runsDir is injected as tmpdir via attachRunStatePersistence's third parameter runsDirProvider,
+ * to avoid writing to the real project directory (Bun ESM module namespace is read-only, cannot monkey-patch getRunsDir).
+ */
+
+test('run_done completed → writes state.json to disk, returnValue consistent', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-persist-'))
+  try {
+    const bus = createProgressBus()
+    const store = createProgressStoreFromBus(bus)
+    attachRunStatePersistence(bus, store, () => dir)
+
+    bus.emit({
+      type: 'run_started',
+      runId: 'rW',
+      workflowName: 'w',
+      meta: null,
+    })
+    bus.emit({
+      type: 'run_done',
+      runId: 'rW',
+      status: 'completed',
+      returnValue: { ok: true, n: 3 },
+    })
+
+    // writeRunState is async (void writeRunState(...) in the subscription); let the microtask complete
+    await new Promise(r => setTimeout(r, 50))
+
+    const got = await readRunState(dir, 'rW')
+    expect(got).not.toBeNull()
+    expect(got!.status).toBe('completed')
+    expect(got!.returnValue).toEqual({ ok: true, n: 3 })
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('run_done failed → writes status=failed + error field to disk', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-persist-'))
+  try {
+    const bus = createProgressBus()
+    const store = createProgressStoreFromBus(bus)
+    attachRunStatePersistence(bus, store, () => dir)
+
+    bus.emit({
+      type: 'run_started',
+      runId: 'rF',
+      workflowName: 'w',
+      meta: null,
+    })
+    bus.emit({
+      type: 'run_done',
+      runId: 'rF',
+      status: 'failed',
+      error: 'boom',
+    })
+    await new Promise(r => setTimeout(r, 50))
+
+    const got = await readRunState(dir, 'rF')
+    expect(got).not.toBeNull()
+    expect(got!.status).toBe('failed')
+    expect(got!.error).toBe('boom')
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('run_done killed → writes status=killed to disk', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-persist-'))
+  try {
+    const bus = createProgressBus()
+    const store = createProgressStoreFromBus(bus)
+    attachRunStatePersistence(bus, store, () => dir)
+
+    bus.emit({
+      type: 'run_started',
+      runId: 'rK',
+      workflowName: 'w',
+      meta: null,
+    })
+    bus.emit({ type: 'run_done', runId: 'rK', status: 'killed' })
+    await new Promise(r => setTimeout(r, 50))
+
+    const got = await readRunState(dir, 'rK')
+    expect(got?.status).toBe('killed')
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('writeRunState internal IO exception is swallowed: attachRunStatePersistence does not propagate, bus emit does not break', async () => {
+  const blockerDir = await mkdtemp(join(tmpdir(), 'wf-persist-'))
+  // first create a same-named file, so subdir mkdir fails → writeRunState internal catch swallows it
+  await writeFile(join(blockerDir, 'not-a-dir.txt'), 'blocker', 'utf-8')
+  try {
+    const bus = createProgressBus()
+    const store = createProgressStoreFromBus(bus)
+    // runsDir points to a dir whose parent path is a file: mkdir recursive fails
+    attachRunStatePersistence(bus, store, () =>
+      join(blockerDir, 'not-a-dir.txt'),
+    )
+
+    // an extra subscriber to verify it still gets notified (bus emit should not break due to internal exception in persistence listener)
+    let otherNotified = 0
+    bus.subscribe(() => otherNotified++)
+
+    // bus.emit should not throw — writeRunState swallows the exception internally
+    expect(() => {
+      bus.emit({
+        type: 'run_started',
+        runId: 'rErr',
+        workflowName: 'w',
+        meta: null,
+      })
+      bus.emit({
+        type: 'run_done',
+        runId: 'rErr',
+        status: 'completed',
+        returnValue: 'x',
+      })
+    }).not.toThrow()
+
+    // let writeRunState's microtask complete (exception swallowed internally)
+    await new Promise(r => setTimeout(r, 50))
+
+    // this store subscriber still works normally (received both run_started + run_done events)
+    expect(otherNotified).toBeGreaterThanOrEqual(2)
+    expect(store.get('rErr')?.status).toBe('completed')
+  } finally {
+    await rm(blockerDir, { recursive: true, force: true })
+  }
+})
+
+test('attachRunStatePersistence returns unsubscribe; after calling it no more disk writes', async () => {
+  const dir = await mkdtemp(join(tmpdir(), 'wf-persist-'))
+  try {
+    const bus = createProgressBus()
+    const store = createProgressStoreFromBus(bus)
+    const unsub = attachRunStatePersistence(bus, store, () => dir)
+
+    // first emit a run_done, verify disk write takes effect
+    bus.emit({
+      type: 'run_started',
+      runId: 'r1',
+      workflowName: 'w',
+      meta: null,
+    })
+    bus.emit({ type: 'run_done', runId: 'r1', status: 'completed' })
+    await new Promise(r => setTimeout(r, 50))
+    expect(await readRunState(dir, 'r1')).not.toBeNull()
+
+    // after unsubscribe, emit run_done again, should not write to disk
+    unsub()
+    bus.emit({
+      type: 'run_started',
+      runId: 'r2',
+      workflowName: 'w',
+      meta: null,
+    })
+    bus.emit({ type: 'run_done', runId: 'r2', status: 'completed' })
+    await new Promise(r => setTimeout(r, 50))
+    expect(await readRunState(dir, 'r2')).toBeNull()
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
--- a/src/workflow/tests/selectors.test.ts
+++ b/src/workflow/tests/selectors.test.ts
@@ -0,0 +1,82 @@
+import { expect, test } from 'bun:test'
+import type { AgentProgress, RunProgress } from '../progress/store.js'
+import {
+  ALL_PHASE,
+  mergePhases,
+  filterAgentsByPhase,
+  tabLabel,
+} from '../panel/selectors.js'
+
+function run(partial: Partial<RunProgress>): RunProgress {
+  return {
+    runId: 'r1',
+    workflowName: 'w',
+    status: 'running',
+    phases: [],
+    declaredPhases: [],
+    currentPhase: null,
+    agents: [],
+    agentCount: 0,
+    startedAt: 1,
+    updatedAt: 1,
+    ...partial,
+  }
+}
+
+test('mergePhases: declared order first, actual phases append undeclared ones, counts done/total', () => {
+  const r = run({
+    declaredPhases: ['Find', 'Review', 'Verify'],
+    phases: [
+      { title: 'Find', status: 'done' },
+      { title: 'Review', status: 'running' },
+    ],
+    agents: [
+      {
+        id: 1,
+        phase: 'Find',
+        status: 'done',
+        resultKind: 'ok',
+        outputShape: 'text',
+      },
+      { id: 2, phase: 'Find', status: 'done', resultKind: 'dead' },
+      { id: 3, phase: 'Review', status: 'running' },
+    ],
+  })
+  expect(mergePhases(r)).toEqual([
+    { title: 'Find', status: 'done', done: 2, total: 2 },
+    { title: 'Review', status: 'running', done: 0, total: 1 },
+    { title: 'Verify', status: 'pending', done: 0, total: 0 },
+  ])
+})
+
+test('mergePhases: actual but undeclared phase appended to the end', () => {
+  const r = run({
+    declaredPhases: ['Find'],
+    phases: [
+      { title: 'Find', status: 'done' },
+      { title: 'Adhoc', status: 'running' },
+    ],
+    agents: [],
+  })
+  expect(mergePhases(r).map(p => p.title)).toEqual(['Find', 'Adhoc'])
+})
+
+test('filterAgentsByPhase: All / undefined → all; specified → only that phase', () => {
+  const agents: AgentProgress[] = [
+    { id: 1, phase: 'A', status: 'running' },
+    {
+      id: 2,
+      phase: 'B',
+      status: 'done',
+      resultKind: 'ok',
+      outputShape: 'text',
+    },
+  ]
+  expect(filterAgentsByPhase(agents, undefined)).toHaveLength(2)
+  expect(filterAgentsByPhase(agents, ALL_PHASE)).toHaveLength(2)
+  expect(filterAgentsByPhase(agents, 'A')).toEqual([agents[0]])
+})
+
+test('tabLabel: workflow name + last 4 chars short code of runId', () => {
+  expect(tabLabel('review-changes', 'wf_abc123def')).toBe('review-changes#3def')
+})
--- a/src/workflow/tests/service.test.ts
+++ b/src/workflow/tests/service.test.ts
@@ -0,0 +1,594 @@
+import { expect, test } from 'bun:test'
+// DI pattern: do not use mock.module (process-global, last-write-wins, would pollute other tests in the same process such as
+// autonomy.test.ts). Instead hand-construct FAKE WorkflowPorts: registry.run returns a fixed ok
+// result, taskRegistrar maintains abort bindings, journalStore is an in-memory empty impl. The real runWorkflow
+// thus runs to completion without needing LLM or mocks.
+
+import { mkdtemp, rm, writeFile } from 'node:fs/promises'
+import { tmpdir } from 'node:os'
+import { join } from 'node:path'
+import { makeService, __resetWorkflowServiceForTests } from '../service.js'
+import { createProgressBus } from '../progress/bus.js'
+import {
+  createProgressStoreFromBus,
+  type RunProgress,
+} from '../progress/store.js'
+import type {
+  AgentRunResult,
+  ProgressEvent,
+  WorkflowPorts,
+} from '@claude-code-best/workflow-engine'
+
+// Construct FAKE ports: registry.run returns a fixed AgentRunResult, taskRegistrar has bindings,
+// journalStore is an in-memory empty impl. progressEmitter.emit → bus.emit (store subscribes to bus at construction).
+// Note: runWorkflow itself emits run_started/run_done; taskRegistrar only manages abort bindings,
+// does not re-emit events (avoids store reducer receiving duplicate run_done).
+type RegistrarCall =
+  | { kind: 'complete'; runId: string; summary?: string }
+  | { kind: 'fail'; runId: string; error?: string }
+  | { kind: 'kill'; runId: string }
+  | {
+      kind: 'registerAgentAbort'
+      runId: string
+      agentId: number
+      controller: AbortController
+    }
+  | { kind: 'unregisterAgentAbort'; runId: string; agentId: number }
+  | { kind: 'killAgent'; runId: string; agentId: number }
+
+function fakePorts(
+  opts: {
+    /** adapter.run throws (simulates agent backend crash). */
+    adapterThrow?: string
+    /** adapter.run return value (default ok). */
+    adapterResult?: AgentRunResult
+    /** agentRunner.runAgentToResult return value (fallback path, default throws). */
+    runnerResult?: AgentRunResult
+  } = {},
+): {
+  ports: WorkflowPorts
+  store: ReturnType<typeof createProgressStoreFromBus>
+  killed: string[]
+  /** taskRegistrar call records (complete/fail/kill/registerAgentAbort/...). */
+  calls: RegistrarCall[]
+  /** runId → (agentId → AbortController). Used by tests to simulate backend registration. */
+  agentBindings: Map<string, Map<number, AbortController>>
+  /** adapter.run call count (accumulates on retry). holder reference, tests read adapterCalls.value. */
+  adapterCallsRef: { value: number }
+} {
+  const bus = createProgressBus()
+  const store = createProgressStoreFromBus(bus)
+  const killed: string[] = []
+  const calls: RegistrarCall[] = []
+  const bindings = new Map<string, { abort: AbortController }>()
+  // agentId → AbortController (per runId). killAgent uses this to abort precisely.
+  const agentBindings = new Map<string, Map<number, AbortController>>()
+  // adapter.run call count (accumulates on retry). Use holder object to avoid closure/getter
+  // snapshot semantics issues in Bun test runner — when returning, shorthand takes the current value (=0),
+  // subsequent outer variable ++ does not reflect into the returned object field. holder reference is stable.
+  const adapterCallsRef = { value: 0 }
+  let seq = 0
+  const ports = {
+    // hostFactory is not actually called by the service.launch path (service builds its own host handle),
+    // but the WorkflowPorts type requires it to exist; keep a minimal impl.
+    hostFactory: () => ({
+      handle: {} as never,
+      cwd: '/tmp',
+      budgetTotal: null,
+      toolUseId: 'tu',
+    }),
+    agentAdapterRegistry: {
+      resolve: () => ({
+        id: 'claude-code',
+        capabilities: { structuredOutput: true },
+        run:
+          opts.adapterThrow !== undefined
+            ? async (): Promise<AgentRunResult> => {
+                adapterCallsRef.value++
+                throw new Error(opts.adapterThrow)
+              }
+            : async (): Promise<AgentRunResult> => {
+                adapterCallsRef.value++
+                return (
+                  opts.adapterResult ?? {
+                    kind: 'ok',
+                    output: 'mock-out',
+                    usage: { outputTokens: 1 },
+                  }
+                )
+              },
+      }),
+    },
+    agentRunner: {
+      runAgentToResult:
+        opts.runnerResult !== undefined
+          ? async () => opts.runnerResult
+          : async () => {
+              throw new Error('should not reach')
+            },
+    },
+    progressEmitter: {
+      emit: (e: ProgressEvent) => bus.emit(e),
+    },
+    taskRegistrar: {
+      register: ({ workflowName }: { workflowName: string }) => {
+        const abort = new AbortController()
+        seq += 1
+        const runId = `run-${seq}`
+        bindings.set(runId, { abort })
+        agentBindings.set(runId, new Map())
+        return { runId, signal: abort.signal }
+      },
+      complete: (runId: string, summary?: string) => {
+        calls.push({ kind: 'complete', runId, summary })
+      },
+      fail: (runId: string, error?: string) => {
+        calls.push({ kind: 'fail', runId, error })
+      },
+      kill: (runId: string) => {
+        killed.push(runId)
+        calls.push({ kind: 'kill', runId })
+        bindings.get(runId)?.abort.abort()
+      },
+      registerAgentAbort: (
+        runId: string,
+        agentId: number,
+        controller: AbortController,
+      ) => {
+        calls.push({
+          kind: 'registerAgentAbort',
+          runId,
+          agentId,
+          controller,
+        })
+        agentBindings.get(runId)?.set(agentId, controller)
+      },
+      unregisterAgentAbort: (runId: string, agentId: number) => {
+        calls.push({ kind: 'unregisterAgentAbort', runId, agentId })
+        agentBindings.get(runId)?.delete(agentId)
+      },
+      killAgent: (runId: string, agentId: number) => {
+        calls.push({ kind: 'killAgent', runId, agentId })
+        const ac = agentBindings.get(runId)?.get(agentId)
+        if (!ac) return false
+        ac.abort()
+        agentBindings.get(runId)!.delete(agentId)
+        return true
+      },
+      pendingAction: () => null,
+    },
+    journalStore: {
+      read: async () => [],
+      append: async () => {},
+      truncate: async () => {},
+    },
+    permissionGate: { isAborted: () => false },
+    logger: {
+      debug: () => {},
+      event: () => {},
+      warn: () => {},
+    },
+  } as unknown as WorkflowPorts
+  return { ports, store, killed, calls, agentBindings, adapterCallsRef }
+}
+
+const stubTUC = { agentId: 'a1', toolUseId: 'tu' } as never
+const stubCanUseTool = (() => Promise.resolve({ behavior: 'allow' })) as never
+
+/** Wait for detached runWorkflow to complete (detached call, need to drain microtasks/macrotasks). */
+async function settle(): Promise<void> {
+  await new Promise(r => setTimeout(r, 60))
+}
+
+test('launch → completed; store shows this run', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store } = fakePorts()
+  const svc = makeService(ports, store)
+  const { runId } = await svc.launch(
+    { script: `return agent('compute')` },
+    stubTUC,
+    stubCanUseTool,
+  )
+  await settle()
+  const r = svc.getRun(runId)
+  expect(r).toBeDefined()
+  // detached execution may still be running within the settle window, or already completed — both are acceptable.
+  expect(['completed', 'running']).toContain(r!.status)
+  expect(r!.workflowName).toBe('workflow')
+})
+
+test('launch inline script → returns scriptPath (persisted to cwdOverride dir)', async () => {
+  __resetWorkflowServiceForTests()
+  const dir = await mkdtemp(join(tmpdir(), 'wf-svc-'))
+  try {
+    const { ports, store } = fakePorts()
+    const svc = makeService(ports, store, dir)
+    const result = await svc.launch(
+      { script: `return agent('x')` },
+      stubTUC,
+      stubCanUseTool,
+    )
+    expect(result.scriptPath).toBe(
+      join(dir, '.claude', 'workflow-runs', 'run-1', 'script.js'),
+    )
+    const { readFile } = await import('node:fs/promises')
+    expect(await readFile(result.scriptPath!, 'utf-8')).toBe(
+      `return agent('x')`,
+    )
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('kill goes through taskRegistrar.kill', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store, killed } = fakePorts()
+  const svc = makeService(ports, store)
+  const { runId } = await svc.launch(
+    { script: `return agent('x')` },
+    stubTUC,
+    stubCanUseTool,
+  )
+  svc.kill(runId)
+  expect(killed).toContain(runId)
+})
+
+test('killAgent goes through taskRegistrar.killAgent: precisely aborts a single agent', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store, calls, agentBindings } = fakePorts()
+  const svc = makeService(ports, store)
+  const { runId } = await svc.launch(
+    { script: `return agent('x')` },
+    stubTUC,
+    stubCanUseTool,
+  )
+  // simulate backend registering AbortController when launching agent
+  const ac = new AbortController()
+  agentBindings.get(runId)!.set(7, ac)
+  // service.killAgent routes to taskRegistrar.killAgent, which actually aborts the corresponding controller
+  expect(svc.killAgent(runId, 7)).toBe(true)
+  expect(ac.signal.aborted).toBe(true)
+  expect(
+    calls.some(
+      c => c.kind === 'killAgent' && c.runId === runId && c.agentId === 7,
+    ),
+  ).toBe(true)
+  // after abort controller is deleted from Map: calling killAgent on same agent again returns false (idempotent)
+  expect(svc.killAgent(runId, 7)).toBe(false)
+  // unknown agentId / unknown runId safe returns false
+  expect(svc.killAgent(runId, 999)).toBe(false)
+  expect(svc.killAgent('nope', 1)).toBe(false)
+})
+
+test('listRuns/subscribe come from store', () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store } = fakePorts()
+  const svc = makeService(ports, store)
+  expect(svc.listRuns()).toEqual([])
+  let n = 0
+  const unsub = svc.subscribe(() => {
+    n++
+  })
+  expect(typeof unsub).toBe('function')
+  unsub()
+  expect(n).toBe(0)
+})
+
+test('listNamed delegates to namedWorkflows (empty dir → []; with files → lists)', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store } = fakePorts()
+  const svc = makeService(ports, store)
+  // non-existent dir → []
+  const empty = await svc.listNamed(
+    join(tmpdir(), `wf-nope-${Math.random().toString(36).slice(2)}`),
+  )
+  expect(empty).toEqual([])
+  // dir with named files → lists names (extension stripped, sorted)
+  const dir = await mkdtemp(join(tmpdir(), 'wf-named-'))
+  try {
+    await writeFile(
+      join(dir, 'a.ts'),
+      'export const meta = { name: "a", description: "d" }\nreturn 1',
+    )
+    await writeFile(join(dir, 'b.js'), 'return 2')
+    const names = await svc.listNamed(dir)
+    expect(names).toEqual(['a', 'b'])
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('missing script/name/scriptPath → throws', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store } = fakePorts()
+  const svc = makeService(ports, store)
+  await expect(svc.launch({}, stubTUC, stubCanUseTool)).rejects.toThrow(
+    /script|name|scriptPath/,
+  )
+})
+
+test('scriptPath reads file content and validates', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store } = fakePorts()
+  const svc = makeService(ports, store)
+  const dir = await mkdtemp(join(tmpdir(), 'wf-path-'))
+  const file = join(dir, 's.ts')
+  try {
+    await writeFile(file, `return agent('from-file')`)
+    const { runId } = await svc.launch(
+      { scriptPath: file },
+      stubTUC,
+      stubCanUseTool,
+    )
+    await settle()
+    const r = svc.getRun(runId)
+    expect(r).toBeDefined()
+    expect(['completed', 'running']).toContain(r!.status)
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('parseScript validation failed → launch throws', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store } = fakePorts()
+  const svc = makeService(ports, store)
+  // trigger ScriptError: meta literal missing description (validateMeta requires both name+description to be strings)
+  await expect(
+    svc.launch(
+      { script: `export const meta = { name: "x" }\nreturn 1` },
+      stubTUC,
+      stubCanUseTool,
+    ),
+  ).rejects.toThrow(/Script validation failed/i)
+})
+
+// ---- Service-layer failure routing coverage (review gap: .then/.catch → taskRegistrar path) ----
+
+test('script run throws → service routes to taskRegistrar.fail, with error text', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store, calls } = fakePorts()
+  const svc = makeService(ports, store)
+  await svc.launch(
+    { script: `throw new Error('script boom')` },
+    stubTUC,
+    stubCanUseTool,
+  )
+  await settle()
+  const fail = calls.find(c => c.kind === 'fail')
+  expect(fail).toBeDefined()
+  expect(fail?.kind === 'fail' && fail.error).toMatch(/script boom/)
+})
+
+test('adapter throws → retry still throws → degrade to dead → workflow completed (not fail)', async () => {
+  __resetWorkflowServiceForTests()
+  // new semantics: agent non-abort throw → retry once → still throws → degrade to dead (agent returns null),
+  // workflow continues and completes. Retry tolerates transient failures (429/network), but a permanently
+  // broken agent does not break through the entire workflow (consistent with parallel/pipeline null-on-error contract).
+  const { ports, store, calls, adapterCallsRef } = fakePorts({
+    adapterThrow: 'adapter boom',
+  })
+  const svc = makeService(ports, store)
+  await svc.launch({ script: `return agent('x')` }, stubTUC, stubCanUseTool)
+  await settle()
+  // retry once → adapter called 2 times
+  expect(adapterCallsRef.value).toBe(2)
+  // workflow normal completed, not failed
+  const complete = calls.find(c => c.kind === 'complete')
+  expect(complete).toBeDefined()
+  const fail = calls.find(c => c.kind === 'fail')
+  expect(fail).toBeUndefined()
+})
+
+test('script completes normally → service routes to taskRegistrar.complete', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store, calls } = fakePorts()
+  const svc = makeService(ports, store)
+  await svc.launch({ script: `return agent('x')` }, stubTUC, stubCanUseTool)
+  await settle()
+  expect(calls.some(c => c.kind === 'complete')).toBe(true)
+})
+
+// ---- Fix N: shutdown cleanup ----
+
+test('shutdown kills all running runs (taskRegistrar.kill called for each)', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store, killed } = fakePorts()
+  // make adapter slower, so during settle the run is still running
+  const slowPorts = {
+    ...ports,
+    agentAdapterRegistry: {
+      resolve: () => ({
+        id: 'claude-code',
+        capabilities: { structuredOutput: true },
+        run: async (): Promise<AgentRunResult> => {
+          await new Promise(r => setTimeout(r, 200))
+          return { kind: 'ok', output: 'slow', usage: { outputTokens: 1 } }
+        },
+      }),
+    },
+  } as unknown as typeof ports
+  const slowSvc = makeService(slowPorts, store)
+  const { runId: a } = await slowSvc.launch(
+    { script: `return agent('a')` },
+    stubTUC,
+    stubCanUseTool,
+  )
+  const { runId: b } = await slowSvc.launch(
+    { script: `return agent('b')` },
+    stubTUC,
+    stubCanUseTool,
+  )
+  killed.length = 0
+  slowSvc.shutdown()
+  expect(killed).toContain(a)
+  expect(killed).toContain(b)
+})
+
+test('shutdown does not re-kill completed runs; idempotent (multiple calls safe)', async () => {
+  __resetWorkflowServiceForTests()
+  const { ports, store, killed } = fakePorts()
+  const svc = makeService(ports, store)
+  const { runId } = await svc.launch(
+    { script: `return agent('x')` },
+    stubTUC,
+    stubCanUseTool,
+  )
+  await settle() // complete
+  killed.length = 0
+  svc.shutdown()
+  // already completed should not be killed again
+  expect(killed).not.toContain(runId)
+  // idempotent
+  expect(() => svc.shutdown()).not.toThrow()
+})
+
+// ---- Task 5: loadPersistedRuns + getRunAsync fallback ----
+// runsDirProvider is injected as makeService's fourth optional parameter with tmpdir, to avoid writing to the real project dir
+// (Bun ESM module namespace is read-only, cannot monkey-patch getRunsDir).
+
+test('loadPersistedRuns scans disk to hydrate historical runs; existing in-memory runs are not overwritten', async () => {
+  __resetWorkflowServiceForTests()
+  const dir = await mkdtemp(join(tmpdir(), 'wf-svc-'))
+  try {
+    // disk first has two historical runs
+    const { writeRunState } = await import('../persistence.js')
+    const historicalA = {
+      runId: 'hA',
+      workflowName: 'old-A',
+      status: 'completed',
+      phases: [],
+      declaredPhases: [],
+      currentPhase: null,
+      agents: [],
+      agentCount: 1,
+      returnValue: 'a',
+      startedAt: 10,
+      updatedAt: 20,
+    } as RunProgress
+    const historicalB = {
+      runId: 'hB',
+      workflowName: 'old-B',
+      status: 'failed',
+      phases: [],
+      declaredPhases: [],
+      currentPhase: null,
+      agents: [],
+      agentCount: 2,
+      error: 'x',
+      startedAt: 30,
+      updatedAt: 40,
+    } as RunProgress
+    await writeRunState(dir, historicalA)
+    await writeRunState(dir, historicalB)
+
+    const { ports, store } = fakePorts()
+    // in-memory first has one current-session run (via ports.progressEmitter.emit through bus → store)
+    ports.progressEmitter.emit({
+      type: 'run_started',
+      runId: 'live',
+      workflowName: 'live-w',
+      meta: null,
+    })
+    const svc = makeService(ports, store, undefined, () => dir)
+
+    await svc.loadPersistedRuns()
+
+    const ids = svc.listRuns().map(r => r.runId)
+    expect(ids).toContain('hA')
+    expect(ids).toContain('hB')
+    expect(ids).toContain('live')
+    // memory first: live is still running (not overwritten by disk; disk has no live so no STALE injected)
+    expect(svc.getRun('live')!.status).toBe('running')
+    expect(svc.getRun('hA')!.returnValue).toBe('a')
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('loadPersistedRuns repeated calls scan disk only once (persistedLoaded flag)', async () => {
+  __resetWorkflowServiceForTests()
+  const dir = await mkdtemp(join(tmpdir(), 'wf-svc-'))
+  try {
+    const { ports, store } = fakePorts()
+    const svc = makeService(ports, store, undefined, () => dir)
+
+    await svc.loadPersistedRuns()
+    await svc.loadPersistedRuns()
+    await svc.loadPersistedRuns()
+
+    // repeated calls do not throw, do not change listRuns result (empty dir)
+    expect(svc.listRuns()).toEqual([])
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('getRunAsync memory hit → no disk read', async () => {
+  __resetWorkflowServiceForTests()
+  const dir = await mkdtemp(join(tmpdir(), 'wf-svc-'))
+  try {
+    const { ports, store } = fakePorts()
+    const svc = makeService(ports, store, undefined, () => dir)
+    ports.progressEmitter.emit({
+      type: 'run_started',
+      runId: 'live',
+      workflowName: 'w',
+      meta: null,
+    })
+
+    const got = await svc.getRunAsync('live')
+    expect(got?.runId).toBe('live')
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('getRunAsync memory miss + disk hit → returns disk value, and does not inject into memory (subsequent get still reads disk)', async () => {
+  __resetWorkflowServiceForTests()
+  const dir = await mkdtemp(join(tmpdir(), 'wf-svc-'))
+  try {
+    const { writeRunState } = await import('../persistence.js')
+    const historical = {
+      runId: 'hist-only',
+      workflowName: 'old',
+      status: 'completed',
+      phases: [],
+      declaredPhases: [],
+      currentPhase: null,
+      agents: [],
+      agentCount: 0,
+      returnValue: { x: 1 },
+      startedAt: 1,
+      updatedAt: 2,
+    } as RunProgress
+    await writeRunState(dir, historical)
+
+    const { ports, store } = fakePorts()
+    const svc = makeService(ports, store, undefined, () => dir)
+
+    const got = await svc.getRunAsync('hist-only')
+    expect(got?.returnValue).toEqual({ x: 1 })
+    // not injected into memory: in-memory list does not contain (not hydrated)
+    expect(svc.listRuns().map(r => r.runId)).not.toContain('hist-only')
+    // subsequent get still returns (each goes through readRunState fallback)
+    const got2 = await svc.getRunAsync('hist-only')
+    expect(got2?.returnValue).toEqual({ x: 1 })
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
+
+test('getRunAsync memory miss + disk miss → undefined', async () => {
+  __resetWorkflowServiceForTests()
+  const dir = await mkdtemp(join(tmpdir(), 'wf-svc-'))
+  try {
+    const { ports, store } = fakePorts()
+    const svc = makeService(ports, store, undefined, () => dir)
+
+    const got = await svc.getRunAsync('no-such-run')
+    expect(got).toBeUndefined()
+  } finally {
+    await rm(dir, { recursive: true, force: true })
+  }
+})
--- a/src/workflow/tests/status.test.ts
+++ b/src/workflow/tests/status.test.ts
@@ -0,0 +1,88 @@
+import { expect, test } from 'bun:test'
+import type { AgentProgress, RunProgress } from '../progress/store.js'
+import {
+  STATUS_DOT,
+  RUN_STATUS_COLOR,
+  RUN_STATUS_TEXT,
+  PHASE_MARK,
+  PHASE_COLOR,
+  agentVisual,
+  formatTokenCount,
+  agentMetaText,
+} from '../panel/status.js'
+
+test('STATUS_DOT / RUN_STATUS_COLOR / RUN_STATUS_TEXT cover four run states', () => {
+  const statuses: RunProgress['status'][] = [
+    'running',
+    'completed',
+    'failed',
+    'killed',
+  ]
+  for (const s of statuses) {
+    expect(STATUS_DOT[s].length).toBeGreaterThan(0)
+    expect(RUN_STATUS_COLOR[s]).toBeTruthy()
+    expect(RUN_STATUS_TEXT[s].length).toBeGreaterThan(0)
+  }
+  expect(STATUS_DOT.running).toBe('●')
+  expect(STATUS_DOT.completed).toBe('✓')
+  expect(STATUS_DOT.failed).toBe('✗')
+  expect(STATUS_DOT.killed).toBe('■')
+  expect(RUN_STATUS_TEXT.completed).toBe('done')
+  expect(RUN_STATUS_TEXT.running).toBe('running')
+})
+
+test('PHASE_MARK / PHASE_COLOR cover running/done/pending', () => {
+  expect(PHASE_MARK.running).toBe('●')
+  expect(PHASE_MARK.done).toBe('✓')
+  expect(PHASE_MARK.pending).toBe('○')
+  expect(PHASE_COLOR.pending).toBe('subtle')
+})
+
+test('agentVisual: running → ● warning', () => {
+  const a: AgentProgress = { id: 1, status: 'running' }
+  expect(agentVisual(a)).toEqual({ mark: '●', color: 'warning' })
+})
+
+test('agentVisual: done·ok → ✓ success (no longer carries outputShape suffix)', () => {
+  const a: AgentProgress = {
+    id: 1,
+    status: 'done',
+    resultKind: 'ok',
+    outputShape: 'object',
+  }
+  expect(agentVisual(a)).toEqual({ mark: '✓', color: 'success' })
+})
+
+test('agentVisual: dead → ✗ error', () => {
+  const a: AgentProgress = { id: 1, status: 'done', resultKind: 'dead' }
+  expect(agentVisual(a)).toEqual({ mark: '✗', color: 'error' })
+})
+
+test('formatTokenCount: <1000 original value, ≥1000 keeps 1 decimal + k', () => {
+  expect(formatTokenCount(undefined)).toBe('0')
+  expect(formatTokenCount(0)).toBe('0')
+  expect(formatTokenCount(42)).toBe('42')
+  expect(formatTokenCount(1000)).toBe('1.0k')
+  expect(formatTokenCount(22900)).toBe('22.9k')
+})
+
+test('agentMetaText: model · Nk tok · N tool', () => {
+  const a: AgentProgress = {
+    id: 1,
+    status: 'done',
+    model: 'glm-5.2',
+    tokenCount: 22900,
+    toolCount: 1,
+  }
+  expect(agentMetaText(a)).toBe('glm-5.2 · 22.9k tok · 1 tool')
+})
+
+test('agentMetaText: omits prefix when no model', () => {
+  const a: AgentProgress = {
+    id: 1,
+    status: 'running',
+    tokenCount: 500,
+    toolCount: 2,
+  }
+  expect(agentMetaText(a)).toBe('500 tok · 2 tool')
+})
--- a/src/workflow/tests/useWorkflowKeyboard.test.ts
+++ b/src/workflow/tests/useWorkflowKeyboard.test.ts
@@ -0,0 +1,45 @@
+import { expect, test } from 'bun:test'
+import { routeWorkflowKey } from '../panel/useWorkflowKeyboard.js'
+
+test('Tab → nextTab；Shift+Tab → prevTab', () => {
+  expect(routeWorkflowKey('', { tab: true })).toBe('nextTab')
+  expect(routeWorkflowKey('', { tab: true, shift: true })).toBe('prevTab')
+})
+
+test('q / Esc → quit', () => {
+  expect(routeWorkflowKey('q', {})).toBe('quit')
+  expect(routeWorkflowKey('', { escape: true })).toBe('quit')
+})
+
+test('x → killAgent；K → killWorkflow；r → resume；n → newRun', () => {
+  expect(routeWorkflowKey('x', {})).toBe('killAgent')
+  expect(routeWorkflowKey('K', {})).toBe('killWorkflow')
+  expect(routeWorkflowKey('r', {})).toBe('resume')
+  expect(routeWorkflowKey('n', {})).toBe('newRun')
+})
+
+test('confirm mode: y/Enter → confirmYes; n/Esc/q → confirmNo; other keys → null', () => {
+  expect(routeWorkflowKey('y', {}, 'confirm')).toBe('confirmYes')
+  expect(routeWorkflowKey('Y', {}, 'confirm')).toBe('confirmYes')
+  expect(routeWorkflowKey('', { return: true }, 'confirm')).toBe('confirmYes')
+  expect(routeWorkflowKey('n', {}, 'confirm')).toBe('confirmNo')
+  expect(routeWorkflowKey('N', {}, 'confirm')).toBe('confirmNo')
+  expect(routeWorkflowKey('', { escape: true }, 'confirm')).toBe('confirmNo')
+  expect(routeWorkflowKey('q', {}, 'confirm')).toBe('confirmNo')
+  // confirm mode swallows navigation/edit keys, preventing accidental triggers
+  expect(routeWorkflowKey('x', {}, 'confirm')).toBeNull()
+  expect(routeWorkflowKey('', { tab: true }, 'confirm')).toBeNull()
+  expect(routeWorkflowKey('', { upArrow: true }, 'confirm')).toBeNull()
+})
+
+test('←/→ switch focus column; ↑/↓ move within column', () => {
+  expect(routeWorkflowKey('', { leftArrow: true })).toBe('focusLeft')
+  expect(routeWorkflowKey('', { rightArrow: true })).toBe('focusRight')
+  expect(routeWorkflowKey('', { upArrow: true })).toBe('moveUp')
+  expect(routeWorkflowKey('', { downArrow: true })).toBe('moveDown')
+})
+
+test('unrelated input → null', () => {
+  expect(routeWorkflowKey('z', {})).toBeNull()
+  expect(routeWorkflowKey('', {})).toBeNull()
+})
--- a/src/workflow/backends/claudeCodeBackend.ts
+++ b/src/workflow/backends/claudeCodeBackend.ts
@@ -0,0 +1,409 @@
+// Deeply-integrated backend: parses agent/model/tools from the live session, delegates to the core runAgent.
+// Implements the AgentAdapter interface, registered and routed by the registry (U5).
+import {
+  type AgentAdapter,
+  type AgentAdapterContext,
+  type AgentRunParams,
+  type AgentRunResult,
+  WorkflowAbortedError,
+} from '@claude-code-best/workflow-engine'
+import { assembleToolPool } from '../../tools.js'
+import { finalizeAgentTool } from '@claude-code-best/builtin-tools/tools/AgentTool/agentToolUtils.js'
+import { runAgent } from '@claude-code-best/builtin-tools/tools/AgentTool/runAgent.js'
+import {
+  isBuiltInAgent,
+  type AgentDefinition,
+  type BuiltInAgentDefinition,
+} from '@claude-code-best/builtin-tools/tools/AgentTool/loadAgentsDir.js'
+import { createUserMessage, extractTextContent } from '../../utils/messages.js'
+import { getTokenCountFromUsage } from '../../utils/tokens.js'
+import { createHash } from 'node:crypto'
+import { createAgentId } from '../../utils/uuid.js'
+import { logForDebugging } from '../../utils/debug.js'
+import { runWithCwdOverride } from '../../utils/cwd.js'
+import {
+  createAgentWorktree,
+  hasWorktreeChanges,
+  removeAgentWorktree,
+} from '../../utils/worktree.js'
+import { logEvent } from '../../services/analytics/index.js'
+import type { ModelAlias } from '../../utils/model/aliases.js'
+import type { Message } from '../../types/message.js'
+import type { ToolUseContext } from '../../Tool.js'
+import { readHostBundle } from '../hostHandle.js'
+
+/** Fallback definition for workflow subagents (used when agentType does not match a real registry entry). */
+export const WORKFLOW_AGENT: BuiltInAgentDefinition = {
+  agentType: 'workflow-worker',
+  whenToUse: 'subtask dispatched by the agent() hook inside a workflow script',
+  tools: ['*'],
+  source: 'built-in',
+  baseDir: 'built-in',
+  getSystemPrompt: () =>
+    'You are a workflow sub-agent. Complete the task concisely; your final text is the return value relayed to the workflow.',
+}
+
+/** agentType -> real agent registry (use if activeAgents hits, otherwise fallback). Exported for unit test coverage. */
+export function resolveAgentDefinition(
+  agentType: string | undefined,
+  toolUseContext: ToolUseContext,
+): AgentDefinition {
+  if (!agentType) return WORKFLOW_AGENT
+  const found = toolUseContext.options.agentDefinitions.activeAgents.find(
+    a => a.agentType === agentType,
+  )
+  return found ?? WORKFLOW_AGENT
+}
+
+/** model alias -> the actual model id of the current provider. v1 passes it through directly (keeps a mapping extension point). Exported for unit test coverage. */
+export function mapWorkflowModel(
+  model: string | undefined,
+): string | undefined {
+  return model
+}
+
+/**
+ * Extract the JSON object produced under schema mode from the agent's final message; returns null on failure. Exported for unit test coverage.
+ *
+ * Robustness strategy (in priority order, returns the first that successfully parses):
+ * 1. fenced code block (```json ... ``` or ``` ... ```) - agents often spontaneously add fences
+ * 2. the first "brace-balanced" {...} fragment in the bare text - handles preceding/trailing narration / multi-segment output
+ *
+ * Uses a brace-stack scan instead of `indexOf('{')..lastIndexOf('}')`: correctly handles nested objects,
+ * `{}` inside string literals, and escape characters. Will not concatenate multiple unrelated JSON fragments (the original version did).
+ *
+ * Does not do syntax repair (trailing commas, single quotes -> double quotes, comment removal) - agents do not produce non-standard JSON,
+ * and fixing it may instead cause wrong edits inside strings (e.g. `"http://..."` getting eaten by a // comment regex).
+ * On parse failure it directly skips to the next candidate.
+ *
+ * Only returns a plain object (typeof === 'object' && !null && !Array);
+ * the schema mode contract is object, array/number/string are all treated as the agent going off-track.
+ */
+export function extractStructuredOutput(
+  content: Array<{ type: string; text?: string }>,
+): unknown | null {
+  for (const block of content) {
+    if (block.type !== 'text' || !block.text) continue
+    const found = findFirstJsonObject(block.text)
+    if (found !== null) return found
+  }
+  return null
+}
+
+/** Find the first JSON fragment in text that can be parsed as a plain object. */
+function findFirstJsonObject(text: string): unknown | null {
+  // 1. fenced code blocks - priority (agents naturally tend to add them; strip the fence and parse the whole block)
+  for (const m of text.matchAll(
+    /```[\t ]*[a-zA-Z0-9_-]*\s*\n([\s\S]*?)\n?```/g,
+  )) {
+    const parsed = tryParseObject(m[1] ?? '')
+    if (parsed !== null) return parsed
+  }
+  // 2. bare text: scan each '{', find a balanced pair and try parse
+  for (let i = 0; i < text.length; i++) {
+    if (text[i] !== '{') continue
+    const end = findBalancedObjectEnd(text, i)
+    if (end < 0) continue
+    const parsed = tryParseObject(text.slice(i, end + 1))
+    if (parsed !== null) return parsed
+  }
+  return null
+}
+
+/**
+ * Find the matching `}` index starting from start (which must be `{`); returns -1 when unbalanced.
+ * Skips braces inside string literals and escape characters. Does not skip comments (the JSON standard does not allow comments,
+ * agents do not produce them; doing so is a risk - see the function doc).
+ */
+function findBalancedObjectEnd(text: string, start: number): number {
+  let depth = 0
+  let inString = false
+  for (let i = start; i < text.length; i++) {
+    const c = text[i]
+    if (inString) {
+      if (c === '\\')
+        i++ // skip the escape char and the next character
+      else if (c === '"') inString = false
+      continue
+    }
+    if (c === '"') inString = true
+    else if (c === '{') depth++
+    else if (c === '}') {
+      depth--
+      if (depth === 0) return i
+    }
+  }
+  return -1
+}
+
+/** try parse the candidate; only returns a plain object, others (array/number/null) return null. */
+function tryParseObject(candidate: string): unknown | null {
+  const trimmed = candidate.trim()
+  if (!trimmed.startsWith('{') || !trimmed.endsWith('}')) return null
+  try {
+    const v = JSON.parse(trimmed)
+    return typeof v === 'object' && v !== null && !Array.isArray(v) ? v : null
+  } catch {
+    return null
+  }
+}
+
+type WorkflowWorktreeInfo = Awaited<ReturnType<typeof createAgentWorktree>>
+
+/**
+ * Generate a slug for the worktree isolation of a workflow agent: derive hex segments from sha256(runId:agentId),
+ * matching the cleanup regex of cleanupStaleAgentWorktrees `^wf_[0-9a-f]{8}-[0-9a-f]{3}-\d+$`.
+ * taskId is `w`+base36 (not a UUID), so runId cannot be placed directly into the regex segment; sha256 is a deterministic mapping,
+ * and agentId ensures slug uniqueness for multiple agents under the same runId (no shared counter, no thread safety issues).
+ */
+function makeWorkflowWorktreeSlug(runId: string, agentId: string): string {
+  const h = createHash('sha256').update(`${runId}:${agentId}`).digest('hex')
+  return `wf_${h.slice(0, 8)}-${h.slice(8, 11)}-${parseInt(h.slice(11, 17), 16) % 100000}`
+}
+
+/**
+ * Clean up the worktree after the agent finishes: hookBased keeps it (cannot detect VCS changes); otherwise uses
+ * hasWorktreeChanges (fail-closed) to detect, auto-removes when there is no change, keeps it on change/detection failure
+ * and logs the path (v1 uses logs rather than extending AgentRunResult, to avoid touching journal serialization).
+ */
+async function cleanupWorkflowWorktree(
+  info: WorkflowWorktreeInfo,
+  agentType: string,
+): Promise<void> {
+  if (info.hookBased || !info.headCommit) return
+  let changed = true
+  try {
+    changed = await hasWorktreeChanges(info.worktreePath, info.headCommit)
+  } catch (e) {
+    logForDebugging(
+      `workflow worktree change-detect failed (${agentType}): ${(e as Error).message}`,
+    )
+    changed = true
+  }
+  if (!changed) {
+    try {
+      await removeAgentWorktree(
+        info.worktreePath,
+        info.worktreeBranch,
+        info.gitRoot,
+      )
+    } catch (e) {
+      logForDebugging(
+        `workflow worktree remove failed (${agentType}): ${(e as Error).message}`,
+      )
+    }
+  } else {
+    logForDebugging(
+      `workflow worktree retained (has changes, ${agentType}): ${info.worktreePath}`,
+    )
+  }
+}
+
+/** Deeply-integrated backend: parses agent/model/tools from the live session, delegates to the core runAgent. */
+export const claudeCodeBackend: AgentAdapter = {
+  id: 'claude-code',
+  capabilities: { structuredOutput: true, tools: true },
+
+  async run(
+    params: AgentRunParams,
+    ctx: AgentAdapterContext,
+  ): Promise<AgentRunResult> {
+    const { toolUseContext, canUseTool } = readHostBundle(ctx.host)
+    const appState = toolUseContext.getAppState()
+    const agentDef = resolveAgentDefinition(params.agentType, toolUseContext)
+    const model = mapWorkflowModel(params.model)
+    // coreAgentId: the tracking ID for the core-layer subagent (a string, used inside runAgent).
+    // Different from ctx.agentId (the engine's number seq, used for panel / killAgent routing) - two distinct concepts, must not be mixed up.
+    const coreAgentId = createAgentId()
+
+    // isolation:'worktree' - run the agent inside an independent git worktree, so concurrent writes do not conflict.
+    let worktreeInfo: WorkflowWorktreeInfo | null = null
+    if (params.isolation === 'worktree') {
+      try {
+        worktreeInfo = await createAgentWorktree(
+          makeWorkflowWorktreeSlug(ctx.runId, coreAgentId),
+        )
+      } catch (e) {
+        // fail-closed: when isolation fails, do not silently fall back to a shared cwd (otherwise concurrent writes race on data)
+        const detail = (e as Error).message
+        logForDebugging(
+          `workflow worktree creation failed (${agentDef.agentType}): ${detail}`,
+        )
+        return { kind: 'dead', reason: 'worktree-failed', detail }
+      }
+    }
+    // runWithCwdOverride makes tools such as Bash/Read inside the agent see the worktree path
+    // (AsyncLocalStorage is preserved across awaits); the worktreePath parameter of runAgent only writes metadata.
+    const runInCwd = worktreeInfo
+      ? <T>(fn: () => T): T =>
+          runWithCwdOverride(worktreeInfo!.worktreePath, fn)
+      : <T>(fn: () => T): T => fn()
+
+    // Bridge ctx.signal -> runAgent.override.abortController. Otherwise, when the workflow is killed
+    // runAgent is unaware (root cause of 'x' being ineffective): the abort signal cannot reach the internal fetch, and the agent runs to completion.
+    // Single-agent kill goes through service.kill(runId, agentId) -> ports.taskRegistrar.killAgent ->
+    // agentAbortControllers.get(agentId).abort(); the same controller takes over both paths.
+    const agentAbort = new AbortController()
+    const onParentAbort = (): void => agentAbort.abort()
+    if (ctx.signal.aborted) {
+      agentAbort.abort()
+    } else {
+      ctx.signal.addEventListener('abort', onParentAbort, { once: true })
+    }
+    if (typeof ctx.registerAgentAbort === 'function') {
+      ctx.registerAgentAbort(ctx.agentId, agentAbort)
+    }
+
+    const workerPermissionContext = {
+      ...appState.toolPermissionContext,
+      mode: agentDef.permissionMode ?? 'acceptEdits',
+    }
+    const workerTools = assembleToolPool(
+      workerPermissionContext,
+      appState.mcp.tools,
+    )
+
+    // schema -> instructs the agent to directly emit JSON in the final text block.
+    // Does not require calling the StructuredOutput tool - it is not in the workflow subagent's tool set (only
+    // the stop_hook path explicitly injects it; workflow goes through assembleToolPool whose default pool does not include it).
+    // Historically the prompt required "call StructuredOutput tool", causing 8/12 agents to refuse to wrap up or struggle to call it;
+    // empirically the main cause of dead is the tool being unreachable rather than "forgetting". Change the contract: raw JSON text, extractStructuredOutput
+    // tolerates fenced fences + preceding/trailing narration + multiple segments.
+    const promptText = params.schema
+      ? [
+          params.prompt,
+          '',
+          'After completing the task, emit your final answer as a single JSON object matching this JSON Schema:',
+          '```json',
+          JSON.stringify(params.schema, null, 2),
+          '```',
+          '',
+          'CRITICAL RULES:',
+          '- The JSON object must be the LAST text block in your response. Do not write any prose after it.',
+          '- Emit the JSON as plain text (markdown code fences optional).',
+          '- Do NOT call any "StructuredOutput" or "SyntheticOutput" tool — it is not available in this environment.',
+          '- Your turn must end with the JSON object. Anything after it (prose, tool calls) will be ignored or cause your answer to be discarded.',
+        ].join('\n')
+      : params.prompt
+
+    const promptMessages = [createUserMessage({ content: promptText })]
+    const messages: Message[] = []
+    const startTime = Date.now()
+    // Accumulate running progress (onProgress push -> agent_progress event -> panel refreshes token/tool in real time).
+    let tokenCount = 0
+    let toolCount = 0
+
+    try {
+      await runInCwd(async () => {
+        for await (const msg of runAgent({
+          agentDefinition: agentDef,
+          promptMessages,
+          toolUseContext,
+          canUseTool,
+          isAsync: true,
+          querySource: toolUseContext.options.querySource ?? 'workflow',
+          availableTools: workerTools,
+          // override the same object: coreAgentId (core subagent tracking) + abortController (kill bridge).
+          // runAgent's model is the top-level ModelAlias; workflow's model is an arbitrary alias string,
+          // the types are incompatible and resolved by the provider layer at runtime. Passes through via double assertion (better than as any/never).
+          override: { agentId: coreAgentId, abortController: agentAbort },
+          ...(model ? { model: model as unknown as ModelAlias } : {}),
+          ...(worktreeInfo ? { worktreePath: worktreeInfo.worktreePath } : {}),
+        })) {
+          messages.push(msg as Message)
+          // Accumulate running progress: assistant message carries usage (cumulative value -> overwrite), tool_use inside content (incremental).
+          if (msg.type === 'assistant' && msg.message) {
+            const usage = msg.message.usage as
+              | Parameters<typeof getTokenCountFromUsage>[0]
+              | undefined
+            if (usage) tokenCount = getTokenCountFromUsage(usage)
+            const content = msg.message.content as
+              | Array<{ type: string }>
+              | undefined
+            if (content)
+              toolCount += content.filter(b => b.type === 'tool_use').length
+          }
+          ctx.onProgress?.({ tokenCount, toolCount })
+        }
+      })
+    } catch (e) {
+      // abort (kill workflow / kill agent): must rethrow WorkflowAbortedError after detection,
+      // otherwise hooks.agent will swallow the abort as an ordinary failure into dead, and the workflow won't know it was killed
+      // (the other side of the 'x' kill path being ineffective: the signal did arrive, but the result was disguised as a normal completion).
+      if (agentAbort.signal.aborted || (e as Error)?.name === 'AbortError') {
+        throw new WorkflowAbortedError()
+      }
+      const detail = (e as Error).message
+      logForDebugging(
+        `workflow sub-agent error (${agentDef.agentType}): ${detail}`,
+      )
+      logEvent('tengu_workflow_agent', { ok: 0 })
+      return { kind: 'dead', reason: 'runagent-threw', detail }
+    } finally {
+      // cleanup (idempotent): listener removeEventListener / Map.delete are safe to call repeatedly.
+      if (typeof ctx.unregisterAgentAbort === 'function') {
+        ctx.unregisterAgentAbort(ctx.agentId)
+      }
+      ctx.signal.removeEventListener('abort', onParentAbort)
+      if (worktreeInfo) {
+        const info = worktreeInfo
+        worktreeInfo = null
+        await cleanupWorkflowWorktree(info, agentDef.agentType)
+      }
+    }
+
+    const finalized = finalizeAgentTool(messages, coreAgentId, {
+      prompt: params.prompt,
+      resolvedAgentModel: toolUseContext.options.mainLoopModel,
+      isBuiltInAgent: isBuiltInAgent(agentDef),
+      startTime,
+      agentType: agentDef.agentType,
+      isAsync: true,
+    })
+    const outputTokens =
+      finalized.usage?.output_tokens ?? finalized.totalTokens ?? 0
+    // For panel display: total context tokens, tool-call count, parsed model id at completion.
+    const finalTokenCount = finalized.totalTokens ?? 0
+    const finalToolCount = finalized.totalToolUseCount ?? 0
+    const resolvedModel = model ?? toolUseContext.options.mainLoopModel
+    logEvent('tengu_workflow_agent', { ok: 1, outputTokens })
+
+    if (params.schema) {
+      const structured = extractStructuredOutput(finalized.content)
+      if (structured === null) {
+        // The agent finished all tool calls but no plain-object JSON was found in the final text block.
+        // Typical scenarios: forgot to emit JSON after a long tool chain, unbalanced JSON nesting, parse failure.
+        // Put a preview of the last text into detail so the hooks retry log and the panel can immediately see what the agent actually said.
+        const preview = extractTextContent(finalized.content, '\n').slice(
+          0,
+          200,
+        )
+        logForDebugging(
+          `workflow sub-agent produced no JSON object (${agentDef.agentType}); preview: ${preview}`,
+        )
+        return {
+          kind: 'dead',
+          reason: 'no-structured-output',
+          detail: preview,
+        }
+      }
+      return {
+        kind: 'ok',
+        output: structured as object,
+        usage: { outputTokens },
+        model: resolvedModel,
+        toolCount: finalToolCount,
+        tokenCount: finalTokenCount,
+      }
+    }
+    const text = extractTextContent(finalized.content, '\n')
+    return {
+      kind: 'ok',
+      output: text,
+      usage: { outputTokens },
+      model: resolvedModel,
+      toolCount: finalToolCount,
+      tokenCount: finalTokenCount,
+    }
+  },
+}
--- a/src/workflow/hostHandle.ts
+++ b/src/workflow/hostHandle.ts
@@ -0,0 +1,42 @@
+import {
+  createHostHandle,
+  unwrapHostHandle,
+  type HostHandle,
+} from '@claude-code-best/workflow-engine'
+import type { CanUseToolFn } from '../hooks/useCanUseTool.js'
+import type { AssistantMessage } from '../types/message.js'
+import type { AgentId } from '../types/ids.js'
+import type { ToolUseContext } from '../Tool.js'
+
+/** Opaque bundle held inside HostHandle (unpacked on the core side). */
+export type WorkflowHostBundle = {
+  toolUseContext: ToolUseContext
+  canUseTool: CanUseToolFn
+  parentMessage?: AssistantMessage
+  agentId?: AgentId
+}
+
+/**
+ * Shared: builds the host bundle from toolUseContext/canUseTool.
+ * parentMessage is optional (absent on the panel launch path — claudeCodeBackend never reads it).
+ */
+export function buildHostBundle(
+  toolUseContext: WorkflowHostBundle['toolUseContext'],
+  canUseTool: WorkflowHostBundle['canUseTool'],
+  parentMessage?: AssistantMessage,
+): WorkflowHostBundle {
+  return {
+    toolUseContext,
+    canUseTool,
+    ...(parentMessage !== undefined ? { parentMessage } : {}),
+    agentId: toolUseContext.agentId,
+  }
+}
+
+export function makeHostHandle(bundle: WorkflowHostBundle): HostHandle {
+  return createHostHandle(bundle)
+}
+
+export function readHostBundle(handle: HostHandle): WorkflowHostBundle {
+  return unwrapHostHandle(handle) as WorkflowHostBundle
+}
--- a/src/workflow/namedWorkflowCommands.ts
+++ b/src/workflow/namedWorkflowCommands.ts
@@ -0,0 +1,34 @@
+import { join } from 'node:path'
+import {
+  listNamedWorkflows,
+  WORKFLOW_DIR_NAME,
+} from '@claude-code-best/workflow-engine'
+import type { Command } from '../types/command.js'
+import { getProjectRoot } from '../bootstrap/state.js'
+
+/** Scan *.ts|*.js|*.mjs under .claude/workflows/ and generate a /<name> command for each. */
+export async function getWorkflowCommands(
+  cwd: string = getProjectRoot(),
+): Promise<Command[]> {
+  const dir = join(cwd, WORKFLOW_DIR_NAME)
+  const names = await listNamedWorkflows(dir)
+  return names.map(name => ({
+    type: 'prompt',
+    name,
+    description: `Run workflow: ${name}`,
+    kind: 'workflow',
+    source: 'builtin',
+    progressMessage: `Running workflow ${name}...`,
+    contentLength: 0,
+    async getPromptForCommand(args, _context) {
+      const argText =
+        typeof args === 'string' && args ? `\n\nArguments: ${args}` : ''
+      return [
+        {
+          type: 'text',
+          text: `Run the "${name}" workflow now by calling the Workflow tool with name="${name}".${argText}`,
+        },
+      ]
+    },
+  }))
+}
--- a/src/workflow/notifications.ts
+++ b/src/workflow/notifications.ts
@@ -0,0 +1,88 @@
+/**
+ * Bridge for workflow status-change notifications.
+ *
+ * The engine emits events via progressEmitter.emit({ type: 'run_done', ... }),
+ * and the progress/store reducer records the status into RunProgress. But the
+ * old implementation had no code bridging status transitions to the host
+ * notification mechanism — the "notifies automatically on completion" promise
+ * in WorkflowTool's return text went unfulfilled.
+ *
+ * This module subscribes to WorkflowService.subscribe, watches status transitions
+ * from running → completed/failed/killed, and emits a host notification via the
+ * injected notifier callback (defaults to enqueuePendingNotification task-notification mode).
+ */
+import {
+  STATUS_TAG,
+  SUMMARY_TAG,
+  TASK_ID_TAG,
+  TASK_NOTIFICATION_TAG,
+  TASK_TYPE_TAG,
+} from '../constants/xml.js'
+import { enqueuePendingNotification } from '../utils/messageQueueManager.js'
+import type { RunProgress } from './progress/store.js'
+import type { WorkflowService } from './service.js'
+
+const WORKFLOW_TASK_TYPE = 'local_workflow'
+
+/** Notifier abstraction (lets tests inject a spy). */
+export type WorkflowNotifier = (message: string) => void
+
+const TERMINAL_STATUSES: ReadonlySet<RunProgress['status']> = new Set([
+  'completed',
+  'failed',
+  'killed',
+])
+
+/** Default notifier: uses the host message queue's task-notification mode. */
+const defaultNotifier: WorkflowNotifier = message => {
+  enqueuePendingNotification({ value: message, mode: 'task-notification' })
+}
+
+export function installWorkflowNotifications(
+  service: WorkflowService,
+  notify: WorkflowNotifier = defaultNotifier,
+): () => void {
+  const prevStatus = new Map<string, RunProgress['status'] | undefined>()
+
+  const unsubscribe = service.subscribe(() => {
+    const runs = service.listRuns()
+    for (const run of runs) {
+      const prev = prevStatus.get(run.runId)
+      // First time seeing this run: just record the current status without notifying
+      // (avoids treating existing historical runs as new notifications on install)
+      if (prev === undefined) {
+        prevStatus.set(run.runId, run.status)
+        continue
+      }
+      // Status changed + entered terminal state → emit notification
+      if (prev !== run.status && TERMINAL_STATUSES.has(run.status)) {
+        notify(buildMessage(run))
+      }
+      prevStatus.set(run.runId, run.status)
+    }
+  })
+
+  return () => {
+    unsubscribe()
+    prevStatus.clear()
+  }
+}
+
+function buildMessage(run: RunProgress): string {
+  const statusText =
+    run.status === 'completed'
+      ? 'completed successfully'
+      : run.status === 'failed'
+        ? 'failed'
+        : 'was stopped'
+  const errorSuffix =
+    run.status === 'failed' && run.error ? `: ${run.error}` : ''
+  const summary = `Workflow "${run.workflowName}" ${statusText}${errorSuffix}`
+
+  return `<${TASK_NOTIFICATION_TAG}>
+<${TASK_ID_TAG}>${run.runId}</${TASK_ID_TAG}>
+<${TASK_TYPE_TAG}>${WORKFLOW_TASK_TYPE}</${TASK_TYPE_TAG}>
+<${STATUS_TAG}>${run.status}</${STATUS_TAG}>
+<${SUMMARY_TAG}>${summary}</${SUMMARY_TAG}>
+</${TASK_NOTIFICATION_TAG}>`
+}
--- a/src/workflow/panel/AgentList.tsx
+++ b/src/workflow/panel/AgentList.tsx
@@ -0,0 +1,71 @@
+import React from 'react';
+import { Box, Text, useAnimationFrame } from '@anthropic/ink';
+import type { Theme } from '@anthropic/ink';
+import type { AgentProgress } from '../progress/store.js';
+import { agentMetaText, agentVisual } from './status.js';
+
+const SPINNER_FRAMES = ['·', '✢', '✱', '✶', '✻', '✽'];
+const FRAME_MS = 120;
+const LABEL_MAX = 18;
+
+/**
+ * Truncate the label to at most max characters. Preserves the trailing `#number` suffix (the audit workflow
+ * `verify:${dim}#${findingIdx}` format) - so verify agent labels with multiple findings under the same dimension
+ * stay distinguishable (the prefix is elided with `…`). When there is no suffix, truncates from the right (legacy behavior).
+ * Exported for unit test coverage.
+ */
+export function truncateLabel(raw: string, max: number): string {
+  if (raw.length <= max) return raw;
+  const m = raw.match(/#\d+$/);
+  if (!m) return raw.slice(0, max);
+  const suffix = m[0]; // includes the # sign
+  const prefix = raw.slice(0, raw.length - suffix.length);
+  const available = max - suffix.length - 1; // -1 reserved for …
+  return `${prefix.slice(0, available)}…${suffix}`;
+}
+
+/**
+ * Right-side agent list (already filtered by the selected phase).
+ * Selected row: only when this column has focus (focused=true) does it paint a selectionBg background (keeps fg, not inverse color);
+ * when focus is not on this column it does not paint the background color, to avoid a "fake focus".
+ * The status mark of a running agent is driven by useAnimationFrame via a spinner animation (shared clock, globally synchronized);
+ * the right side `model · Nk tok · N tool` is refreshed in real time by agent_progress / agent_done.
+ */
+export function AgentList({
+  agents,
+  selectedIndex,
+  focused,
+}: {
+  agents: AgentProgress[];
+  selectedIndex: number;
+  focused: boolean;
+}): React.ReactNode {
+  // Subscribe once to the animation frame at the top level: all running agents share the same frame (synchronized animation, avoids a per-row hook).
+  const [ref, time] = useAnimationFrame(FRAME_MS);
+  const frame = SPINNER_FRAMES[Math.floor(time / FRAME_MS) % SPINNER_FRAMES.length];
+
+  if (agents.length === 0) {
+    return <Text color="subtle">(no agents in this phase)</Text>;
+  }
+  return (
+    <Box ref={ref} flexDirection="column">
+      {agents.map((a, i) => {
+        const v = agentVisual(a);
+        const selected = i === selectedIndex;
+        const highlighted = selected && focused;
+        const running = a.status === 'running';
+        const mark = running ? frame : v.mark;
+        const label = truncateLabel(a.label ?? `agent-${a.id}`, LABEL_MAX);
+        return (
+          <Box key={a.id} backgroundColor={highlighted ? 'selectionBg' : undefined} justifyContent="space-between">
+            <Box>
+              <Text color={v.color as keyof Theme}>{mark}</Text>
+              <Text> {label}</Text>
+            </Box>
+            <Text color="subtle">{agentMetaText(a)}</Text>
+          </Box>
+        );
+      })}
+    </Box>
+  );
+}
--- a/src/workflow/panel/PhaseSidebar.tsx
+++ b/src/workflow/panel/PhaseSidebar.tsx
@@ -0,0 +1,65 @@
+import React from 'react';
+import { Box, Text, useAnimationFrame } from '@anthropic/ink';
+import type { Theme } from '@anthropic/ink';
+import type { AgentProgress } from '../progress/store.js';
+import { PHASE_COLOR, PHASE_MARK, type PhaseStatus } from './status.js';
+import { ALL_PHASE, type MergedPhase } from './selectors.js';
+
+const SPINNER_FRAMES = ['·', '✢', '✱', '✶', '✻', '✽'];
+const FRAME_MS = 120;
+
+type PhaseRow = {
+  title: string;
+  status?: PhaseStatus;
+  done: number;
+  total: number;
+};
+
+/**
+ * Left phase sidebar: the first row is All (aggregating done/total), followed by the merged phases (including pending ○).
+ * Selected row: only when this column has focus (focused=true) does it paint a selectionBg background (keeps fg, not inverse color) + a `>` marker;
+ * when focus is not on this column it does not paint the background color, to avoid a "fake focus". The status mark of a running phase is driven by useAnimationFrame via a spinner animation.
+ * Style aligns with the reference image: `> ✓ Scan  3/3`.
+ */
+export function PhaseSidebar({
+  phases,
+  agents,
+  selectedIndex,
+  focused,
+}: {
+  phases: MergedPhase[];
+  agents: AgentProgress[];
+  selectedIndex: number;
+  focused: boolean;
+}): React.ReactNode {
+  const [ref, time] = useAnimationFrame(FRAME_MS);
+  const frame = SPINNER_FRAMES[Math.floor(time / FRAME_MS) % SPINNER_FRAMES.length];
+  const totalAgents = agents.length;
+  const doneAgents = agents.filter(a => a.status === 'done').length;
+  const rows: PhaseRow[] = [{ title: ALL_PHASE, done: doneAgents, total: totalAgents }, ...phases];
+
+  return (
+    <Box ref={ref} flexDirection="column">
+      {rows.map((row, i) => {
+        const selected = i === selectedIndex;
+        const highlighted = selected && focused;
+        const running = row.status === 'running';
+        const mark = running ? frame : row.status ? PHASE_MARK[row.status] : ' ';
+        const color = (row.status ? PHASE_COLOR[row.status] : 'subtle') as keyof Theme;
+        return (
+          <Box key={row.title} backgroundColor={highlighted ? 'selectionBg' : undefined} justifyContent="space-between">
+            <Box>
+              <Text color={selected ? 'claude' : undefined}>{highlighted ? '>' : ' '}</Text>
+              <Text> </Text>
+              <Text color={color}>{mark}</Text>
+              <Text> {row.title}</Text>
+            </Box>
+            <Text color="subtle">
+              {row.done}/{row.total}
+            </Text>
+          </Box>
+        );
+      })}
+    </Box>
+  );
+}
--- a/src/workflow/panel/TabsBar.tsx
+++ b/src/workflow/panel/TabsBar.tsx
@@ -0,0 +1,37 @@
+import React from 'react';
+import { Box, Text } from '@anthropic/ink';
+import type { Theme } from '@anthropic/ink';
+import type { RunProgress } from '../progress/store.js';
+import { RUN_STATUS_COLOR, STATUS_DOT } from './status.js';
+import { tabLabel } from './selectors.js';
+
+/**
+ * Top run tab row: one tab per run (status dot + name + #short code).
+ * The current tab is highlighted with an orange ═ underline.
+ */
+export function TabsBar({ runs, activeRunId }: { runs: RunProgress[]; activeRunId: string | null }): React.ReactNode {
+  if (runs.length === 0) {
+    return <Text color="subtle">(no runs)</Text>;
+  }
+  return (
+    <Box>
+      {runs.map(r => {
+        const active = r.runId === activeRunId;
+        const label = tabLabel(r.workflowName, r.runId);
+        const underline = '═'.repeat(label.length + 2);
+        return (
+          <Box key={r.runId} flexDirection="column" marginRight={2}>
+            <Box>
+              <Text color={RUN_STATUS_COLOR[r.status] as keyof Theme}>{STATUS_DOT[r.status]}</Text>
+              <Text> </Text>
+              <Text color={active ? 'claude' : undefined} bold={active}>
+                {label}
+              </Text>
+            </Box>
+            <Text color={active ? 'claude' : undefined}>{active ? underline : ''}</Text>
+          </Box>
+        );
+      })}
+    </Box>
+  );
+}
--- a/src/workflow/panel/WorkflowsPanel.tsx
+++ b/src/workflow/panel/WorkflowsPanel.tsx
@@ -0,0 +1,283 @@
+import React, { useEffect, useRef, useState, useSyncExternalStore } from 'react';
+import { Box, Dialog, Text, useAnimationFrame } from '@anthropic/ink';
+import type { Theme } from '@anthropic/ink';
+import type { LocalJSXCommandContext, LocalJSXCommandOnDone } from '../../types/command.js';
+import { getWorkflowService } from '../service.js';
+import type { RunProgress } from '../progress/store.js';
+import { AgentList } from './AgentList.js';
+import { PhaseSidebar } from './PhaseSidebar.js';
+import { TabsBar } from './TabsBar.js';
+import { RUN_STATUS_COLOR, RUN_STATUS_TEXT } from './status.js';
+import { type FocusColumn, type WorkflowKeyboardHandlers, useWorkflowKeyboard } from './useWorkflowKeyboard.js';
+import { ALL_PHASE, filterAgentsByPhase, formatDuration, mergePhases } from './selectors.js';
+
+/**
+ * Clamp the selected index to a valid range (empty list -> 0; out of range -> last position; negative/NaN -> 0).
+ * Extracted into a module-level pure function: called inside the panel + unit tested for the same logic, to avoid behavior drift.
+ */
+export function clampSelected(selected: number, len: number): number {
+  if (len === 0) return 0;
+  const n = Math.trunc(selected);
+  if (Number.isNaN(n) || n < 0) return 0;
+  return Math.min(n, len - 1);
+}
+
+/**
+ * Determine whether the focused run completed the running -> terminal state transition (used for panel auto-exit).
+ * Extracted into a pure function for easy unit testing; called directly inside the panel's useEffect.
+ *
+ * Trigger condition: prev and curr are the same runId, prev is running, curr is completed/failed/killed.
+ * - Opening the history panel (prev=null): does not trigger
+ * - Switching to an already completed tab (different runId): does not trigger
+ * - Same run running -> terminal: triggers
+ */
+export function isRunTerminatedTransition(
+  prev: { runId: string; status: RunProgress['status'] } | null,
+  curr: { runId: string; status: RunProgress['status'] } | null,
+): boolean {
+  if (!prev || !curr) return false;
+  if (prev.runId !== curr.runId) return false;
+  if (prev.status !== 'running') return false;
+  return curr.status === 'completed' || curr.status === 'failed' || curr.status === 'killed';
+}
+
+/**
+ * /workflows main panel: three-region focus model (top tab + left phase sidebar + right agent list).
+ *
+ * - useSyncExternalStore subscribes to WorkflowService (the store returns stable snapshots, no re-render without change).
+ * - Focus state: activeRunId / focusColumn('phases'|'agents') / selectedPhaseIndex(0=All) / selectedAgentIndex.
+ * - Keybindings: Tab switch run · Left/Right switch focus column · Up/Down move within column · x kill · r resume · q/Esc quit.
+ */
+export function WorkflowsPanel({
+  onDone,
+  context,
+}: {
+  onDone: LocalJSXCommandOnDone;
+  context: LocalJSXCommandContext;
+}): React.ReactNode {
+  const svc = getWorkflowService();
+  const runs = useSyncExternalStore(
+    svc.subscribe,
+    () => svc.listRuns(),
+    () => [],
+  );
+
+  const [activeRunId, setActiveRunId] = useState<string | null>(null);
+  const [focusColumn, setFocusColumn] = useState<FocusColumn>('phases');
+  const [selectedPhaseIndex, setSelectedPhaseIndex] = useState(0);
+  const [selectedAgentIndex, setSelectedAgentIndex] = useState(0);
+  // kill secondary confirmation. null = no dialog; 'workflow' = kill the whole run; 'agent' = kill the currently selected agent.
+  // When non-null the keyboard enters confirm mode (only y/Enter/n/Esc/q respond).
+  const [confirmKill, setConfirmKill] = useState<null | 'agent' | 'workflow'>(null);
+
+  // On mount, trigger a single disk scan to hydrate historical runs (the service's internal persistedLoaded flag guards idempotency).
+  // Re-mount / re-render does not scan again (guarded by the process-singleton flag). The svc reference is stable (getWorkflowService singleton).
+  useEffect(() => {
+    void svc.loadPersistedRuns();
+  }, [svc]);
+
+  // On runs change: activeRunId invalidated (killed / first time) -> clamp to the first one
+  useEffect(() => {
+    if (runs.length === 0) {
+      if (activeRunId !== null) setActiveRunId(null);
+      return;
+    }
+    if (!runs.some(r => r.runId === activeRunId)) {
+      setActiveRunId(runs[0]!.runId);
+    }
+  }, [runs, activeRunId]);
+
+  const focused: RunProgress | undefined = runs.find(r => r.runId === activeRunId);
+  const phases = focused ? mergePhases(focused) : [];
+  // The sidebar includes the All row: prepend one item to the phases array -> total rows = phases.length + 1
+  const phaseRowCount = phases.length + 1;
+  const clampedPhase = clampSelected(selectedPhaseIndex, phaseRowCount);
+
+  // Auto-exit the panel when the focused run transitions from running to terminal (800ms delay so the user sees the ✓/✗ terminal state).
+  // Only triggered by a state transition on the same runId: switching to an already completed tab (prev was a different run) does not exit; opening the history panel
+  // (prev=null) does not exit either. Otherwise the agent is blocked by the panel while waiting for the Workflow tool result, and the user must press q manually.
+  const prevFocusedRef = useRef<{ runId: string; status: RunProgress['status'] } | null>(null);
+  useEffect(() => {
+    const curr = focused ? { runId: focused.runId, status: focused.status } : null;
+    const prev = prevFocusedRef.current;
+    prevFocusedRef.current = curr;
+    if (!isRunTerminatedTransition(prev, curr)) return;
+    const timer = setTimeout(() => onDone(), 800);
+    return (): void => {
+      clearTimeout(timer);
+    };
+  }, [focused?.runId, focused?.status, onDone]);
+
+  // Selected phase title (0 = All = undefined)
+  const selectedPhaseTitle = clampedPhase === 0 ? undefined : phases[clampedPhase - 1]?.title;
+
+  const visibleAgents = focused ? filterAgentsByPhase(focused.agents, selectedPhaseTitle) : [];
+  const clampedAgent = clampSelected(selectedAgentIndex, visibleAgents.length);
+
+  const switchTab = (runId: string): void => {
+    setActiveRunId(runId);
+    setFocusColumn('phases');
+    setSelectedPhaseIndex(0);
+    setSelectedAgentIndex(0);
+  };
+
+  const nextTab = (): void => {
+    if (runs.length === 0) return;
+    const idx = runs.findIndex(r => r.runId === activeRunId);
+    const next = runs[(idx + 1) % runs.length]!;
+    switchTab(next.runId);
+  };
+  const prevTab = (): void => {
+    if (runs.length === 0) return;
+    const idx = runs.findIndex(r => r.runId === activeRunId);
+    const next = runs[(idx - 1 + runs.length) % runs.length]!;
+    switchTab(next.runId);
+  };
+
+  const handlers: WorkflowKeyboardHandlers = {
+    nextTab,
+    prevTab,
+    focusLeft: () => setFocusColumn('phases'),
+    focusRight: () => setFocusColumn('agents'),
+    moveUp: () => {
+      if (focusColumn === 'phases') setSelectedPhaseIndex(s => clampSelected(s - 1, phaseRowCount));
+      else setSelectedAgentIndex(s => clampSelected(s - 1, visibleAgents.length));
+    },
+    moveDown: () => {
+      if (focusColumn === 'phases') setSelectedPhaseIndex(s => clampSelected(s + 1, phaseRowCount));
+      else setSelectedAgentIndex(s => clampSelected(s + 1, visibleAgents.length));
+    },
+    killAgent: () => {
+      // Only pop the agent confirmation when the agents column is focused (pressing x in the phases column has no target, no-op).
+      // The selected agent is decided by visibleAgents[clampedAgent]; saved into confirmKill and then
+      // actually executed by confirmYes - to avoid mis-killing caused by visibleAgents changing between two renders.
+      if (focusColumn !== 'agents' || !focused) return;
+      const agent = visibleAgents[clampedAgent];
+      if (!agent) return;
+      setConfirmKill('agent');
+    },
+    killWorkflow: () => {
+      if (!focused) return;
+      setConfirmKill('workflow');
+    },
+    resumeFocused: () => {
+      if (!focused) return;
+      const canUseTool = context.canUseTool;
+      if (!canUseTool) {
+        onDone('resume needs canUseTool context; run /<name> resume from the main session.');
+        return;
+      }
+      void svc
+        .launch({ resumeFromRunId: focused.runId, name: focused.workflowName }, context, canUseTool)
+        .catch(e => onDone(`resume failed: ${(e as Error).message}`));
+    },
+    newRun: () => onDone('Tip: start a named workflow with /<name>, or pass name via the Workflow tool.'),
+    quit: () => {
+      // In confirm mode q = cancel confirmation (routeWorkflowKey already routed to confirmNo);
+      // only in non-confirm mode does it really exit the panel.
+      if (confirmKill !== null) {
+        setConfirmKill(null);
+        return;
+      }
+      onDone();
+    },
+    confirmYes: () => {
+      if (confirmKill === 'workflow' && focused) {
+        svc.kill(focused.runId);
+        // After killing the entire workflow, immediately return to the main chat: the run_done event -> the store reducer changes the status to
+        // killed -> notifications.ts bridges enqueuePendingNotification, and the main chat shows
+        // `Workflow "<name>" was stopped`. Staying on the panel would instead make the user miss the "stopped" feedback.
+        setConfirmKill(null);
+        onDone();
+        return;
+      } else if (confirmKill === 'agent' && focused) {
+        const agent = visibleAgents[clampedAgent];
+        if (agent) svc.killAgent(focused.runId, agent.id);
+      }
+      setConfirmKill(null);
+    },
+    confirmNo: () => setConfirmKill(null),
+  };
+  useWorkflowKeyboard(handlers, confirmKill !== null ? 'confirm' : 'normal');
+
+  const running = runs.filter(r => r.status === 'running').length;
+  const done = runs.length - running;
+  const phaseHeader = selectedPhaseTitle ?? ALL_PHASE;
+  const agentDone = focused ? focused.agents.filter(a => a.status === 'done').length : 0;
+  // Refresh the header duration every second (shared clock; subscribing triggers re-render, duration follows wall clock).
+  const [clockRef] = useAnimationFrame(1000);
+  const elapsed = focused ? Date.now() - focused.startedAt : 0;
+
+  return (
+    <Box ref={clockRef} flexDirection="column" borderStyle="round" borderColor="claude" paddingX={1}>
+      <Box justifyContent="space-between">
+        <Text bold>{focused?.workflowName ?? 'Workflows'}</Text>
+        {focused ? (
+          <Text color="subtle">
+            {agentDone}/{focused.agentCount} agents · {formatDuration(elapsed)} ·{' '}
+            <Text color={RUN_STATUS_COLOR[focused.status] as keyof Theme}>{RUN_STATUS_TEXT[focused.status]}</Text>
+          </Text>
+        ) : (
+          <Text color="subtle">
+            {running} running · {done} done
+          </Text>
+        )}
+      </Box>
+      {focused?.description ? <Text color="subtle">{focused.description}</Text> : null}
+
+      {runs.length > 1 ? (
+        <Box marginTop={1}>
+          <TabsBar runs={runs} activeRunId={activeRunId} />
+        </Box>
+      ) : null}
+
+      <Box flexDirection="row" marginTop={1}>
+        <Box width="25%" flexDirection="column">
+          <Text color={focusColumn === 'phases' ? 'claude' : 'subtle'} bold>
+            Phases
+          </Text>
+          <PhaseSidebar
+            phases={phases}
+            agents={focused?.agents ?? []}
+            selectedIndex={clampedPhase}
+            focused={focusColumn === 'phases'}
+          />
+        </Box>
+        <Text color="subtle">│</Text>
+        <Box flexGrow={1} flexDirection="column">
+          <Text color={focusColumn === 'agents' ? 'claude' : 'subtle'} bold>
+            {phaseHeader} · {visibleAgents.length} agents
+          </Text>
+          <AgentList agents={visibleAgents} selectedIndex={clampedAgent} focused={focusColumn === 'agents'} />
+        </Box>
+      </Box>
+
+      <Box marginTop={1}>
+        <Text color="subtle">
+          {confirmKill !== null
+            ? 'Confirm: y kill · n/Esc cancel'
+            : 'Tab switch run · ←/→ focus · ↑/↓ move · x kill agent · K kill workflow · r resume · q quit'}
+        </Text>
+      </Box>
+
+      {confirmKill !== null ? (
+        <Dialog
+          title={
+            confirmKill === 'workflow'
+              ? `Kill workflow "${focused?.workflowName ?? ''}"?`
+              : `Kill agent "${visibleAgents[clampedAgent]?.label ?? ''}"?`
+          }
+          subtitle={
+            confirmKill === 'workflow'
+              ? 'All in-flight agents will be aborted. Resume will replay from journal.'
+              : 'Only this agent aborts; other agents in the workflow keep running.'
+          }
+          onCancel={() => setConfirmKill(null)}
+          color="warning"
+        >
+          <Text color="subtle">Press y to confirm, or n/Esc to cancel.</Text>
+        </Dialog>
+      ) : null}
+    </Box>
+  );
+}
--- a/src/workflow/panel/panelCall.tsx
+++ b/src/workflow/panel/panelCall.tsx
@@ -0,0 +1,16 @@
+import type { LocalJSXCommandCall } from '../../types/command.js';
+import { SentryErrorBoundary } from '../../components/SentryErrorBoundary.js';
+import { WorkflowsPanel } from './WorkflowsPanel.js';
+
+/**
+ * local-jsx call for /workflows: builds the panel element and returns it for Ink to render.
+ *
+ * Wrapped in SentryErrorBoundary: when useSyncExternalStore / listNamed / child components
+ * throw, the exception must not break through to the REPL top level and crash the whole session; the boundary falls back to a local error card.
+ * onDone/context are injected by the command runtime; args is unused (the panel has no parameterized behavior).
+ */
+export const call: LocalJSXCommandCall = async (onDone, context, _args) => (
+  <SentryErrorBoundary name="WorkflowsPanel">
+    <WorkflowsPanel onDone={onDone} context={context} />
+  </SentryErrorBoundary>
+);
--- a/src/workflow/panel/selectors.ts
+++ b/src/workflow/panel/selectors.ts
@@ -0,0 +1,71 @@
+import type { AgentProgress, RunProgress } from '../progress/store.js'
+import type { PhaseStatus } from './status.js'
+
+/** Title of the fixed "no filter" item (first row of the sidebar). */
+export const ALL_PHASE = 'All'
+
+/** Merged phase (including pending), with done/total counts of agents under that phase. */
+export type MergedPhase = {
+  title: string
+  status: PhaseStatus
+  done: number
+  total: number
+}
+
+/**
+ * Merge declaredPhases (declared by meta) and run.phases (actually running/done):
+ * - Declared order takes priority; phases present in actual but not declared are appended at the end.
+ * - No actual record -> pending; otherwise take the actual status.
+ * - done/total = done under that phase / total agents under that phase.
+ */
+export function mergePhases(
+  run: Pick<RunProgress, 'declaredPhases' | 'phases' | 'agents'>,
+): MergedPhase[] {
+  const actualByTitle = new Map(run.phases.map(p => [p.title, p]))
+  const seen = new Set<string>()
+  const out: MergedPhase[] = []
+  const push = (title: string): void => {
+    if (seen.has(title)) return
+    seen.add(title)
+    const actual = actualByTitle.get(title)
+    const status: PhaseStatus = !actual ? 'pending' : actual.status
+    const inPhase = run.agents.filter(a => a.phase === title)
+    out.push({
+      title,
+      status,
+      done: inPhase.filter(a => a.status === 'done').length,
+      total: inPhase.length,
+    })
+  }
+  for (const t of run.declaredPhases) push(t)
+  for (const p of run.phases) push(p.title)
+  return out
+}
+
+/**
+ * Filter agents by the selected phase.
+ * selectedPhase undefined or ALL_PHASE -> all.
+ */
+export function filterAgentsByPhase(
+  agents: AgentProgress[],
+  selectedPhase: string | undefined,
+): AgentProgress[] {
+  if (selectedPhase === undefined || selectedPhase === ALL_PHASE) return agents
+  return agents.filter(a => a.phase === selectedPhase)
+}
+
+/** tab label: workflow name + `#` + last 4 chars of runId (disambiguates same-name runs). */
+export function tabLabel(workflowName: string, runId: string): string {
+  return `${workflowName}#${runId.slice(-4)}`
+}
+
+/** milliseconds -> compact duration (<60s -> `Ns`; <60m -> `MmSSs`; otherwise `HhMMm`). Used by the panel header. */
+export function formatDuration(ms: number): string {
+  const s = Math.floor(ms / 1000)
+  if (s < 60) return `${s}s`
+  const m = Math.floor(s / 60)
+  const ss = s % 60
+  if (m < 60) return `${m}m${String(ss).padStart(2, '0')}s`
+  const h = Math.floor(m / 60)
+  return `${h}h${String(m % 60).padStart(2, '0')}m`
+}
--- a/src/workflow/panel/status.ts
+++ b/src/workflow/panel/status.ts
@@ -0,0 +1,73 @@
+import type { AgentProgress, RunProgress } from '../progress/store.js'
+
+/** run status -> dot character (used by top tab). */
+export const STATUS_DOT: Record<RunProgress['status'], string> = {
+  running: '●',
+  completed: '✓',
+  failed: '✗',
+  killed: '■',
+}
+
+/** run status -> ink theme color token (follows existing WorkflowList palette). */
+export const RUN_STATUS_COLOR: Record<RunProgress['status'], string> = {
+  running: 'warning',
+  completed: 'success',
+  failed: 'error',
+  killed: 'subtle',
+}
+
+/** run status -> display text (used by header; aligns with reference image done/running). */
+export const RUN_STATUS_TEXT: Record<RunProgress['status'], string> = {
+  running: 'running',
+  completed: 'done',
+  failed: 'failed',
+  killed: 'killed',
+}
+
+/** merged phase status in the sidebar (includes pending: declared by meta but not started). */
+export type PhaseStatus = 'running' | 'done' | 'pending'
+
+export const PHASE_MARK: Record<PhaseStatus, string> = {
+  running: '●',
+  done: '✓',
+  pending: '○',
+}
+
+export const PHASE_COLOR: Record<PhaseStatus, string> = {
+  running: 'warning',
+  done: 'success',
+  pending: 'subtle',
+}
+
+/** visual for an agent row: mark character + color (running has the mark overridden by a spinner animation in UI). */
+export type AgentVisual = { mark: string; color: string }
+
+/**
+ * agent status -> visual.
+ * - running -> ● warning (UI overrides mark with spinner animation)
+ * - done·dead -> ✗ error
+ * - done·ok -> ✓ success
+ */
+export function agentVisual(a: AgentProgress): AgentVisual {
+  if (a.status === 'running') return { mark: '●', color: 'warning' }
+  if (a.resultKind === 'dead') return { mark: '✗', color: 'error' }
+  return { mark: '✓', color: 'success' }
+}
+
+/** token count -> display string (<1000 keeps the raw value; otherwise keeps 1 decimal + k). */
+export function formatTokenCount(n: number | undefined): string {
+  if (!n) return '0'
+  return n >= 1000 ? `${(n / 1000).toFixed(1)}k` : String(n)
+}
+
+/**
+ * right-side stats text for an agent row: `model · Nk tok · N tool`.
+ * Omits the prefix when there is no model; token/tool refresh in real time via agent_progress while running.
+ */
+export function agentMetaText(a: AgentProgress): string {
+  const parts: string[] = []
+  if (a.model) parts.push(a.model)
+  parts.push(`${formatTokenCount(a.tokenCount)} tok`)
+  parts.push(`${a.toolCount ?? 0} tool`)
+  return parts.join(' · ')
+}
--- a/src/workflow/panel/useWorkflowKeyboard.ts
+++ b/src/workflow/panel/useWorkflowKeyboard.ts
@@ -0,0 +1,145 @@
+import { useInput } from '@anthropic/ink'
+
+/** The column that currently has focus. */
+export type FocusColumn = 'phases' | 'agents'
+
+/** Keyboard mode: normal = regular navigation; confirm = a Dialog is open, waiting for the user's y/n confirmation. */
+export type WorkflowKeyboardMode = 'normal' | 'confirm'
+
+/** Subset of the useInput key object (only declares the fields we use, to avoid coupling to the ink Key type). */
+type KeyEvent = {
+  tab?: boolean
+  shift?: boolean
+  escape?: boolean
+  return?: boolean
+  leftArrow?: boolean
+  rightArrow?: boolean
+  upArrow?: boolean
+  downArrow?: boolean
+}
+
+/** key -> action (pure function, easy to unit test; no rendering dependencies). */
+export type WorkflowKeyAction =
+  | 'nextTab'
+  | 'prevTab'
+  | 'focusLeft'
+  | 'focusRight'
+  | 'moveUp'
+  | 'moveDown'
+  | 'killAgent'
+  | 'killWorkflow'
+  | 'resume'
+  | 'newRun'
+  | 'quit'
+  | 'confirmYes'
+  | 'confirmNo'
+
+export function routeWorkflowKey(
+  input: string,
+  key: KeyEvent,
+  mode: WorkflowKeyboardMode = 'normal',
+): WorkflowKeyAction | null {
+  // confirm mode: only y/Enter confirms, n/Esc/q cancels, all other keys are swallowed (prevent mis-touch)
+  if (mode === 'confirm') {
+    if (input === 'y' || input === 'Y' || key.return) return 'confirmYes'
+    if (input === 'n' || input === 'N' || key.escape || input === 'q') {
+      return 'confirmNo'
+    }
+    return null
+  }
+  // @anthropic/ink sets key.tab to true for the Tab key; some environments fall back to '\t'
+  if (key.tab || input === '\t') return key.shift ? 'prevTab' : 'nextTab'
+  if (key.escape || input === 'q') return 'quit'
+  // Capital K = kill the entire workflow; lowercase x = kill the currently selected agent (agents column only).
+  // Case distinction avoids x accidentally triggering workflow kill; K explicitly requires Shift, hinting at a "heavy operation".
+  if (input === 'K') return 'killWorkflow'
+  if (input === 'x') return 'killAgent'
+  if (input === 'r') return 'resume'
+  if (input === 'n') return 'newRun'
+  if (key.leftArrow) return 'focusLeft'
+  if (key.rightArrow) return 'focusRight'
+  if (key.upArrow) return 'moveUp'
+  if (key.downArrow) return 'moveDown'
+  return null
+}
+
+/** Focus model callbacks (injected by WorkflowsPanel). */
+export type WorkflowKeyboardHandlers = {
+  nextTab: () => void
+  prevTab: () => void
+  focusLeft: () => void
+  focusRight: () => void
+  moveUp: () => void
+  moveDown: () => void
+  /** Request killing the currently selected agent (panel pops a Dialog for secondary confirmation). */
+  killAgent: () => void
+  /** Request killing the entire workflow (panel pops a Dialog for secondary confirmation). */
+  killWorkflow: () => void
+  resumeFocused: () => void
+  newRun: () => void
+  quit: () => void
+  /** User confirms in confirm mode (y/Enter). */
+  confirmYes: () => void
+  /** User cancels in confirm mode (n/Esc/q). */
+  confirmNo: () => void
+}
+
+/**
+ * /workflows panel keybindings (focus rotation model):
+ * - Tab / Shift+Tab: switch the top run tab
+ * - Left / Right: switch focus between phases and agents
+ * - Up / Down: move within the currently focused column
+ * - x kill single agent · K kill the entire workflow (with Dialog secondary confirmation) · r resume · n new · q / Esc quit
+ *
+ * @param mode In confirm mode only y/n/Esc/q are accepted, all other keys are swallowed - avoid mis-navigation inside the confirmation dialog.
+ */
+export function useWorkflowKeyboard(
+  h: WorkflowKeyboardHandlers,
+  mode: WorkflowKeyboardMode = 'normal',
+): void {
+  useInput((input, key) => {
+    const action = routeWorkflowKey(input, key as KeyEvent, mode)
+    if (action === null) return
+    switch (action) {
+      case 'nextTab':
+        h.nextTab()
+        break
+      case 'prevTab':
+        h.prevTab()
+        break
+      case 'focusLeft':
+        h.focusLeft()
+        break
+      case 'focusRight':
+        h.focusRight()
+        break
+      case 'moveUp':
+        h.moveUp()
+        break
+      case 'moveDown':
+        h.moveDown()
+        break
+      case 'killAgent':
+        h.killAgent()
+        break
+      case 'killWorkflow':
+        h.killWorkflow()
+        break
+      case 'resume':
+        h.resumeFocused()
+        break
+      case 'newRun':
+        h.newRun()
+        break
+      case 'quit':
+        h.quit()
+        break
+      case 'confirmYes':
+        h.confirmYes()
+        break
+      case 'confirmNo':
+        h.confirmNo()
+        break
+    }
+  })
+}
--- a/src/workflow/persistence.ts
+++ b/src/workflow/persistence.ts
@@ -0,0 +1,131 @@
+import { mkdir, readFile, readdir, rename, writeFile } from 'node:fs/promises'
+import { join } from 'node:path'
+import { getProjectRoot } from '../bootstrap/state.js'
+import { logForDebugging } from '../utils/debug.js'
+import type { ProgressBus } from './progress/bus.js'
+import type { ProgressStore, RunProgress } from './progress/store.js'
+
+/** Current schema version of state.json; introduces a migration chain on upgrade. */
+const SCHEMA_VERSION = 1
+const STATE_FILE = 'state.json'
+const STATE_TMP = 'state.json.tmp'
+
+/**
+ * Single source for runsDir: shares the same root as ports.ts journalStore (${projectRoot}/.claude/workflow-runs).
+ * Extracted as a function: eliminates duplicated path concatenation between ports.ts and persistence logic, staying in the same root when entering worktree/subdirectory.
+ * Tests monkey-patch this function to point at a tmpdir.
+ */
+export function getRunsDir(): string {
+  return join(getProjectRoot(), '.claude', 'workflow-runs')
+}
+
+type StateFile = {
+  schemaVersion: number
+  run: RunProgress
+}
+
+/**
+ * Atomically overwrite the terminal RunProgress to <runsDir>/<runId>/state.json.
+ * Atomicity: writeFile(tmp) → rename(tmp, target), rename is atomic; worst case leaves tmp, next write overwrites it.
+ * Failure is best-effort: IO exceptions only log a warn, do not throw (workflow already succeeded; persistence failure only means it cannot be retrieved after restart).
+ */
+export async function writeRunState(
+  runsDir: string,
+  run: RunProgress,
+): Promise<void> {
+  const dir = join(runsDir, run.runId)
+  const target = join(dir, STATE_FILE)
+  const tmp = join(dir, STATE_TMP)
+  const payload: StateFile = { schemaVersion: SCHEMA_VERSION, run }
+  try {
+    await mkdir(dir, { recursive: true })
+    await writeFile(tmp, JSON.stringify(payload), 'utf-8')
+    await rename(tmp, target)
+  } catch (e) {
+    logForDebugging(
+      `[workflow warn] writeRunState failed for ${run.runId}: ${(e as Error).message}`,
+    )
+  }
+}
+
+/**
+ * Read <runsDir>/<runId>/state.json with fault tolerance:
+ * - File does not exist → null (caller treats it as a miss)
+ * - JSON parse failure / schema structure mismatch / schemaVersion mismatch → null (log warn, do not crash)
+ */
+export async function readRunState(
+  runsDir: string,
+  runId: string,
+): Promise<RunProgress | null> {
+  const target = join(runsDir, runId, STATE_FILE)
+  let raw: string
+  try {
+    raw = await readFile(target, 'utf-8')
+  } catch {
+    return null
+  }
+  try {
+    const parsed = JSON.parse(raw) as Partial<StateFile>
+    if (parsed.schemaVersion !== SCHEMA_VERSION) return null
+    const run = parsed.run
+    if (!run || typeof run !== 'object') return null
+    if (typeof run.runId !== 'string') return null
+    if (typeof run.status !== 'string') return null
+    return run as RunProgress
+  } catch (e) {
+    logForDebugging(
+      `[workflow warn] readRunState parse failed for ${runId}: ${(e as Error).message}`,
+    )
+    return null
+  }
+}
+
+/**
+ * Scan all subdirectories under runsDir, read each state.json, return a list of non-null RunProgress.
+ * - runsDir does not exist → empty array
+ * - A subdirectory without state.json (half-written run) → skip
+ * - A subdirectory whose state.json is corrupted → skip that single one, keep scanning the rest
+ * - Sort by updatedAt descending (consistent with store.list() ordering)
+ */
+export async function listPersistedRuns(
+  runsDir: string,
+): Promise<RunProgress[]> {
+  let entries: string[]
+  try {
+    entries = await readdir(runsDir)
+  } catch {
+    return []
+  }
+  const runs: RunProgress[] = []
+  for (const name of entries) {
+    const run = await readRunState(runsDir, name)
+    if (run) runs.push(run)
+  }
+  return runs.sort((a, b) => b.updatedAt - a.updatedAt)
+}
+
+/**
+ * Subscribe to the bus's run_done event and write the terminal RunProgress to state.json on disk.
+ * Covers all three terminal states (completed/failed/killed; shutdown-kill also routes to run_done killed).
+ * The store registers to the bus before this subscription, so when the listener runs store.get(runId) is already terminal.
+ * Returns an unsubscribe function (for test cleanup).
+ *
+ * Disk write is best-effort: writeRunState swallows IO exceptions and only logs, does not propagate —
+ * so other bus subscribers (store, etc.) are not affected by persistence failures.
+ *
+ * @param runsDirProvider Optional runsDir resolver (defaults to getRunsDir).
+ *   Production path uses the default; tests inject a tmpdir to avoid writing to the real project directory (Bun ESM module namespace is read-only,
+ *   cannot monkey-patch getRunsDir itself).
+ */
+export function attachRunStatePersistence(
+  bus: ProgressBus,
+  store: ProgressStore,
+  runsDirProvider: () => string = getRunsDir,
+): () => void {
+  return bus.subscribe(event => {
+    if (event.type !== 'run_done') return
+    const run = store.get(event.runId)
+    if (!run) return
+    void writeRunState(runsDirProvider(), run)
+  })
+}
--- a/src/workflow/ports.ts
+++ b/src/workflow/ports.ts
@@ -0,0 +1,202 @@
+import {
+  createFileJournalStore,
+  type ProgressEvent,
+  type WorkflowPorts,
+} from '@claude-code-best/workflow-engine'
+import { logForDebugging } from '../utils/debug.js'
+import { getProjectRoot } from '../bootstrap/state.js'
+import { getRunsDir } from './persistence.js'
+import {
+  type AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS,
+  logEvent,
+} from '../services/analytics/index.js'
+import {
+  completeWorkflowTask,
+  failWorkflowTask,
+  killWorkflowTask,
+  registerLocalWorkflowTask,
+} from '../tasks/LocalWorkflowTask/LocalWorkflowTask.js'
+import {
+  buildHostBundle,
+  makeHostHandle,
+  readHostBundle,
+  type WorkflowHostBundle,
+} from './hostHandle.js'
+import { buildRegistry } from './registry.js'
+import type { ProgressBus } from './progress/bus.js'
+import type { ProgressStore } from './progress/store.js'
+import type { SetAppState } from '../Task.js'
+import type { AssistantMessage } from '../types/message.js'
+
+type RunBinding = {
+  runId: string
+  taskId: string
+  setAppState: SetAppState
+  abortController: AbortController
+  workflowName: string
+  /** agentId → AbortController. Registered when backend starts an agent; killAgent uses it for precise abort. */
+  agentAbortControllers: Map<number, AbortController>
+}
+
+/** Constructs a WorkflowHostContext from toolUseContext on each tool invocation. */
+function makeHostFactory(): WorkflowPorts['hostFactory'] {
+  return ({ context, canUseTool, parentMessage }) => {
+    const ctx = context as WorkflowHostBundle['toolUseContext'] & {
+      agentId?: string
+    }
+    return {
+      handle: makeHostHandle(
+        buildHostBundle(
+          ctx,
+          canUseTool as WorkflowHostBundle['canUseTool'],
+          parentMessage as AssistantMessage | undefined,
+        ),
+      ),
+      // Use projectRoot rather than getCwd(): shares the same root as journalStore's runsDir,
+      // otherwise named workflow resolution and journal persistence diverge when the user
+      // enters a worktree/sub-directory. The engine's internal ctx.cwd is only used for
+      // resolution (scriptPath/name) and does not affect the agent's execution cwd
+      // (the agent gets its own cwd via the toolUseContext inside the host bundle).
+      cwd: getProjectRoot(),
+      budgetTotal: null, // turn-level budget injection point (read from settings in the future)
+      ...(ctx.toolUseId ? { toolUseId: ctx.toolUseId } : {}),
+    }
+  }
+}
+
+/**
+ * Assembles the complete WorkflowPorts. bus/store are passed in by the caller (shared via the service singleton).
+ * taskRegistrar maintains runId → RunBinding for kill routing.
+ */
+export function createWorkflowPorts(opts: {
+  bus: ProgressBus
+  store: ProgressStore
+}): WorkflowPorts {
+  const bindings = new Map<string, RunBinding>()
+  const runsDir = getRunsDir()
+  const registry = buildRegistry()
+
+  // Telemetry subscription (independent of store). LogEventMetadata only accepts boolean/number/undefined,
+  // and runId is a string — use the brand cast provided by the analytics module (verified non-code/path) to pass it through.
+  opts.bus.subscribe((e: ProgressEvent) => {
+    if (e.type === 'run_done') {
+      logEvent('tengu_workflow_done', {
+        status: e.status === 'completed' ? 0 : e.status === 'failed' ? 1 : 2,
+        runId:
+          e.runId as AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS,
+      })
+    }
+  })
+
+  const taskRegistrar: WorkflowPorts['taskRegistrar'] = {
+    register(regOpts, host) {
+      const bundle = readHostBundle(host)
+      const setAppState =
+        bundle.toolUseContext.setAppStateForTasks ??
+        bundle.toolUseContext.setAppState
+      const abortController = new AbortController()
+      const taskId = registerLocalWorkflowTask(setAppState, {
+        description: regOpts.summary ?? regOpts.workflowName,
+        workflowName: regOpts.workflowName,
+        workflowFile: regOpts.workflowFile ?? '',
+        summary: regOpts.summary,
+        ...(regOpts.toolUseId ? { toolUseId: regOpts.toolUseId } : {}),
+        abortController,
+      })
+      const runId = regOpts.runId ?? taskId
+      bindings.set(runId, {
+        runId,
+        taskId,
+        setAppState,
+        abortController,
+        workflowName: regOpts.workflowName,
+        agentAbortControllers: new Map(),
+      })
+      logForDebugging(
+        `workflow task registered: ${runId} (${regOpts.workflowName})`,
+      )
+      return { runId, signal: abortController.signal }
+    },
+    complete(runId, summary) {
+      const b = bindings.get(runId)
+      if (!b) return
+      completeWorkflowTask(b.taskId, b.setAppState)
+      logForDebugging(`workflow ${runId} completed: ${summary ?? ''}`)
+      bindings.delete(runId)
+    },
+    fail(runId, error) {
+      const b = bindings.get(runId)
+      if (!b) return
+      failWorkflowTask(b.taskId, b.setAppState, error)
+      logForDebugging(`workflow ${runId} failed: ${error}`)
+      bindings.delete(runId)
+    },
+    kill(runId) {
+      const b = bindings.get(runId)
+      if (!b) return
+      killWorkflowTask(b.taskId, b.setAppState) // internal abort controller
+      // Killing the run also aborts all in-flight agents (guards against the edge timing where the backend misses the task abort)
+      for (const ac of b.agentAbortControllers.values()) {
+        try {
+          ac.abort()
+        } catch {
+          // no-op: abort won't throw internally, but fail-closed
+        }
+      }
+      b.agentAbortControllers.clear()
+      bindings.delete(runId)
+    },
+    registerAgentAbort(runId, agentId, ac) {
+      const b = bindings.get(runId)
+      if (!b) return
+      b.agentAbortControllers.set(agentId, ac)
+    },
+    unregisterAgentAbort(runId, agentId) {
+      const b = bindings.get(runId)
+      if (!b) return
+      b.agentAbortControllers.delete(agentId)
+    },
+    killAgent(runId, agentId) {
+      const b = bindings.get(runId)
+      if (!b) return false
+      const ac = b.agentAbortControllers.get(agentId)
+      if (!ac) return false
+      try {
+        ac.abort()
+      } catch {
+        // no-op
+      }
+      b.agentAbortControllers.delete(agentId)
+      return true
+    },
+    pendingAction() {
+      return null // v1: skip/retry not wired (seam retained)
+    },
+  }
+
+  return {
+    hostFactory: makeHostFactory(),
+    agentAdapterRegistry: registry,
+    agentRunner: {
+      // Dead-code fallback: hooks always go through agentAdapterRegistry (required on ports). Reaching here means the registry was not registered — fail-fast.
+      async runAgentToResult() {
+        throw new Error(
+          'workflow agentRunner fallback reached — agentAdapterRegistry must be set on ports',
+        )
+      },
+    },
+    progressEmitter: {
+      emit(event) {
+        opts.bus.emit(event) // → store reducer + telemetry
+      },
+    },
+    taskRegistrar,
+    journalStore: createFileJournalStore(runsDir),
+    permissionGate: { isAborted: () => false }, // engine uses ctx.signal to check abort
+    logger: {
+      debug: msg => logForDebugging(msg),
+      warn: msg => logForDebugging(`[workflow warn] ${msg}`),
+      event: name => logForDebugging(`workflow event: ${name}`),
+    },
+  }
+}
--- a/src/workflow/progress/bus.ts
+++ b/src/workflow/progress/bus.ts
@@ -0,0 +1,20 @@
+import type { ProgressEvent } from '@claude-code-best/workflow-engine'
+
+/** Typed progress event bus. engine progressEmitter.emit -> broadcasts to all subscribers (store / telemetry). */
+export type ProgressBus = {
+  emit(event: ProgressEvent): void
+  subscribe(listener: (event: ProgressEvent) => void): () => void
+}
+
+export function createProgressBus(): ProgressBus {
+  const listeners = new Set<(event: ProgressEvent) => void>()
+  return {
+    emit(event) {
+      for (const fn of listeners) fn(event)
+    },
+    subscribe(listener) {
+      listeners.add(listener)
+      return () => listeners.delete(listener)
+    },
+  }
+}
--- a/src/workflow/progress/store.ts
+++ b/src/workflow/progress/store.ts
@@ -0,0 +1,200 @@
+import type { ProgressEvent } from '@claude-code-best/workflow-engine'
+import type { ProgressBus } from './bus.js'
+
+export type AgentProgress = {
+  /** Unique id stamped by the engine, precisely correlates started/done (fixes the old LIFO race condition). */
+  id: number
+  label?: string
+  phase?: string
+  status: 'running' | 'done'
+  resultKind?: string
+  /** Only meaningful when done·ok: output is an object -> 'object', otherwise -> 'text'. None for dead/skipped. */
+  outputShape?: 'text' | 'object'
+  /** Actually parsed model id (carried in by agent_done; none while running). */
+  model?: string
+  /** Cumulative context tokens (live via agent_progress / final value settled by agent_done). */
+  tokenCount?: number
+  /** Cumulative tool-call count (live via agent_progress / final value settled by agent_done). */
+  toolCount?: number
+}
+
+export type RunProgress = {
+  runId: string
+  workflowName: string
+  status: 'running' | 'completed' | 'failed' | 'killed'
+  phases: Array<{ title: string; status: 'running' | 'done' }>
+  /** From run_started.meta.phases[].title; the panel uses this to show pending(○) phases. [] when no meta. */
+  declaredPhases: string[]
+  currentPhase: string | null
+  agents: AgentProgress[]
+  agentCount: number
+  returnValue?: unknown
+  error?: string
+  /** run_started timestamp (used by the panel to compute run duration). */
+  startedAt: number
+  /** workflow description (from run_started.meta.description). */
+  description?: string
+  updatedAt: number
+}
+
+export type ProgressStore = {
+  apply(event: ProgressEvent): void
+  list(): RunProgress[]
+  get(runId: string): RunProgress | undefined
+  /** Directly inject a run read from disk (bypassing bus); skips existing runId - in-memory takes priority. */
+  hydrate(run: RunProgress): void
+  /** For useSyncExternalStore: returns a stable reference, the same array when no change. */
+  subscribe(listener: () => void): () => void
+  getSnapshot(): RunProgress[]
+}
+
+/** Build a reactive store from the bus: subscribe to the bus, reduce events, notify React subscribers. */
+export function createProgressStoreFromBus(bus: ProgressBus): ProgressStore {
+  const byId = new Map<string, RunProgress>()
+  let snapshot: RunProgress[] = []
+  const listeners = new Set<() => void>()
+
+  const notify = (): void => {
+    snapshot = [...byId.values()].sort((a, b) => b.updatedAt - a.updatedAt)
+    for (const fn of listeners) fn()
+  }
+
+  const ensure = (runId: string, workflowName: string): RunProgress => {
+    let p = byId.get(runId)
+    if (!p) {
+      p = {
+        runId,
+        workflowName,
+        status: 'running',
+        phases: [],
+        declaredPhases: [],
+        currentPhase: null,
+        agents: [],
+        agentCount: 0,
+        startedAt: Date.now(),
+        updatedAt: Date.now(),
+      }
+      byId.set(runId, p)
+    }
+    return p
+  }
+
+  const apply = (event: ProgressEvent): void => {
+    // log produces no visible state change (panel has no log view): early exit to avoid pointless snapshot rebuild and React re-render
+    if (event.type === 'log') return
+    const runId = event.runId
+    const p = ensure(
+      runId,
+      'workflowName' in event ? event.workflowName : 'workflow',
+    )
+    p.updatedAt = Date.now()
+    switch (event.type) {
+      case 'run_started':
+        p.workflowName = event.workflowName
+        p.status = 'running'
+        p.declaredPhases = event.meta?.phases?.map(ph => ph.title) ?? []
+        p.description = event.meta?.description ?? undefined
+        break
+      case 'phase_started':
+        if (!p.phases.some(ph => ph.title === event.phase)) {
+          p.phases.push({ title: event.phase, status: 'running' })
+        }
+        p.currentPhase = event.phase
+        break
+      case 'phase_done':
+        for (const ph of p.phases)
+          if (ph.title === event.phase) ph.status = 'done'
+        if (p.currentPhase === event.phase) p.currentPhase = null
+        break
+      case 'agent_started': {
+        let a = p.agents.find(x => x.id === event.agentId)
+        if (!a) {
+          a = {
+            id: event.agentId,
+            label: event.label,
+            phase: event.phase,
+            status: 'running',
+          }
+          p.agents.push(a)
+          p.agentCount = p.agents.length
+        } else {
+          a.status = 'running'
+          a.label = event.label
+          a.phase = event.phase
+        }
+        break
+      }
+      case 'agent_progress': {
+        // live progress: only update token/tool (high frequency, but once per agent message, frequency is controllable).
+        const ap = p.agents.find(x => x.id === event.agentId)
+        if (ap) {
+          ap.tokenCount = event.tokenCount
+          ap.toolCount = event.toolCount
+        }
+        break
+      }
+      case 'agent_done': {
+        let a = p.agents.find(x => x.id === event.agentId)
+        if (!a) {
+          a = {
+            id: event.agentId,
+            label: event.label,
+            phase: event.phase,
+            status: 'done',
+            ...(event.result.kind === 'ok'
+              ? {
+                  outputShape:
+                    typeof event.result.output === 'object' &&
+                    event.result.output !== null
+                      ? ('object' as const)
+                      : ('text' as const),
+                  tokenCount: event.result.tokenCount,
+                  toolCount: event.result.toolCount,
+                  model: event.result.model,
+                }
+              : {}),
+          }
+          p.agents.push(a)
+          p.agentCount = p.agents.length
+        } else {
+          a.status = 'done'
+          a.resultKind = event.result.kind
+          if (event.result.kind === 'ok') {
+            a.outputShape =
+              typeof event.result.output === 'object' &&
+              event.result.output !== null
+                ? 'object'
+                : 'text'
+            a.tokenCount = event.result.tokenCount
+            a.toolCount = event.result.toolCount
+            a.model = event.result.model
+          }
+        }
+        break
+      }
+      case 'run_done':
+        p.status = event.status
+        if (event.returnValue !== undefined) p.returnValue = event.returnValue
+        if (event.error !== undefined) p.error = event.error
+        break
+    }
+    notify()
+  }
+
+  bus.subscribe(apply)
+  return {
+    apply,
+    list: () => snapshot,
+    get: id => byId.get(id),
+    hydrate(run) {
+      if (byId.has(run.runId)) return
+      byId.set(run.runId, run)
+      notify()
+    },
+    subscribe: fn => {
+      listeners.add(fn)
+      return () => listeners.delete(fn)
+    },
+    getSnapshot: () => snapshot,
+  }
+}
--- a/src/workflow/registry.ts
+++ b/src/workflow/registry.ts
@@ -0,0 +1,13 @@
+import { AgentAdapterRegistry } from '@claude-code-best/workflow-engine'
+import { claudeCodeBackend } from './backends/claudeCodeBackend.js'
+
+/**
+ * Build a multi-backend registry. v1 (depth B) only registers a single
+ * claude-code adapter as default, without prefilling routing rules — add
+ * .route(...) when extending with a second provider adapter.
+ */
+export function buildRegistry(): AgentAdapterRegistry {
+  const reg = new AgentAdapterRegistry()
+  reg.register(claudeCodeBackend).default('claude-code')
+  return reg
+}
--- a/src/workflow/service.ts
+++ b/src/workflow/service.ts
@@ -0,0 +1,314 @@
+import {
+  listNamedWorkflows,
+  parseScript,
+  persistInlineScript,
+  resolveNamedWorkflow,
+  runWorkflow,
+  WORKFLOW_DIR_NAME,
+  type WorkflowHostContext,
+  type WorkflowInput,
+  type WorkflowPorts,
+} from '@claude-code-best/workflow-engine'
+import { readFile } from 'node:fs/promises'
+import { join } from 'node:path'
+import { getProjectRoot } from '../bootstrap/state.js'
+import { logForDebugging } from '../utils/debug.js'
+import { buildHostBundle, makeHostHandle } from './hostHandle.js'
+import { installWorkflowNotifications } from './notifications.js'
+import {
+  attachRunStatePersistence,
+  getRunsDir,
+  listPersistedRuns,
+  readRunState,
+} from './persistence.js'
+import { createProgressBus } from './progress/bus.js'
+import {
+  createProgressStoreFromBus,
+  type ProgressStore,
+  type RunProgress,
+} from './progress/store.js'
+import { createWorkflowPorts } from './ports.js'
+import type { CanUseToolFn } from '../hooks/useCanUseTool.js'
+import type { ToolUseContext } from '../Tool.js'
+
+/**
+ * WorkflowService: the single entry shared by the tool (U7) and panel (U9).
+ *
+ * - `ports`: shared WorkflowPorts; tool descriptors are passed through to the engine.
+ * - `launch`: parse script → parseScript quick validation → taskRegistrar.register (gets runId+signal)
+ *   → detached runWorkflow → on completion routes to complete/fail/kill.
+ * - `kill/listRuns/getRun/subscribe/listNamed`: auxiliary queries for panel and tool.
+ */
+export type WorkflowService = {
+  /** Shared ports (used by tool descriptors). */
+  ports: WorkflowPorts
+  /** Panel/tool launches a workflow: parse script → register → detached runWorkflow. */
+  launch(
+    input: Pick<
+      WorkflowInput,
+      | 'script'
+      | 'name'
+      | 'scriptPath'
+      | 'args'
+      | 'description'
+      | 'resumeFromRunId'
+      | 'title'
+      | 'maxConcurrency'
+    >,
+    toolUseContext: ToolUseContext,
+    canUseTool: CanUseToolFn,
+  ): Promise<{ runId: string; scriptPath?: string }>
+  kill(runId: string): void
+  /**
+   * Aborts a single agent (does not affect other agents in the same run; workflow keeps running).
+   * Returns whether the agent was hit (false = agent already finished/does not exist). An aborted agent returns dead → null.
+   */
+  killAgent(runId: string, agentId: number): boolean
+  /**
+   * Cleanup on process exit / config unload: kill all running runs to avoid orphan tasks.
+   * Completed/failed runs are unaffected. Idempotent — safe to call multiple times.
+   */
+  shutdown(): void
+  listRuns(): RunProgress[]
+  getRun(runId: string): RunProgress | undefined
+  /**
+   * Async lookup by runId: return on memory hit; on miss read state.json from disk (not injected into memory).
+   * Used by the "get historical return by runId" scenario; for panel display use loadPersistedRuns + listRuns.
+   */
+  getRunAsync(runId: string): Promise<RunProgress | undefined>
+  /**
+   * Scans the disk and hydrates state.json of all historical runs into the store (skips existing runIds).
+   * The process singleton only scans the disk once (persistedLoaded flag); repeated calls return immediately.
+   */
+  loadPersistedRuns(): Promise<void>
+  subscribe(listener: () => void): () => void
+  listNamed(workflowDir?: string): Promise<string[]>
+}
+
+let cached: WorkflowService | null = null
+
+/** Process singleton. Tool and panel share the same ports/registry/store. */
+export function getWorkflowService(): WorkflowService {
+  if (cached) return cached
+  const bus = createProgressBus()
+  const store = createProgressStoreFromBus(bus)
+  const ports = createWorkflowPorts({ bus, store })
+  const service = makeService(ports, store)
+  // Subscribe to run_done to write the terminal snapshot to disk (shared entry for completed/failed/killed; shutdown-kill also routes here).
+  // The store registers to the bus before this subscription, so when the listener runs store.get(runId) is already terminal.
+  attachRunStatePersistence(bus, store)
+  // Install the state-change notification bridge (commit 0768d4dc promised "auto-notify on completion" but the old implementation left it unfulfilled)
+  installWorkflowNotifications(service)
+  cached = service
+  return cached
+}
+
+/**
+ * Construct the service (inject ports + store).
+ *
+ * Production path uses {@link getWorkflowService}; tests use this function to inject fake ports directly,
+ * avoiding touching real getProjectRoot/getCwd/analytics and other module-level side effects.
+ *
+ * @param cwdOverride For tests only: inject a temp directory (avoids inline persistence writing to the real project directory).
+ * @param runsDirProvider For tests only: inject a tmpdir (Bun ESM module namespace is read-only, cannot monkey-patch getRunsDir).
+ */
+export function makeService(
+  ports: WorkflowPorts,
+  store: ProgressStore,
+  cwdOverride?: string,
+  runsDirProvider: () => string = getRunsDir,
+): WorkflowService {
+  const buildHost = (
+    toolUseContext: ToolUseContext,
+    canUseTool: CanUseToolFn,
+  ): WorkflowHostContext => ({
+    handle: makeHostHandle(buildHostBundle(toolUseContext, canUseTool)),
+    // Use projectRoot to stay in sync with ports.ts hostFactory / journalStore;
+    // entering a worktree/subdirectory will not desync named workflow resolution from journal persistence.
+    // cwdOverride is for tests only: inject a temp directory (avoids inline persistence writing to the real project directory).
+    cwd: cwdOverride ?? getProjectRoot(),
+    budgetTotal: null, // turn-level budget injection point (in future read from settings)
+    toolUseId: toolUseContext.toolUseId,
+  })
+
+  async function resolveSource(input: {
+    script?: string
+    name?: string
+    scriptPath?: string
+  }): Promise<{
+    script: string
+    workflowFile?: string
+    workflowName: string
+  }> {
+    if (input.script) {
+      return { script: input.script, workflowName: 'workflow' }
+    }
+    if (input.scriptPath) {
+      return {
+        script: await readFile(input.scriptPath, 'utf-8'),
+        workflowFile: input.scriptPath,
+        workflowName: 'workflow',
+      }
+    }
+    if (input.name) {
+      const dir = join(getProjectRoot(), WORKFLOW_DIR_NAME)
+      const found = await resolveNamedWorkflow(dir, input.name)
+      if (!found) {
+        throw new Error(
+          `Named workflow "${input.name}" not found (looked in ${WORKFLOW_DIR_NAME}/)`,
+        )
+      }
+      return {
+        script: found.content,
+        workflowFile: found.path,
+        workflowName: input.name,
+      }
+    }
+    throw new Error('One of script, name, or scriptPath must be provided')
+  }
+
+  // Process-singleton flag for loadPersistedRuns: set to true on first call, subsequent calls return immediately.
+  // Reset on scan failure to allow next retry. Each makeService call has its own closure variable (reset when tests build a new service).
+  let persistedLoaded = false
+
+  return {
+    ports,
+
+    async launch(input, toolUseContext, canUseTool) {
+      const { script, workflowFile, workflowName } = await resolveSource(input)
+      try {
+        parseScript(script)
+      } catch (e) {
+        throw new Error(`Script validation failed: ${(e as Error).message}`)
+      }
+
+      const host = buildHost(toolUseContext, canUseTool)
+      const { runId, signal } = ports.taskRegistrar.register(
+        {
+          workflowName,
+          ...(workflowFile ? { workflowFile } : {}),
+          ...(input.description ? { summary: input.description } : {}),
+          ...(host.toolUseId ? { toolUseId: host.toolUseId } : {}),
+          ...(input.resumeFromRunId ? { runId: input.resumeFromRunId } : {}),
+        },
+        host.handle,
+      )
+
+      // Inline entry: persist script to the run directory (symmetric with WorkflowTool), return a reusable path.
+      // Degrade on write failure (log), do not block the run (script is already in memory).
+      let persistedScriptPath: string | undefined
+      if (!workflowFile && input.script) {
+        try {
+          persistedScriptPath = await persistInlineScript(
+            input.script,
+            runId,
+            host.cwd,
+          )
+        } catch (e) {
+          logForDebugging(
+            `workflow inline script persist failed: ${(e as Error).message}`,
+          )
+        }
+      }
+
+      // detached: do not await, let the caller get runId immediately; on completion route to the registrar.
+      void runWorkflow({
+        script,
+        ...(input.args !== undefined ? { args: input.args } : {}),
+        runId,
+        workflowName,
+        ports,
+        host: host.handle,
+        signal,
+        cwd: host.cwd,
+        budgetTotal: host.budgetTotal,
+        ...(input.maxConcurrency !== undefined
+          ? { maxConcurrency: input.maxConcurrency }
+          : {}),
+        ...(input.resumeFromRunId ? { resume: true } : {}),
+      })
+        .then(result => {
+          if (result.status === 'completed') {
+            ports.taskRegistrar.complete(runId)
+          } else if (result.status === 'failed') {
+            ports.taskRegistrar.fail(runId, result.error ?? 'failed')
+          } else {
+            ports.taskRegistrar.kill(runId)
+          }
+        })
+        .catch(e => ports.taskRegistrar.fail(runId, (e as Error).message))
+
+      logForDebugging(`workflow launched: ${runId} (${workflowName})`)
+      return {
+        runId,
+        ...(persistedScriptPath ? { scriptPath: persistedScriptPath } : {}),
+      }
+    },
+
+    kill(runId) {
+      ports.taskRegistrar.kill(runId)
+    },
+    killAgent(runId, agentId) {
+      return ports.taskRegistrar.killAgent?.(runId, agentId) ?? false
+    },
+
+    shutdown() {
+      // Only kill running: for completed/failed runs the taskRegistrar has already reclaimed the binding, kill is a no-op.
+      // taskRegistrar.kill is a safe no-op for unknown runIds, hence idempotent — multiple shutdowns do not throw repeatedly.
+      // Each kill is wrapped in its own try/catch: kill internally routes through setAppState, and process-exit phase triggers a React re-render
+      // which may throw (render already unmounted, etc.); a single failure should not block cleanup of other runs.
+      for (const run of store.list()) {
+        if (run.status !== 'running') continue
+        try {
+          ports.taskRegistrar.kill(run.runId)
+        } catch (e) {
+          logForDebugging(
+            `workflow shutdown: kill ${run.runId} failed: ${(e as Error).message}`,
+          )
+        }
+      }
+    },
+
+    listRuns: () => store.list(),
+    getRun: id => store.get(id),
+    async getRunAsync(id) {
+      const mem = store.get(id)
+      if (mem) return mem
+      return (await readRunState(runsDirProvider(), id)) ?? undefined
+    },
+    async loadPersistedRuns() {
+      if (persistedLoaded) return
+      persistedLoaded = true
+      try {
+        const runs = await listPersistedRuns(runsDirProvider())
+        for (const run of runs) store.hydrate(run)
+      } catch (e) {
+        // Scan failure does not block the panel: log + reset flag to allow next retry
+        logForDebugging(
+          `[workflow warn] loadPersistedRuns failed: ${(e as Error).message}`,
+        )
+        persistedLoaded = false
+      }
+    },
+    subscribe: fn => store.subscribe(fn),
+
+    async listNamed(workflowDir) {
+      return listNamedWorkflows(
+        workflowDir ?? join(getProjectRoot(), WORKFLOW_DIR_NAME),
+      )
+    },
+  }
+}
+
+/** For tests: reset the singleton (avoid cross-case contamination). */
+export function __resetWorkflowServiceForTests(): void {
+  cached = null
+}
+
+/**
+ * Returns the already-instantiated service (does not create one). Used on process exit / config unload to peek;
+ * if workflow was never used, cached is still null — avoids side-effecting bus/ports creation in the exit hook.
+ */
+export function peekWorkflowService(): WorkflowService | null {
+  return cached
+}
--- a/src/workflow/wiring.ts
+++ b/src/workflow/wiring.ts
@@ -0,0 +1,65 @@
+import {
+  createWorkflowTool,
+  workflowInputSchema,
+  WORKFLOW_TOOL_NAME,
+  type WorkflowToolDescriptor,
+} from '@claude-code-best/workflow-engine'
+import { buildTool, type Tool } from '../Tool.js'
+import { getWorkflowService } from './service.js'
+
+/**
+ * Adapts the engine's self-contained descriptor into a buildTool-compatible Tool.
+ * The descriptor routes through the service singleton (sharing ports/registry/store).
+ *
+ * ports resolution is deferred to the first real method call (lazy): tools.ts calls
+ * createWorkflowToolCore() during module-load (feature-gated), and resolving ports
+ * immediately would trigger service instantiation, which in turn calls module-level
+ * side effects like getProjectRoot — yielding wrong paths before bootstrap completes.
+ * The Tool object itself is a singleton via createWorkflowToolCore's cached (PermissionRequest
+ * matches by reference), and the ports singleton is guaranteed by getWorkflowService.
+ */
+function buildWorkflowTool(): Tool {
+  let cachedDescriptor: WorkflowToolDescriptor | null = null
+  const descriptor = (): WorkflowToolDescriptor => {
+    if (!cachedDescriptor) {
+      const { ports } = getWorkflowService()
+      cachedDescriptor = createWorkflowTool(ports)
+    }
+    return cachedDescriptor
+  }
+  return buildTool({
+    name: WORKFLOW_TOOL_NAME,
+    maxResultSizeChars: 50_000,
+    inputSchema: workflowInputSchema,
+    isEnabled: () => descriptor().isEnabled(),
+    isReadOnly: input => descriptor().isReadOnly(input),
+    isConcurrencySafe: () => true,
+    async description() {
+      return descriptor().description()
+    },
+    async prompt() {
+      return descriptor().prompt()
+    },
+    async call(input, context, canUseTool, parentMessage, onProgress) {
+      const result = await descriptor().call(
+        input,
+        context,
+        canUseTool,
+        parentMessage,
+        onProgress,
+      )
+      return { data: result.data }
+    },
+    renderToolUseMessage: input => descriptor().renderToolUseMessage(input),
+    mapToolResultToToolResultBlockParam: (data, toolUseId) =>
+      descriptor().mapToolResultToToolResultBlockParam(data, toolUseId),
+  })
+}
+
+// Singleton: tools.ts registration and PermissionRequest must reference the same instance (switch matches by reference).
+let cached: Tool | null = null
+
+export function createWorkflowToolCore(): Tool {
+  if (!cached) cached = buildWorkflowTool()
+  return cached
+}