feat(autofix-pr): 完整完成回流机制 (latent bug fix + completionChecker + 内容回流) (#1240)

* fix(autofix-pr): 修复 taskId 不一致导致 monitor lock dangling 问题:createAutofixTeammate 生成 teammate UUID 作为 monitor lock 的 key, 但 registerRemoteAgentTask 内部生成的 framework taskId 是另一个 UUID。 CCR session 自然完成时框架调 clearActiveMonitor(frameworkTaskId) guard 失败,lock 永不释放,导致后续 /autofix-pr 报 "already monitoring"。修复(Phase 1 of remote-agent completion loop): - monitorState 新增 updateActiveMonitor(partial) 原子更新 - callAutofixPr 在 register 后 swap lock 的 taskId 到 framework 分配的 id - RemoteAgentTask 引入 registerCompletionHook 注册式 API(参考已有的 registerCompletionChecker 模式),在 5 个完成路径调 runCompletionHook - autofix-pr 命令模块自己注册 cleanup hook,避免 framework 反向依赖 command 模块测试: - monitorState 新增 4 个测试(updateActiveMonitor 行为 + bug 复现/修复) - launchAutofixPr 新增 3 个端到端回归测试(taskId swap + hook 触发 + subsequent launch 不报 already monitoring) 完整分析与 Phase 2/3 改造方案见 docs/features/remote-agent-completion-analysis.md。 * feat(autofix-pr): 注册 completionChecker 用 gh CLI 探测 PR 完成 Phase 2 of remote-agent completion loop。Phase 1 修了 monitor lock dangling,但完成信号仍然只能等 CCR session 自然 archive(timing 不可预测,且不知道 PR 究竟有没有被修好)。Phase 2 加上主动完成探测。实现: - 新增 prOutcomeCheck.ts(纯决策矩阵):summariseAutofixOutcome 给定 PR 快照 + 基线 SHA 返回 completed/summary。8 个决策分支单元测试。 - 新增 prFetch.ts(spawn 层):runGhPrView 调 gh CLI,fetchPrHeadSha 在 launch 时捕获基线 SHA,checkPrAutofixOutcome 组合两者。 - AutofixPrRemoteTaskMetadata 加 initialHeadSha?: string 字段,survive --resume。 - launchAutofixPr.ts 模块顶部 registerCompletionChecker('autofix-pr', ...),5s throttle 防 gh CLI 调用爆。callAutofixPr 启动时调 fetchPrHeadSha 传入 metadata。决策矩阵: MERGED → done(merged) CLOSED 未 merge → done(closed without fix) OPEN 无 baseline → 继续轮询 OPEN head 未变 → 继续轮询(agent 还没 push) OPEN head 变 + CI pending → 继续轮询 OPEN head 变 + CI failure → done(surface red,user 决定 retry) OPEN head 变 + CI success → done(clean fix) 设计: - gh CLI 而非 Octokit:复用用户已有 auth,不引入 token 管理 - 决策与 spawn 分文件:prOutcomeCheck 纯函数易测,prFetch 单独 mock 避免 Bun mock.module 进程级污染(已在 launchAutofixPr.test 注释说明) - 5s throttle:framework 每 1s 轮询,gh CLI subprocess 太重不能跟上 - 失败兜底:fetchPrHeadSha/checkPrAutofixOutcome 失败均不抛,returns null/false,framework 继续走原路径测试: - prOutcomeCheck 9 个单测覆盖决策矩阵 - launchAutofixPr 5 个新测试:checker 注册 / fetchPrHeadSha 调用 / initialHeadSha 传 metadata / SHA 失败仍能 launch / SHA null 处理完整方案见 docs/features/remote-agent-completion-analysis.md。 * feat(autofix-pr): 内容回流让本地模型读到 PR 修复结果 Phase 3 of remote-agent completion loop。Phase 2 注册了 completionChecker 让框架能在 PR 合并/关闭/有 push+CI 绿时主动完成 task,但 task-notification 仍然只携带 generic 文本(""${owner}/${repo}#42 merged"")。Phase 3 让本地模型读到远端 agent 自己产出的结构化结果(commits 列表、files 列表、CI 状态、人类可读 summary)。实现: - 新增 extractAutofixResultFromLog (src/commands/autofix-pr/ extractAutofixResult.ts):从 SDKMessage[] 中扫 <autofix-result> tag, 优先 hook stdout 后 fallback assistant text,latest-wins。10 个单测。 - RemoteAgentTask 新增 registerContentExtractor 注册式 API + 私有 enqueueRichRemoteNotification(参考 enqueueRemoteReviewNotification), 在 3 个 generic 完成路径(archived / completionChecker / result-driven) 先尝试 tryExtractRichContent,有内容用 rich 变体,没有走 generic。 isRemoteReview 路径不变(它走自己的 enqueueRemoteReviewNotification)。 - launchAutofixPr.ts 模块顶部 registerContentExtractor('autofix-pr', extractAutofixResultFromLog)。initialMessage 加 <autofix-result> 输出指令(pr-number / commits-pushed / files-changed / ci-status / summary)。设计: - 注册式 API(同 Phase 1 hook + Phase 2 checker):framework 不反向依赖命令模块,所有 PR-specific 逻辑在 autofix-pr/ - latest-wins:agent 重试时只取最新 tag,旧 tag 不会污染 - truncated tag → null:开 tag 无对应闭 tag 视为不完整,走 generic fallback - 跨 message 不拼接:开 tag 和闭 tag 在不同 message 视为不完整(避免误拼字符串) - 字符串 content 不解析:assistant.message.content 为 string(非 block array)的少见路径直接 skip,不 crash 测试: - extractAutofixResultFromLog 10 个单测(空 log / 无 tag / hook stdout / assistant text / hook_response subtype / 多 tag latest-wins / 截断 / hook 后于 assistant 的优先级 / 跨 message 不拼接 / 字符串 content graceful) - launchAutofixPr 3 个新测试(extractor 注册 / initialMessage 含 tag schema / extractor 真实行为) 完整方案见 docs/features/remote-agent-completion-analysis.md 第 5.3 节。 * fix(autofix-pr): extractBetween 支持 latest tag 截断时回溯到更早完整对如果远端 agent 重试时写了完整 <autofix-result> 后又开了一个被截断的第二个 tag, 旧实现只看 lastIndexOf(open) 然后找不到 close 就返回 null, 导致前面那个完整结果被丢弃。改为从尾向首遍历所有 open tag, 返回第一个能配对的 open/close 对。附带: - docs/features/remote-agent-completion-analysis.md: 9 处裸 fenced block 补 language tag (text/http), 修复 markdownlint MD040 警告 - 同文件: 两处"三选项" → "三个选项" 符合中文量词习惯 * test(autofix-pr): 补齐 completionChecker / 边界 CI 检查覆盖率针对 codecov patch coverage gap, 补足三块此前未走到的代码路径: prOutcomeCheck.ts (原 96.92%, 2 lines missing): - statusCheckRollup === undefined 路径 (与空数组分支不同, GitHub 在无 checks 配置的 PR 上直接省略字段) - COMPLETED 状态但 conclusion 为 null/空的 in-flight 检查归为 pending launchAutofixPr.ts (原 58.33%, 15 lines missing): - registerCompletionChecker arrow body: metadata 缺失早返回 / 节流窗口内返回 null / completed=false 返回 null / completed=true 返回 summary / initialHeadSha 透传到 checkPrAutofixOutcome - registerCompletionHook 的 if(meta) 短路两侧: 有 metadata 时清空节流条目, 无 metadata 时仍释放 active monitor lock 所有新测试沿用现有 mock.module 与 registerXxxMock.mock.calls 拉取注册回调的模式, 无新增依赖。prOutcomeCheck 11/11 本地通过。 * style: biome check --fix 整形 launchAutofixPr.test 新增段 --------- Co-authored-by: unraid <local@unraid.local> Co-authored-by: Claude <noreply@anthropic.com>
2026-06-23 08:45:50 +00:00 · 2026-05-22 21:06:26 +08:00
parent f2b751f659
commit 9d17597e58
10 changed files with 1346 additions and 7 deletions
--- a/src/commands/autofix-pr/tests/extractAutofixResult.test.ts
+++ b/src/commands/autofix-pr/tests/extractAutofixResult.test.ts
@@ -0,0 +1,133 @@
+import { describe, expect, test } from 'bun:test'
+import type { SDKMessage } from '../../../entrypoints/agentSdkTypes.js'
+import {
+  AUTOFIX_RESULT_TAG,
+  extractAutofixResultFromLog,
+} from '../extractAutofixResult.js'
+
+function hookProgressMessage(stdout: string): SDKMessage {
+  return {
+    type: 'system',
+    subtype: 'hook_progress',
+    stdout,
+  } as unknown as SDKMessage
+}
+
+function assistantTextMessage(text: string): SDKMessage {
+  return {
+    type: 'assistant',
+    message: {
+      content: [{ type: 'text', text }],
+    },
+  } as unknown as SDKMessage
+}
+
+const sampleTag = (summary: string): string =>
+  `<${AUTOFIX_RESULT_TAG}>
+  <pr-number>42</pr-number>
+  <commits-pushed>
+    <commit sha="abc123">${summary}</commit>
+  </commits-pushed>
+  <ci-status>green</ci-status>
+  <summary>${summary}</summary>
+</${AUTOFIX_RESULT_TAG}>`
+
+describe('extractAutofixResultFromLog', () => {
+  test('returns null on empty log', () => {
+    expect(extractAutofixResultFromLog([])).toBeNull()
+  })
+
+  test('returns null when no tag present', () => {
+    const log = [
+      assistantTextMessage('just some normal text without the tag'),
+      hookProgressMessage('hook output without tag'),
+    ]
+    expect(extractAutofixResultFromLog(log)).toBeNull()
+  })
+
+  test('extracts from hook stdout', () => {
+    const tag = sampleTag('fixed lint error')
+    const log = [hookProgressMessage(`prefix\n${tag}\nsuffix`)]
+    const result = extractAutofixResultFromLog(log)
+    expect(result).toBe(tag)
+  })
+
+  test('extracts from assistant text', () => {
+    const tag = sampleTag('typecheck fixed')
+    const log = [assistantTextMessage(`Done!\n${tag}`)]
+    expect(extractAutofixResultFromLog(log)).toBe(tag)
+  })
+
+  test('extracts from hook_response subtype too', () => {
+    const tag = sampleTag('via hook_response')
+    const log = [
+      {
+        type: 'system',
+        subtype: 'hook_response',
+        stdout: tag,
+      } as unknown as SDKMessage,
+    ]
+    expect(extractAutofixResultFromLog(log)).toBe(tag)
+  })
+
+  test('returns the latest tag when multiple appear in different messages', () => {
+    const older = sampleTag('older attempt')
+    const newer = sampleTag('newer attempt')
+    const log = [
+      assistantTextMessage(`first try\n${older}`),
+      assistantTextMessage(`retry\n${newer}`),
+    ]
+    expect(extractAutofixResultFromLog(log)).toBe(newer)
+  })
+
+  test('returns null when open tag exists but close tag is missing (truncated)', () => {
+    const log = [
+      assistantTextMessage(
+        `<${AUTOFIX_RESULT_TAG}>\n<summary>got cut off mid-write...`,
+      ),
+    ]
+    expect(extractAutofixResultFromLog(log)).toBeNull()
+  })
+
+  test('returns earlier complete tag when latest open tag is truncated within the same block', () => {
+    // Retry scenario: a full result was emitted, then a second result tag
+    // started but got cut off. We should surface the earlier complete pair
+    // rather than dropping the whole block.
+    const complete = sampleTag('earlier complete result')
+    const truncated = `<${AUTOFIX_RESULT_TAG}>\n<summary>truncated retry...`
+    const log = [assistantTextMessage(`${complete}\n${truncated}`)]
+    expect(extractAutofixResultFromLog(log)).toBe(complete)
+  })
+
+  test('walks backwards so hook stdout from later in log wins over earlier assistant text', () => {
+    const earlier = sampleTag('via assistant first')
+    const later = sampleTag('via hook later')
+    const log = [
+      assistantTextMessage(`some output\n${earlier}`),
+      hookProgressMessage(later),
+    ]
+    expect(extractAutofixResultFromLog(log)).toBe(later)
+  })
+
+  test('ignores tag-shaped strings that span across messages (no concatenation)', () => {
+    // Open tag in one message, close tag in another — should NOT be stitched.
+    const log = [
+      assistantTextMessage(`<${AUTOFIX_RESULT_TAG}>\n<summary>part 1`),
+      assistantTextMessage(`part 2</summary>\n</${AUTOFIX_RESULT_TAG}>`),
+    ]
+    expect(extractAutofixResultFromLog(log)).toBeNull()
+  })
+
+  test('extracts when assistant content is a string (not block array)', () => {
+    // Some SDK paths emit assistant content as a raw string instead of
+    // a content-block array. Current implementation skips those — verify
+    // graceful no-op rather than crash.
+    const log = [
+      {
+        type: 'assistant',
+        message: { content: sampleTag('string content') },
+      } as unknown as SDKMessage,
+    ]
+    expect(extractAutofixResultFromLog(log)).toBeNull()
+  })
+})
--- a/src/commands/autofix-pr/tests/launchAutofixPr.test.ts
+++ b/src/commands/autofix-pr/tests/launchAutofixPr.test.ts
@@ -46,7 +46,7 @@ mock.module('src/utils/teleport.js', () => ({
 }))

 const registerMock = mock(() => ({
-  taskId: 'task-abc',
+  taskId: 'framework-task-id',
  sessionId: 'session-123',
  cleanup: () => {},
 }))
@@ -56,14 +56,41 @@ const checkEligibilityMock = mock(() =>
 const getSessionUrlMock = mock(
  (id: string) => `https://claude.ai/session/${id}`,
 )
+const registerCompletionHookMock = mock<
+  (taskType: string, hook: (taskId: string, metadata?: unknown) => void) => void
+>(() => {})
+const registerCompletionCheckerMock = mock<
+  (
+    taskType: string,
+    checker: (metadata?: unknown) => Promise<string | null>,
+  ) => void
+>(() => {})
+const registerContentExtractorMock = mock<
+  (taskType: string, extractor: (log: unknown[]) => string | null) => void
+>(() => {})

 mock.module('src/tasks/RemoteAgentTask/RemoteAgentTask.js', () => ({
  checkRemoteAgentEligibility: checkEligibilityMock,
  registerRemoteAgentTask: registerMock,
+  registerCompletionHook: registerCompletionHookMock,
+  registerCompletionChecker: registerCompletionCheckerMock,
+  registerContentExtractor: registerContentExtractorMock,
  getRemoteTaskSessionUrl: getSessionUrlMock,
  formatPreconditionError: (e: { type: string }) => e.type,
 }))

+const fetchPrHeadShaMock = mock<
+  (owner: string, repo: string, prNumber: number) => Promise<string | null>
+>(() => Promise.resolve('sha-baseline-abc123'))
+
+// Mock prFetch.ts (gh CLI spawn layer) — keeping the pure decision matrix
+// in prOutcomeCheck.ts unmocked so its tests are unaffected by this file's
+// process-global mock.module pollution.
+mock.module('src/commands/autofix-pr/prFetch.js', () => ({
+  fetchPrHeadSha: fetchPrHeadShaMock,
+  checkPrAutofixOutcome: mock(() => Promise.resolve({ completed: false })),
+}))
+
 const detectRepoMock = mock(() =>
  Promise.resolve({ host: 'github.com', owner: 'acme', name: 'myrepo' }),
 )
@@ -375,6 +402,326 @@ describe('callAutofixPr', () => {
  })
 })

+// Regression suite for the taskId-mismatch latent bug + completion hook wiring.
+// Before this fix, createAutofixTeammate generated a teammate UUID, that UUID
+// was used to acquire the singleton monitor lock, and registerRemoteAgentTask
+// generated a *different* framework taskId. When the framework eventually
+// called clearActiveMonitor(frameworkTaskId) on natural completion, the guard
+// failed (active.taskId !== frameworkTaskId) and the lock stayed acquired,
+// blocking any subsequent /autofix-pr invocations in the same process.
+describe('callAutofixPr · completion hook wiring (taskId mismatch regression)', () => {
+  test('updateActiveMonitor swaps lock taskId to framework-assigned id after register', async () => {
+    await callAutofixPr(onDone, makeContext(), '42')
+    const monitor = getActiveMonitor() as { taskId: string } | null
+    expect(monitor).not.toBeNull()
+    // registerMock returns 'framework-task-id'; before the fix this would be
+    // a teammate-generated random UUID instead.
+    expect(monitor?.taskId).toBe('framework-task-id')
+  })
+
+  test('framework hook → clearActiveMonitor releases lock on natural completion', async () => {
+    await callAutofixPr(onDone, makeContext(), '42')
+    expect(getActiveMonitor()).not.toBeNull()
+
+    // Find the hook the module registered at import time. We grab the last
+    // call so re-imports across tests don't break this — only the most recent
+    // registration is what the framework would invoke now.
+    const calls = registerCompletionHookMock.mock.calls
+    expect(calls.length).toBeGreaterThan(0)
+    const lastCall = calls[calls.length - 1]
+    expect(lastCall?.[0]).toBe('autofix-pr')
+    const hook = lastCall?.[1] as (id: string, metadata?: unknown) => void
+    expect(typeof hook).toBe('function')
+
+    // Simulate the framework invoking the hook with the framework taskId
+    // after a terminal transition. Before the fix this would no-op against
+    // a lock keyed by the teammate UUID.
+    hook('framework-task-id', { owner: 'acme', repo: 'myrepo', prNumber: 42 })
+    expect(getActiveMonitor()).toBeNull()
+  })
+
+  test('subsequent /autofix-pr succeeds after framework hook clears the lock', async () => {
+    await callAutofixPr(onDone, makeContext(), '42')
+    // Simulate natural completion via the registered hook
+    const calls = registerCompletionHookMock.mock.calls
+    const hook = calls[calls.length - 1]?.[1] as (
+      id: string,
+      metadata?: unknown,
+    ) => void
+    hook('framework-task-id', { owner: 'acme', repo: 'myrepo', prNumber: 42 })
+
+    onDone.mockClear()
+    await callAutofixPr(onDone, makeContext(), '99')
+    const firstArg = onDone.mock.calls[0]?.[0] as string
+    // Should be the success path, not "already monitoring"
+    expect(firstArg).not.toMatch(/already monitoring/i)
+    expect(firstArg).toMatch(/Autofix launched/)
+  })
+})
+
+// Phase 2: completionChecker wiring + initialHeadSha capture
+describe('callAutofixPr · Phase 2 completionChecker integration', () => {
+  test('completionChecker is registered at module load with autofix-pr type', () => {
+    // The registration happens during the beforeAll dynamic import; just
+    // verify the mock recorded a call. Filter by task type so any future
+    // additional registrations elsewhere don't break this assertion.
+    const calls = registerCompletionCheckerMock.mock.calls.filter(
+      c => c[0] === 'autofix-pr',
+    )
+    expect(calls.length).toBeGreaterThan(0)
+    const hook = calls[calls.length - 1]?.[1]
+    expect(typeof hook).toBe('function')
+  })
+
+  test('callAutofixPr captures initialHeadSha via fetchPrHeadSha', async () => {
+    fetchPrHeadShaMock.mockClear()
+    await callAutofixPr(onDone, makeContext(), '42')
+    expect(fetchPrHeadShaMock).toHaveBeenCalledWith('acme', 'myrepo', 42)
+  })
+
+  test('initialHeadSha is passed into remoteTaskMetadata on register', async () => {
+    fetchPrHeadShaMock.mockImplementationOnce(() =>
+      Promise.resolve('sha-from-launch'),
+    )
+    await callAutofixPr(onDone, makeContext(), '42')
+    expect(registerMock).toHaveBeenCalledWith(
+      expect.objectContaining({
+        remoteTaskMetadata: expect.objectContaining({
+          owner: 'acme',
+          repo: 'myrepo',
+          prNumber: 42,
+          initialHeadSha: 'sha-from-launch',
+        }),
+      }),
+    )
+  })
+
+  test('fetchPrHeadSha failure → metadata initialHeadSha undefined, launch still succeeds', async () => {
+    fetchPrHeadShaMock.mockImplementationOnce(() =>
+      Promise.reject(new Error('gh not installed')),
+    )
+    await callAutofixPr(onDone, makeContext(), '42')
+    expect(registerMock).toHaveBeenCalledWith(
+      expect.objectContaining({
+        remoteTaskMetadata: expect.objectContaining({
+          owner: 'acme',
+          repo: 'myrepo',
+          prNumber: 42,
+          initialHeadSha: undefined,
+        }),
+      }),
+    )
+    // Launch must NOT fail just because SHA capture failed
+    const firstArg = onDone.mock.calls[0]?.[0] as string
+    expect(firstArg).toMatch(/Autofix launched/)
+  })
+
+  test('fetchPrHeadSha returning null → metadata initialHeadSha undefined', async () => {
+    fetchPrHeadShaMock.mockImplementationOnce(() => Promise.resolve(null))
+    await callAutofixPr(onDone, makeContext(), '42')
+    expect(registerMock).toHaveBeenCalledWith(
+      expect.objectContaining({
+        remoteTaskMetadata: expect.objectContaining({
+          initialHeadSha: undefined,
+        }),
+      }),
+    )
+  })
+})
+
+// Phase 2 (cont.): exercise the registered completionChecker arrow body
+// directly. The earlier suite verifies it was registered but never invokes
+// the arrow itself, leaving the throttle / metadata-guard / gh-CLI dispatch
+// branches uncovered.
+describe('callAutofixPr · Phase 2 completionChecker arrow body', () => {
+  // Pull the most recent registered checker — beforeAll registers once at
+  // module load; nothing else re-registers across this file's tests.
+  function getChecker(): (metadata?: unknown) => Promise<string | null> {
+    const calls = registerCompletionCheckerMock.mock.calls.filter(
+      c => c[0] === 'autofix-pr',
+    )
+    const fn = calls[calls.length - 1]?.[1]
+    if (typeof fn !== 'function') {
+      throw new Error('completionChecker not registered')
+    }
+    return fn
+  }
+
+  test('returns null when metadata is undefined (early guard)', async () => {
+    const checker = getChecker()
+    expect(await checker(undefined)).toBeNull()
+  })
+
+  test('returns null when checkPrAutofixOutcome reports not completed', async () => {
+    const { checkPrAutofixOutcome } = await import('../prFetch.js')
+    ;(checkPrAutofixOutcome as ReturnType<typeof mock>).mockImplementationOnce(
+      () => Promise.resolve({ completed: false }),
+    )
+    const checker = getChecker()
+    // Distinct PR number to dodge the in-process throttle map carried over
+    // from earlier tests.
+    const result = await checker({
+      owner: 'acme',
+      repo: 'myrepo',
+      prNumber: 1001,
+    })
+    expect(result).toBeNull()
+  })
+
+  test('returns the summary string when checkPrAutofixOutcome reports completed', async () => {
+    const { checkPrAutofixOutcome } = await import('../prFetch.js')
+    ;(checkPrAutofixOutcome as ReturnType<typeof mock>).mockImplementationOnce(
+      () =>
+        Promise.resolve({
+          completed: true,
+          summary: 'acme/myrepo#1002 merged. Autofix monitoring complete.',
+        }),
+    )
+    const checker = getChecker()
+    const result = await checker({
+      owner: 'acme',
+      repo: 'myrepo',
+      prNumber: 1002,
+    })
+    expect(result).toBe('acme/myrepo#1002 merged. Autofix monitoring complete.')
+  })
+
+  test('passes initialHeadSha through to checkPrAutofixOutcome', async () => {
+    const { checkPrAutofixOutcome } = await import('../prFetch.js')
+    const checkMock = checkPrAutofixOutcome as ReturnType<typeof mock>
+    checkMock.mockClear()
+    checkMock.mockImplementationOnce(() =>
+      Promise.resolve({ completed: false }),
+    )
+    const checker = getChecker()
+    await checker({
+      owner: 'acme',
+      repo: 'myrepo',
+      prNumber: 1003,
+      initialHeadSha: 'sha-baseline-xyz',
+    })
+    expect(checkMock).toHaveBeenCalledWith({
+      owner: 'acme',
+      repo: 'myrepo',
+      prNumber: 1003,
+      initialHeadSha: 'sha-baseline-xyz',
+    })
+  })
+
+  test('throttles back-to-back calls for the same PR within CHECK_INTERVAL_MS', async () => {
+    const { checkPrAutofixOutcome } = await import('../prFetch.js')
+    const checkMock = checkPrAutofixOutcome as ReturnType<typeof mock>
+    checkMock.mockClear()
+    checkMock.mockImplementation(() => Promise.resolve({ completed: false }))
+    const checker = getChecker()
+    const meta = { owner: 'acme', repo: 'myrepo', prNumber: 1004 }
+    await checker(meta)
+    // Second call within the 5s throttle window must short-circuit to null
+    // without invoking the gh CLI layer again.
+    const callCountAfterFirst = checkMock.mock.calls.length
+    const result = await checker(meta)
+    expect(result).toBeNull()
+    expect(checkMock.mock.calls.length).toBe(callCountAfterFirst)
+  })
+
+  test('completionHook with metadata clears the throttle entry (re-launch can re-check immediately)', async () => {
+    const { checkPrAutofixOutcome } = await import('../prFetch.js')
+    const checkMock = checkPrAutofixOutcome as ReturnType<typeof mock>
+    checkMock.mockClear()
+    checkMock.mockImplementation(() => Promise.resolve({ completed: false }))
+    const checker = getChecker()
+    const meta = { owner: 'acme', repo: 'myrepo', prNumber: 1005 }
+    await checker(meta) // populate throttle map
+
+    // Invoke the registered completion hook with the same metadata so the
+    // throttle entry is wiped, then verify the next checker call dispatches
+    // gh CLI again instead of short-circuiting.
+    const hookCalls = registerCompletionHookMock.mock.calls.filter(
+      c => c[0] === 'autofix-pr',
+    )
+    const hook = hookCalls[hookCalls.length - 1]?.[1] as (
+      id: string,
+      metadata?: unknown,
+    ) => void
+    hook('any-task-id', meta)
+
+    const callCountBefore = checkMock.mock.calls.length
+    await checker(meta)
+    expect(checkMock.mock.calls.length).toBe(callCountBefore + 1)
+  })
+
+  test('completionHook without metadata still clears the active monitor lock', async () => {
+    // Lock is set via callAutofixPr; hook then invoked with undefined metadata
+    // to exercise the `if (meta)` short-circuit branch (the lock-clear half
+    // still has to run regardless of metadata presence).
+    await callAutofixPr(onDone, makeContext(), '42')
+    expect(getActiveMonitor()).not.toBeNull()
+    const hookCalls = registerCompletionHookMock.mock.calls.filter(
+      c => c[0] === 'autofix-pr',
+    )
+    const hook = hookCalls[hookCalls.length - 1]?.[1] as (
+      id: string,
+      metadata?: unknown,
+    ) => void
+    hook('framework-task-id', undefined)
+    expect(getActiveMonitor()).toBeNull()
+  })
+})
+
+// Phase 3: content extractor wiring + initialMessage tag instruction
+describe('callAutofixPr · Phase 3 content extractor integration', () => {
+  test('registerContentExtractor is called at module load with autofix-pr type', () => {
+    const calls = registerContentExtractorMock.mock.calls.filter(
+      c => c[0] === 'autofix-pr',
+    )
+    expect(calls.length).toBeGreaterThan(0)
+    const extractor = calls[calls.length - 1]?.[1]
+    expect(typeof extractor).toBe('function')
+  })
+
+  test('initialMessage instructs the remote agent to emit an <autofix-result> tag', async () => {
+    await callAutofixPr(onDone, makeContext(), '42')
+    // teleportMock's typed signature has no args, so calls[0] is a
+    // zero-length tuple. We know teleportToRemote is invoked with one
+    // options object, so double-cast through unknown to read the args.
+    const calls = teleportMock.mock.calls as unknown as Array<
+      [{ initialMessage?: string }]
+    >
+    const teleportArgs = calls[0]?.[0]
+    expect(teleportArgs?.initialMessage).toContain('<autofix-result>')
+    expect(teleportArgs?.initialMessage).toContain('</autofix-result>')
+    expect(teleportArgs?.initialMessage).toContain('<ci-status>')
+    expect(teleportArgs?.initialMessage).toContain('<summary>')
+  })
+
+  test('registered extractor returns string for valid log and null for empty', () => {
+    const calls = registerContentExtractorMock.mock.calls.filter(
+      c => c[0] === 'autofix-pr',
+    )
+    const extractor = calls[calls.length - 1]?.[1] as
+      | ((log: unknown[]) => string | null)
+      | undefined
+    expect(extractor).toBeDefined()
+    // Empty log → null
+    expect(extractor?.([])).toBeNull()
+    // Log with assistant text containing tag → returns it
+    const logWithTag = [
+      {
+        type: 'assistant',
+        message: {
+          content: [
+            {
+              type: 'text',
+              text: 'done\n<autofix-result><summary>x</summary></autofix-result>',
+            },
+          ],
+        },
+      },
+    ]
+    expect(extractor?.(logWithTag)).toContain('<autofix-result>')
+  })
+})
+
 // Cover ../index.ts load() — placed in this test file so all the heavy mocks
 // (teleport / detectRepository / RemoteAgentTask / bootstrap-state / analytics /
 // skillDetect) are already registered when load() dynamically imports
--- a/src/commands/autofix-pr/tests/monitorState.test.ts
+++ b/src/commands/autofix-pr/tests/monitorState.test.ts
@@ -5,6 +5,7 @@ import {
  isMonitoring,
  setActiveMonitor,
  trySetActiveMonitor,
+  updateActiveMonitor,
 } from '../monitorState.js'

 function makeState(
@@ -76,4 +77,41 @@ describe('monitorState', () => {
    // First state remains
    expect(getActiveMonitor()?.prNumber).toBe(1)
  })
+
+  test('updateActiveMonitor returns false when no active monitor', () => {
+    expect(updateActiveMonitor({ taskId: 'task-x' })).toBe(false)
+    expect(getActiveMonitor()).toBeNull()
+  })
+
+  test('updateActiveMonitor merges partial fields into the active monitor', () => {
+    setActiveMonitor(makeState({ taskId: 'tentative-uuid' }))
+    expect(updateActiveMonitor({ taskId: 'framework-task-id' })).toBe(true)
+    const after = getActiveMonitor()
+    expect(after?.taskId).toBe('framework-task-id')
+    // Other fields untouched
+    expect(after?.owner).toBe('acme')
+    expect(after?.repo).toBe('myrepo')
+    expect(after?.prNumber).toBe(42)
+  })
+
+  test('updateActiveMonitor with new taskId makes clearActiveMonitor recognise framework taskId', () => {
+    // Reproduce the latent bug scenario: lock acquired with one taskId,
+    // framework assigns a different one. Before the fix, the framework's
+    // clearActiveMonitor(frameworkTaskId) would no-op because guard fails.
+    setActiveMonitor(makeState({ taskId: 'teammate-uuid' }))
+    // Framework cleanup using its own taskId — would fail guard before the fix
+    clearActiveMonitor('framework-uuid')
+    expect(getActiveMonitor()).not.toBeNull()
+    // After updateActiveMonitor swaps the taskId, framework cleanup works
+    updateActiveMonitor({ taskId: 'framework-uuid' })
+    clearActiveMonitor('framework-uuid')
+    expect(getActiveMonitor()).toBeNull()
+  })
+
+  test('updateActiveMonitor does not change abortController identity', () => {
+    const ac = new AbortController()
+    setActiveMonitor(makeState({ abortController: ac, taskId: 'tentative' }))
+    updateActiveMonitor({ taskId: 'updated' })
+    expect(getActiveMonitor()?.abortController).toBe(ac)
+  })
 })
--- a/src/commands/autofix-pr/tests/prOutcomeCheck.test.ts
+++ b/src/commands/autofix-pr/tests/prOutcomeCheck.test.ts
@@ -0,0 +1,193 @@
+import { describe, expect, test } from 'bun:test'
+import {
+  type PrViewPayload,
+  summariseAutofixOutcome,
+} from '../prOutcomeCheck.js'
+
+function basePayload(overrides: Partial<PrViewPayload> = {}): PrViewPayload {
+  return {
+    headRefOid: 'sha-baseline',
+    state: 'OPEN',
+    statusCheckRollup: [],
+    ...overrides,
+  }
+}
+
+const identity = (overrides: Partial<{ initialHeadSha: string }> = {}) => ({
+  owner: 'acme',
+  repo: 'myrepo',
+  prNumber: 42,
+  initialHeadSha: 'sha-baseline',
+  ...overrides,
+})
+
+describe('summariseAutofixOutcome · terminal PR states', () => {
+  test('MERGED → completed regardless of head SHA / CI', () => {
+    const result = summariseAutofixOutcome(
+      basePayload({ state: 'MERGED', headRefOid: 'sha-baseline' }),
+      identity(),
+    )
+    expect(result).toEqual({
+      completed: true,
+      summary: 'acme/myrepo#42 merged. Autofix monitoring complete.',
+    })
+  })
+
+  test('CLOSED → completed regardless of head SHA / CI', () => {
+    const result = summariseAutofixOutcome(
+      basePayload({ state: 'CLOSED' }),
+      identity(),
+    )
+    expect(result).toEqual({
+      completed: true,
+      summary:
+        'acme/myrepo#42 closed without merge. Autofix monitoring complete.',
+    })
+  })
+})
+
+describe('summariseAutofixOutcome · OPEN PR without push', () => {
+  test('no initialHeadSha baseline → not completed (cannot detect push)', () => {
+    const result = summariseAutofixOutcome(
+      basePayload({ state: 'OPEN' }),
+      identity({ initialHeadSha: undefined as unknown as string }),
+    )
+    expect(result).toEqual({ completed: false })
+  })
+
+  test('headRefOid unchanged → not completed (autofix has not pushed yet)', () => {
+    const result = summariseAutofixOutcome(
+      basePayload({ state: 'OPEN', headRefOid: 'sha-baseline' }),
+      identity(),
+    )
+    expect(result).toEqual({ completed: false })
+  })
+})
+
+describe('summariseAutofixOutcome · OPEN PR with push, CI variations', () => {
+  test('push detected + no checks configured → completed (success)', () => {
+    const result = summariseAutofixOutcome(
+      basePayload({
+        state: 'OPEN',
+        headRefOid: 'sha-new',
+        statusCheckRollup: [],
+      }),
+      identity(),
+    )
+    expect(result).toEqual({
+      completed: true,
+      summary: 'Autofix pushed commits to acme/myrepo#42, CI green.',
+    })
+  })
+
+  test('push detected + CI pending → not completed (wait for CI)', () => {
+    const result = summariseAutofixOutcome(
+      basePayload({
+        state: 'OPEN',
+        headRefOid: 'sha-new',
+        statusCheckRollup: [
+          { status: 'IN_PROGRESS', conclusion: null, name: 'ci' },
+          { status: 'COMPLETED', conclusion: 'SUCCESS', name: 'lint' },
+        ],
+      }),
+      identity(),
+    )
+    expect(result).toEqual({ completed: false })
+  })
+
+  test('push detected + CI all green → completed (success summary)', () => {
+    const result = summariseAutofixOutcome(
+      basePayload({
+        state: 'OPEN',
+        headRefOid: 'sha-new',
+        statusCheckRollup: [
+          { status: 'COMPLETED', conclusion: 'SUCCESS', name: 'ci' },
+          { status: 'COMPLETED', conclusion: 'SUCCESS', name: 'lint' },
+        ],
+      }),
+      identity(),
+    )
+    expect(result.completed).toBe(true)
+    if (result.completed) {
+      expect(result.summary).toContain('CI green')
+      expect(result.summary).toContain('acme/myrepo#42')
+    }
+  })
+
+  test('push detected + CI red → completed (failure summary surfaces the red)', () => {
+    const result = summariseAutofixOutcome(
+      basePayload({
+        state: 'OPEN',
+        headRefOid: 'sha-new',
+        statusCheckRollup: [
+          { status: 'COMPLETED', conclusion: 'FAILURE', name: 'ci' },
+          { status: 'COMPLETED', conclusion: 'SUCCESS', name: 'lint' },
+        ],
+      }),
+      identity(),
+    )
+    expect(result.completed).toBe(true)
+    if (result.completed) {
+      expect(result.summary).toContain('CI is failing')
+      expect(result.summary).toContain('1/2 checks failing')
+    }
+  })
+
+  test('statusCheckRollup undefined → treated as no checks configured (success)', () => {
+    // Distinct from empty-array: GitHub omits the field entirely on PRs
+    // without any configured checks. The !rollup branch covers undefined.
+    const result = summariseAutofixOutcome(
+      basePayload({
+        state: 'OPEN',
+        headRefOid: 'sha-new',
+        statusCheckRollup: undefined,
+      }),
+      identity(),
+    )
+    expect(result.completed).toBe(true)
+    if (result.completed) {
+      expect(result.summary).toContain('CI green')
+    }
+  })
+
+  test('check with COMPLETED status but empty conclusion → counted as pending', () => {
+    // Edge case: GitHub sometimes reports a check as COMPLETED with a null/
+    // missing conclusion (in-flight result mid-write). The defensive branch
+    // treats empty conclusion after a passed status check as pending.
+    const result = summariseAutofixOutcome(
+      basePayload({
+        state: 'OPEN',
+        headRefOid: 'sha-new',
+        statusCheckRollup: [
+          { status: 'COMPLETED', conclusion: null, name: 'ci-in-flight' },
+          { status: 'COMPLETED', conclusion: 'SUCCESS', name: 'lint' },
+        ],
+      }),
+      identity(),
+    )
+    expect(result).toEqual({ completed: false })
+  })
+
+  test('neutral / skipped conclusions count as success (not failure)', () => {
+    const result = summariseAutofixOutcome(
+      basePayload({
+        state: 'OPEN',
+        headRefOid: 'sha-new',
+        statusCheckRollup: [
+          {
+            status: 'COMPLETED',
+            conclusion: 'NEUTRAL',
+            name: 'optional-check',
+          },
+          { status: 'COMPLETED', conclusion: 'SKIPPED', name: 'docs-check' },
+          { status: 'COMPLETED', conclusion: 'SUCCESS', name: 'ci' },
+        ],
+      }),
+      identity(),
+    )
+    expect(result.completed).toBe(true)
+    if (result.completed) {
+      expect(result.summary).toContain('CI green')
+    }
+  })
+})