docs: 文档优化完成

2026-06-15 12:55:51 +00:00 · 2026-04-01 17:18:48 +08:00
parent 221fb6eb05
commit c57e6ee384
7 changed files with 1077 additions and 242 deletions
--- a/docs/context/system-prompt.mdx
+++ b/docs/context/system-prompt.mdx
@@ -1,36 +1,225 @@
 ---
 title: "System Prompt 动态组装 - AI 工作记忆构建"
-description: "深入解析 Claude Code 的 System Prompt 动态组装过程：如何将 CLAUDE.md、项目上下文、工具定义和用户偏好拼装为 AI 的工作记忆。"
-keywords: ["System Prompt", "系统提示词", "动态组装", "CLAUDE.md", "上下文构建"]
+description: "深入解析 Claude Code 的 System Prompt 动态组装过程：缓存策略、分界标记、Section 注册表、CLAUDE.md 多级合并，以及如何将零散上下文拼装为 API 可消费的缓存友好结构。"
+keywords: ["System Prompt", "系统提示词", "动态组装", "CLAUDE.md", "Prompt Cache", "缓存策略"]
 ---

-{/* 本章目标：解释 System Prompt 的组装过程和设计思想 */}
+## 从数组到 API 调用：System Prompt 的完整链路

-## 什么是 System Prompt
+System Prompt 在 Claude Code 中不是一段写死的文本，而是一个 **`string[]` 数组**（品牌类型 `SystemPrompt`，定义于 `src/utils/systemPromptType.ts:8`），经过组装、分块、缓存标记后发送给 API。

-每次调用 AI API 时，都需要发送一个 System Prompt——它是 AI 的"人设说明书"，告诉 AI：
+### 三阶段管道

- 你是谁（Claude Code，一个编程助手）
- 你能做什么（可用工具列表）
- 你在什么环境（操作系统、当前目录、git 状态）
- 你需要遵守什么规则（安全规范、输出格式）
+```
+getSystemPrompt()          →  string[]       （组装内容）
+  ↓
+buildEffectiveSystemPrompt() →  SystemPrompt   （选择优先级路径）
+  ↓
+buildSystemPromptBlocks()  →  TextBlockParam[] （分块 + cache_control 标记）
+```

-## 不是静态模板，而是动态组装
+1. **`getSystemPrompt()`**（`src/constants/prompts.ts:444`）—— 收集静态段 + 动态段，插入 `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` 分界标记
+2. **`buildEffectiveSystemPrompt()`**（`src/utils/systemPrompt.ts:41`）—— 按 Override > Coordinator > Agent > Custom > Default 优先级选择
+3. **`buildSystemPromptBlocks()`**（`src/services/api/claude.ts:3214`）—— 调用 `splitSysPromptPrefix()` 分块，为每个块附加 `cache_control`

-Claude Code 的 System Prompt 不是一段写死的文本，而是根据当前环境**实时组装**的：
+## SystemPrompt 品牌类型

-<Frame caption="System Prompt 的 6 大组成部分">
-  <img src="/docs/images/system-prompt-assembly.png" alt="System Prompt 动态组装图" />
-</Frame>
+```typescript
+// src/utils/systemPromptType.ts:8
+export type SystemPrompt = readonly string[] & {
+  readonly __brand: 'SystemPrompt'
+}
+export function asSystemPrompt(value: readonly string[]): SystemPrompt {
+  return value as SystemPrompt  // 零开销类型断言
+}
+```

-| 组成部分 | 内容 | 来源 |
-|----------|------|------|
-| 基础人设 | 角色定义、行为准则 | 内置模板 |
-| 环境信息 | 操作系统、shell 类型、当前日期 | 运行时检测 |
-| Git 状态 | 当前分支、最近提交、工作区状态 | `git` 命令输出 |
-| 项目知识 | CLAUDE.md 文件内容 | 项目目录层级扫描 |
-| 记忆文件 | 用户偏好、项目约定 | 持久化记忆系统 |
-| 工具说明 | 每个可用工具的描述和参数 | 工具注册表 |
+品牌类型（branded type）防止普通 `string[]` 被意外传入 API 调用——只有通过 `asSystemPrompt()` 显式转换才能获得 `SystemPrompt` 类型。
+
+## getSystemPrompt()：内容组装的全景
+
+`src/constants/prompts.ts:444` 是 System Prompt 的核心工厂函数，返回一个有序数组：
+
+| 阶段 | 内容 | 缓存策略 |
+|------|------|----------|
+| **静态区** | Intro Section、System Rules、Doing Tasks、Actions、Using Tools、Tone & Style、Output Efficiency | 可跨组织缓存（`scope: 'global'`） |
+| **BOUNDARY** | `SYSTEM_PROMPT_DYNAMIC_BOUNDARY = '__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__'` | 分界标记（不发送给 API） |
+| **动态区** | Session Guidance、Memory、Model Override、Env Info、Language、Output Style、MCP Instructions、Scratchpad、FRC、Summarize Tool Results、Token Budget、Brief | 每次会话不同（`scope: 'org'` 或无缓存） |
+
+### 动态区的 Section 注册表
+
+动态区通过 `systemPromptSection()` / `DANGEROUS_uncachedSystemPromptSection()` 注册，这两个工厂函数定义于 `src/constants/systemPromptSections.ts`：
+
+```typescript
+// 缓存式 Section：计算一次，/clear 或 /compact 后才重新计算
+systemPromptSection('memory', () => loadMemoryPrompt())
+
+// 危险：每轮重新计算，会破坏 Prompt Cache
+DANGEROUS_uncachedSystemPromptSection(
+  'mcp_instructions',
+  () => isMcpInstructionsDeltaEnabled() ? null : getMcpInstructionsSection(mcpClients),
+  'MCP servers connect/disconnect between turns'  // 必须给出破坏缓存的理由
+)
+```
+
+`resolveSystemPromptSections()` 在每轮查询时解析所有 Section，对于 `cacheBreak: false` 的 Section，优先使用 `getSystemPromptSectionCache()` 中的缓存值。只有 MCP 指令等真正动态的内容使用 `DANGEROUS_uncachedSystemPromptSection`。
+
+### `CLAUDE_CODE_SIMPLE` 快速路径
+
+当环境变量 `CLAUDE_CODE_SIMPLE` 为真时，整个 System Prompt 缩减为一行：
+
+```typescript
+`You are Claude Code, Anthropic's official CLI for Claude.\n\nCWD: ${getCwd()}\nDate: ${getSessionStartDate()}`
+```
+
+跳过所有 Section 注册、缓存分块、动态组装——用于最小化 token 消耗的测试场景。
+
+## buildEffectiveSystemPrompt()：五级优先级
+
+`src/utils/systemPrompt.ts:41` 决定最终使用哪个 System Prompt：
+
+| 优先级 | 条件 | 行为 |
+|--------|------|------|
+| **0. Override** | `overrideSystemPrompt` 非空 | 完全替换，返回 `[override]` |
+| **1. Coordinator** | `COORDINATOR_MODE` feature + 环境变量 | 使用协调者专用提示词 |
+| **2. Agent** | `mainThreadAgentDefinition` 存在 | Proactive 模式：追加到默认提示词尾部；否则：替换默认提示词 |
+| **3. Custom** | `--system-prompt` 参数指定 | 替换默认提示词 |
+| **4. Default** | 无特殊条件 | 使用 `getSystemPrompt()` 完整输出 |
+
+`appendSystemPrompt` 始终追加到末尾（Override 除外）。
+
+## 缓存策略：分块、标记、命中
+
+这是 System Prompt 设计中最精密的部分。
+
+### Anthropic Prompt Cache 基础
+
+Anthropic API 的 Prompt Cache 允许跨请求复用相同的 System Prompt 前缀，按缓存命中量计费（远低于完整输入价格）。缓存键由内容的 Blake2b 哈希决定——任何字符变化都会导致缓存失效。
+
+### `splitSysPromptPrefix()`：三种分块模式
+
+`src/utils/api.ts:321` 是缓存策略的核心，根据条件选择三种分块模式：
+
+#### 模式 1：MCP 工具存在时（`skipGlobalCacheForSystemPrompt=true`）
+
+```
+[attribution header]    → cacheScope: null     （不缓存）
+[system prompt prefix]  → cacheScope: 'org'    （组织级缓存）
+[everything else]       → cacheScope: 'org'    （组织级缓存）
+```
+
+MCP 工具列表在会话中可能变化（连接/断开），破坏了跨组织缓存的基础，因此降级为组织级。
+
+#### 模式 2：Global Cache + Boundary 存在（1P 专用）
+
+```
+[attribution header]    → cacheScope: null     （不缓存）
+[system prompt prefix]  → cacheScope: null     （不缓存）
+[static content]        → cacheScope: 'global' （全局缓存！跨组织共享）
+[dynamic content]       → cacheScope: null     （不缓存）
+```
+
+这是缓存效率最高的模式。`SYSTEM_PROMPT_DYNAMIC_BOUNDARY` 之前的静态内容（Intro、Rules、Tone & Style 等）对所有用户相同，可跨组织缓存。
+
+#### 模式 3：默认（3P 提供商 或 Boundary 缺失）
+
+```
+[attribution header]    → cacheScope: null     （不缓存）
+[system prompt prefix]  → cacheScope: 'org'    （组织级缓存）
+[everything else]       → cacheScope: 'org'    （组织级缓存）
+```
+
+### `getCacheControl()`：TTL 决策
+
+`src/services/api/claude.ts:359` 生成的 `cache_control` 对象：
+
+```typescript
+{
+  type: 'ephemeral',
+  ttl?: '1h',         // 仅特定 querySource 符合条件时
+  scope?: 'global',   // 仅静态区
+}
+```
+
+1 小时 TTL 的判定逻辑（`should1hCacheTTL()`，第 394 行）：
+- **Bedrock 用户**：通过环境变量 `ENABLE_PROMPT_CACHING_1H_BEDROCK` 启用
+- **1P 用户**：通过 GrowthBook 配置的 `allowlist` 数组匹配 `querySource`，支持前缀通配符（如 `"repl_main_thread*"`）
+- **会话级锁定**：资格判定结果在 bootstrap state 中缓存，防止 GrowthBook 配置中途变化导致同一会话内 TTL 不一致
+
+### 缓存破坏：Session-Specific Guidance 的放置
+
+`getSessionSpecificGuidanceSection()`（`src/constants/prompts.ts:352`）的内容必须放在 `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` **之后**。因为它包含：
+- 当前会话的 enabledTools 集合
+- `isForkSubagentEnabled()` 的运行时判定
+- `getIsNonInteractiveSession()` 的结果
+
+这些运行时 bit 如果放在静态区，会产生 2^N 种 Blake2b 哈希变体（N = 运行时条件数），完全破坏缓存命中率。源码注释明确警告：
+
+> Each conditional here is a runtime bit that would otherwise multiply the Blake2b prefix hash variants (2^N). See PR #24490, #24171 for the same bug class.
+
+### `CLAUDE_CODE_SIMPLE` 模式
+
+当设置了 `CLAUDE_CODE_SIMPLE` 环境变量时，整个系统提示词会大幅缩减：
+
+```typescript
+return [`You are Claude Code, Anthropic's official CLI for Claude.\n\nCWD: ${getCwd()}\nDate: ${getSessionStartDate()}`]
+```
+
+## 上下文注入：System Context 与 User Context
+
+System Prompt 数组本身不包含运行时上下文（git 状态、CLAUDE.md 内容）。上下文通过两个独立的管道注入：
+
+### System Context（`src/context.ts:116`）
+
+```typescript
+export const getSystemContext = memoize(async () => {
+  return {
+    gitStatus,           // git 分支、状态、最近提交（截断至 MAX_STATUS_CHARS=2000）
+    cacheBreaker,        // 仅 ant 用户的缓存破坏器
+  }
+})
+```
+
+- 使用 `lodash.memoize` 缓存——**整个会话期间只计算一次**
+- Git 状态快照包含 5 个并行 `git` 命令（branch、defaultBranch、status、log、userName）
+- `status` 超过 2000 字符时截断并附加提示使用 BashTool 获取更多信息
+- `systemPromptInjection` 变更时，通过 `getUserContext.cache.clear?.()` 清除所有上下文缓存
+
+### User Context（`src/context.ts:155`）
+
+```typescript
+export const getUserContext = memoize(async () => {
+  return {
+    claudeMd,            // 合并后的 CLAUDE.md 内容
+    currentDate,         // "Today's date is YYYY-MM-DD."
+  }
+})
+```
+
+- **CLAUDE.md 禁用条件**：`CLAUDE_CODE_DISABLE_CLAUDE_MDS` 环境变量，或 `--bare` 模式（除非通过 `--add-dir` 显式指定目录）
+- `--bare` 模式的语义是"跳过我没要求的东西"而非"忽略所有"
+
+### 注入位置
+
+在 `src/query.ts:449`：
+
+```typescript
+// System Context 追加到 System Prompt 尾部
+const fullSystemPrompt = asSystemPrompt(
+  appendSystemContext(systemPrompt, systemContext)  // 简单拼接
+)
+```
+
+User Context 通过 `prependUserContext()`（`src/utils/api.ts:449`）注入为 `<system-reminder>` 标签包裹的首条用户消息，放在所有对话消息之前。
+
+## Attribution Header：计费与安全
+
+每个 API 请求的 System Prompt 首块是 Attribution Header（`src/constants/system.ts:30`），包含：
+- **`cc_version`**：Claude Code 版本 + 指纹
+- **`cc_entrypoint`**：入口点标识（REPL / SDK / pipe 等）
+- **`cch=00000`**（NATIVE_CLIENT_ATTESTATION 启用时）：Bun 原生 HTTP 层在发送前将零替换为计算出的哈希值，服务器验证此 token 确认请求来自真实 Claude Code 客户端
+
+Header 始终 `cacheScope: null`——它因版本和指纹不同而变化，不适合缓存。

 ## CLAUDE.md：项目级知识注入

@@ -49,10 +238,15 @@ Claude Code 的 System Prompt 不是一段写死的文本，而是根据当前
        └── /project/src/CLAUDE.md  ← 子目录（模块特定）
 ```

-## 缓存策略
+加载逻辑在 `src/utils/claudemd.ts` 中的 `getClaudeMds()` 和 `getMemoryFiles()` 实现——从 CWD 向上遍历目录树，合并所有匹配的 CLAUDE.md 文件内容。

-System Prompt 的 token 消耗不小（可能占总量的 30%+）。为了降低成本，系统使用了缓存机制：
+## 设计洞察：为什么是 `string[]` 而非单个 `string`

- 不变的部分（基础人设、工具说明）可以跨请求复用
- 变化的部分（git 状态、记忆文件）每次重新生成
- 缓存节点的位置经过精心设计，最大化缓存命中率
+将 System Prompt 设计为数组而非单段文本，是为了 **缓存分块**：
+
+1. Anthropic Prompt Cache 以 **内容块**（TextBlock）为缓存单位
+2. 将 System Prompt 拆为多个块，可以让不变的部分（Intro、Rules）获得独立的缓存命中
+3. 如果是单个 `string`，任何一个字符变化（如日期更新）都会导致整个 System Prompt 的缓存失效
+4. `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` 标记允许 `splitSysPromptPrefix()` 精确地将静态区标记为 `scope: 'global'`，动态区不标记或标记为 `scope: 'org'`
+
+这是 Claude Code 在 token 成本优化上的核心设计——一次典型的 System Prompt 约 20K+ tokens，通过缓存分块可以节省 30-50% 的输入 token 费用。