Files
claude-code/src/utils/tokens.ts
claude-code-best 2e9aaf4993 feat: ACP 协议版本 remote control (#293)
* fix: 添加 usage 字段缺失时的防御性防护

第三方 API(如智谱 GLM)在某些流式响应中不返回 usage 字段,
导致 usage.input_tokens 访问 undefined 崩溃并连锁影响后续所有请求。

- claude.ts: content_block_stop 创建消息时 fallback 到 EMPTY_USAGE
- LocalAgentTask.tsx: usage 为 undefined 时提前返回
- tokens.ts: getTokenCountFromUsage 加 null guard 和 ?? 0
- cost-tracker.ts: input_tokens/output_tokens 加 ?? 0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: ACP Plan 展示 — 支持 session/update plan 类型的可视化

补全 PlanUpdate 类型定义(PlanEntry/Priority/Status),新建 PlanView 组件
渲染进度条、状态图标和优先级标签,在 ChatInterface 中处理 plan 更新逻辑。

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: 穷鬼模式下跳过 verification agent 以节省 token

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: 补充 RCS 后端 + 前端测试覆盖 (+116 tests)

后端新增 3 个测试文件 (70 tests):
- automationState: normalize/snapshot/equals 纯函数
- client-payload: toClientPayload 协议转换
- transport-normalize: normalizePayload + extractContent

前端新增 2 个测试文件 (46 tests):
- utils: formatTime/statusClass/truncate/extractEventText 等
- api-client: getUuid/setUuid/api GET/POST 错误处理

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: RCS ACP 页面添加权限模式选择器 + 权限响应修复

- 新增权限模式选择器 UI(6种模式:默认/自动接受编辑/跳过权限/规划/不询问/自动判断)
- 权限模式通过 ACP _meta 从 web → acp-link → agent 全链路传递
- 修复 PermissionPanel 点击"允许"发送 cancelled 而非 selected 的 bug
- 权限模式和模型选择持久化到 localStorage
- acp-link 直接连接路径同步支持 permissionMode 透传

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: RCS Web UI 重构 + QR 修复 + ACP 扫描自动跳转

- RCS Web UI 组件全面重构: Dialog 迁移 Radix UI, lazy loading,
  主题系统改进, 组件样式优化
- IdentityPanel QR 码显示修复: requestAnimationFrame 延迟绘制
  解决 Radix Dialog Portal 挂载时序问题
- ACP QR 扫描自动跳转: IdentityPanel 扫描 ACP 格式 { url, token }
  后存储 sessionStorage 并跳转 /code/?acp=1
- 新增 ACPDirectView 组件: ACP 直连视图, 用 ACPClient 连接并
  渲染 ACPMain 聊天界面

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: ACP 权限管道改进 — 模式同步 + bypass 检测 + 统一权限流水线

- agent.ts: applySessionMode 同步 appState.toolPermissionContext.mode
- agent.ts: bypassPermissions 可用性检测 (非 root 或 sandbox 环境)
- permissions.ts: createAcpCanUseTool 接入 hasPermissionsToUseTool
  统一权限流水线, 替代原来分散的处理逻辑
- permissions.ts: 支持 onModeChange 回调, 模式变更时实时同步

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: acp-link 支持 permissionMode 默认值传递给 agent

客户端 (Zed/VS Code 等) 的 new_session 不一定携带 permissionMode,
导致 agent 收到 _meta: undefined, permission 回退到 default。

修复: handleNewSession 使用 fallback 链:
  客户端传值 > config.permissionMode > ACP_PERMISSION_MODE 环境变量

使用: ACP_PERMISSION_MODE=auto acp-link claude

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: 更新文档及说明

* fix: 修复类型错误

* chore: 提交脚本

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-18 21:54:22 +08:00

269 lines
9.9 KiB
TypeScript

import type { BetaUsage as Usage } from '@anthropic-ai/sdk/resources/beta/messages/messages.mjs'
import { roughTokenCountEstimationForMessages } from '../services/tokenEstimation.js'
import type { AssistantMessage, ContentItem, Message } from '../types/message.js'
import { SYNTHETIC_MESSAGES, SYNTHETIC_MODEL } from './messages.js'
import { jsonStringify } from './slowOperations.js'
export function getTokenUsage(message: Message): Usage | undefined {
if (
message?.type === 'assistant' &&
message.message &&
'usage' in message.message &&
!(
Array.isArray(message.message.content) &&
(message.message.content as ContentItem[])[0]?.type === 'text' &&
SYNTHETIC_MESSAGES.has((message.message.content as Array<ContentItem & { text: string }>)[0]!.text)
) &&
message.message.model !== SYNTHETIC_MODEL
) {
return message.message.usage as Usage
}
return undefined
}
/**
* Get the API response id for an assistant message with real (non-synthetic) usage.
* Used to identify split assistant records that came from the same API response —
* when parallel tool calls are streamed, each content block becomes a separate
* AssistantMessage record, but they all share the same message.id.
*/
function getAssistantMessageId(message: Message): string | undefined {
if (
message?.type === 'assistant' &&
'id' in message.message! &&
message.message!.model !== SYNTHETIC_MODEL
) {
return message.message!.id
}
return undefined
}
/**
* Calculate total context window tokens from an API response's usage data.
* Includes input_tokens + cache tokens + output_tokens.
*
* This represents the full context size at the time of that API call.
* Use tokenCountWithEstimation() when you need context size from messages.
*/
export function getTokenCountFromUsage(usage: Usage): number {
if (!usage) {
return 0
}
return (
(usage.input_tokens ?? 0) +
(usage.cache_creation_input_tokens ?? 0) +
(usage.cache_read_input_tokens ?? 0) +
(usage.output_tokens ?? 0)
)
}
export function tokenCountFromLastAPIResponse(messages: Message[]): number {
let i = messages.length - 1
while (i >= 0) {
const message = messages[i]
const usage = message ? getTokenUsage(message) : undefined
if (usage) {
return getTokenCountFromUsage(usage)
}
i--
}
return 0
}
/**
* Final context window size from the last API response's usage.iterations[-1].
* Used for task_budget.remaining computation across compaction boundaries —
* the server's budget countdown is context-based, so remaining decrements by
* the pre-compact final window, not billing spend. See monorepo
* api/api/sampling/prompt/renderer.py:292 for the server-side computation.
*
* Falls back to top-level input_tokens + output_tokens when iterations is
* absent (no server-side tool loops, so top-level usage IS the final window).
* Both paths exclude cache tokens to match #304930's formula.
*/
export function finalContextTokensFromLastResponse(
messages: Message[],
): number {
let i = messages.length - 1
while (i >= 0) {
const message = messages[i]
const usage = message ? getTokenUsage(message) : undefined
if (usage) {
// Stainless types don't include iterations yet — cast like advisor.ts:43
const iterations = (
usage as {
iterations?: Array<{
input_tokens: number
output_tokens: number
}> | null
}
).iterations
if (iterations && iterations.length > 0) {
const last = iterations.at(-1)!
return last.input_tokens + last.output_tokens
}
// No iterations → no server tool loop → top-level usage IS the final
// window. Match the iterations path's formula (input + output, no cache)
// rather than getTokenCountFromUsage — #304930 defines final window as
// non-cache input + output. Whether the server's budget countdown
// (renderer.py:292 calculate_context_tokens) counts cache the same way
// is an open question; aligning with the iterations path keeps the two
// branches consistent until that's resolved.
return usage.input_tokens + usage.output_tokens
}
i--
}
return 0
}
/**
* Get only the output_tokens from the last API response.
* This excludes input context (system prompt, tools, prior messages).
*
* WARNING: Do NOT use this for threshold comparisons (autocompact, session memory).
* Use tokenCountWithEstimation() instead, which measures full context size.
* This function is only useful for measuring how many tokens Claude generated
* in a single response, not how full the context window is.
*/
export function messageTokenCountFromLastAPIResponse(
messages: Message[],
): number {
let i = messages.length - 1
while (i >= 0) {
const message = messages[i]
const usage = message ? getTokenUsage(message) : undefined
if (usage) {
return usage.output_tokens
}
i--
}
return 0
}
export function getCurrentUsage(messages: Message[]): {
input_tokens: number
output_tokens: number
cache_creation_input_tokens: number
cache_read_input_tokens: number
} | null {
for (let i = messages.length - 1; i >= 0; i--) {
const message = messages[i]
const usage = message ? getTokenUsage(message) : undefined
if (usage) {
return {
input_tokens: usage.input_tokens,
output_tokens: usage.output_tokens,
cache_creation_input_tokens: usage.cache_creation_input_tokens ?? 0,
cache_read_input_tokens: usage.cache_read_input_tokens ?? 0,
}
}
}
return null
}
export function doesMostRecentAssistantMessageExceed200k(
messages: Message[],
): boolean {
const THRESHOLD = 200_000
const lastAsst = messages.findLast(m => m.type === 'assistant')
if (!lastAsst) return false
const usage = getTokenUsage(lastAsst)
return usage ? getTokenCountFromUsage(usage) > THRESHOLD : false
}
/**
* Calculate the character content length of an assistant message.
* Used for spinner token estimation (characters / 4 ≈ tokens).
* This is used when subagent streaming events are filtered out and we
* need to count content from completed messages instead.
*
* Counts the same content that handleMessageFromStream would count via deltas:
* - text (text_delta)
* - thinking (thinking_delta)
* - redacted_thinking data
* - tool_use input (input_json_delta)
* Note: signature_delta is excluded from streaming counts (not model output).
*/
export function getAssistantMessageContentLength(
message: AssistantMessage,
): number {
let contentLength = 0
const content = message.message?.content
if (!Array.isArray(content)) return contentLength
for (const block of content as ContentItem[]) {
if (block.type === 'text') {
contentLength += (block as ContentItem & { text: string }).text.length
} else if (block.type === 'thinking') {
contentLength += (block as ContentItem & { thinking: string }).thinking.length
} else if (block.type === 'redacted_thinking') {
contentLength += (block as ContentItem & { data: string }).data.length
} else if (block.type === 'tool_use') {
contentLength += jsonStringify((block as ContentItem & { input: unknown }).input).length
}
}
return contentLength
}
/**
* Get the current context window size in tokens.
*
* This is the CANONICAL function for measuring context size when checking
* thresholds (autocompact, session memory init, etc.). Uses the last API
* response's token count (input + output + cache) plus estimates for any
* messages added since.
*
* Always use this instead of:
* - Cumulative token counting (which double-counts as context grows)
* - messageTokenCountFromLastAPIResponse (which only counts output_tokens)
* - tokenCountFromLastAPIResponse (which doesn't estimate new messages)
*
* Implementation note on parallel tool calls: when the model makes multiple
* tool calls in one response, the streaming code emits a SEPARATE assistant
* record per content block (all sharing the same message.id and usage), and
* the query loop interleaves each tool_result immediately after its tool_use.
* So the messages array looks like:
* [..., assistant(id=A), user(result), assistant(id=A), user(result), ...]
* If we stop at the LAST assistant record, we only estimate the one tool_result
* after it and miss all the earlier interleaved tool_results — which will ALL
* be in the next API request. To avoid undercounting, after finding a usage-
* bearing record we walk back to the FIRST sibling with the same message.id
* so every interleaved tool_result is included in the rough estimate.
*/
export function tokenCountWithEstimation(messages: readonly Message[]): number {
let i = messages.length - 1
while (i >= 0) {
const message = messages[i]
const usage = message ? getTokenUsage(message) : undefined
if (message && usage) {
// Walk back past any earlier sibling records split from the same API
// response (same message.id) so interleaved tool_results between them
// are included in the estimation slice.
const responseId = getAssistantMessageId(message)
if (responseId) {
let j = i - 1
while (j >= 0) {
const prior = messages[j]
const priorId = prior ? getAssistantMessageId(prior) : undefined
if (priorId === responseId) {
// Earlier split of the same API response — anchor here instead.
i = j
} else if (priorId !== undefined) {
// Hit a different API response — stop walking.
break
}
// priorId === undefined: a user/tool_result/attachment message,
// possibly interleaved between splits — keep walking.
j--
}
}
return (
getTokenCountFromUsage(usage) +
roughTokenCountEstimationForMessages(messages.slice(i + 1) as Parameters<typeof roughTokenCountEstimationForMessages>[0])
)
}
i--
}
return roughTokenCountEstimationForMessages(messages as Parameters<typeof roughTokenCountEstimationForMessages>[0])
}