Files
claude-code/src/services/tokenEstimation.ts
Dosion c2ac9a74c1 fix: resolve dependency audit findings precisely (#361)
* fix: harden ACP communication boundaries

Harden ACP communication boundaries

Remote ACP sessions now cannot widen permission mode through untrusted
metadata or client payloads. WebSocket ACP ingress measures payloads by bytes
before binary decode, and prompt queue handoff keeps exactly one prompt active
while queued prompts are drained FIFO.

Constraint: ACP remote clients must not be able to open bypassPermissions without local launch intent
Constraint: WebSocket payload limits must be byte-based and checked before binary decode
Rejected: Keep promptToQueryContent wrapper | no production consumers remained after prompt conversion single-sourcing
Confidence: high
Scope-risk: moderate
Directive: Do not re-enable remote bypassPermissions from _meta unless a local launch gate is verified in both acp-link and agent
Tested: targeted ACP/RCS/acp-link prompt queue, bridge, permission, payload, and prompt conversion tests; bun run typecheck; bun run build
Not-tested: Manual live ACP/RCS session against an external client

* fix: restore repository verification gates

Keep the full repository test, typecheck, build, and Biome lint gates usable
after the ACP fix pass. This commit is intentionally separate from the ACP
behavior change: it fixes Windows-safe Langfuse home redaction, removes stale
lint suppressions, resolves Biome warning/info diagnostics, and keeps env
expansion tests explicit without template-placeholder lint noise.

Constraint: The project completion contract requires full typecheck, lint, test, and build evidence
Rejected: Leave warning/info diagnostics as historical noise | they obscure future gate regressions and weaken flow-impact claims
Confidence: high
Scope-risk: narrow
Directive: Keep repository gate cleanup separate from feature fixes when it is not part of the same runtime path
Tested: bunx biome lint src/; bunx tsc --noEmit; bun test src/services/mcp/__tests__/envExpansion.test.ts src/utils/__tests__/sliceAnsi.test.ts src/utils/__tests__/stringUtils.test.ts; bun test; bun run build
Not-tested: Manual Langfuse export against a real external Langfuse service

* fix: harden ACP failure boundaries after review

Deep review found several paths that made ACP communication failures look normal: prompt errors could finish as end_turn, permission pipeline exceptions could fall through to client approval, tool rawInput was deep-copied with JSON, and acp-link accepted unbounded or unvalidated WebSocket payloads. This keeps the behavior fail-closed, validates WS payloads before dispatch, caps payload size before JSON parse, and preserves cancellation intent with a generation counter.

Constraint: User explicitly rejected pseudo-fixes, fallback behavior, and unbounded payload handling

Rejected: Keep JSON stringify/parse rawInput copy | duplicates large payloads and silently drops non-JSON inputs

Rejected: Delegate permission pipeline errors to client approval | allows a broken local permission check to be bypassed

Confidence: high

Scope-risk: moderate

Directive: Do not convert ACP errors into normal end_turn responses without a protocol-level reason and regression tests

Tested: bun test src/services/acp/__tests__/agent.test.ts src/services/acp/__tests__/bridge.test.ts src/services/acp/__tests__/permissions.test.ts

Tested: bun test packages/acp-link/src/__tests__/server.test.ts

Tested: bunx tsc --noEmit

Tested: bunx biome lint src/ packages/acp-link/src/

Tested: bun run test:all

Tested: bun run build

Not-tested: Manual end-to-end ACP client session over a real editor WebSocket

* fix: prevent ACP coverage runs from seeing partial mocks

GitHub Actions failed under bun test --coverage because permissions.test.ts replaced ../bridge.js with a partial mock that omitted forwardSessionUpdates. Coverage worker ordering on Linux let sibling tests observe that incomplete module.

This isolates ACP test mocks by snapshotting real exports, overriding only requested symbols, and restoring mocks in LIFO order. The shared helper also keeps the same behavior in agent.test.ts without duplicating mock infrastructure.

Constraint: bun:test mock.module is process-global inside a worker.

Rejected: Add fallback exports or production guards | the bridge export exists; the failure was test mock pollution.

Rejected: Keep per-file helper copies | duplication would let restore semantics drift again.

Confidence: high

Scope-risk: narrow

Directive: Prefer safeMockModule for partial mocks of real modules in ACP tests; plain mock.module is only appropriate for fully synthetic modules or isolated tests.

Tested: bun test src/services/acp/__tests__/agent.test.ts src/services/acp/__tests__/bridge.test.ts src/services/acp/__tests__/permissions.test.ts

Tested: bun test --coverage --coverage-reporter=lcov

Tested: bunx tsc --noEmit

Tested: bun run lint

Tested: git diff --check

Not-tested: Linux runner directly before push

* fix: normalize ACP bypass requests without warning noise

The previous CI repair removed the failing partial bridge mock, but it also added a shared safeMockModule helper and left the acp-link bypass normalization warning in the real new_session path.

This tightens the fix: acp-link now treats an unauthorized client bypass request as normal permission-mode normalization without emitting a warning, and the ACP permission test explicitly preserves the real bridge and permission exports instead of using a shared helper. The agent test keeps its local mock preservation but names it by behavior and restores mocks in LIFO order.

Constraint: CI output should not contain expected warning noise for covered policy branches.

Rejected: Silence the test only | the normal new_session path would still warn for an expected normalization branch.

Rejected: Keep the shared safeMockModule helper | the failing module was specific and should be fixed by preserving real exports at the mocking site.

Confidence: high

Scope-risk: narrow

Directive: Treat client-requested bypassPermissions as data to normalize unless the local default explicitly enables bypass.

Tested: bun test packages/acp-link/src/__tests__/server.test.ts

Tested: bun test src/services/acp/__tests__/agent.test.ts src/services/acp/__tests__/bridge.test.ts src/services/acp/__tests__/permissions.test.ts

Tested: bun test --coverage --coverage-reporter=lcov with UPPER_WARN_COUNT=0

Tested: bun run test:all

Tested: bun run lint

Tested: bunx tsc --noEmit

Tested: git diff --check

* fix: harden ACP bypass and CI warning gates

ACP clients must not be able to enter bypassPermissions unless the local ACP gate and process environment both allow it. The same gate now controls session creation, explicit mode changes, and the ExitPlanMode option list, while session setup restores process.cwd so coverage and later work do not inherit ACP session state.

Constraint: CI must stay warning-clean without hiding real ACP permission failures

Rejected: Logging rejected bypass requests on the normal new_session path | it preserves audit text but reintroduces warning noise the runtime should not emit

Rejected: Broad CI=true postinstall skip | it hides explicit Chrome MCP setup checks outside the install path

Confidence: high

Scope-risk: moderate

Directive: Keep bypassPermissions gated through one ACP availability decision before exposing it to clients

Tested: bun test src/services/acp/__tests__/permissions.test.ts src/services/acp/__tests__/agent.test.ts packages/acp-link/src/__tests__/server.test.ts

Tested: bun run test:all

Tested: bun run lint

Tested: bun run build:vite with zero warning matches

Tested: bun test --coverage --coverage-reporter lcov --coverage-dir coverage produced non-empty lcov with SF records and zero filtered warning matches

Not-tested: GitHub Actions result after this push

* fix: remove remaining CI warning noise

The CI log still had three non-failing warnings after the ACP hardening commit: git init default-branch advice from checkout, a Node 20 action-runtime deprecation, and one additional known Vite dynamic-import diagnostic that only surfaced on Linux. The workflow now provides explicit git config and opts actions into Node 24, while Vite keeps a narrow allowlist for acknowledged optimizer diagnostics.

Constraint: Do not use shell log filtering to hide warnings after they happen

Rejected: Grep warning lines out of CI output | it would make future diagnostics harder to find

Confidence: high

Scope-risk: narrow

Directive: Add new Vite warning allowlist entries only after checking that they are existing optimizer diagnostics, not new application defects

Tested: bunx tsc --noEmit --pretty false

Tested: bunx biome lint .github/workflows/ci.yml vite.config.ts

Tested: bun run build:vite with zero warning matches

Not-tested: GitHub Actions result after this push

* fix: reject unauthorized ACP bypass and harden CI actions

ACP clients now fail closed when permissionMode is malformed, unknown, or requests bypass without a local bypass opt-in. acp-link validates new_session input before forwarding to the agent and returns client error frames for expected unauthorized requests without logging create-failed noise. The direct AcpAgent path independently rejects invalid _meta.permissionMode and unauthorized bypass instead of falling back to settings.

CI workflows and generated GitHub App templates now use Node 24-compatible actions pinned to immutable commit SHAs, and acp-link startup output no longer prints the auth token.

Constraint: Must not hide warnings with test isolation or log filtering

Rejected: Silent fallback to local permission mode | accepts invalid client intent and masks boundary behavior

Rejected: Broad dependency churn from bun update | audit remained failing while package and lockfile churn expanded scope

Confidence: high

Scope-risk: moderate

Directive: Client-provided permissionMode must stay fail-closed before reaching AcpAgent; only local settings.defaultMode may fall back to default on invalid local config

Tested: bun test packages/acp-link/src/__tests__/server.test.ts src/services/acp/__tests__/agent.test.ts src/services/acp/__tests__/permissions.test.ts src/services/skillLearning/__tests__/skillLifecycle.test.ts src/utils/settings/__tests__/config.test.ts

Tested: bunx tsc -p packages/acp-link/tsconfig.json --noEmit --pretty false

Tested: bunx tsc --noEmit --pretty false

Tested: bun run lint

Tested: bun run test:all

Tested: local CI equivalent install/typecheck/coverage/build with warning_scan=0

Not-tested: Pre-existing bun audit vulnerabilities require a separate dependency-hardening PR

* fix: resolve dependency audit findings precisely

Use dependency-native upgrades and lockfile resolution to close the audit findings without suppressions. Keep the chrome MCP setup aligned with the new dependency graph and add real integration coverage so the override behavior stays verified.

Constraint: no audit ignores or warning suppression
Rejected: broad google-auth/protobuf overrides | replaced with upstream-compatible resolution
Confidence: high
Scope-risk: moderate
Directive: keep dependency fixes upstream-compatible; do not reintroduce blanket overrides unless the audit surface changes materially
Tested: bun audit; bun audit --json; bun install --frozen-lockfile with CLAUDE_CODE_SKIP_CHROME_MCP_SETUP=1; bunx tsc --noEmit --pretty false; bun run lint; targeted tests; bun run test:all; bun test --coverage --coverage-reporter lcov --coverage-dir coverage; bun run build:vite
Not-tested: unrelated pre-existing ACP/CORS/token fallback residual risks

* fix: keep ACP auth tokens out of URLs

Replace the ad hoc URL-token flow with crypto UUID-backed transport identifiers so the bearer token stays in structured request data instead of query strings. Keep the server, web client, and transport helpers aligned so the ACP/RCS handshake remains compatible after the API shape change.

Constraint: token must not be embedded in the URL
Rejected: token-as-uuid query fallback | leaked bearer tokens in URLs
Confidence: high
Scope-risk: moderate
Directive: preserve the structured auth path; do not reintroduce query-token fallback when adjusting ACP transport code
Tested: targeted ACP/RCS transport tests
Not-tested: unrelated pre-existing ACP/CORS/token fallback residual risks

* fix: normalize WebFetch request headers

Normalize WebFetch headers before dispatch so canonicalization preserves auth semantics and duplicate forms do not slip through. Keep the behavior locked with a focused header test instead of broadening the request pipeline.

Constraint: preserve header semantics without widening the fetch surface
Rejected: ad hoc caller-side normalization | too easy to bypass in future call sites
Confidence: high
Scope-risk: narrow
Directive: keep header normalization close to the WebFetch utility so future callers inherit the same behavior automatically
Tested: targeted WebFetch header tests
Not-tested: unrelated fetch backend behavior beyond header normalization

* fix: harden ACP remote auth surfaces

Tighten the remaining Claude security artifact items by requiring API keys on ACP global reads and relay upgrades, moving WebSocket tokens out of URLs, and replacing open web CORS with an explicit allowlist.

Constraint: Browser WebSocket clients cannot set arbitrary Authorization headers, so the token is carried in a selected subprotocol instead of a query string.
Rejected: Keep UUID auth for ACP channel groups | any caller can mint a UUID and read global ACP data.
Rejected: Preserve ?token= compatibility | secrets leak into logs, history, referrers, and intermediaries.
Confidence: high
Scope-risk: moderate
Directive: Do not reintroduce query-string bearer tokens; use Authorization or rcs.auth.<base64url-token>.
Tested: bunx tsc --noEmit --pretty false
Tested: bun run typecheck in packages/remote-control-server
Tested: bun run build in packages/acp-link
Tested: bun run lint
Tested: bun audit
Tested: focused RCS/acp-link/web tests, 160 pass
Tested: Edge headless browser WebSocket subprotocol handshake
Tested: bun run test:all, 3669 pass
Tested: bun run build:vite
Tested: bun run build
Not-tested: Manual end-to-end relay with a live external ACP agent

* fix: resolve CI dependency override lookup

The CI runner does not expose @grpc/proto-loader as a root-resolvable package, and the test was relying on local hoisting rather than the real dependency owner. Resolve proto-loader through @opentelemetry/exporter-trace-otlp-grpc and @grpc/grpc-js so the smoke test follows the package graph it is validating.

Constraint: Do not add a new root dependency for a transitive smoke test.

Rejected: Skip or weaken the test | the test protects the protobuf 7 override path and should keep exercising loadSync.

Rejected: Add @grpc/proto-loader directly to root package.json | that hides the owning-package resolution issue and broadens dependency surface.

Confidence: high

Scope-risk: narrow

Directive: Dependency override smoke tests should resolve from the package that actually owns the dependency, not from incidental root hoisting.

Tested: bun test tests/integration/dependency-overrides.test.ts; bunx tsc --noEmit --pretty false; bun run lint; bun audit; bun run test:all; git diff --check

---------

Co-authored-by: unraid <local@unraid.local>
2026-04-26 19:49:54 +08:00

555 lines
18 KiB
TypeScript

import type { Anthropic } from '@anthropic-ai/sdk'
import type { BetaMessageParam as MessageParam } from '@anthropic-ai/sdk/resources/beta/messages/messages.mjs'
// @aws-sdk/client-bedrock-runtime is imported dynamically in countTokensWithBedrock()
// to defer ~279KB of AWS SDK code until a Bedrock call is actually made
import type { CountTokensCommandInput } from '@aws-sdk/client-bedrock-runtime'
import { getAPIProvider } from 'src/utils/model/providers.js'
import { VERTEX_COUNT_TOKENS_ALLOWED_BETAS } from '../constants/betas.js'
import type { Attachment } from '../utils/attachments.js'
import { getModelBetas } from '../utils/betas.js'
import { getVertexRegionForModel, isEnvTruthy } from '../utils/envUtils.js'
import { logError } from '../utils/log.js'
import { normalizeAttachmentForAPI } from '../utils/messages.js'
import {
createBedrockRuntimeClient,
getInferenceProfileBackingModel,
isFoundationModel,
} from '../utils/model/bedrock.js'
import {
getDefaultSonnetModel,
getMainLoopModel,
getSmallFastModel,
normalizeModelStringForAPI,
} from '../utils/model/model.js'
import { jsonStringify } from '../utils/slowOperations.js'
import { isToolReferenceBlock } from '../utils/toolSearch.js'
import { getAPIMetadata, getExtraBodyParams } from './api/claude.js'
import { getAnthropicClient } from './api/client.js'
import { createTrace, endTrace, isLangfuseEnabled, recordLLMObservation } from './langfuse/index.js'
import { getSessionId } from '../bootstrap/state.js'
import { withTokenCountVCR } from './vcr.js'
// Minimal values for token counting with thinking enabled
// API constraint: max_tokens must be greater than thinking.budget_tokens
const TOKEN_COUNT_THINKING_BUDGET = 1024
const TOKEN_COUNT_MAX_TOKENS = 2048
/**
* Check if messages contain thinking blocks
*/
function hasThinkingBlocks(
messages: Anthropic.Beta.Messages.BetaMessageParam[],
): boolean {
for (const message of messages) {
if (message.role === 'assistant' && Array.isArray(message.content)) {
for (const block of message.content) {
if (
typeof block === 'object' &&
block !== null &&
'type' in block &&
(block.type === 'thinking' || block.type === 'redacted_thinking')
) {
return true
}
}
}
}
return false
}
/**
* Strip tool search-specific fields from messages before sending for token counting.
* This removes 'caller' from tool_use blocks and 'tool_reference' from tool_result content.
* These fields are only valid with the tool search beta and will cause errors otherwise.
*
* Note: We use 'as unknown as' casts because the SDK types don't include tool search beta fields,
* but at runtime these fields may exist from API responses when tool search was enabled.
*/
function stripToolSearchFieldsFromMessages(
messages: Anthropic.Beta.Messages.BetaMessageParam[],
): Anthropic.Beta.Messages.BetaMessageParam[] {
return messages.map(message => {
if (!Array.isArray(message.content)) {
return message
}
const normalizedContent = message.content.map(block => {
// Strip 'caller' from tool_use blocks (assistant messages)
if (block.type === 'tool_use') {
// Destructure to exclude any extra fields like 'caller'
const toolUse =
block as Anthropic.Beta.Messages.BetaToolUseBlockParam & {
caller?: unknown
}
return {
type: 'tool_use' as const,
id: toolUse.id,
name: toolUse.name,
input: toolUse.input,
}
}
// Strip tool_reference blocks from tool_result content (user messages)
if (block.type === 'tool_result') {
const toolResult =
block as Anthropic.Beta.Messages.BetaToolResultBlockParam
if (Array.isArray(toolResult.content)) {
const filteredContent = (toolResult.content as unknown[]).filter(
c => !isToolReferenceBlock(c),
) as typeof toolResult.content
if (filteredContent.length === 0) {
return {
...toolResult,
content: [{ type: 'text' as const, text: '[tool references]' }],
}
}
if (filteredContent.length !== toolResult.content.length) {
return {
...toolResult,
content: filteredContent,
}
}
}
}
return block
})
return {
...message,
content: normalizedContent,
}
})
}
export async function countTokensWithAPI(
content: string,
): Promise<number | null> {
// Special case for empty content - API doesn't accept empty messages
if (!content) {
return 0
}
const message: Anthropic.Beta.Messages.BetaMessageParam = {
role: 'user',
content: content,
}
return countMessagesTokensWithAPI([message], [])
}
export async function countMessagesTokensWithAPI(
messages: Anthropic.Beta.Messages.BetaMessageParam[],
tools: Anthropic.Beta.Messages.BetaToolUnion[],
): Promise<number | null> {
return withTokenCountVCR(messages, tools, async () => {
try {
const provider = getAPIProvider()
if (provider === 'gemini') {
return roughTokenCountEstimationForAPIRequest(messages, tools)
}
const model = getMainLoopModel()
const betas = getModelBetas(model)
const containsThinking = hasThinkingBlocks(messages)
if (provider === 'bedrock') {
// @anthropic-sdk/bedrock-sdk doesn't support countTokens currently
return countTokensWithBedrock({
model: normalizeModelStringForAPI(model),
messages,
tools,
betas,
containsThinking,
})
}
const anthropic = await getAnthropicClient({
maxRetries: 1,
model,
source: 'count_tokens',
})
const filteredBetas =
getAPIProvider() === 'vertex'
? betas.filter(b => VERTEX_COUNT_TOKENS_ALLOWED_BETAS.has(b))
: betas
const response = await anthropic.beta.messages.countTokens({
model: normalizeModelStringForAPI(model),
messages:
// When we pass tools and no messages, we need to pass a dummy message
// to get an accurate tool token count.
messages.length > 0 ? messages : [{ role: 'user', content: 'foo' }],
tools,
...(filteredBetas.length > 0 && { betas: filteredBetas }),
// Enable thinking if messages contain thinking blocks
...(containsThinking && {
thinking: {
type: 'enabled',
budget_tokens: TOKEN_COUNT_THINKING_BUDGET,
},
}),
})
if (typeof response.input_tokens !== 'number') {
// Vertex client throws
// Bedrock client succeeds with { Output: { __type: 'com.amazon.coral.service#UnknownOperationException' }, Version: '1.0' }
return null
}
return response.input_tokens
} catch (error) {
logError(error)
return null
}
})
}
export function roughTokenCountEstimation(
content: string,
bytesPerToken: number = 4,
): number {
return Math.round(content.length / bytesPerToken)
}
/**
* Returns an estimated bytes-per-token ratio for a given file extension.
* Dense JSON has many single-character tokens (`{`, `}`, `:`, `,`, `"`)
* which makes the real ratio closer to 2 rather than the default 4.
*/
export function bytesPerTokenForFileType(fileExtension: string): number {
switch (fileExtension) {
case 'json':
case 'jsonl':
case 'jsonc':
return 2
default:
return 4
}
}
/**
* Like {@link roughTokenCountEstimation} but uses a more accurate
* bytes-per-token ratio when the file type is known.
*
* This matters when the API-based token count is unavailable (e.g. on
* Bedrock) and we fall back to the rough estimate — an underestimate can
* let an oversized tool result slip into the conversation.
*/
export function roughTokenCountEstimationForFileType(
content: string,
fileExtension: string,
): number {
return roughTokenCountEstimation(
content,
bytesPerTokenForFileType(fileExtension),
)
}
/**
* Estimates token count for a Message object by extracting and analyzing its text content.
* This provides a more reliable estimate than getTokenUsage for messages that may have been compacted.
* Uses Haiku for token counting (Haiku 4.5 supports thinking blocks), except:
* - Vertex global region: uses Sonnet (Haiku not available)
* - Bedrock with thinking blocks: uses Sonnet (Haiku 3.5 doesn't support thinking)
*/
export async function countTokensViaHaikuFallback(
messages: Anthropic.Beta.Messages.BetaMessageParam[],
tools: Anthropic.Beta.Messages.BetaToolUnion[],
): Promise<number | null> {
const provider = getAPIProvider()
if (provider === 'gemini') {
return roughTokenCountEstimationForAPIRequest(messages, tools)
}
// Check if messages contain thinking blocks
const containsThinking = hasThinkingBlocks(messages)
// If we're on Vertex and using global region, always use Sonnet since Haiku is not available there.
const isVertexGlobalEndpoint =
isEnvTruthy(process.env.CLAUDE_CODE_USE_VERTEX) &&
getVertexRegionForModel(getSmallFastModel()) === 'global'
// If we're on Bedrock with thinking blocks, use Sonnet since Haiku 3.5 doesn't support thinking
const isBedrockWithThinking =
isEnvTruthy(process.env.CLAUDE_CODE_USE_BEDROCK) && containsThinking
// If we're on Vertex with thinking blocks, use Sonnet since Haiku 3.5 doesn't support thinking
const isVertexWithThinking =
isEnvTruthy(process.env.CLAUDE_CODE_USE_VERTEX) && containsThinking
// Otherwise always use Haiku - Haiku 4.5 supports thinking blocks.
// WARNING: if you change this to use a non-Haiku model, this request will fail in 1P unless it uses getCLISyspromptPrefix.
// Note: We don't need Sonnet for tool_reference blocks because we strip them via
// stripToolSearchFieldsFromMessages() before sending.
// Use getSmallFastModel() to respect ANTHROPIC_SMALL_FAST_MODEL env var for Bedrock users
// with global inference profiles (see issue #10883).
const model =
isVertexGlobalEndpoint || isBedrockWithThinking || isVertexWithThinking
? getDefaultSonnetModel()
: getSmallFastModel()
const anthropic = await getAnthropicClient({
maxRetries: 1,
model,
source: 'count_tokens',
})
// Strip tool search-specific fields (caller, tool_reference) before sending
// These fields are only valid with the tool search beta header
const normalizedMessages = stripToolSearchFieldsFromMessages(messages)
const messagesToSend: MessageParam[] =
normalizedMessages.length > 0
? (normalizedMessages as MessageParam[])
: [{ role: 'user', content: 'count' }]
const betas = getModelBetas(model)
// Filter betas for Vertex - some betas (like web-search) cause 400 errors
// on certain Vertex endpoints. See issue #10789.
const filteredBetas =
getAPIProvider() === 'vertex'
? betas.filter(b => VERTEX_COUNT_TOKENS_ALLOWED_BETAS.has(b))
: betas
const apiStart = Date.now()
const langfuseTrace = isLangfuseEnabled()
? createTrace({
sessionId: getSessionId(),
model: normalizeModelStringForAPI(model),
provider: getAPIProvider(),
name: 'token-estimation',
})
: null
const response = await anthropic.beta.messages.create({
model: normalizeModelStringForAPI(model),
max_tokens: containsThinking ? TOKEN_COUNT_MAX_TOKENS : 1,
messages: messagesToSend,
tools: tools.length > 0 ? tools : undefined,
...(filteredBetas.length > 0 && { betas: filteredBetas }),
metadata: getAPIMetadata(),
...getExtraBodyParams(),
// Enable thinking if messages contain thinking blocks
...(containsThinking && {
thinking: {
type: 'enabled',
budget_tokens: TOKEN_COUNT_THINKING_BUDGET,
},
}),
})
const usage = response.usage
const inputTokens = usage.input_tokens
const cacheCreationTokens = usage.cache_creation_input_tokens || 0
const cacheReadTokens = usage.cache_read_input_tokens || 0
recordLLMObservation(langfuseTrace, {
model: normalizeModelStringForAPI(model),
provider: getAPIProvider(),
input: messagesToSend,
output: response.content,
usage: {
input_tokens: inputTokens,
output_tokens: usage.output_tokens,
cache_creation_input_tokens: cacheCreationTokens || undefined,
cache_read_input_tokens: cacheReadTokens || undefined,
},
startTime: new Date(apiStart),
endTime: new Date(),
})
endTrace(langfuseTrace)
return inputTokens + cacheCreationTokens + cacheReadTokens
}
export function roughTokenCountEstimationForMessages(
messages: readonly {
type: string
message?: { content?: unknown }
attachment?: Attachment
}[],
): number {
let totalTokens = 0
for (const message of messages) {
totalTokens += roughTokenCountEstimationForMessage(message)
}
return totalTokens
}
export function roughTokenCountEstimationForMessage(message: {
type: string
message?: { content?: unknown }
attachment?: Attachment
}): number {
if (
(message.type === 'assistant' || message.type === 'user') &&
message.message?.content
) {
return roughTokenCountEstimationForContent(
message.message?.content as
| string
| Array<Anthropic.ContentBlock>
| Array<Anthropic.ContentBlockParam>
| undefined,
)
}
if (message.type === 'attachment' && message.attachment) {
const userMessages = normalizeAttachmentForAPI(message.attachment)
let total = 0
for (const userMsg of userMessages) {
total += roughTokenCountEstimationForContent(userMsg.message.content)
}
return total
}
return 0
}
function roughTokenCountEstimationForContent(
content:
| string
| Array<Anthropic.ContentBlock>
| Array<Anthropic.ContentBlockParam>
| undefined,
): number {
if (!content) {
return 0
}
if (typeof content === 'string') {
return roughTokenCountEstimation(content)
}
let totalTokens = 0
for (const block of content) {
totalTokens += roughTokenCountEstimationForBlock(block)
}
return totalTokens
}
function roughTokenCountEstimationForAPIRequest(
messages: Anthropic.Beta.Messages.BetaMessageParam[],
tools: Anthropic.Beta.Messages.BetaToolUnion[],
): number {
let totalTokens = 0
for (const message of messages) {
totalTokens += roughTokenCountEstimationForContent(
message.content as
| string
| Array<Anthropic.ContentBlock>
| Array<Anthropic.ContentBlockParam>
| undefined,
)
}
if (tools.length > 0) {
totalTokens += roughTokenCountEstimation(jsonStringify(tools))
}
return totalTokens
}
function roughTokenCountEstimationForBlock(
block: string | Anthropic.ContentBlock | Anthropic.ContentBlockParam,
): number {
if (typeof block === 'string') {
return roughTokenCountEstimation(block)
}
if (block.type === 'text') {
return roughTokenCountEstimation(block.text)
}
if (block.type === 'image' || block.type === 'document') {
// https://platform.claude.com/docs/en/build-with-claude/vision#calculate-image-costs
// tokens = (width px * height px)/750
// Images are resized to max 2000x2000 (5333 tokens). Use a conservative
// estimate that matches microCompact's IMAGE_MAX_TOKEN_SIZE to avoid
// underestimating and triggering auto-compact too late.
//
// document: base64 PDF in source.data. Must NOT reach the
// jsonStringify catch-all — a 1MB PDF is ~1.33M base64 chars →
// ~325k estimated tokens, vs the ~2000 the API actually charges.
// Same constant as microCompact's calculateToolResultTokens.
return 2000
}
if (block.type === 'tool_result') {
return roughTokenCountEstimationForContent(block.content as any)
}
if (block.type === 'tool_use') {
// input is the JSON the model generated — arbitrarily large (bash
// commands, Edit diffs, file contents). Stringify once for the
// char count; the API re-serializes anyway so this is what it sees.
return roughTokenCountEstimation(
block.name + jsonStringify(block.input ?? {}),
)
}
if (block.type === 'thinking') {
return roughTokenCountEstimation(block.thinking)
}
if (block.type === 'redacted_thinking') {
return roughTokenCountEstimation(block.data)
}
// server_tool_use, web_search_tool_result, mcp_tool_use, etc. —
// text-like payloads (tool inputs, search results, no base64).
// Stringify-length tracks the serialized form the API sees; the
// key/bracket overhead is single-digit percent on real blocks.
return roughTokenCountEstimation(jsonStringify(block))
}
async function countTokensWithBedrock({
model,
messages,
tools,
betas,
containsThinking,
}: {
model: string
messages: Anthropic.Beta.Messages.BetaMessageParam[]
tools: Anthropic.Beta.Messages.BetaToolUnion[]
betas: string[]
containsThinking: boolean
}): Promise<number | null> {
try {
const client = await createBedrockRuntimeClient()
// Bedrock CountTokens requires a model ID, not an inference profile / ARN
const modelId = isFoundationModel(model)
? model
: await getInferenceProfileBackingModel(model)
if (!modelId) {
return null
}
const requestBody = {
anthropic_version: 'bedrock-2023-05-31',
// When we pass tools and no messages, we need to pass a dummy message
// to get an accurate tool token count.
messages:
messages.length > 0 ? messages : [{ role: 'user', content: 'foo' }],
max_tokens: containsThinking ? TOKEN_COUNT_MAX_TOKENS : 1,
...(tools.length > 0 && { tools }),
...(betas.length > 0 && { anthropic_beta: betas }),
...(containsThinking && {
thinking: {
type: 'enabled',
budget_tokens: TOKEN_COUNT_THINKING_BUDGET,
},
}),
}
const { CountTokensCommand } = await import(
'@aws-sdk/client-bedrock-runtime'
)
const input: CountTokensCommandInput = {
modelId,
input: {
invokeModel: {
body: new TextEncoder().encode(jsonStringify(requestBody)),
},
},
}
const response = await client.send(new CountTokensCommand(input))
const tokenCount = response.inputTokens ?? null
return tokenCount
} catch (error) {
logError(error)
return null
}
}