OpenAI's prompt_tokens includes cached tokens, but Anthropic's
input_tokens semantic excludes them. The adapter was mapping
prompt_tokens → input_tokens verbatim, causing downstream code
(cache hit rate, cost, autocompact) to double-count.
Real-world impact: DeepSeek returns prompt_tokens=34097 with
cached_tokens=34048, displayed as 50% hit rate instead of 99.86%.
Co-Authored-By: glm-5.1 <zai-org@claude-code-best.win>
DeepSeek v4 in thinking mode sometimes returns reasoning_content: ""
when the model answers directly without internal reasoning. Two places
were filtering the empty string out, which dropped the thinking block
from the assistant turn entirely. The next request then omitted
reasoning_content for that prior turn, and DeepSeek rejected with
400 "reasoning_content ... must be passed back to the API".
Fix:
- openaiStreamAdapter: open a thinking block whenever reasoning_content
is present (including ""); skip the empty thinking_delta event since
the empty value is already conveyed by the block's initial state.
- openaiConvertMessages: preserve empty thinking blocks as
reasoning_content: "" when serializing assistant messages back to
the OpenAI/DeepSeek format.
Tests:
- New: empty reasoning_content opens a thinking block (adapter).
- Updated: empty thinking blocks now round-trip as reasoning_content: ""
instead of being dropped.
- New: assistant messages with no thinking block still omit
reasoning_content (regression guard for non-thinking models).