versun/claude-code

mirror of https://github.com/claude-code-best/claude-code.git synced 2026-06-15 21:05:51 +00:00

Files

claude-code-best 047c85fcbf fix: 修复 DeepSeek V4 reasoning_content 回传导致的 400 错误

- 扩大模型名称检测范围，匹配所有 deepseek 模型（V4、R1 等）
- 始终保留 thinking blocks 为 reasoning_content 回传给 API
- 移除有 bug 的 turn boundary 剥离逻辑

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-04-24 20:33:43 +08:00

868 B

Raw Permalink Blame History

Session: vLLM Inference Optimization

Level: Beginner (Target: Inference Optimization)
Started: 2026-04-24
Status: Mastered

Concepts

✅ LLM 推理的两个阶段 (Prefill vs Decode)
✅ KV Cache
✅ 显存瓶颈与碎片化
✅ PagedAttention
✅ vLLM 架构 (Scheduler, Worker)
✅ 实战部署 (--dtype, openai api)
✅ 量化 (AWQ/GPTQ vs 暴力 dtype)
✅ Tensor Parallel (TP, NCCL)
✅ 性能参数 (--gpu-memory-utilization)
✅ Chunked Prefill

Misconceptions

- 纠正：确实降低了峰值激活显存，但核心目的是降低Latency (卡顿感)。

Log

Diagnosed: Beginner
Mastery: Intuitive understanding of memory constraints and fragmentation is strong.
Final Quiz: 3/3 correct (with minor clarification needed on TP params).