mirror of
https://github.com/claude-code-best/claude-code.git
synced 2026-06-22 16:25:51 +00:00
fix: 修复 DeepSeek V4 reasoning_content 回传导致的 400 错误
- 扩大模型名称检测范围,匹配所有 deepseek 模型(V4、R1 等) - 始终保留 thinking blocks 为 reasoning_content 回传给 API - 移除有 bug 的 turn boundary 剥离逻辑 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
25
teach-me/vllm/session.md
Normal file
25
teach-me/vllm/session.md
Normal file
@@ -0,0 +1,25 @@
|
||||
# Session: vLLM Inference Optimization
|
||||
- Level: Beginner (Target: Inference Optimization)
|
||||
- Started: 2026-04-24
|
||||
- Status: Mastered
|
||||
|
||||
## Concepts
|
||||
1. ✅ LLM 推理的两个阶段 (Prefill vs Decode)
|
||||
2. ✅ KV Cache
|
||||
3. ✅ 显存瓶颈与碎片化
|
||||
4. ✅ PagedAttention
|
||||
5. ✅ vLLM 架构 (Scheduler, Worker)
|
||||
6. ✅ 实战部署 (--dtype, openai api)
|
||||
7. ✅ 量化 (AWQ/GPTQ vs 暴力 dtype)
|
||||
8. ✅ Tensor Parallel (TP, NCCL)
|
||||
9. ✅ 性能参数 (--gpu-memory-utilization)
|
||||
10. ✅ Chunked Prefill
|
||||
|
||||
## Misconceptions
|
||||
- [Chunked Prefill]: 原以为主要目的是降低显存。
|
||||
- 纠正:确实降低了**峰值激活显存**,但核心目的是降低**Latency (卡顿感)**。
|
||||
|
||||
## Log
|
||||
- Diagnosed: Beginner
|
||||
- Mastery: Intuitive understanding of memory constraints and fragmentation is strong.
|
||||
- Final Quiz: 3/3 correct (with minor clarification needed on TP params).
|
||||
Reference in New Issue
Block a user