fix: 修复 DeepSeek V4 reasoning_content 回传导致的 400 错误

- 扩大模型名称检测范围，匹配所有 deepseek 模型（V4、R1 等） - 始终保留 thinking blocks 为 reasoning_content 回传给 API - 移除有 bug 的 turn boundary 剥离逻辑 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-22 16:25:51 +00:00 · 2026-04-24 19:49:26 +08:00
parent da6d06365d
commit 047c85fcbf
3 changed files with 120 additions and 0 deletions
--- a/teach-me/vllm/session.md
+++ b/teach-me/vllm/session.md
@@ -0,0 +1,25 @@
+# Session: vLLM Inference Optimization
+- Level: Beginner (Target: Inference Optimization)
+- Started: 2026-04-24
+- Status: Mastered
+
+## Concepts
+1. ✅ LLM 推理的两个阶段 (Prefill vs Decode)
+2. ✅ KV Cache
+3. ✅ 显存瓶颈与碎片化
+4. ✅ PagedAttention
+5. ✅ vLLM 架构 (Scheduler, Worker)
+6. ✅ 实战部署 (--dtype, openai api)
+7. ✅ 量化 (AWQ/GPTQ vs 暴力 dtype)
+8. ✅ Tensor Parallel (TP, NCCL)
+9. ✅ 性能参数 (--gpu-memory-utilization)
+10. ✅ Chunked Prefill
+
+## Misconceptions
+- [Chunked Prefill]: 原以为主要目的是降低显存。
+  - 纠正：确实降低了**峰值激活显存**，但核心目的是降低**Latency (卡顿感)**。
+
+## Log
+- Diagnosed: Beginner
+- Mastery: Intuitive understanding of memory constraints and fragmentation is strong.
+- Final Quiz: 3/3 correct (with minor clarification needed on TP params).