feat(workflow): 默认并发降为 3 并支持 per-run maxConcurrency 注入

- DEFAULT_MAX_CONCURRENCY=3 替代旧的 min(16, cores-2);MAX_CONCURRENCY_CAP=16 保留为用户输入的绝对上限
- 新增 clampMaxConcurrency() 处理 undefined/<1/>CAP 边界
- WorkflowInput schema 新增 maxConcurrency: number.int().min(1).max(16).optional()
- 引擎层 context/runWorkflow 全链路透传:semaphore 容量来自 per-run 入参
- WorkflowTool prompt 增加指引:fan-out 场景先用 AskUserQuestion 与用户确认并发再启动
- 同步 ultracode skill + audit workflow spec 的并发文字(删 cpu-cores 公式)
- 同步 docs/features/workflow-scripts.md 旧公式

Co-Authored-By: glm-5.2 <zai-org@claude-code-best.win>
This commit is contained in:
claude-code-best
2026-06-14 10:16:29 +08:00
parent b5ead59e72
commit 3edc370aa1
15 changed files with 219 additions and 32 deletions

View File

@@ -74,7 +74,7 @@ Script body hooks:
- \`budget: {total: number|null, spent(): number, remaining(): number}\` — the turn's token target from the user's "+500k"-style directive. \`budget.total\` is null if no target was set. \`budget.spent()\` returns output tokens spent this turn across the main loop and all workflows — the pool is shared, not per-workflow. \`budget.remaining()\` returns \`max(0, total - spent())\`, or \`Infinity\` if no target. The target is a HARD ceiling, not advisory: once \`spent()\` reaches \`total\`, further \`agent()\` calls throw. Use for dynamic loops: \`while (budget.total && budget.remaining() > 50_000) { ... }\`, or static scaling: \`const FLEET = budget.total ? Math.floor(budget.total / 100_000) : 5\`.
- \`workflow(nameOrRef: string | {scriptPath: string}, args?: any): Promise<any>\` — run another workflow inline as a sub-step and return whatever it returns. Pass a name to invoke a saved workflow (same registry as {name: "..."}), or {scriptPath} to run a script file you Wrote earlier. The child shares this run's concurrency cap, agent counter, abort signal, and token budget — its agents appear under a "▸ name" group in /workflows and its tokens count toward budget.spent(). The args param becomes the child's \`args\` global. Nesting is one level only: workflow() inside a child throws. Throws on unknown name / unreadable scriptPath / child syntax error; catch to handle gracefully.
Concurrent agent() calls are capped at min(16, cpu cores - 2) per workflow — excess calls queue and run as slots free up. You can still pass 100 items to parallel()/pipeline() and they all complete; only ~10 run at any moment. Total agent count across a workflow's lifetime is capped at 1000 — a runaway-loop backstop set far above any real workflow. A single parallel()/pipeline() call accepts at most 4096 items; passing more is an explicit error, not a silent truncation.
Concurrent agent() calls are capped at 3 by default per workflow — excess calls queue and run as slots free up. The Workflow tool accepts an optional \`maxConcurrency\` input (116) to override per-run; if the user hasn't specified and the workflow fans out (large parallel/pipeline, multi-dimensional review), ask them via AskUserQuestion before launching — e.g. offer 3 / 6 / 9 as choices. You can still pass 100 items to parallel()/pipeline() and they all complete; only the configured number run at any moment. Total agent count across a workflow's lifetime is capped at 1000 — a runaway-loop backstop set far above any real workflow. A single parallel()/pipeline() call accepts at most 4096 items; passing more is an explicit error, not a silent truncation.
Subagents are told their final text IS the return value (not a human-facing message), so they return raw data. For structured output, use the schema option — validation happens at the tool-call layer so the model retries on mismatch.