-
Notifications
You must be signed in to change notification settings - Fork 2
feat: Full headless mode for autonomous SpecFlow pipelines #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Previously, all four verify.md sections (Pre-Verification Checklist, Smoke Test Results, Browser Verification, API Verification) required substantive content, forcing users to --force bypass for CLI-only features with no browser or API. Now sections containing "N/A", "Not applicable", "Not required", or "CLI only" are accepted as valid. Section headings must still exist. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds automatic AI evaluation of Doctorow Gate checks when running in non-TTY environments (CI/CD, agent pipelines). Uses claude -p with Haiku for fast, cheap evaluation. Falls back to pass-by-default on AI failure to avoid blocking pipelines. Closes #5 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Default to Sonnet (claude-sonnet-4-20250514) for better reasoning on quality checks. Override via SPECFLOW_DOCTOROW_MODEL env var. Supported models: - claude-haiku-4-5-20251001 (fast/cheap) - claude-sonnet-4-20250514 (balanced, default) - claude-opus-4-5-20251101 (deep reasoning) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add --output-format json to claude -p invocation to ensure parseable output in environments with CLAUDE.md hooks/skills configured - Change default model from Sonnet to Opus for deeper quality reasoning - Model remains configurable via SPECFLOW_DOCTOROW_MODEL env var Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable CI/headless execution of specify, plan, tasks, and implement commands by detecting non-TTY/SPECFLOW_HEADLESS and routing through `claude -p --output-format json` instead of interactive sessions. - F-1: Shared headless runner (lib/headless.ts) with isHeadlessMode() and runClaudeHeadless() using Bun.spawn + JSON envelope extraction - F-2: Headless plan command with system prompt for file writing - F-3: Headless tasks command with forced autoChain=always - F-4: Headless specify command with auto batch mode detection - F-5: Headless executor for executeFeature/executeFeatureStreaming - F-6: Pipeline command (specflow pipeline <id>) running full sequence Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
jcfischer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Council Review: PR #7 — Full headless mode for autonomous SpecFlow pipelines
Verdict: MERGE WITH CHANGES | Confidence: HIGH
Council: Engineer, Architect, Security, Operations (4 agents, 1 round each)
Must-Fix (blocking merge)
1. Verify headless mode can actually write files
Critical observation: interactive mode uses --dangerously-skip-permissions but headless mode does NOT:
// Interactive (existing):
spawn("claude", ["--print", "--dangerously-skip-permissions", prompt], ...)
// Headless (new):
Bun.spawn(["claude", "-p", "--output-format", "json", "--model", model, ...])Headless mode is more restricted than interactive (good for security), but phases need to write spec.md, plan.md, tasks.md to disk. Without --dangerously-skip-permissions, Claude will hit permission prompts that can't be answered in non-TTY mode.
Options:
- A. Add the flag (reduces security but enables functionality)
- B. Restructure prompts so Claude returns content via stdout and calling code handles file writes (architecturally superior)
This needs to be tested before merge — the feature may not work as described.
2. Default model → Sonnet, not Opus
Cost per pipeline run with Opus (9 API calls): **$5-8**. With Sonnet: ~$1.50-2.50.
If a CI pipeline triggers on every push, this is a cost amplification vector. A contributor pushing frequently generates significant API spend. The system prompts are too generic for Opus to add value:
"You are a technical planning agent. Follow the instructions exactly."
Default Sonnet, Opus opt-in via SPECFLOW_MODEL.
3. Replace process.env mutation with explicit parameter
// Current (pipeline.ts line 85):
process.env.SPECFLOW_HEADLESS = "true"; // global side effectThis mutates the process environment permanently. It works today because the pipeline is a top-level CLI command, but it's architecturally fragile.
Required change: Pass { headless: true } through each command's options type:
// pipeline.ts:
await specifyCommand(featureId, { batch: true, headless: true });
await planCommand(featureId, { headless: true });
// isHeadlessMode becomes:
function isHeadlessMode(options?: { headless?: boolean }): boolean {
return options?.headless || !process.stdin.isTTY || process.env.SPECFLOW_HEADLESS === "true";
}Small change, large architectural benefit.
Should-Fix (recommended)
4. DRY the spawn+timeout pattern
doctorow.ts has evaluateCheckWithAI() with a nearly identical Bun.spawn + timeout + Promise.race pattern to lib/headless.ts. Also, extractJsonFromResponse() lives in doctorow.ts but is imported by headless.ts. Extract to shared location; have doctorow use runClaudeHeadless() internally.
5. Separate the migration commit
embedded.ts includes a new contrib_prep_state table unrelated to headless mode. Mixing schema migrations with feature changes makes rollback harder.
6. Add unit tests for pipeline and headless runner
400 lines of tests is good coverage for verify.md and doctorow headless, but there are zero tests for runClaudeHeadless(), the modified runClaude() functions, or the pipeline command. The test plan in the PR body is all unchecked manual checkboxes.
7. Add single-retry on transient failures
No retry logic anywhere. A transient API error causes process.exit(1). One retry in runClaudeHeadless() for exit code != 0 with no output would be a 5-line improvement.
What's Solid
- Pipeline command is the right abstraction for v1: simple, sequential,
--stop-afterfor partial runs - No rollback is correct — SpecFlow operates on files, "rollback" is re-run the phase
- Headless mode is additive — interactive mode completely untouched
lib/headless.tsis a clean shared module with good interfaces- Headless mode is MORE restricted than interactive (no
--dangerously-skip-permissions) — correct security posture
Cost Summary
| Model | Per-pipeline cost | Monthly (5/day) |
|---|---|---|
| Opus (current default) | $5-8 | $750-1200 |
| Sonnet (recommended) | $1.50-2.50 | $225-375 |
| Haiku | $0.30-0.50 | $45-75 |
Summary
specflow pipeline <feature-id>command runs the full lifecycle autonomouslyWhat Changed
New:
lib/headless.ts— Shared headless Claude runnerisHeadlessMode()detects!process.stdin.isTTY || SPECFLOW_HEADLESS=truerunClaudeHeadless()usesclaude -p --output-format jsonfor reliable outputSPECFLOW_MODELenv var (default: Opus)Bun.spawnNew:
commands/pipeline.ts— Full lifecycle commandspecflow pipeline F-3runs specify → plan → tasks → implement → complete--stop-after <phase>for partial runsModified: specify.ts, plan.ts, tasks.ts, executor.ts
runClaude()now checksisHeadlessMode()runClaudeHeadless()with phase-specific system promptsautoChain: alwaysin headless modeEnvironment Variables
SPECFLOW_HEADLESSfalseSPECFLOW_MODELclaude-opus-4-5-20251101SPECFLOW_DOCTOROW_MODELTest plan
SPECFLOW_HEADLESS=true specflow specify F-Xgenerates spec.md without interactionSPECFLOW_HEADLESS=true specflow plan F-Xgenerates plan.mdSPECFLOW_HEADLESS=true specflow tasks F-Xgenerates tasks.mdspecflow pipeline F-Xruns full lifecyclespecflow pipeline F-X --stop-after planstops after plan phase🤖 Generated with Claude Code