feat: Full headless mode for autonomous SpecFlow pipelines #7

mellanon · 2026-02-02T02:46:42Z

Summary

Adds headless mode to all interactive SpecFlow commands (specify, plan, tasks, executor)
New specflow pipeline <feature-id> command runs the full lifecycle autonomously
Builds on the headless Doctorow Gate from PR feat: AI-powered headless Doctorow Gate #6

What Changed

New: lib/headless.ts — Shared headless Claude runner

isHeadlessMode() detects !process.stdin.isTTY || SPECFLOW_HEADLESS=true
runClaudeHeadless() uses claude -p --output-format json for reliable output
Configurable model via SPECFLOW_MODEL env var (default: Opus)
Timeout handling with Bun.spawn

New: commands/pipeline.ts — Full lifecycle command

specflow pipeline F-3 runs specify → plan → tasks → implement → complete
--stop-after <phase> for partial runs
Forces headless mode for all phases

Modified: specify.ts, plan.ts, tasks.ts, executor.ts

Each command's runClaude() now checks isHeadlessMode()
Headless branch delegates to runClaudeHeadless() with phase-specific system prompts
Interactive mode is completely unchanged
tasks.ts auto-sets autoChain: always in headless mode

Environment Variables

Variable	Purpose	Default
`SPECFLOW_HEADLESS`	Force headless mode	`false`
`SPECFLOW_MODEL`	Model for headless calls	`claude-opus-4-5-20251101`
`SPECFLOW_DOCTOROW_MODEL`	Model for Doctorow Gate	Uses SPECFLOW_MODEL

Test plan

SPECFLOW_HEADLESS=true specflow specify F-X generates spec.md without interaction
SPECFLOW_HEADLESS=true specflow plan F-X generates plan.md
SPECFLOW_HEADLESS=true specflow tasks F-X generates tasks.md
specflow pipeline F-X runs full lifecycle
Interactive mode still works unchanged (TTY detected)
specflow pipeline F-X --stop-after plan stops after plan phase

🤖 Generated with Claude Code

Previously, all four verify.md sections (Pre-Verification Checklist, Smoke Test Results, Browser Verification, API Verification) required substantive content, forcing users to --force bypass for CLI-only features with no browser or API. Now sections containing "N/A", "Not applicable", "Not required", or "CLI only" are accepted as valid. Section headings must still exist. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Adds automatic AI evaluation of Doctorow Gate checks when running in non-TTY environments (CI/CD, agent pipelines). Uses claude -p with Haiku for fast, cheap evaluation. Falls back to pass-by-default on AI failure to avoid blocking pipelines. Closes #5 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Default to Sonnet (claude-sonnet-4-20250514) for better reasoning on quality checks. Override via SPECFLOW_DOCTOROW_MODEL env var. Supported models: - claude-haiku-4-5-20251001 (fast/cheap) - claude-sonnet-4-20250514 (balanced, default) - claude-opus-4-5-20251101 (deep reasoning) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add --output-format json to claude -p invocation to ensure parseable output in environments with CLAUDE.md hooks/skills configured - Change default model from Sonnet to Opus for deeper quality reasoning - Model remains configurable via SPECFLOW_DOCTOROW_MODEL env var Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Enable CI/headless execution of specify, plan, tasks, and implement commands by detecting non-TTY/SPECFLOW_HEADLESS and routing through `claude -p --output-format json` instead of interactive sessions. - F-1: Shared headless runner (lib/headless.ts) with isHeadlessMode() and runClaudeHeadless() using Bun.spawn + JSON envelope extraction - F-2: Headless plan command with system prompt for file writing - F-3: Headless tasks command with forced autoChain=always - F-4: Headless specify command with auto batch mode detection - F-5: Headless executor for executeFeature/executeFeatureStreaming - F-6: Pipeline command (specflow pipeline <id>) running full sequence Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

jcfischer

Council Review: PR #7 — Full headless mode for autonomous SpecFlow pipelines

Verdict: MERGE WITH CHANGES | Confidence: HIGH
Council: Engineer, Architect, Security, Operations (4 agents, 1 round each)

Must-Fix (blocking merge)

1. Verify headless mode can actually write files

Critical observation: interactive mode uses --dangerously-skip-permissions but headless mode does NOT:

// Interactive (existing):
spawn("claude", ["--print", "--dangerously-skip-permissions", prompt], ...)

// Headless (new):
Bun.spawn(["claude", "-p", "--output-format", "json", "--model", model, ...])

Headless mode is more restricted than interactive (good for security), but phases need to write spec.md, plan.md, tasks.md to disk. Without --dangerously-skip-permissions, Claude will hit permission prompts that can't be answered in non-TTY mode.

Options:

A. Add the flag (reduces security but enables functionality)
B. Restructure prompts so Claude returns content via stdout and calling code handles file writes (architecturally superior)

This needs to be tested before merge — the feature may not work as described.

2. Default model → Sonnet, not Opus

Cost per pipeline run with Opus (9 API calls): **$5-8**. With Sonnet: ~$1.50-2.50.

If a CI pipeline triggers on every push, this is a cost amplification vector. A contributor pushing frequently generates significant API spend. The system prompts are too generic for Opus to add value:

"You are a technical planning agent. Follow the instructions exactly."

Default Sonnet, Opus opt-in via SPECFLOW_MODEL.

3. Replace process.env mutation with explicit parameter

// Current (pipeline.ts line 85):
process.env.SPECFLOW_HEADLESS = "true";  // global side effect

This mutates the process environment permanently. It works today because the pipeline is a top-level CLI command, but it's architecturally fragile.

Required change: Pass { headless: true } through each command's options type:

// pipeline.ts:
await specifyCommand(featureId, { batch: true, headless: true });
await planCommand(featureId, { headless: true });

// isHeadlessMode becomes:
function isHeadlessMode(options?: { headless?: boolean }): boolean {
  return options?.headless || !process.stdin.isTTY || process.env.SPECFLOW_HEADLESS === "true";
}

Small change, large architectural benefit.

Should-Fix (recommended)

4. DRY the spawn+timeout pattern

doctorow.ts has evaluateCheckWithAI() with a nearly identical Bun.spawn + timeout + Promise.race pattern to lib/headless.ts. Also, extractJsonFromResponse() lives in doctorow.ts but is imported by headless.ts. Extract to shared location; have doctorow use runClaudeHeadless() internally.

5. Separate the migration commit

embedded.ts includes a new contrib_prep_state table unrelated to headless mode. Mixing schema migrations with feature changes makes rollback harder.

6. Add unit tests for pipeline and headless runner

400 lines of tests is good coverage for verify.md and doctorow headless, but there are zero tests for runClaudeHeadless(), the modified runClaude() functions, or the pipeline command. The test plan in the PR body is all unchecked manual checkboxes.

7. Add single-retry on transient failures

No retry logic anywhere. A transient API error causes process.exit(1). One retry in runClaudeHeadless() for exit code != 0 with no output would be a 5-line improvement.

What's Solid

Pipeline command is the right abstraction for v1: simple, sequential, --stop-after for partial runs
No rollback is correct — SpecFlow operates on files, "rollback" is re-run the phase
Headless mode is additive — interactive mode completely untouched
lib/headless.ts is a clean shared module with good interfaces
Headless mode is MORE restricted than interactive (no --dangerously-skip-permissions) — correct security posture

Cost Summary

Model	Per-pipeline cost	Monthly (5/day)
Opus (current default)	$5-8	$750-1200
Sonnet (recommended)	$1.50-2.50	$225-375
Haiku	$0.30-0.50	$45-75

mellanon and others added 5 commits February 2, 2026 12:44

jcfischer requested changes Feb 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Full headless mode for autonomous SpecFlow pipelines #7

feat: Full headless mode for autonomous SpecFlow pipelines #7

Uh oh!

mellanon commented Feb 2, 2026

Uh oh!

jcfischer left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Full headless mode for autonomous SpecFlow pipelines #7

Are you sure you want to change the base?

feat: Full headless mode for autonomous SpecFlow pipelines #7

Uh oh!

Conversation

mellanon commented Feb 2, 2026

Summary

What Changed

Environment Variables

Test plan

Uh oh!

jcfischer left a comment

Choose a reason for hiding this comment

Council Review: PR #7 — Full headless mode for autonomous SpecFlow pipelines

Must-Fix (blocking merge)

Should-Fix (recommended)

What's Solid

Cost Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants