Skip to content

Conversation

@mellanon
Copy link
Contributor

@mellanon mellanon commented Feb 2, 2026

Summary

  • Adds headless mode to all interactive SpecFlow commands (specify, plan, tasks, executor)
  • New specflow pipeline <feature-id> command runs the full lifecycle autonomously
  • Builds on the headless Doctorow Gate from PR feat: AI-powered headless Doctorow Gate #6

What Changed

New: lib/headless.ts — Shared headless Claude runner

  • isHeadlessMode() detects !process.stdin.isTTY || SPECFLOW_HEADLESS=true
  • runClaudeHeadless() uses claude -p --output-format json for reliable output
  • Configurable model via SPECFLOW_MODEL env var (default: Opus)
  • Timeout handling with Bun.spawn

New: commands/pipeline.ts — Full lifecycle command

  • specflow pipeline F-3 runs specify → plan → tasks → implement → complete
  • --stop-after <phase> for partial runs
  • Forces headless mode for all phases

Modified: specify.ts, plan.ts, tasks.ts, executor.ts

  • Each command's runClaude() now checks isHeadlessMode()
  • Headless branch delegates to runClaudeHeadless() with phase-specific system prompts
  • Interactive mode is completely unchanged
  • tasks.ts auto-sets autoChain: always in headless mode

Environment Variables

Variable Purpose Default
SPECFLOW_HEADLESS Force headless mode false
SPECFLOW_MODEL Model for headless calls claude-opus-4-5-20251101
SPECFLOW_DOCTOROW_MODEL Model for Doctorow Gate Uses SPECFLOW_MODEL

Test plan

  • SPECFLOW_HEADLESS=true specflow specify F-X generates spec.md without interaction
  • SPECFLOW_HEADLESS=true specflow plan F-X generates plan.md
  • SPECFLOW_HEADLESS=true specflow tasks F-X generates tasks.md
  • specflow pipeline F-X runs full lifecycle
  • Interactive mode still works unchanged (TTY detected)
  • specflow pipeline F-X --stop-after plan stops after plan phase

🤖 Generated with Claude Code

mellanon and others added 5 commits February 2, 2026 12:44
Previously, all four verify.md sections (Pre-Verification Checklist,
Smoke Test Results, Browser Verification, API Verification) required
substantive content, forcing users to --force bypass for CLI-only
features with no browser or API. Now sections containing "N/A",
"Not applicable", "Not required", or "CLI only" are accepted as valid.
Section headings must still exist.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds automatic AI evaluation of Doctorow Gate checks when running in
non-TTY environments (CI/CD, agent pipelines). Uses claude -p with
Haiku for fast, cheap evaluation. Falls back to pass-by-default on
AI failure to avoid blocking pipelines.

Closes #5

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Default to Sonnet (claude-sonnet-4-20250514) for better reasoning on
quality checks. Override via SPECFLOW_DOCTOROW_MODEL env var.

Supported models:
- claude-haiku-4-5-20251001 (fast/cheap)
- claude-sonnet-4-20250514 (balanced, default)
- claude-opus-4-5-20251101 (deep reasoning)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add --output-format json to claude -p invocation to ensure parseable
  output in environments with CLAUDE.md hooks/skills configured
- Change default model from Sonnet to Opus for deeper quality reasoning
- Model remains configurable via SPECFLOW_DOCTOROW_MODEL env var

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable CI/headless execution of specify, plan, tasks, and implement
commands by detecting non-TTY/SPECFLOW_HEADLESS and routing through
`claude -p --output-format json` instead of interactive sessions.

- F-1: Shared headless runner (lib/headless.ts) with isHeadlessMode()
  and runClaudeHeadless() using Bun.spawn + JSON envelope extraction
- F-2: Headless plan command with system prompt for file writing
- F-3: Headless tasks command with forced autoChain=always
- F-4: Headless specify command with auto batch mode detection
- F-5: Headless executor for executeFeature/executeFeatureStreaming
- F-6: Pipeline command (specflow pipeline <id>) running full sequence

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Owner

@jcfischer jcfischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Council Review: PR #7 — Full headless mode for autonomous SpecFlow pipelines

Verdict: MERGE WITH CHANGES | Confidence: HIGH
Council: Engineer, Architect, Security, Operations (4 agents, 1 round each)

Must-Fix (blocking merge)

1. Verify headless mode can actually write files

Critical observation: interactive mode uses --dangerously-skip-permissions but headless mode does NOT:

// Interactive (existing):
spawn("claude", ["--print", "--dangerously-skip-permissions", prompt], ...)

// Headless (new):
Bun.spawn(["claude", "-p", "--output-format", "json", "--model", model, ...])

Headless mode is more restricted than interactive (good for security), but phases need to write spec.md, plan.md, tasks.md to disk. Without --dangerously-skip-permissions, Claude will hit permission prompts that can't be answered in non-TTY mode.

Options:

  • A. Add the flag (reduces security but enables functionality)
  • B. Restructure prompts so Claude returns content via stdout and calling code handles file writes (architecturally superior)

This needs to be tested before merge — the feature may not work as described.

2. Default model → Sonnet, not Opus

Cost per pipeline run with Opus (9 API calls): **$5-8**. With Sonnet: ~$1.50-2.50.

If a CI pipeline triggers on every push, this is a cost amplification vector. A contributor pushing frequently generates significant API spend. The system prompts are too generic for Opus to add value:

"You are a technical planning agent. Follow the instructions exactly."

Default Sonnet, Opus opt-in via SPECFLOW_MODEL.

3. Replace process.env mutation with explicit parameter

// Current (pipeline.ts line 85):
process.env.SPECFLOW_HEADLESS = "true";  // global side effect

This mutates the process environment permanently. It works today because the pipeline is a top-level CLI command, but it's architecturally fragile.

Required change: Pass { headless: true } through each command's options type:

// pipeline.ts:
await specifyCommand(featureId, { batch: true, headless: true });
await planCommand(featureId, { headless: true });

// isHeadlessMode becomes:
function isHeadlessMode(options?: { headless?: boolean }): boolean {
  return options?.headless || !process.stdin.isTTY || process.env.SPECFLOW_HEADLESS === "true";
}

Small change, large architectural benefit.

Should-Fix (recommended)

4. DRY the spawn+timeout pattern

doctorow.ts has evaluateCheckWithAI() with a nearly identical Bun.spawn + timeout + Promise.race pattern to lib/headless.ts. Also, extractJsonFromResponse() lives in doctorow.ts but is imported by headless.ts. Extract to shared location; have doctorow use runClaudeHeadless() internally.

5. Separate the migration commit

embedded.ts includes a new contrib_prep_state table unrelated to headless mode. Mixing schema migrations with feature changes makes rollback harder.

6. Add unit tests for pipeline and headless runner

400 lines of tests is good coverage for verify.md and doctorow headless, but there are zero tests for runClaudeHeadless(), the modified runClaude() functions, or the pipeline command. The test plan in the PR body is all unchecked manual checkboxes.

7. Add single-retry on transient failures

No retry logic anywhere. A transient API error causes process.exit(1). One retry in runClaudeHeadless() for exit code != 0 with no output would be a 5-line improvement.

What's Solid

  • Pipeline command is the right abstraction for v1: simple, sequential, --stop-after for partial runs
  • No rollback is correct — SpecFlow operates on files, "rollback" is re-run the phase
  • Headless mode is additive — interactive mode completely untouched
  • lib/headless.ts is a clean shared module with good interfaces
  • Headless mode is MORE restricted than interactive (no --dangerously-skip-permissions) — correct security posture

Cost Summary

Model Per-pipeline cost Monthly (5/day)
Opus (current default) $5-8 $750-1200
Sonnet (recommended) $1.50-2.50 $225-375
Haiku $0.30-0.50 $45-75

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants