GitHub - teddyjfpender/skill-issues

███████╗██╗  ██╗██╗██╗     ██╗          ██╗███████╗███████╗██╗   ██╗███████╗███████╗
██╔════╝██║ ██╔╝██║██║     ██║          ██║██╔════╝██╔════╝██║   ██║██╔════╝██╔════╝
███████╗█████╔╝ ██║██║     ██║          ██║███████╗███████╗██║   ██║█████╗  ███████╗
╚════██║██╔═██╗ ██║██║     ██║          ██║╚════██║╚════██║██║   ██║██╔══╝  ╚════██║
███████║██║  ██╗██║███████╗███████╗     ██║███████║███████║╚██████╔╝███████╗███████║
╚══════╝╚═╝  ╚═╝╚═╝╚══════╝╚══════╝     ╚═╝╚══════╝╚══════╝ ╚═════╝ ╚══════╝╚══════╝

DSL AI Code Generation Evaluation Framework

This repository contains two main components:

Skills - Reusable knowledge packages that improve AI code generation quality
Eval Harness - A multi-stage evaluation system for measuring and improving AI-generated code

Quick Start: Step Loop Example

The step-loop is our primary evaluation tool. It breaks complex coding tasks into incremental steps, validates each step, and produces production-quality code.

Example: Trapping Rain Water in Cairo

./skill-issues run cairo-trapping-rain-water-01

That's it. The CLI infers paths, applies default skills, and runs the full evaluation.

With options:

./skill-issues run cairo-trapping-rain-water-01 \
  -m claude-opus-4-20250514 \
  --clean \
  -v

Other commands:

./skill-issues list                              # Show available prompts
./skill-issues status cairo-trapping-rain-water-01   # Check run status
./skill-issues clean cairo-trapping-rain-water-01    # Remove generated files

This command:

Reads a 6-step prompt (brute force → DP → two-pointer optimization)
Generates code incrementally, validating each step with scarb build
Runs tests with snforge test at completion
Produces a modular multi-file project structure
Applies cairo-quirks and cairo-quality skills for better output

What Gets Generated

eval/work/cairo-trapping-rain-water-01/
├── Scarb.toml
├── src/
│   ├── lib.cairo           # Module exports
│   └── solution.cairo      # Implementation (3 algorithms)
└── tests/
    └── test_lib.cairo      # 17+ integration tests

The generated solution.cairo includes:

trap_brute_force() - O(n²) time, O(1) space
trap_dp() - O(n) time, O(n) space
trap() - O(n) time, O(1) space (optimal two-pointer)
Full documentation with complexity analysis
Comprehensive test coverage

Why This Matters

The Problem

AI code generators often produce code that:

Compiles but has subtle bugs
Uses suboptimal algorithms
Has poor structure (everything in one file)
Lacks documentation and tests
Contains unused imports and lint warnings

The Solution

This system addresses these issues through:

Incremental validation - Each step must compile before proceeding
Skills - Domain knowledge injected into prompts
Multi-file structure - Proper separation of concerns
Quality skills - Guidelines for DRY, complexity, documentation

Repository Structure

skill-issues/
├── skills/                    # Reusable skill packages
│   ├── cairo-quirks/         # Cairo language patterns
│   └── cairo-quality/        # Code quality guidelines
├── eval/
│   ├── prompts/              # Task definitions (one per file)
│   ├── rubrics/              # Pass/fail criteria
│   ├── work/                 # Generated projects (gitignored)
│   └── ralph/
│       ├── step-loop.sh      # Main evaluation runner
│       └── .state/           # Execution state (gitignored)
└── dist/                     # Packaged .skill files

Installation

Skills Installation

Option A — User-scoped (available in all repos)

mkdir -p ~/.codex/skills
cp -R ./skills/cairo-* ~/.codex/skills/

Option B — Repo-scoped (checked into this repo)

mkdir -p ./.codex/skills
cp -R ./skills/cairo-* ./.codex/skills/

Using packaged .skill files

mkdir -p ~/.codex/skills
unzip ./dist/cairo-*.skill -d ~/.codex/skills

Prerequisites for Eval Harness

Scarb - Cairo package manager
snforge - Starknet testing framework
claude CLI or codex CLI for AI backends

Documentation

Eval Harness Overview - Full evaluation system docs
Step Loop Guide - Detailed step-loop documentation
Prompts Guide - How to write prompts
Rubrics Guide - How to write rubrics (also see eval/rubrics/)

End-to-End Flow

┌─────────────────────────────────────────────────────────────────┐
│                        step-loop.sh                              │
├─────────────────────────────────────────────────────────────────┤
│  1. Parse prompt into steps                                      │
│  2. Scaffold project (scarb new)                                 │
│  3. For each step:                                               │
│     a. Build prompt with accumulated code + skills               │
│     b. Call LLM backend (claude/codex)                           │
│     c. Extract code from response                                │
│     d. Write to project files                                    │
│     e. Validate (scarb check → scarb build)                      │
│     f. On failure: retry with error feedback (up to 3x)          │
│     g. Record metrics                                            │
│  4. Run tests (snforge test)                                     │
│  5. Run linter (scarb lint)                                      │
│  6. Output final metrics                                         │
└─────────────────────────────────────────────────────────────────┘

Skills

Skills are markdown files that provide domain-specific knowledge to improve code generation.

cairo-quirks

Cairo language patterns and common pitfalls:

Array immutability and ownership
Felt252 vs u256 usage
Storage patterns for Starknet
Common compiler errors and fixes

cairo-quality

Code quality guidelines:

Algorithm documentation (time/space complexity)
DRY principles
Unused import prevention
Naming conventions
Test quality standards

Metrics

Each run produces metrics at .state/<project>/metrics.json:

{
  "prompt_id": "cairo-trapping-rain-water-01",
  "total_steps": 6,
  "steps_completed": 6,
  "total_iterations": 6,
  "lint_warnings": 0,
  "tests_passed": 17,
  "tests_failed": 0,
  "status": "completed"
}

Contributing

Add a prompt: Create eval/prompts/<id>.md with step-by-step tasks
Add a rubric: Create eval/rubrics/<id>.md with pass/fail criteria
Run evaluation: Use step-loop to test generation quality
Improve skills: Add patterns that fix common failures

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.claude/skills		.claude/skills
dist		dist
eval		eval
scratch-pad		scratch-pad
skills		skills
tools		tools
.gitignore		.gitignore
README.md		README.md
skill-issues		skill-issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Start: Step Loop Example

Example: Trapping Rain Water in Cairo

What Gets Generated

Why This Matters

The Problem

The Solution

Repository Structure

Installation

Skills Installation

Prerequisites for Eval Harness

Documentation

End-to-End Flow

Skills

cairo-quirks

cairo-quality

Metrics

Contributing

License

About

Uh oh!

Releases

Packages

Languages

teddyjfpender/skill-issues

Folders and files

Latest commit

History

Repository files navigation

Quick Start: Step Loop Example

Example: Trapping Rain Water in Cairo

What Gets Generated

Why This Matters

The Problem

The Solution

Repository Structure

Installation

Skills Installation

Prerequisites for Eval Harness

Documentation

End-to-End Flow

Skills

cairo-quirks

cairo-quality

Metrics

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages