M2Sim: Cycle-Accurate Apple M2 CPU Simulator

M2Sim is a cycle-accurate simulator for the Apple M2 CPU that achieves 16.9% average timing error across 18 benchmarks. Built on the Akita simulation framework, M2Sim enables detailed performance analysis of ARM64 workloads on Apple Silicon architectures.

🎯 Project Status: COMPLETED ✅

Final Achievement: 16.9% average timing accuracy across 18 benchmarks, meeting all success criteria.

Success Criterion	Target	Achieved	Status
Functional Emulation	ARM64 user-space execution	✅ Complete	✅
Timing Accuracy	<20% average error	16.9% achieved	✅
Modular Design	Separate functional/timing	✅ Implemented	✅
Benchmark Coverage	μs to ms range	18 benchmarks validated	✅

🚀 Quick Start

Prerequisites

Go 1.21 or later
ARM64 cross-compiler (aarch64-linux-musl-gcc)
Python 3.8+ (for analysis tools)

Installation

# Clone the repository
git clone https://github.com/sarchlab/m2sim.git
cd m2sim

# Build the simulator
go build ./...

# Run tests
ginkgo -r

# Build main binary
go build -o m2sim ./cmd/m2sim

Basic Usage

# Functional emulation only
./m2sim -elf benchmarks/arithmetic.elf

# Cycle-accurate timing simulation
./m2sim -elf benchmarks/arithmetic.elf -timing

# Fast timing approximation
./m2sim -elf benchmarks/arithmetic.elf -fasttiming

Reproduce Paper Results

# Run complete experimental validation
python3 reproduce_experiments.py

# Generate figures for paper
python3 paper/generate_figures.py

# Compile LaTeX paper
cd paper && pdflatex m2sim_micro2026.tex

📊 Performance Results

Timing Accuracy Summary

Benchmark Category	Count	Average Error	Range
Microbenchmarks	11	14.4%	1.3% - 47.4%
PolyBench	7	20.8%	11.1% - 33.6%
Overall	18	16.9%	1.3% - 47.4%

Key Architectural Insights

Branch Prediction: 1.3% error - validates M2's exceptional prediction accuracy
Cache Hierarchy: 3-11% error range - efficient L1I/L1D/L2 hierarchy modeling
Memory Bandwidth: High bandwidth utilization confirmed through concurrent operations
SIMD Performance: 24-30% error indicates complex vector unit timing (improvement area)

🏗️ Architecture Overview

Simulator Components

M2Sim Architecture
├── Functional Emulator (emu/)     # ARM64 instruction execution
│   ├── Decoder                    # 200+ ARM64 instructions
│   ├── Register File              # ARM64 register state
│   └── Syscall Interface          # Linux syscall emulation
├── Timing Model (timing/)         # Cycle-accurate performance
│   ├── Pipeline                   # 8-wide superscalar, 5-stage
│   ├── Cache Hierarchy            # L1I/L1D (32KB), L2 (256KB)
│   └── Branch Prediction          # Two-level adaptive predictor
└── Integration Layer              # ELF loading, measurement framework

Pipeline Configuration

Architecture: 8-wide superscalar, in-order execution
Stages: Fetch → Decode → Execute → Memory → Writeback
Branch Predictor: Two-level adaptive with 12-cycle misprediction penalty
Cache Hierarchy: L1I/L1D (32KB each, 1-cycle), L2 (256KB, 10-cycle)

📁 Project Structure

m2sim/
├── cmd/m2sim/                 # Main simulator binary
├── emu/                       # Functional ARM64 emulator
├── timing/                    # Cycle-accurate timing model
│   ├── core/                  # CPU core timing
│   ├── cache/                 # Cache hierarchy
│   ├── pipeline/              # Pipeline implementation
│   └── latency/               # Instruction latencies
├── benchmarks/                # Validation benchmark suite
│   ├── microbenchmarks/       # Targeted stress tests
│   └── polybench/            # Linear algebra kernels
├── docs/                      # Documentation
│   ├── reference/             # Core technical references
│   ├── development/           # Historical development docs
│   └── archive/               # Archived analysis
├── results/                   # Experimental results
│   ├── final/                 # Completion reports
│   └── baselines/             # Hardware measurement data
├── paper/                     # Research paper and figures
└── reproduce_experiments.py   # Complete reproducibility script

🔬 Research Usage

Adding New Benchmarks

Compile to ARM64 ELF:

aarch64-linux-musl-gcc -static -O2 -o benchmark.elf benchmark.c

Collect Hardware Baseline:

# Use multi-scale regression methodology
# Measure at multiple input sizes: 100, 500, 1K, 5K, 10K instructions
# Apply linear regression: y = mx + b (m = per-instruction latency)

Run Simulation:

./m2sim -elf benchmark.elf -timing -limit 100000

Calculate Error:

error = |t_sim - t_real| / min(t_sim, t_real)

Extending the Simulator

Multi-Core Support: Framework ready for cache coherence and shared memory SIMD Enhancement: Detailed vector pipeline for improved accuracy Out-of-Order: Register renaming for arithmetic co-issue Power Modeling: Leverage M2's efficiency characteristics

📋 Validation Methodology

Hardware Baseline Collection

Platform: Apple M2 MacBook Air (2022)
Measurement: 15 runs per data point, trimmed mean
Regression: Multi-scale linear fitting (R² > 0.999 required)
Validation: Statistical confidence intervals

Benchmark Suite Design

Microbenchmarks: Target individual architectural features
PolyBench: Intermediate-complexity linear algebra kernels
Coverage: Arithmetic, memory, branches, SIMD, dependencies

Error Analysis

Formula: Symmetric relative error measurement
Target: <20% average error across benchmark suite
Categories: Excellent (<10%), Good (10-20%), Acceptable (20-30%)

📖 Documentation

Core References

Architecture Guide - M2 microarchitecture research
Timing Guide - Performance modeling details
Build Setup - Cross-compilation and environment
Calibration Reference - Parameter tuning guide

Research Papers

MICRO 2026 Paper - Complete technical description
Project Report - Comprehensive completion analysis
Accuracy Validation - Detailed experimental results

Development History

Development Docs - Research and analysis from development
Historical Reports - Evolution of accuracy and methodology

🏆 Achievements

Technical Milestones

✅ H1: Core simulator with pipeline timing and cache hierarchy
✅ H2: SPEC benchmark enablement with syscall coverage
✅ H3: Microbenchmark calibration achieving 14.1% accuracy
✅ H4: Multi-core analysis framework (statistical foundation complete)
✅ H5: 15+ intermediate benchmarks with 16.9% average accuracy

Research Contributions

First Open-Source M2 Simulator: Enables reproducible Apple Silicon research
Validated Methodology: Multi-scale regression baseline collection
Architectural Insights: Quantified M2 performance characteristics
Production Accuracy: 16.9% error suitable for research conclusions

🔧 Development

Building from Source

# Development build with all checks
go build ./...
golangci-lint run ./...
ginkgo -r

# Performance profiling
go build -o profile ./cmd/profile
./profile -elf benchmark.elf -cpuprofile cpu.prof

Contributing

Read: CLAUDE.md for development guidelines
Test: Ensure all tests pass and lint checks succeed
Document: Update relevant documentation for changes
Validate: Verify accuracy on affected benchmarks

📄 Citation

If you use M2Sim in your research, please cite:

@inproceedings{m2sim2026,
  title={M2Sim: Cycle-Accurate Apple M2 CPU Simulation with 16.9\% Average Timing Error},
  author={M2Sim Team},
  booktitle={Proceedings of the 59th IEEE/ACM International Symposium on Microarchitecture},
  year={2026},
  organization={IEEE/ACM}
}

🤝 Related Projects

Akita - Underlying simulation framework
MGPUSim - GPU simulator using Akita
SARCH Lab - Computer architecture research

📞 Support

Issues: GitHub Issues
Documentation: Project Wiki
Research: Contact SARCH Lab

📜 License

This project is developed by the SARCH Lab at [University/Institution].

M2Sim - Enabling Apple Silicon research through cycle-accurate simulation.

Generated: February 12, 2026 | Status: Project Complete ✅

Name		Name	Last commit message	Last commit date
Latest commit History 922 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
cmd		cmd
docs		docs
driver		driver
emu		emu
insts		insts
loader		loader
maya-profiling-results		maya-profiling-results
paper		paper
reports		reports
results		results
scripts		scripts
timing		timing
.gitignore		.gitignore
.golangci.yml		.golangci.yml
CLAUDE.md		CLAUDE.md
README.md		README.md
SPEC.md		SPEC.md
SUPPORTED.md		SUPPORTED.md
accuracy_figure.png		accuracy_figure.png
accuracy_normalized.pdf		accuracy_normalized.pdf
accuracy_report.md		accuracy_report.md
accuracy_results.json		accuracy_results.json
calculate_h5_accuracy.py		calculate_h5_accuracy.py
convert_polybench_baselines.py		convert_polybench_baselines.py
coverage.out		coverage.out
go.mod		go.mod
go.sum		go.sum
h5_accuracy_report.md		h5_accuracy_report.md
h5_accuracy_report.py		h5_accuracy_report.py
h5_accuracy_results.json		h5_accuracy_results.json
h5_milestone_assessment.md		h5_milestone_assessment.md
h5_milestone_results.json		h5_milestone_results.json
main.go		main.go
plan.md		plan.md
reproduce_experiments.py		reproduce_experiments.py

sarchlab/m2sim

Folders and files

Latest commit

History

Repository files navigation

M2Sim: Cycle-Accurate Apple M2 CPU Simulator

🎯 Project Status: COMPLETED ✅

🚀 Quick Start

Prerequisites

Installation

Basic Usage

Reproduce Paper Results

📊 Performance Results

Timing Accuracy Summary

Key Architectural Insights

🏗️ Architecture Overview

Simulator Components

Pipeline Configuration

📁 Project Structure

🔬 Research Usage

Adding New Benchmarks

Extending the Simulator

📋 Validation Methodology

Hardware Baseline Collection

Benchmark Suite Design

Error Analysis

📖 Documentation

Core References

Research Papers

Development History

🏆 Achievements

Technical Milestones

Research Contributions

🔧 Development

Building from Source

Contributing

📄 Citation

🤝 Related Projects

📞 Support

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages