M2Sim is a cycle-accurate simulator for the Apple M2 CPU that achieves 16.9% average timing error across 18 benchmarks. Built on the Akita simulation framework, M2Sim enables detailed performance analysis of ARM64 workloads on Apple Silicon architectures.
Final Achievement: 16.9% average timing accuracy across 18 benchmarks, meeting all success criteria.
| Success Criterion | Target | Achieved | Status |
|---|---|---|---|
| Functional Emulation | ARM64 user-space execution | β Complete | β |
| Timing Accuracy | <20% average error | 16.9% achieved | β |
| Modular Design | Separate functional/timing | β Implemented | β |
| Benchmark Coverage | ΞΌs to ms range | 18 benchmarks validated | β |
- Go 1.21 or later
- ARM64 cross-compiler (
aarch64-linux-musl-gcc) - Python 3.8+ (for analysis tools)
# Clone the repository
git clone https://github.com/sarchlab/m2sim.git
cd m2sim
# Build the simulator
go build ./...
# Run tests
ginkgo -r
# Build main binary
go build -o m2sim ./cmd/m2sim# Functional emulation only
./m2sim -elf benchmarks/arithmetic.elf
# Cycle-accurate timing simulation
./m2sim -elf benchmarks/arithmetic.elf -timing
# Fast timing approximation
./m2sim -elf benchmarks/arithmetic.elf -fasttiming# Run complete experimental validation
python3 reproduce_experiments.py
# Generate figures for paper
python3 paper/generate_figures.py
# Compile LaTeX paper
cd paper && pdflatex m2sim_micro2026.tex| Benchmark Category | Count | Average Error | Range |
|---|---|---|---|
| Microbenchmarks | 11 | 14.4% | 1.3% - 47.4% |
| PolyBench | 7 | 20.8% | 11.1% - 33.6% |
| Overall | 18 | 16.9% | 1.3% - 47.4% |
- Branch Prediction: 1.3% error - validates M2's exceptional prediction accuracy
- Cache Hierarchy: 3-11% error range - efficient L1I/L1D/L2 hierarchy modeling
- Memory Bandwidth: High bandwidth utilization confirmed through concurrent operations
- SIMD Performance: 24-30% error indicates complex vector unit timing (improvement area)
M2Sim Architecture
βββ Functional Emulator (emu/) # ARM64 instruction execution
β βββ Decoder # 200+ ARM64 instructions
β βββ Register File # ARM64 register state
β βββ Syscall Interface # Linux syscall emulation
βββ Timing Model (timing/) # Cycle-accurate performance
β βββ Pipeline # 8-wide superscalar, 5-stage
β βββ Cache Hierarchy # L1I/L1D (32KB), L2 (256KB)
β βββ Branch Prediction # Two-level adaptive predictor
βββ Integration Layer # ELF loading, measurement framework
- Architecture: 8-wide superscalar, in-order execution
- Stages: Fetch β Decode β Execute β Memory β Writeback
- Branch Predictor: Two-level adaptive with 12-cycle misprediction penalty
- Cache Hierarchy: L1I/L1D (32KB each, 1-cycle), L2 (256KB, 10-cycle)
m2sim/
βββ cmd/m2sim/ # Main simulator binary
βββ emu/ # Functional ARM64 emulator
βββ timing/ # Cycle-accurate timing model
β βββ core/ # CPU core timing
β βββ cache/ # Cache hierarchy
β βββ pipeline/ # Pipeline implementation
β βββ latency/ # Instruction latencies
βββ benchmarks/ # Validation benchmark suite
β βββ microbenchmarks/ # Targeted stress tests
β βββ polybench/ # Linear algebra kernels
βββ docs/ # Documentation
β βββ reference/ # Core technical references
β βββ development/ # Historical development docs
β βββ archive/ # Archived analysis
βββ results/ # Experimental results
β βββ final/ # Completion reports
β βββ baselines/ # Hardware measurement data
βββ paper/ # Research paper and figures
βββ reproduce_experiments.py # Complete reproducibility script
-
Compile to ARM64 ELF:
aarch64-linux-musl-gcc -static -O2 -o benchmark.elf benchmark.c
-
Collect Hardware Baseline:
# Use multi-scale regression methodology # Measure at multiple input sizes: 100, 500, 1K, 5K, 10K instructions # Apply linear regression: y = mx + b (m = per-instruction latency)
-
Run Simulation:
./m2sim -elf benchmark.elf -timing -limit 100000
-
Calculate Error:
error = |t_sim - t_real| / min(t_sim, t_real)
Multi-Core Support: Framework ready for cache coherence and shared memory SIMD Enhancement: Detailed vector pipeline for improved accuracy Out-of-Order: Register renaming for arithmetic co-issue Power Modeling: Leverage M2's efficiency characteristics
- Platform: Apple M2 MacBook Air (2022)
- Measurement: 15 runs per data point, trimmed mean
- Regression: Multi-scale linear fitting (RΒ² > 0.999 required)
- Validation: Statistical confidence intervals
- Microbenchmarks: Target individual architectural features
- PolyBench: Intermediate-complexity linear algebra kernels
- Coverage: Arithmetic, memory, branches, SIMD, dependencies
- Formula: Symmetric relative error measurement
- Target: <20% average error across benchmark suite
- Categories: Excellent (<10%), Good (10-20%), Acceptable (20-30%)
- Architecture Guide - M2 microarchitecture research
- Timing Guide - Performance modeling details
- Build Setup - Cross-compilation and environment
- Calibration Reference - Parameter tuning guide
- MICRO 2026 Paper - Complete technical description
- Project Report - Comprehensive completion analysis
- Accuracy Validation - Detailed experimental results
- Development Docs - Research and analysis from development
- Historical Reports - Evolution of accuracy and methodology
- β H1: Core simulator with pipeline timing and cache hierarchy
- β H2: SPEC benchmark enablement with syscall coverage
- β H3: Microbenchmark calibration achieving 14.1% accuracy
- β H4: Multi-core analysis framework (statistical foundation complete)
- β H5: 15+ intermediate benchmarks with 16.9% average accuracy
- First Open-Source M2 Simulator: Enables reproducible Apple Silicon research
- Validated Methodology: Multi-scale regression baseline collection
- Architectural Insights: Quantified M2 performance characteristics
- Production Accuracy: 16.9% error suitable for research conclusions
# Development build with all checks
go build ./...
golangci-lint run ./...
ginkgo -r
# Performance profiling
go build -o profile ./cmd/profile
./profile -elf benchmark.elf -cpuprofile cpu.prof- Read: CLAUDE.md for development guidelines
- Test: Ensure all tests pass and lint checks succeed
- Document: Update relevant documentation for changes
- Validate: Verify accuracy on affected benchmarks
If you use M2Sim in your research, please cite:
@inproceedings{m2sim2026,
title={M2Sim: Cycle-Accurate Apple M2 CPU Simulation with 16.9\% Average Timing Error},
author={M2Sim Team},
booktitle={Proceedings of the 59th IEEE/ACM International Symposium on Microarchitecture},
year={2026},
organization={IEEE/ACM}
}- Akita - Underlying simulation framework
- MGPUSim - GPU simulator using Akita
- SARCH Lab - Computer architecture research
- Issues: GitHub Issues
- Documentation: Project Wiki
- Research: Contact SARCH Lab
This project is developed by the SARCH Lab at [University/Institution].
M2Sim - Enabling Apple Silicon research through cycle-accurate simulation.
Generated: February 12, 2026 | Status: Project Complete β