perf: CUDA async to host copy #6085
Merged
CodSpeed HQ / CodSpeed Performance Analysis
failed
Jan 22, 2026 in 0s
Performance Regression: -31.44%
⚡ 4 improved benchmarks
❌ 3 regressed benchmarks
✅ 1247 untouched benchmarks
⏩ 1254 skipped benchmarks1
⚠️ Please fix the performance issues or acknowledge them on CodSpeed.
Performance Changes
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | canonical_into_non_nullable[(10000, 10, 0.01)] |
219.7 µs | 308.6 µs | -28.82% |
| ⚡ | Simulation | canonical_into_non_nullable[(10000, 1, 0.01)] |
44.4 µs | 37.1 µs | +19.5% |
| ⚡ | Simulation | canonical_into_non_nullable[(10000, 1, 0.1)] |
59.4 µs | 53.2 µs | +11.61% |
| ❌ | Simulation | canonical_into_non_nullable[(10000, 10, 0.0)] |
192.9 µs | 281.4 µs | -31.44% |
| ⚡ | Simulation | canonical_into_non_nullable[(10000, 1, 0.0)] |
38.9 µs | 31.8 µs | +22.45% |
| ❌ | Simulation | canonical_into_non_nullable[(10000, 10, 0.1)] |
375.4 µs | 468 µs | -19.78% |
| ⚡ | Simulation | canonical_into_nullable[(10000, 100, 0.0)] |
5.1 ms | 4.1 ms | +24.85% |
Comparing ad/async-copy (afbc61c) with develop (1a0d672)
Footnotes
-
1254 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Loading