Skip to content

Conversation

@jonahsamost
Copy link

@jonahsamost jonahsamost commented Jan 9, 2026

Config Elements Original New Speedup
128x64 (small batch) 8,192 0.0383ms 0.0081ms 4.71x
512x64 (medium batch) 32,768 0.0756ms 0.0081ms 9.28x
1024x64 (large batch) 65,536 0.0759ms 0.0082ms 9.28x
2048x64 (xlarge batch) 131,072 0.0766ms 0.0086ms 8.95x
512x16 (short horizon) 8,192 0.0095ms 0.0061ms 1.56x
512x32 (medium horizon) 16,384 0.0379ms 0.0062ms 6.15x
512x128 (long horizon) 65,536 0.1524ms 0.0145ms 10.54x
512x256 (very long horizon) 131,072 0.3030ms 0.0270ms 11.22x
4096x64 (many rows) 262,144 0.0767ms 0.0126ms 6.07x
256x512 (very long horizon) 131,072 0.5980ms 0.0576ms 10.39x

Verify with python and cuda code in tests/ with

nvcc -O3 -arch=sm_86 -shared -Xcompiler -fPIC puff_advantage_standalone.cu -o libpuff_advantage.so
python test_puff_advantage_standalone.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant