Skip to content

Conversation

@sergeytimoshin
Copy link

@sergeytimoshin sergeytimoshin commented Jan 22, 2026

  • Reuse a per-hasher scratch buffer for MDS to avoid per-round allocations and collect overhead.
  • Cache width to reduce iterator cost.
  • Add a fast-path pow5 for alpha == 5 (x5 S-box).

Bench:

cargo bench -p light-poseidon --bench bn254_x5 -- --baseline committed 
   Compiling light-poseidon v0.4.0 (/Users/tsv/src/light-poseidon/light-poseidon)
    Finished `bench` profile [optimized] target(s) in 7.01s
     Running benches/bn254_x5.rs (target/release/deps/bn254_x5-b345d9a579f3f719)
Gnuplot not found, using plotters backend
poseidon_bn254_x5_1     time:   [5.3874 µs 5.3947 µs 5.4025 µs]
                        change: [-45.132% -44.990% -44.839%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

poseidon_bn254_x5_2     time:   [8.9322 µs 8.9634 µs 8.9949 µs]
                        change: [-36.247% -36.016% -35.785%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild

poseidon_bn254_x5_3     time:   [13.616 µs 13.667 µs 13.723 µs]
                        change: [-27.183% -26.882% -26.552%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  6 (6.00%) high mild

poseidon_bn254_x5_4     time:   [20.578 µs 20.629 µs 20.682 µs]
                        change: [-20.783% -20.378% -19.971%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

poseidon_bn254_x5_5     time:   [28.326 µs 28.410 µs 28.500 µs]
                        change: [-16.608% -16.272% -15.928%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

poseidon_bn254_x5_6     time:   [38.968 µs 39.091 µs 39.236 µs]
                        change: [-13.448% -13.124% -12.810%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild

poseidon_bn254_x5_7     time:   [50.393 µs 50.560 µs 50.754 µs]
                        change: [-10.629% -10.243% -9.8770%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild

poseidon_bn254_x5_8     time:   [61.839 µs 61.986 µs 62.139 µs]
                        change: [-9.3874% -9.0479% -8.7104%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild

poseidon_bn254_x5_9     time:   [72.613 µs 72.821 µs 73.034 µs]
                        change: [-8.0375% -7.6361% -7.2427%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

poseidon_bn254_x5_10    time:   [94.341 µs 94.606 µs 94.877 µs]
                        change: [-6.2655% -5.8588% -5.4217%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

poseidon_bn254_x5_11    time:   [102.48 µs 102.72 µs 102.96 µs]
                        change: [-7.4800% -7.0893% -6.6792%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

poseidon_bn254_x5_12    time:   [127.98 µs 128.31 µs 128.65 µs]
                        change: [-6.4339% -6.0649% -5.7056%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

- Reuse a per-hasher scratch buffer for MDS to avoid per-round
  allocations and collect overhead.
 - Cache width/base in tight loops and switch to indexed loops to reduce
   iterator cost.
 - Add a fast-path pow5 for alpha == 5 (x5 S-box).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants