Skip to content

Conversation

@400Ping
Copy link
Contributor

@400Ping 400Ping commented Jan 5, 2026

Purpose of PR

Update throughput benchmark to batch encoding

Related Issues or PRs

Closes #795

Changes Made

  • Bug fix
  • New feature
  • Refactoring
  • Documentation
  • Test
  • CI/CD pipeline
  • Other

Breaking Changes

  • Yes
  • No

Checklist

  • Added or updated unit tests for all changes
  • Added or updated documentation for all changes
  • Successfully built and ran all unit tests or manual tests locally
  • PR title follows "MAHOUT-XXX: Brief Description" format (if related to an issue)
  • Code follows ASF guidelines

Signed-off-by: 400Ping <fourhundredping@gmail.com>
@400Ping 400Ping changed the title [Core] Update throughput benchmark to batch encoding [QDP] Update throughput benchmark to batch encoding Jan 5, 2026
@400Ping 400Ping changed the title [QDP] Update throughput benchmark to batch encoding [QDP] Update benchmark_throughput to batch encoding Jan 5, 2026
@400Ping 400Ping marked this pull request as draft January 5, 2026 11:32
Signed-off-by: 400Ping <fourhundredping@gmail.com>
@400Ping 400Ping marked this pull request as ready for review January 5, 2026 11:39
@400Ping 400Ping marked this pull request as draft January 5, 2026 12:10
@400Ping
Copy link
Contributor Author

400Ping commented Jan 5, 2026

Before(dev-qdp):

$ python ./qdp-python/benchmark/benchmark_throughput.py
Generating 12800 samples of 16 qubits...
  Batch size   : 64
  Vector length: 65536
  Batches      : 200
  Prefetch     : 16
  Frameworks   : pennylane, qiskit, mahout
  Generated 12800 samples
  PennyLane/Qiskit format: 6400.00 MB
  Mahout format: 6400.00 MB

======================================================================
DATALOADER THROUGHPUT BENCHMARK: 16 Qubits, 12800 Samples
======================================================================

[PennyLane] Full Pipeline (DataLoader -> GPU)...
/home/jay/work/mahout/qdp/./qdp-python/benchmark/benchmark_throughput.py:170: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at /pytorch/aten/src/ATen/native/Copy.cpp:309.)
  state_gpu = state_cpu.to("cuda", dtype=torch.float32)
  Total Time: 6.7562 s (1894.6 vectors/sec)

[Qiskit] Full Pipeline (DataLoader -> GPU)...

        
  Total Time: 848.2128 s (15.1 vectors/sec)

[Mahout] Full Pipeline (DataLoader -> GPU)...
  IO + Encode Time: 9.7979 s
  Total Time: 9.7979 s (1306.4 vectors/sec)

======================================================================
THROUGHPUT (Higher is Better)
Samples: 12800, Qubits: 16
======================================================================
PennyLane        1894.6 vectors/sec
Mahout           1306.4 vectors/sec
Qiskit             15.1 vectors/sec
----------------------------------------------------------------------
Speedup vs PennyLane:       0.69x
Speedup vs Qiskit:         86.57x

After:

$ python ./qdp-python/benchmark/benchmark_throughput.py
Generating 12800 samples of 16 qubits...
  Batch size   : 64
  Vector length: 65536
  Batches      : 200
  Prefetch     : 16
  Frameworks   : pennylane, qiskit, mahout
  Generated 12800 samples
  PennyLane/Qiskit format: 6400.00 MB
  Mahout format: 6400.00 MB

======================================================================
DATALOADER THROUGHPUT BENCHMARK: 16 Qubits, 12800 Samples
======================================================================

[PennyLane] Full Pipeline (DataLoader -> GPU)...
/home/jay/work/mahout/qdp/./qdp-python/benchmark/benchmark_throughput.py:169: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at /pytorch/aten/src/ATen/native/Copy.cpp:309.)
  state_gpu = state_cpu.to("cuda", dtype=torch.float32)
  Total Time: 6.7298 s (1902.0 vectors/sec)

[Qiskit] Full Pipeline (DataLoader -> GPU)...

  Total Time: 854.9839 s (15.0 vectors/sec)

[Mahout] Full Pipeline (DataLoader -> GPU)...
  IO + Encode Time: 3.6776 s
  Total Time: 3.6776 s (3480.5 vectors/sec)

======================================================================
THROUGHPUT (Higher is Better)
Samples: 12800, Qubits: 16
======================================================================
Mahout           3480.5 vectors/sec
PennyLane        1902.0 vectors/sec
Qiskit             15.0 vectors/sec
----------------------------------------------------------------------
Speedup vs PennyLane:       1.83x
Speedup vs Qiskit:        232.48x

@400Ping 400Ping marked this pull request as ready for review January 5, 2026 12:27
@400Ping
Copy link
Contributor Author

400Ping commented Jan 5, 2026

Copy link
Contributor

@ryankert01 ryankert01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg

@guan404ming guan404ming merged commit 581f5ee into apache:dev-qdp Jan 5, 2026
2 checks passed
guan404ming pushed a commit that referenced this pull request Jan 6, 2026
* [Core] Update throughput benchmark to batch encoding

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* fix conflict

Signed-off-by: 400Ping <fourhundredping@gmail.com>

---------

Signed-off-by: 400Ping <fourhundredping@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants