Skip to content

Conversation

@wangshangsam
Copy link
Contributor

@wangshangsam wangshangsam commented Dec 24, 2025

For launching the VLM benchmark, currently we have:

  • mlperf-inf-mm-q3vl benchmark endpoint: Benchmarking against a generic endpoint that follows the OpenAI API spec. This allows the submitter to benchmark a generic inference system, but does require more manual (or bash scripting) efforts to set it up.
  • mlperf-inf-mm-q3vl benchmark vllm: Deploy and launch vLLM, wait for it to be healthy, then run the same benchmarking routine. For the submitter who only wants to benchmark vLLM, this is a very convenient command that does everything for the submitter.

But what if the submitter wants to benchmark an inference system that's different from the out-of-the-box vLLM, yet still wants to achieve the same convenience that mlperf-inf-mm-q3vl benchmark vllm provides? This PR introduces a plugin system that allows the submitter to implement their own subcommand of mlperf-inf-mm-q3vl benchmark from a 3rd party python package (i.e., without direct modification to the mlperf-inf-mm-q3vl source code).

@wangshangsam wangshangsam requested a review from a team as a code owner December 24, 2025 23:56
@github-actions
Copy link
Contributor

github-actions bot commented Dec 24, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@wangshangsam
Copy link
Contributor Author

@soodoshll @johncalesp Could you help to review this PR?

Copy link

@soodoshll soodoshll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!


from .schema import FooEndpoint

def register_foo_benchmark() -> Callable[[Settings, Dataset, FooEndpoint, int, int, Verbosity], None]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return type annotation is a little verbose? Is it a must-have?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a must-have. I just wanted to highlight that the return value should be the CLI command function.
I'll reduce it to just a Callable.

from mlperf_inf_mm_q3vl.schema import Settings, Dataset, Endpoint, Verbosity
from mlperf_inf_mm_q3vl.log import setup_loguru_for_benchmark

from .schema import FooEndpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the user would need to follow a similar structure for the Endpoint from mlperf_inf_mm_q3vl.schema. Should we put into the package structure the schema.py file ?

mlperf-inf-mm-q3vl-foo/
├── pyproject.toml
└── src/
    └── mlperf_inf_mm_q3vl_foo/
        ├── __init__.py
        ├── schema.py
        └── plugin.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Endpoint pydantic BaseModel, yes, the user likely would need to follow a similar class structure. However, in terms of the package structure, not necessarily. The user can put everything in __init__.py if they want (even though that's generally bad software engineering practices).

This command deploys a model using the Foo backend
and runs the MLPerf benchmark against it.
"""
from .deploy import FooDeployer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same like schema.py

mlperf-inf-mm-q3vl-foo/
├── pyproject.toml
└── src/
    └── mlperf_inf_mm_q3vl_foo/
        ├── __init__.py
        ├── deploy.py
        └── plugin.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants