Add report generation for OSU Benchmark #807

allkoow · 2026-02-12T17:07:02Z

Summary

New report generation strategy for OSU: parses OSU stdout into CSV with three output types:

Latency – headers with Avg Latency(us) (full and short),
Bandwidth – Bandwidth (MB/s) or MB/s,
Multiple bandwidth – MB/s and Messages/s.

Skip benchmarks with no table data (osu_init, osu_hello).
Add comparison report with tables and charts.

Test Plan

Tested on internal environment:

Report generation: verified report generation (.csv) for osu_latency, osu_bw, osu_get_bw, osu_put_bw, osu_bibw, osu_mbw_mr, osu_multi_lat.
Comparison report: verified comparison report for scenarios with 2 runs: osu_bw, osu_latency.

coderabbitai · 2026-02-12T17:07:27Z

Warning

Rate limit exceeded

@allkoow has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 15 minutes and 31 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

This pull request introduces OSU Benchmark report generation and comparison functionality. A new report generation strategy extracts benchmark data from stdout into CSV format, while a specialized comparison report generates HTML tables and charts aggregating metrics across multiple benchmark runs. Supporting registration, module exports, and test infrastructure are also added.

Changes

Cohort / File(s)	Summary
Report Generation Infrastructure `src/cloudai/workloads/osu_bench/report_generation_strategy.py`, `src/cloudai/workloads/osu_bench/osu_comparison_report.py`	New modules implementing OSU benchmark data extraction from stdout (with BenchmarkType detection, row parsing, and CSV generation) and comparison report generation with metrics-driven table and chart creation for latency, bandwidth, and message rate.
Module Registration & Exports `src/cloudai/registration.py`, `src/cloudai/workloads/osu_bench/__init__.py`	Imported and registered OSUBenchComparisonReport and OSUBenchReportGenerationStrategy; added scenario report registration with grouping by benchmark type.
Test Definition Updates `src/cloudai/workloads/osu_bench/osu_bench.py`	Added early success path for osu_hello and osu_init benchmarks, bypassing standard output validation.
Test Coverage & Infrastructure `tests/report_generation_strategy/test_osu_bench_report_generation_strategy.py`, `tests/test_init.py`, `tests/test_reporter.py`	New test module validating data extraction across multiple benchmark types; updated existing tests to include new report class and snapshot/restore report_configs state.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hops through the benchmark with glee,
OSU data flows wild and free,
Comparison charts bloom so bright,
Latency, bandwidth—what a sight!
Let the reports hop their way through the night! ✨

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title clearly summarizes the main change: adding report generation functionality for OSU Benchmark, which is the primary focus of all changes in the changeset.
Description check	✅ Passed	The pull request description is directly related to the changeset, providing a summary of the new report generation strategy, benchmark types, and comparison report features, along with test verification.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-02-12T17:12:35Z

Greptile Overview

Greptile Summary

This PR adds report generation functionality for OSU Benchmark tests. The implementation parses OSU stdout output into CSV format with support for three benchmark types: latency (with Avg Latency(us) column), bandwidth (with Bandwidth (MB/s) or MB/s column), and multiple bandwidth (with both MB/s and Messages/s columns).

Key changes:

New OSUBenchReportGenerationStrategy class that extracts benchmark data from stdout and generates CSV reports
New OSUBenchComparisonReport class that creates comparison reports with tables and charts for multiple test runs
Updated OSUBenchTestDefinition.was_run_successful() to skip table validation for osu_hello and osu_init benchmarks that only produce summary output
Registered both strategies in the central Registry
Comprehensive test coverage for all three benchmark types with various header formats

The implementation uses regex patterns to detect benchmark types, parses data rows, and handles missing files gracefully by returning empty DataFrames. The comparison report integrates with the existing ComparisonReport base class to generate HTML reports with Bokeh charts and Rich tables.

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The implementation is well-structured, follows existing patterns in the codebase, includes comprehensive test coverage for all three benchmark types, and properly integrates with the existing Registry system. The code handles edge cases like missing files and empty data gracefully.
No files require special attention

Important Files Changed

Filename	Overview
src/cloudai/workloads/osu_bench/report_generation_strategy.py	New file implementing OSU Benchmark report generation with CSV parsing for three benchmark types (latency, bandwidth, multiple bandwidth)
src/cloudai/workloads/osu_bench/osu_comparison_report.py	New comparison report implementation for OSU benchmarks with support for tables and charts for different metric types
src/cloudai/workloads/osu_bench/osu_bench.py	Added special case handling for osu_hello and osu_init benchmarks that don't output table data
src/cloudai/registration.py	Registered new OSU report generation strategy and comparison report in the registry
tests/report_generation_strategy/test_osu_bench_report_generation_strategy.py	Comprehensive test coverage for all three benchmark types with different header formats

Sequence Diagram

sequenceDiagram
    participant TestRun
    participant OSUBenchReportGenerationStrategy
    participant extract_osu_bench_data
    participant OSUBenchComparisonReport
    participant ComparisonReport
    
    TestRun->>OSUBenchReportGenerationStrategy: generate_report()
    OSUBenchReportGenerationStrategy->>OSUBenchReportGenerationStrategy: can_handle_directory()
    OSUBenchReportGenerationStrategy->>extract_osu_bench_data: extract_osu_bench_data(stdout.txt)
    extract_osu_bench_data->>extract_osu_bench_data: detect benchmark type (regex)
    extract_osu_bench_data->>extract_osu_bench_data: parse data rows
    extract_osu_bench_data-->>OSUBenchReportGenerationStrategy: DataFrame
    OSUBenchReportGenerationStrategy->>OSUBenchReportGenerationStrategy: df.to_csv()
    
    Note over OSUBenchComparisonReport: Comparison Report Generation
    OSUBenchComparisonReport->>OSUBenchComparisonReport: load_test_runs()
    OSUBenchComparisonReport->>OSUBenchComparisonReport: extract_data_as_df() for each run
    OSUBenchComparisonReport->>ComparisonReport: create_tables(grouped_runs)
    ComparisonReport-->>OSUBenchComparisonReport: Rich tables
    OSUBenchComparisonReport->>ComparisonReport: create_charts(grouped_runs)
    ComparisonReport-->>OSUBenchComparisonReport: Bokeh figures
    OSUBenchComparisonReport->>OSUBenchComparisonReport: render HTML report

greptile-apps

_{8 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@src/cloudai/workloads/osu_bench/osu_comparison_report.py`:
- Around line 65-142: The code reads each group's CSVs twice because
create_tables and create_charts each call extract_data_as_df independently; add
a simple per-report cache to avoid duplicate reads by implementing a helper
(e.g., _get_group_dfs) that checks/sets a dict on self (e.g., self._dfs_cache)
keyed by a stable identifier for the group (like tuple(item.tr.path) or
id(group)), and returns the list produced by [self.extract_data_as_df(item.tr)
for item in group.items]; replace the direct list comprehensions in
create_tables and create_charts with calls to _get_group_dfs and clear/init
self._dfs_cache appropriately (e.g., in report constructor or before processing)
so extract_data_as_df is invoked only once per group.

In `@src/cloudai/workloads/osu_bench/report_generation_strategy.py`:
- Around line 64-85: The two branches in _parse_data_row that handle
BenchmarkType.BANDWIDTH and the LATENCY fallback both return [parts[0],
parts[1]]; simplify by removing the explicit BANDWIDTH branch and collapsing
them into a single fallback that returns [parts[0], parts[1]] for any
non-MULTIPLE_BANDWIDTH case (after validating parts and the message-size int),
keeping the MULTIPLE_BANDWIDTH branch unchanged; this reduces duplication while
preserving behavior tied to _columns_for_type.

In
`@tests/report_generation_strategy/test_osu_bench_report_generation_strategy.py`:
- Around line 63-112: Add three edge-case tests for extract_osu_bench_data to
cover its early-return branches: (1) "file not found" — call
extract_osu_bench_data with a non-existent Path and assert it returns an empty
DataFrame (no columns, shape (0,0) or however empty is represented in your
project), (2) "empty file" — create an empty stdout file and assert
extract_osu_bench_data returns an empty DataFrame, and (3) "unrecognized header"
— write a stdout file containing non-OSU output (e.g., "osu_hello" text) and
assert extract_osu_bench_data returns an empty DataFrame; reference the function
extract_osu_bench_data and the module osu_bench.py so tests exercise the
special-case handling added there.

coderabbitai · 2026-02-12T17:17:10Z

src/cloudai/workloads/osu_bench/osu_comparison_report.py

+    def create_tables(self, cmp_groups: list[GroupedTestRuns]) -> list[Table]:
+        tables: list[Table] = []
+        for group in cmp_groups:
+            dfs = [self.extract_data_as_df(item.tr) for item in group.items]
+
+            if self._has_metric(dfs, "avg_lat"):
+                tables.append(
+                    self.create_table(
+                        group,
+                        dfs=dfs,
+                        title="Latency",
+                        info_columns=list(self.INFO_COLUMNS),
+                        data_columns=["avg_lat"],
+                    )
+                )
+            if self._has_metric(dfs, "mb_sec"):
+                tables.append(
+                    self.create_table(
+                        group,
+                        dfs=dfs,
+                        title="Bandwidth",
+                        info_columns=list(self.INFO_COLUMNS),
+                        data_columns=["mb_sec"],
+                    )
+                )
+            if self._has_metric(dfs, "messages_sec"):
+                tables.append(
+                    self.create_table(
+                        group,
+                        dfs=dfs,
+                        title="Message Rate",
+                        info_columns=list(self.INFO_COLUMNS),
+                        data_columns=["messages_sec"],
+                    )
+                )
+
+        return tables
+
+    def create_charts(self, cmp_groups: list[GroupedTestRuns]) -> list[bk.figure]:
+        charts: list[bk.figure] = []
+        for group in cmp_groups:
+            dfs = [self.extract_data_as_df(item.tr) for item in group.items]
+
+            if self._has_metric(dfs, "avg_lat"):
+                charts.append(
+                    self.create_chart(
+                        group,
+                        dfs,
+                        "Latency",
+                        list(self.INFO_COLUMNS),
+                        ["avg_lat"],
+                        "Time (us)",
+                    )
+                )
+            if self._has_metric(dfs, "mb_sec"):
+                charts.append(
+                    self.create_chart(
+                        group,
+                        dfs,
+                        "Bandwidth",
+                        list(self.INFO_COLUMNS),
+                        ["mb_sec"],
+                        "Bandwidth (MB/s)",
+                    )
+                )
+            if self._has_metric(dfs, "messages_sec"):
+                charts.append(
+                    self.create_chart(
+                        group,
+                        dfs,
+                        "Message Rate",
+                        list(self.INFO_COLUMNS),
+                        ["messages_sec"],
+                        "Messages/s",
+                    )
+                )
+
+        return charts


🧹 Nitpick | 🔵 Trivial

Repeated extract_data_as_df calls for the same group items across create_tables and create_charts.

Each group's CSVs are read from disk twice — once for tables, once for charts. This follows the existing ComparisonReport interface pattern (each method receives cmp_groups independently), so changing it would require a broader refactor. Acceptable as-is for small CSV files.

🤖 Prompt for AI Agents

In `@src/cloudai/workloads/osu_bench/osu_comparison_report.py` around lines 65 - 142, The code reads each group's CSVs twice because create_tables and create_charts each call extract_data_as_df independently; add a simple per-report cache to avoid duplicate reads by implementing a helper (e.g., _get_group_dfs) that checks/sets a dict on self (e.g., self._dfs_cache) keyed by a stable identifier for the group (like tuple(item.tr.path) or id(group)), and returns the list produced by [self.extract_data_as_df(item.tr) for item in group.items]; replace the direct list comprehensions in create_tables and create_charts with calls to _get_group_dfs and clear/init self._dfs_cache appropriately (e.g., in report constructor or before processing) so extract_data_as_df is invoked only once per group.

coderabbitai · 2026-02-12T17:17:10Z

src/cloudai/workloads/osu_bench/report_generation_strategy.py

+def _parse_data_row(parts: list[str], benchmark_type: BenchmarkType) -> list[str] | None:
+    if len(parts) < 2:
+        return None
+
+    try:
+        int(parts[0])  # message size
+    except ValueError:
+        return None
+
+    # Append row data based on benchmark type.
+    if benchmark_type == BenchmarkType.MULTIPLE_BANDWIDTH:
+        if len(parts) >= 3:
+            # size, MB/s, Messages/s
+            return [parts[0], parts[1], parts[2]]
+        return None
+
+    if benchmark_type == BenchmarkType.BANDWIDTH:
+        # size, MB/s
+        return [parts[0], parts[1]]
+
+    # LATENCY
+    return [parts[0], parts[1]]


🧹 Nitpick | 🔵 Trivial

_parse_data_row for BANDWIDTH and LATENCY are identical — both return [parts[0], parts[1]].

The two branches at lines 80-82 and 84-85 produce the same result. This is not a bug (the column naming differs in _columns_for_type), but you could collapse them into a single fallback for clarity.

♻️ Minor simplification

if benchmark_type == BenchmarkType.MULTIPLE_BANDWIDTH: if len(parts) >= 3: return [parts[0], parts[1], parts[2]] return None - if benchmark_type == BenchmarkType.BANDWIDTH: - # size, MB/s - return [parts[0], parts[1]] - - # LATENCY + # BANDWIDTH or LATENCY: size + one metric return [parts[0], parts[1]]

🤖 Prompt for AI Agents

In `@src/cloudai/workloads/osu_bench/report_generation_strategy.py` around lines 64 - 85, The two branches in _parse_data_row that handle BenchmarkType.BANDWIDTH and the LATENCY fallback both return [parts[0], parts[1]]; simplify by removing the explicit BANDWIDTH branch and collapsing them into a single fallback that returns [parts[0], parts[1]] for any non-MULTIPLE_BANDWIDTH case (after validating parts and the message-size int), keeping the MULTIPLE_BANDWIDTH branch unchanged; this reduces duplication while preserving behavior tied to _columns_for_type.

coderabbitai · 2026-02-12T17:17:11Z

tests/report_generation_strategy/test_osu_bench_report_generation_strategy.py

+def test_osu_multiple_bandwidth_message_rate_parsing(tmp_path: Path) -> None:
+    stdout = tmp_path / "stdout.txt"
+    stdout.write_text(OSU_MULTIPLE_BW)
+
+    df = extract_osu_bench_data(stdout)
+    assert list(df.columns) == ["size", "mb_sec", "messages_sec"]
+    assert df.shape == (2, 3)
+    assert df["size"].iloc[0] == 1
+    assert df["mb_sec"].iloc[0] == pytest.approx(2.36)
+    assert df["messages_sec"].iloc[0] == pytest.approx(2356892.44)
+    assert df["size"].iloc[-1] == 4194304
+    assert df["mb_sec"].iloc[-1] == pytest.approx(26903.29)
+    assert df["messages_sec"].iloc[-1] == pytest.approx(6414.24)
+
+
+def test_osu_latency_parsing(tmp_path: Path) -> None:
+    stdout = tmp_path / "stdout.txt"
+    stdout.write_text(OSU_ALLGATHER_LAT)
+
+    df = extract_osu_bench_data(stdout)
+    assert list(df.columns) == ["size", "avg_lat"]
+    assert df.shape == (2, 2)
+    assert df["size"].iloc[0] == 1
+    assert df["avg_lat"].iloc[0] == pytest.approx(2.81)
+    assert df["size"].iloc[-1] == 1048576
+    assert df["avg_lat"].iloc[-1] == pytest.approx(104.30)
+
+
+def test_osu_bandwidth_parsing(tmp_path: Path) -> None:
+    stdout = tmp_path / "stdout.txt"
+    stdout.write_text(OSU_BW)
+
+    df = extract_osu_bench_data(stdout)
+    assert list(df.columns) == ["size", "mb_sec"]
+    assert df.shape == (2, 2)
+    assert df["size"].iloc[0] == 1
+    assert df["mb_sec"].iloc[0] == pytest.approx(2.32)
+    assert df["size"].iloc[-1] == 4194304
+    assert df["mb_sec"].iloc[-1] == pytest.approx(27161.33)
+
+
+def test_osu_multi_latency_short_header_parsing(tmp_path: Path) -> None:
+    stdout = tmp_path / "stdout.txt"
+    stdout.write_text(OSU_MULTI_LAT)
+
+    df = extract_osu_bench_data(stdout)
+    assert list(df.columns) == ["size", "avg_lat"]
+    assert df.shape == (6, 2)
+    assert df["size"].tolist() == [1, 2, 4, 8, 16, 32]
+    assert df["avg_lat"].tolist() == pytest.approx([1.88, 1.84, 1.88, 1.91, 1.87, 2.01])


🧹 Nitpick | 🔵 Trivial

Good coverage of the happy-path parsing for all three benchmark types.

Consider adding edge-case tests for extract_osu_bench_data: file not found (returns empty DataFrame), empty file, and stdout with no recognizable header (e.g., osu_hello output). These would guard the early-return paths and align with the special-case handling added in osu_bench.py.

Would you like me to generate those edge-case tests?

🤖 Prompt for AI Agents

In `@tests/report_generation_strategy/test_osu_bench_report_generation_strategy.py` around lines 63 - 112, Add three edge-case tests for extract_osu_bench_data to cover its early-return branches: (1) "file not found" — call extract_osu_bench_data with a non-existent Path and assert it returns an empty DataFrame (no columns, shape (0,0) or however empty is represented in your project), (2) "empty file" — create an empty stdout file and assert extract_osu_bench_data returns an empty DataFrame, and (3) "unrecognized header" — write a stdout file containing non-OSU output (e.g., "osu_hello" text) and assert extract_osu_bench_data returns an empty DataFrame; reference the function extract_osu_bench_data and the module osu_bench.py so tests exercise the special-case handling added there.

greptile-apps

_{9 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{9 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Add report generation for OSU Benchmark

5556f02

allkoow requested review from amaslenn, jeffnvidia and srivatsankrishnan as code owners February 12, 2026 17:07

Merge branch 'main' into ako/osu-report-gen

b20fbe0

greptile-apps bot reviewed Feb 12, 2026

View reviewed changes

Update number of reporters in ut

d7958bb

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

greptile-apps bot reviewed Feb 12, 2026

View reviewed changes

Update copyright year range in osu_bench

23850e6

greptile-apps bot reviewed Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add report generation for OSU Benchmark #807

Add report generation for OSU Benchmark #807

Uh oh!

allkoow commented Feb 12, 2026

Uh oh!

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

greptile-apps bot commented Feb 12, 2026 •

edited

Loading

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add report generation for OSU Benchmark #807

Are you sure you want to change the base?

Add report generation for OSU Benchmark #807

Uh oh!

Conversation

allkoow commented Feb 12, 2026

Summary

Test Plan

Uh oh!

coderabbitai bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

greptile-apps bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

greptile-apps bot commented Feb 12, 2026 •

edited

Loading