Skip to content

Conversation

@allkoow
Copy link
Contributor

@allkoow allkoow commented Feb 12, 2026

Summary

  1. New report generation strategy for OSU: parses OSU stdout into CSV with three output types:
  • Latency – headers with Avg Latency(us) (full and short),
  • Bandwidth – Bandwidth (MB/s) or MB/s,
  • Multiple bandwidth – MB/s and Messages/s.
  1. Skip benchmarks with no table data (osu_init, osu_hello).
  2. Add comparison report with tables and charts.

Test Plan

Tested on internal environment:

  • Report generation: verified report generation (.csv) for osu_latency, osu_bw, osu_get_bw, osu_put_bw, osu_bibw, osu_mbw_mr, osu_multi_lat.
  • Comparison report: verified comparison report for scenarios with 2 runs: osu_bw, osu_latency.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

Warning

Rate limit exceeded

@allkoow has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 15 minutes and 31 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

This pull request introduces OSU Benchmark report generation and comparison functionality. A new report generation strategy extracts benchmark data from stdout into CSV format, while a specialized comparison report generates HTML tables and charts aggregating metrics across multiple benchmark runs. Supporting registration, module exports, and test infrastructure are also added.

Changes

Cohort / File(s) Summary
Report Generation Infrastructure
src/cloudai/workloads/osu_bench/report_generation_strategy.py, src/cloudai/workloads/osu_bench/osu_comparison_report.py
New modules implementing OSU benchmark data extraction from stdout (with BenchmarkType detection, row parsing, and CSV generation) and comparison report generation with metrics-driven table and chart creation for latency, bandwidth, and message rate.
Module Registration & Exports
src/cloudai/registration.py, src/cloudai/workloads/osu_bench/__init__.py
Imported and registered OSUBenchComparisonReport and OSUBenchReportGenerationStrategy; added scenario report registration with grouping by benchmark type.
Test Definition Updates
src/cloudai/workloads/osu_bench/osu_bench.py
Added early success path for osu_hello and osu_init benchmarks, bypassing standard output validation.
Test Coverage & Infrastructure
tests/report_generation_strategy/test_osu_bench_report_generation_strategy.py, tests/test_init.py, tests/test_reporter.py
New test module validating data extraction across multiple benchmark types; updated existing tests to include new report class and snapshot/restore report_configs state.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hops through the benchmark with glee,
OSU data flows wild and free,
Comparison charts bloom so bright,
Latency, bandwidth—what a sight!
Let the reports hop their way through the night!

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly summarizes the main change: adding report generation functionality for OSU Benchmark, which is the primary focus of all changes in the changeset.
Description check ✅ Passed The pull request description is directly related to the changeset, providing a summary of the new report generation strategy, benchmark types, and comparison report features, along with test verification.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 12, 2026

Greptile Overview

Greptile Summary

This PR adds report generation functionality for OSU Benchmark tests. The implementation parses OSU stdout output into CSV format with support for three benchmark types: latency (with Avg Latency(us) column), bandwidth (with Bandwidth (MB/s) or MB/s column), and multiple bandwidth (with both MB/s and Messages/s columns).

Key changes:

  • New OSUBenchReportGenerationStrategy class that extracts benchmark data from stdout and generates CSV reports
  • New OSUBenchComparisonReport class that creates comparison reports with tables and charts for multiple test runs
  • Updated OSUBenchTestDefinition.was_run_successful() to skip table validation for osu_hello and osu_init benchmarks that only produce summary output
  • Registered both strategies in the central Registry
  • Comprehensive test coverage for all three benchmark types with various header formats

The implementation uses regex patterns to detect benchmark types, parses data rows, and handles missing files gracefully by returning empty DataFrames. The comparison report integrates with the existing ComparisonReport base class to generate HTML reports with Bokeh charts and Rich tables.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The implementation is well-structured, follows existing patterns in the codebase, includes comprehensive test coverage for all three benchmark types, and properly integrates with the existing Registry system. The code handles edge cases like missing files and empty data gracefully.
  • No files require special attention

Important Files Changed

Filename Overview
src/cloudai/workloads/osu_bench/report_generation_strategy.py New file implementing OSU Benchmark report generation with CSV parsing for three benchmark types (latency, bandwidth, multiple bandwidth)
src/cloudai/workloads/osu_bench/osu_comparison_report.py New comparison report implementation for OSU benchmarks with support for tables and charts for different metric types
src/cloudai/workloads/osu_bench/osu_bench.py Added special case handling for osu_hello and osu_init benchmarks that don't output table data
src/cloudai/registration.py Registered new OSU report generation strategy and comparison report in the registry
tests/report_generation_strategy/test_osu_bench_report_generation_strategy.py Comprehensive test coverage for all three benchmark types with different header formats

Sequence Diagram

sequenceDiagram
    participant TestRun
    participant OSUBenchReportGenerationStrategy
    participant extract_osu_bench_data
    participant OSUBenchComparisonReport
    participant ComparisonReport
    
    TestRun->>OSUBenchReportGenerationStrategy: generate_report()
    OSUBenchReportGenerationStrategy->>OSUBenchReportGenerationStrategy: can_handle_directory()
    OSUBenchReportGenerationStrategy->>extract_osu_bench_data: extract_osu_bench_data(stdout.txt)
    extract_osu_bench_data->>extract_osu_bench_data: detect benchmark type (regex)
    extract_osu_bench_data->>extract_osu_bench_data: parse data rows
    extract_osu_bench_data-->>OSUBenchReportGenerationStrategy: DataFrame
    OSUBenchReportGenerationStrategy->>OSUBenchReportGenerationStrategy: df.to_csv()
    
    Note over OSUBenchComparisonReport: Comparison Report Generation
    OSUBenchComparisonReport->>OSUBenchComparisonReport: load_test_runs()
    OSUBenchComparisonReport->>OSUBenchComparisonReport: extract_data_as_df() for each run
    OSUBenchComparisonReport->>ComparisonReport: create_tables(grouped_runs)
    ComparisonReport-->>OSUBenchComparisonReport: Rich tables
    OSUBenchComparisonReport->>ComparisonReport: create_charts(grouped_runs)
    ComparisonReport-->>OSUBenchComparisonReport: Bokeh figures
    OSUBenchComparisonReport->>OSUBenchComparisonReport: render HTML report
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@src/cloudai/workloads/osu_bench/osu_comparison_report.py`:
- Around line 65-142: The code reads each group's CSVs twice because
create_tables and create_charts each call extract_data_as_df independently; add
a simple per-report cache to avoid duplicate reads by implementing a helper
(e.g., _get_group_dfs) that checks/sets a dict on self (e.g., self._dfs_cache)
keyed by a stable identifier for the group (like tuple(item.tr.path) or
id(group)), and returns the list produced by [self.extract_data_as_df(item.tr)
for item in group.items]; replace the direct list comprehensions in
create_tables and create_charts with calls to _get_group_dfs and clear/init
self._dfs_cache appropriately (e.g., in report constructor or before processing)
so extract_data_as_df is invoked only once per group.

In `@src/cloudai/workloads/osu_bench/report_generation_strategy.py`:
- Around line 64-85: The two branches in _parse_data_row that handle
BenchmarkType.BANDWIDTH and the LATENCY fallback both return [parts[0],
parts[1]]; simplify by removing the explicit BANDWIDTH branch and collapsing
them into a single fallback that returns [parts[0], parts[1]] for any
non-MULTIPLE_BANDWIDTH case (after validating parts and the message-size int),
keeping the MULTIPLE_BANDWIDTH branch unchanged; this reduces duplication while
preserving behavior tied to _columns_for_type.

In
`@tests/report_generation_strategy/test_osu_bench_report_generation_strategy.py`:
- Around line 63-112: Add three edge-case tests for extract_osu_bench_data to
cover its early-return branches: (1) "file not found" — call
extract_osu_bench_data with a non-existent Path and assert it returns an empty
DataFrame (no columns, shape (0,0) or however empty is represented in your
project), (2) "empty file" — create an empty stdout file and assert
extract_osu_bench_data returns an empty DataFrame, and (3) "unrecognized header"
— write a stdout file containing non-OSU output (e.g., "osu_hello" text) and
assert extract_osu_bench_data returns an empty DataFrame; reference the function
extract_osu_bench_data and the module osu_bench.py so tests exercise the
special-case handling added there.

Comment on lines +65 to +142
def create_tables(self, cmp_groups: list[GroupedTestRuns]) -> list[Table]:
tables: list[Table] = []
for group in cmp_groups:
dfs = [self.extract_data_as_df(item.tr) for item in group.items]

if self._has_metric(dfs, "avg_lat"):
tables.append(
self.create_table(
group,
dfs=dfs,
title="Latency",
info_columns=list(self.INFO_COLUMNS),
data_columns=["avg_lat"],
)
)
if self._has_metric(dfs, "mb_sec"):
tables.append(
self.create_table(
group,
dfs=dfs,
title="Bandwidth",
info_columns=list(self.INFO_COLUMNS),
data_columns=["mb_sec"],
)
)
if self._has_metric(dfs, "messages_sec"):
tables.append(
self.create_table(
group,
dfs=dfs,
title="Message Rate",
info_columns=list(self.INFO_COLUMNS),
data_columns=["messages_sec"],
)
)

return tables

def create_charts(self, cmp_groups: list[GroupedTestRuns]) -> list[bk.figure]:
charts: list[bk.figure] = []
for group in cmp_groups:
dfs = [self.extract_data_as_df(item.tr) for item in group.items]

if self._has_metric(dfs, "avg_lat"):
charts.append(
self.create_chart(
group,
dfs,
"Latency",
list(self.INFO_COLUMNS),
["avg_lat"],
"Time (us)",
)
)
if self._has_metric(dfs, "mb_sec"):
charts.append(
self.create_chart(
group,
dfs,
"Bandwidth",
list(self.INFO_COLUMNS),
["mb_sec"],
"Bandwidth (MB/s)",
)
)
if self._has_metric(dfs, "messages_sec"):
charts.append(
self.create_chart(
group,
dfs,
"Message Rate",
list(self.INFO_COLUMNS),
["messages_sec"],
"Messages/s",
)
)

return charts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Repeated extract_data_as_df calls for the same group items across create_tables and create_charts.

Each group's CSVs are read from disk twice — once for tables, once for charts. This follows the existing ComparisonReport interface pattern (each method receives cmp_groups independently), so changing it would require a broader refactor. Acceptable as-is for small CSV files.

🤖 Prompt for AI Agents
In `@src/cloudai/workloads/osu_bench/osu_comparison_report.py` around lines 65 -
142, The code reads each group's CSVs twice because create_tables and
create_charts each call extract_data_as_df independently; add a simple
per-report cache to avoid duplicate reads by implementing a helper (e.g.,
_get_group_dfs) that checks/sets a dict on self (e.g., self._dfs_cache) keyed by
a stable identifier for the group (like tuple(item.tr.path) or id(group)), and
returns the list produced by [self.extract_data_as_df(item.tr) for item in
group.items]; replace the direct list comprehensions in create_tables and
create_charts with calls to _get_group_dfs and clear/init self._dfs_cache
appropriately (e.g., in report constructor or before processing) so
extract_data_as_df is invoked only once per group.

Comment on lines +64 to +85
def _parse_data_row(parts: list[str], benchmark_type: BenchmarkType) -> list[str] | None:
if len(parts) < 2:
return None

try:
int(parts[0]) # message size
except ValueError:
return None

# Append row data based on benchmark type.
if benchmark_type == BenchmarkType.MULTIPLE_BANDWIDTH:
if len(parts) >= 3:
# size, MB/s, Messages/s
return [parts[0], parts[1], parts[2]]
return None

if benchmark_type == BenchmarkType.BANDWIDTH:
# size, MB/s
return [parts[0], parts[1]]

# LATENCY
return [parts[0], parts[1]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

_parse_data_row for BANDWIDTH and LATENCY are identical — both return [parts[0], parts[1]].

The two branches at lines 80-82 and 84-85 produce the same result. This is not a bug (the column naming differs in _columns_for_type), but you could collapse them into a single fallback for clarity.

♻️ Minor simplification
     if benchmark_type == BenchmarkType.MULTIPLE_BANDWIDTH:
         if len(parts) >= 3:
             return [parts[0], parts[1], parts[2]]
         return None
 
-    if benchmark_type == BenchmarkType.BANDWIDTH:
-        # size, MB/s
-        return [parts[0], parts[1]]
-
-    # LATENCY
+    # BANDWIDTH or LATENCY: size + one metric
     return [parts[0], parts[1]]
🤖 Prompt for AI Agents
In `@src/cloudai/workloads/osu_bench/report_generation_strategy.py` around lines
64 - 85, The two branches in _parse_data_row that handle BenchmarkType.BANDWIDTH
and the LATENCY fallback both return [parts[0], parts[1]]; simplify by removing
the explicit BANDWIDTH branch and collapsing them into a single fallback that
returns [parts[0], parts[1]] for any non-MULTIPLE_BANDWIDTH case (after
validating parts and the message-size int), keeping the MULTIPLE_BANDWIDTH
branch unchanged; this reduces duplication while preserving behavior tied to
_columns_for_type.

Comment on lines +63 to +112
def test_osu_multiple_bandwidth_message_rate_parsing(tmp_path: Path) -> None:
stdout = tmp_path / "stdout.txt"
stdout.write_text(OSU_MULTIPLE_BW)

df = extract_osu_bench_data(stdout)
assert list(df.columns) == ["size", "mb_sec", "messages_sec"]
assert df.shape == (2, 3)
assert df["size"].iloc[0] == 1
assert df["mb_sec"].iloc[0] == pytest.approx(2.36)
assert df["messages_sec"].iloc[0] == pytest.approx(2356892.44)
assert df["size"].iloc[-1] == 4194304
assert df["mb_sec"].iloc[-1] == pytest.approx(26903.29)
assert df["messages_sec"].iloc[-1] == pytest.approx(6414.24)


def test_osu_latency_parsing(tmp_path: Path) -> None:
stdout = tmp_path / "stdout.txt"
stdout.write_text(OSU_ALLGATHER_LAT)

df = extract_osu_bench_data(stdout)
assert list(df.columns) == ["size", "avg_lat"]
assert df.shape == (2, 2)
assert df["size"].iloc[0] == 1
assert df["avg_lat"].iloc[0] == pytest.approx(2.81)
assert df["size"].iloc[-1] == 1048576
assert df["avg_lat"].iloc[-1] == pytest.approx(104.30)


def test_osu_bandwidth_parsing(tmp_path: Path) -> None:
stdout = tmp_path / "stdout.txt"
stdout.write_text(OSU_BW)

df = extract_osu_bench_data(stdout)
assert list(df.columns) == ["size", "mb_sec"]
assert df.shape == (2, 2)
assert df["size"].iloc[0] == 1
assert df["mb_sec"].iloc[0] == pytest.approx(2.32)
assert df["size"].iloc[-1] == 4194304
assert df["mb_sec"].iloc[-1] == pytest.approx(27161.33)


def test_osu_multi_latency_short_header_parsing(tmp_path: Path) -> None:
stdout = tmp_path / "stdout.txt"
stdout.write_text(OSU_MULTI_LAT)

df = extract_osu_bench_data(stdout)
assert list(df.columns) == ["size", "avg_lat"]
assert df.shape == (6, 2)
assert df["size"].tolist() == [1, 2, 4, 8, 16, 32]
assert df["avg_lat"].tolist() == pytest.approx([1.88, 1.84, 1.88, 1.91, 1.87, 2.01])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Good coverage of the happy-path parsing for all three benchmark types.

Consider adding edge-case tests for extract_osu_bench_data: file not found (returns empty DataFrame), empty file, and stdout with no recognizable header (e.g., osu_hello output). These would guard the early-return paths and align with the special-case handling added in osu_bench.py.

Would you like me to generate those edge-case tests?

🤖 Prompt for AI Agents
In
`@tests/report_generation_strategy/test_osu_bench_report_generation_strategy.py`
around lines 63 - 112, Add three edge-case tests for extract_osu_bench_data to
cover its early-return branches: (1) "file not found" — call
extract_osu_bench_data with a non-existent Path and assert it returns an empty
DataFrame (no columns, shape (0,0) or however empty is represented in your
project), (2) "empty file" — create an empty stdout file and assert
extract_osu_bench_data returns an empty DataFrame, and (3) "unrecognized header"
— write a stdout file containing non-OSU output (e.g., "osu_hello" text) and
assert extract_osu_bench_data returns an empty DataFrame; reference the function
extract_osu_bench_data and the module osu_bench.py so tests exercise the
special-case handling added there.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant