[ENH] Reduce complexity of run_flow_on_task func#1596
[ENH] Reduce complexity of run_flow_on_task func#1596Omswastik-11 wants to merge 19 commits intoopenml:mainfrom
run_flow_on_task func#1596Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1596 +/- ##
==========================================
- Coverage 53.09% 51.95% -1.15%
==========================================
Files 37 37
Lines 4362 4383 +21
==========================================
- Hits 2316 2277 -39
- Misses 2046 2106 +60 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
geetu040
left a comment
There was a problem hiding this comment.
Looks really nice, I have left a few comments with only minor changes requested.
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
geetu040
left a comment
There was a problem hiding this comment.
Nicely refactored, LGTM.
CC: @fkiraly, @SimonBlanke for review/merge.
SimonBlanke
left a comment
There was a problem hiding this comment.
@Omswastik-11 Do you see a way to increase the test coverage here? This is not a hard requirement.
geetu040
left a comment
There was a problem hiding this comment.
Actually there is no unit test for the function openml.runs.run_flow_on_task, could you please add one. There are some tests that use openml.runs.run_flow_on_task internally, but it would be nice to have an independent test that only checks this functionality. You can add this test in tests/test_runs.
Also if the helper functions in openml.runs.run_flow_on_task can be tested at unit (suggested in #1596 (review)), that would be nice, but again, it's not a hard requirement.
openml/runs/functions.py
Outdated
| task, flow = flow, task | ||
|
|
||
| if not isinstance(flow, OpenMLFlow): | ||
| raise TypeError("Flow must be OpenMLFlow after validation") |
There was a problem hiding this comment.
please include error location, and correct reference to variable
openml/runs/functions.py
Outdated
| if isinstance(flow, OpenMLTask) and isinstance(task, OpenMLFlow): | ||
| # We want to allow either order of argument (to avoid confusion). | ||
| warnings.warn( | ||
| "The old argument order (Flow, model) is deprecated and " |
There was a problem hiding this comment.
please capitalize variable names correctly. Include source of warning.
| """ | ||
| # We only need to sync with the server right now if we want to upload the flow, | ||
| # or ensure no duplicate runs exist. Otherwise it can be synced at upload time. | ||
| flow_id = None |
There was a problem hiding this comment.
code smell, too many indentations. Please simplify this further
openml/runs/functions.py
Outdated
| # We only need to sync with the server right now if we want to upload the flow, | ||
| # or ensure no duplicate runs exist. Otherwise it can be synced at upload time. | ||
| flow_id = None | ||
| if upload_flow or avoid_duplicate_runs: |
There was a problem hiding this comment.
for example, this could be if not upload_flow and not avoid_duplicate_runs: return flow_id, then one indentation less below.
| task=task, | ||
| flow=flow, | ||
| flow_id=flow_id, | ||
| dataset_id=dataset.dataset_id, |
There was a problem hiding this comment.
why is it fine to delete these args?
fkiraly
left a comment
There was a problem hiding this comment.
Nice! I left some recommendations on how to further simplify the code flow.

Summary
This PR refactors
run_flow_on_task, which had grown to ~160 lines with high cyclomatic complexity, by extracting small helper functions with clear, single responsibilities. The main function is now a readable orchestrator with clearly defined steps.Changes
Extracted helper functions
_validate_flow_and_task_inputsHandles input validation and backward-compatible argument handling
_sync_flow_with_serverSynchronizes the flow with the server and checks for duplicate runs
_prepare_run_environmentPrepares environment information and run tags
_create_run_from_resultsBuilds the
OpenMLRunobject from execution resultsInternal structure improvements
_RunResultsNamedTupleto bundle execution outputs(
data_content,trace,evaluations) and reduce long parameter listsType Safety Improvements
assertstatements with explicitValueError/TypeErrorexceptionsNoneFixes #1580