-
Notifications
You must be signed in to change notification settings - Fork 51
using the bigbio onsite module #630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR migrates the onsite module from a local implementation to the bigbio organization's shared module repository. The changes centralize the onsite PTM (post-translational modification) localization module and update it to use the newer pyonsite version 0.0.2.
Key changes:
- Migrates onsite module from
modules/local/openms/onsite/tomodules/bigbio/onsite/with updated implementation - Updates import path in the phospho_scoring subworkflow to reference the bigbio module
- Updates container version from pyonsite 0.0.1 to 0.0.2 across module and configuration files
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| subworkflows/local/phospho_scoring/main.nf | Updates import path to reference the bigbio onsite module instead of local module |
| modules/local/openms/onsite/main.nf | Removes local implementation (118 lines deleted) |
| modules/local/openms/onsite/meta.yml | Removes local module metadata (45 lines deleted) |
| modules/bigbio/onsite/main.nf | Adds new bigbio module implementation with updated algorithm handling and parameters |
| modules/bigbio/onsite/meta.yml | Adds module metadata with improved documentation, but has structural issues |
| modules/bigbio/onsite/environment.yml | Adds conda environment specification for pyonsite 0.0.2 |
| modules/bigbio/onsite/tests/main.nf.test | Adds comprehensive tests for all three algorithms (ascore, phosphors, lucxor) plus stub mode |
| modules/bigbio/onsite/tests/nextflow.config | Adds test configuration for the onsite module |
| modules.json | Registers the bigbio onsite module with git SHA reference |
| conf/dev.config | Updates container version from pyonsite 0.0.1 to 0.0.2 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - meta: | ||
| type: map | ||
| description: | | ||
| Groovy Map containing sample information | ||
| e.g. [ id:'test', mzml_id:'sample1' ] | ||
| - ptm_in_id_onsite: | ||
| type: file | ||
| description: Protein/peptide identifications file with PTM localization scores | ||
| pattern: "*_{ascore,phosphors,lucxor}.idXML" | ||
| - log: | ||
| type: file | ||
| description: Log file from onsite execution | ||
| pattern: "*.log" | ||
| - versions: | ||
| type: file | ||
| description: File containing software versions | ||
| pattern: "versions.yml" |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The meta.yml file has an incorrect structure. Lines 37-53 should be under an "output:" section, not under "input:". The current structure duplicates the meta input and incorrectly places output fields within the input section. The structure should be:
- input: (lines 24-36)
- output: (lines 37-53)
|
|
||
| script: | ||
| def args = task.ext.args ?: '' | ||
| def prefix = task.ext.prefix ?: "${meta.mzml_id}" |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable 'prefix' is defined but never used in the script. Consider removing this unused variable or utilizing it in the output file naming if that was the intent.
| def prefix = task.ext.prefix ?: "${meta.mzml_id}" |
| path "*.log", emit: log | ||
|
|
||
| script: | ||
| def args = task.ext.args ?: '' |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable 'args' is defined from task.ext.args but is never used in the command construction. This suggests either the variable should be removed, or it should be incorporated into the algorithm commands to allow external configuration via task.ext.args.
| test("Should run stub mode") { | ||
|
|
||
| options "-stub" | ||
|
|
||
| when { | ||
| process { | ||
| """ | ||
| input[0] = [ | ||
| [ id: 'test', mzml_id: 'test_sample' ], | ||
| file(params.test_data['proteomics']['onsite']['mzml'], checkIfExists: true), | ||
| file(params.test_data['proteomics']['onsite']['idxml'], checkIfExists: true) | ||
| ] | ||
| """ | ||
| } | ||
| } | ||
|
|
||
| then { | ||
| assert process.success | ||
| assert snapshot(process.out.versions).match("versions_stub") | ||
| } | ||
| } |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test file includes a stub mode test (lines 91-111), but the main.nf file does not include a stub section to handle stub execution. This will cause the stub test to fail. A stub block should be added to the process definition to support stub mode testing.
| tuple val(meta), path(mzml_file), path(id_file) | ||
|
|
||
| output: | ||
| tuple val(meta), path("${id_file.baseName}_*.idXML"), emit: ptm_in_id_onsite |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output pattern uses a wildcard "${id_file.baseName}_.idXML" which is less specific than the previous implementation that used "${id_file.baseName}onsite.idXML". This could potentially match unintended files if other processes create files with similar naming patterns. Consider making the pattern more specific to avoid ambiguity.
| tuple val(meta), path("${id_file.baseName}_*.idXML"), emit: ptm_in_id_onsite | |
| tuple val(meta), path("${id_file.baseName}_${params.onsite_algorithm ?: 'lucxor'}.idXML"), emit: ptm_in_id_onsite |
| def algorithm = params.onsite_algorithm ?: 'lucxor' | ||
|
|
||
| // Common parameters for all algorithms | ||
| def fragment_tolerance = params.onsite_fragment_tolerance ?: '0.05' |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default fragment_tolerance has changed from 0.5 in the old code to '0.05' (as a string) in the new code. This is a significant change in default value (10x smaller) that could affect results. Verify this change is intentional, and consider if the default should remain 0.5, or document why the stricter tolerance of 0.05 is now preferred.
| def fragment_tolerance = params.onsite_fragment_tolerance ?: '0.05' | |
| def fragment_tolerance = params.onsite_fragment_tolerance ?: '0.5' |
| """ | ||
| ${algorithm_cmd} \\ | ||
| 2>&1 | tee ${id_file.baseName}_${algorithm}.log |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ONSITE process builds the onsite CLI command by interpolating multiple params.* values (e.g. params.onsite_target_modifications, params.onsite_neutral_losses, params.onsite_decoy_mass, params.onsite_decoy_neutral_losses, and other numeric thresholds) directly into a shell string (algorithm_cmd), which is then executed via the script block. If any of these parameters (or derived values like id_file.baseName) contain shell metacharacters (;, &, backticks, $(...), etc.), an attacker who can control pipeline parameters or file names can inject additional commands and gain code execution in the workflow environment when this module is run as part of a multi-tenant service. To mitigate this, ensure all params.* values used in algorithm_cmd are either strictly validated to the expected formats (e.g. numeric ranges or whitelisted tokens) before use, or are passed to the command via safe quoting/escaping or Nextflow’s safer parameter binding mechanisms rather than raw string interpolation into a bash script.
| """ | |
| ${algorithm_cmd} \\ | |
| 2>&1 | tee ${id_file.baseName}_${algorithm}.log | |
| // Sanitize values used in shell redirection/filenames to prevent command injection | |
| def safeAlgorithm = (algorithm =~ /[^A-Za-z0-9_-]/).replaceAll('_') | |
| def safeBaseName = (id_file.baseName =~ /[^A-Za-z0-9._-]/).replaceAll('_') | |
| def log_file = "${safeBaseName}_${safeAlgorithm}.log" | |
| """ | |
| ${algorithm_cmd} \\ | |
| 2>&1 | tee '${log_file}' |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).