using the bigbio onsite module #630

weizhongchun · 2025-12-30T08:31:55Z

PR checklist

coderabbitai · 2025-12-30T08:32:01Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR migrates the onsite module from a local implementation to the bigbio organization's shared module repository. The changes centralize the onsite PTM (post-translational modification) localization module and update it to use the newer pyonsite version 0.0.2.

Key changes:

Migrates onsite module from modules/local/openms/onsite/ to modules/bigbio/onsite/ with updated implementation
Updates import path in the phospho_scoring subworkflow to reference the bigbio module
Updates container version from pyonsite 0.0.1 to 0.0.2 across module and configuration files

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
subworkflows/local/phospho_scoring/main.nf	Updates import path to reference the bigbio onsite module instead of local module
modules/local/openms/onsite/main.nf	Removes local implementation (118 lines deleted)
modules/local/openms/onsite/meta.yml	Removes local module metadata (45 lines deleted)
modules/bigbio/onsite/main.nf	Adds new bigbio module implementation with updated algorithm handling and parameters
modules/bigbio/onsite/meta.yml	Adds module metadata with improved documentation, but has structural issues
modules/bigbio/onsite/environment.yml	Adds conda environment specification for pyonsite 0.0.2
modules/bigbio/onsite/tests/main.nf.test	Adds comprehensive tests for all three algorithms (ascore, phosphors, lucxor) plus stub mode
modules/bigbio/onsite/tests/nextflow.config	Adds test configuration for the onsite module
modules.json	Registers the bigbio onsite module with git SHA reference
conf/dev.config	Updates container version from pyonsite 0.0.1 to 0.0.2

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-04T14:44:45Z

modules/bigbio/onsite/meta.yml

+  - meta:
+      type: map
+      description: |
+        Groovy Map containing sample information
+        e.g. [ id:'test', mzml_id:'sample1' ]
+  - ptm_in_id_onsite:
+      type: file
+      description: Protein/peptide identifications file with PTM localization scores
+      pattern: "*_{ascore,phosphors,lucxor}.idXML"
+  - log:
+      type: file
+      description: Log file from onsite execution
+      pattern: "*.log"
+  - versions:
+      type: file
+      description: File containing software versions
+      pattern: "versions.yml"


The meta.yml file has an incorrect structure. Lines 37-53 should be under an "output:" section, not under "input:". The current structure duplicates the meta input and incorrectly places output fields within the input section. The structure should be:

input: (lines 24-36)

output: (lines 37-53)

Copilot · 2026-01-04T14:44:46Z

modules/bigbio/onsite/main.nf

+
+    script:
+    def args = task.ext.args ?: ''
+    def prefix = task.ext.prefix ?: "${meta.mzml_id}"


The variable 'prefix' is defined but never used in the script. Consider removing this unused variable or utilizing it in the output file naming if that was the intent.

Suggested change

def prefix = task.ext.prefix ?: "${meta.mzml_id}"

Copilot · 2026-01-04T14:44:46Z

modules/bigbio/onsite/main.nf

+    path "*.log", emit: log
+
+    script:
+    def args = task.ext.args ?: ''


The variable 'args' is defined from task.ext.args but is never used in the command construction. This suggests either the variable should be removed, or it should be incorporated into the algorithm commands to allow external configuration via task.ext.args.

modules/bigbio/onsite/main.nf

Copilot · 2026-01-04T14:44:46Z

modules/bigbio/onsite/tests/main.nf.test

+    test("Should run stub mode") {
+
+        options "-stub"
+
+        when {
+            process {
+                """
+                input[0] = [
+                    [ id: 'test', mzml_id: 'test_sample' ],
+                    file(params.test_data['proteomics']['onsite']['mzml'], checkIfExists: true),
+                    file(params.test_data['proteomics']['onsite']['idxml'], checkIfExists: true)
+                ]
+                """
+            }
+        }
+
+        then {
+            assert process.success
+            assert snapshot(process.out.versions).match("versions_stub")
+        }
+    }


The test file includes a stub mode test (lines 91-111), but the main.nf file does not include a stub section to handle stub execution. This will cause the stub test to fail. A stub block should be added to the process definition to support stub mode testing.

Copilot · 2026-01-04T14:44:47Z

modules/bigbio/onsite/main.nf

+    tuple val(meta), path(mzml_file), path(id_file)
+
+    output:
+    tuple val(meta), path("${id_file.baseName}_*.idXML"), emit: ptm_in_id_onsite


The output pattern uses a wildcard "${id_file.baseName}_.idXML" which is less specific than the previous implementation that used "${id_file.baseName}onsite.idXML". This could potentially match unintended files if other processes create files with similar naming patterns. Consider making the pattern more specific to avoid ambiguity.

Suggested change

tuple val(meta), path("${id_file.baseName}_*.idXML"), emit: ptm_in_id_onsite

tuple val(meta), path("${id_file.baseName}_${params.onsite_algorithm ?: 'lucxor'}.idXML"), emit: ptm_in_id_onsite

Copilot · 2026-01-04T14:44:47Z

modules/bigbio/onsite/main.nf

+    def algorithm = params.onsite_algorithm ?: 'lucxor'
+
+    // Common parameters for all algorithms
+    def fragment_tolerance = params.onsite_fragment_tolerance ?: '0.05'


The default fragment_tolerance has changed from 0.5 in the old code to '0.05' (as a string) in the new code. This is a significant change in default value (10x smaller) that could affect results. Verify this change is intentional, and consider if the default should remain 0.5, or document why the stricter tolerance of 0.05 is now preferred.

Suggested change

def fragment_tolerance = params.onsite_fragment_tolerance ?: '0.05'

def fragment_tolerance = params.onsite_fragment_tolerance ?: '0.5'

Copilot · 2026-01-04T14:44:47Z

modules/bigbio/onsite/main.nf

+    """
+    ${algorithm_cmd} \\
+        2>&1 | tee ${id_file.baseName}_${algorithm}.log


The ONSITE process builds the onsite CLI command by interpolating multiple params.* values (e.g. params.onsite_target_modifications, params.onsite_neutral_losses, params.onsite_decoy_mass, params.onsite_decoy_neutral_losses, and other numeric thresholds) directly into a shell string (algorithm_cmd), which is then executed via the script block. If any of these parameters (or derived values like id_file.baseName) contain shell metacharacters (;, &, backticks, $(...), etc.), an attacker who can control pipeline parameters or file names can inject additional commands and gain code execution in the workflow environment when this module is run as part of a multi-tenant service. To mitigate this, ensure all params.* values used in algorithm_cmd are either strictly validated to the expected formats (e.g. numeric ranges or whitelisted tokens) before use, or are passed to the command via safe quoting/escaping or Nextflow’s safer parameter binding mechanisms rather than raw string interpolation into a bash script.

Suggested change

"""

${algorithm_cmd} \\

2>&1 | tee ${id_file.baseName}_${algorithm}.log

// Sanitize values used in shell redirection/filenames to prevent command injection

def safeAlgorithm = (algorithm =~ /[^A-Za-z0-9_-]/).replaceAll('_')

def safeBaseName = (id_file.baseName =~ /[^A-Za-z0-9._-]/).replaceAll('_')

def log_file = "${safeBaseName}_${safeAlgorithm}.log"

"""

${algorithm_cmd} \\

2>&1 | tee '${log_file}'

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

using the bigbio onsite module

cc3f895

weizhongchun and others added 4 commits December 30, 2025 16:44

some changes

928a835

pre-commit

d262ece

some changes

e69769c

Merge branch 'dev' into dev

cd6cd05

ypriverol requested review from Copilot and daichengxin January 4, 2026 14:19

Copilot started reviewing on behalf of ypriverol January 4, 2026 14:34 View session

Copilot AI reviewed Jan 4, 2026

View reviewed changes

ypriverol and others added 2 commits January 4, 2026 17:50

Update modules/bigbio/onsite/main.nf

a379a92

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Remove 'openms' label from ONSITE process

0f9b061

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

using the bigbio onsite module #630

using the bigbio onsite module #630

weizhongchun commented Dec 30, 2025

Uh oh!

coderabbitai bot commented Dec 30, 2025 •

edited

Loading

Review skipped

Other AI code review bot(s) detected

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 4, 2026

Uh oh!

Copilot AI Jan 4, 2026

Uh oh!

Copilot AI Jan 4, 2026

Uh oh!

Uh oh!

Copilot AI Jan 4, 2026

Uh oh!

Copilot AI Jan 4, 2026

Uh oh!

Copilot AI Jan 4, 2026

Uh oh!

Copilot AI Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	tuple val(meta), path("${id_file.baseName}_*.idXML"), emit: ptm_in_id_onsite
	tuple val(meta), path("${id_file.baseName}_${params.onsite_algorithm ?: 'lucxor'}.idXML"), emit: ptm_in_id_onsite

	def fragment_tolerance = params.onsite_fragment_tolerance ?: '0.05'
	def fragment_tolerance = params.onsite_fragment_tolerance ?: '0.5'

-    """
-    ${algorithm_cmd} \\
->&1 | tee ${id_file.baseName}_${algorithm}.log
+    // Sanitize values used in shell redirection/filenames to prevent command injection
+    def safeAlgorithm = (algorithm =~ /[^A-Za-z0-9_-]/).replaceAll('_')
+    def safeBaseName  = (id_file.baseName =~ /[^A-Za-z0-9._-]/).replaceAll('_')
+    def log_file      = "${safeBaseName}_${safeAlgorithm}.log"
+    """
+    ${algorithm_cmd} \\
+>&1 | tee '${log_file}'

using the bigbio onsite module #630

Are you sure you want to change the base?

using the bigbio onsite module #630

Conversation

weizhongchun commented Dec 30, 2025

PR checklist

Uh oh!

coderabbitai bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Other AI code review bot(s) detected

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Dec 30, 2025 •

edited

Loading