Skip to content

Conversation

@daichengxin
Copy link
Collaborator

@daichengxin daichengxin commented Dec 30, 2025

User description

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the bigbio/quantms branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

PR Type

Enhancement, Bug fix


Description

  • Refactored rescoring workflow to move feature extraction into peptide database search subworkflow

  • Updated container image versions from 0.0.13 to 0.0.14 for quantms-rescoring modules

  • Enhanced data flow by propagating search engine type through rescoring pipeline

  • Simplified PSM rescoring subworkflow by removing duplicated feature extraction logic

  • Fixed fine-tuning model directory handling with improved parameter validation


Diagram Walkthrough

flowchart LR
  A["PEPTIDE_DATABASE_SEARCH"] -->|"ch_id_files_idx"| B["Feature Extraction<br/>MSRESCORE_FEATURES<br/>SPECTRUM_FEATURES"]
  B -->|"ch_id_files_feats"| C["PSM_RESCORING<br/>PERCOLATOR"]
  C -->|"ch_consensus_input"| D["CONSENSUSID"]
  E["Container Update<br/>0.0.13 → 0.0.14"] -.->|"applies to"| B
Loading

File Walkthrough

Relevant files
Enhancement
main.nf
Update container version and add search engine tracking   

modules/local/utils/msrescore_features/main.nf

  • Updated container image version from 0.0.13 to 0.0.14
  • Added search_engine parameter to input tuple for tracking search
    engine type
  • Propagated search_engine through output tuple
  • Enhanced model directory parameter validation with improved
    conditional logic
+10/-8   
main.nf
Update container and improve fine-tuning parameter handling

modules/local/utils/msrescore_fine_tuning/main.nf

  • Updated container image version from 0.0.13 to 0.0.14
  • Added groupkey parameter to input for better grouping during
    fine-tuning
  • Changed output to include groupkey for proper result tracking
  • Fixed model directory handling with explicit parameter validation
  • Updated prefix generation to use groupkey instead of meta.mzml_id
+16/-10 
main.nf
Add search engine type propagation                                             

modules/local/utils/spectrum_features/main.nf

  • Added search_engine parameter to input tuple
  • Propagated search_engine through output tuple for downstream tracking
+2/-2     
main.nf
Pass experiment design to database search                               

subworkflows/local/id/main.nf

  • Updated PEPTIDE_DATABASE_SEARCH call to pass ch_expdesign parameter
+2/-1     
Refactoring
main.nf
Remove rescoring logic, move to database search                   

subworkflows/local/dda_id/main.nf

  • Removed module includes for rescoring-related processes
    (MSRESCORE_FEATURES, MSRESCORE_FINE_TUNING, SPECTRUM_FEATURES,
    PSM_CLEAN, GET_SAMPLE)
  • Removed ID_MERGER include as it moved to peptide_database_search
    subworkflow
  • Simplified rescoring logic by removing feature extraction and
    fine-tuning code
  • Updated PEPTIDE_DATABASE_SEARCH call to pass ch_expdesign parameter
  • Consolidated rescoring to only handle percolator and consensus ID
    steps
+5/-113 
main.nf
Integrate rescoring into database search workflow               

subworkflows/local/peptide_database_search/main.nf

  • Added includes for rescoring modules (PSM_CLEAN,
    MSRESCORE_FINE_TUNING, MSRESCORE_FEATURES, GET_SAMPLE,
    SPECTRUM_FEATURES, ID_MERGER)
  • Added ch_expdesign parameter to workflow input
  • Implemented complete feature extraction pipeline within database
    search workflow
  • Added fine-tuning support with per-search-engine model training
  • Implemented SNR feature extraction with conditional logic
  • Added sample-based and project-based ID merging logic
  • Added get_sample_map helper function for sample mapping
+182/-2 
main.nf
Simplify rescoring to percolator only                                       

subworkflows/local/psm_rescoring/main.nf

  • Removed all feature extraction logic (MSRESCORE_FEATURES,
    SPECTRUM_FEATURES, PSM_CLEAN, MSRESCORE_FINE_TUNING, GET_SAMPLE,
    ID_MERGER)
  • Simplified workflow to only handle percolator and ID ripper steps
  • Changed input parameter from ch_id_files to ch_id_files_feats
    (pre-processed features)
  • Removed get_sample_map function (moved to peptide_database_search
    subworkflow)
  • Removed complex branching and merging logic for search engines
+5/-103 

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 30, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@daichengxin daichengxin changed the title fixed quantms-rescoring fixed quantms-rescoring (DON'T MERGE) Dec 30, 2025
@ypriverol ypriverol marked this pull request as draft December 30, 2025 16:40
@jpfeuffer
Copy link
Collaborator

I think it still fails because of permissions. Can't you set the env var for pep deep and matplotlib inside of the task? Just to PWD or a subfolder of it. You know you will always have access there

1 similar comment
@jpfeuffer
Copy link
Collaborator

I think it still fails because of permissions. Can't you set the env var for pep deep and matplotlib inside of the task? Just to PWD or a subfolder of it. You know you will always have access there

@daichengxin daichengxin marked this pull request as ready for review January 4, 2026 11:17
@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status:
Misleading variable usage: The new code uses file_name as both the destination and the input (file(file_name)) in
add_file_prefix, making the identifier misleading and likely incorrect.

Referred Code
def add_file_prefix(file_path) {
    position = file(file_path).name.lastIndexOf('_sage_perc.idXML')
    if (position == -1) {
        position = file(file_path).name.lastIndexOf('_comet_perc.idXML')
        if (position == -1) {
            position = file(file_path).name.lastIndexOf('_msgf_perc.idXML')
        }
    }
    file_name = file(file_name).name.take(position)
    return [file_name, file_path]

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Unhandled edge cases: add_file_prefix can compute position == -1 and then uses take(position) plus an incorrect
file(file_name) reference, which can trigger runtime failures without a clear, handled
error path.

Referred Code
def add_file_prefix(file_path) {
    position = file(file_path).name.lastIndexOf('_sage_perc.idXML')
    if (position == -1) {
        position = file(file_path).name.lastIndexOf('_comet_perc.idXML')
        if (position == -1) {
            position = file(file_path).name.lastIndexOf('_msgf_perc.idXML')
        }
    }
    file_name = file(file_name).name.take(position)
    return [file_name, file_path]

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Missing input validation: get_sample_map assumes row.Spectra_Filepath contains a . and required fields exist, but
does not validate/guard against null/empty values or missing columns before string/path
operations.

Referred Code
def get_sample_map(LinkedHashMap row) {
    def sample_map = [:]

    filestr               = row.Spectra_Filepath
    file_name             = file(filestr).name.take(file(filestr).name.lastIndexOf('.'))
    sample                = row.Sample

    return [file_name, sample]

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Use correct model directory variable

Correct the --ms2_model_dir argument to use the params.ms2features_model_dir
variable instead of model_weight.

modules/local/utils/msrescore_features/main.nf [30-34]

 } else if (params.ms2features_model_dir && params.ms2features_model_dir != true){
-    ms2_model_dir = "--ms2_model_dir ${model_weight}"
+    ms2_model_dir = "--ms2_model_dir ${params.ms2features_model_dir}"
 } else {
     ms2_model_dir = "--ms2_model_dir ./"
 }
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: This suggestion identifies a critical bug where the wrong variable is used, causing the program to point to a file instead of a directory, which would lead to a runtime error.

High
Fix meta map lookup

Correct the access to the meta map by removing the incorrect [0] indexer.

modules/local/utils/msrescore_fine_tuning/main.nf [29-30]

-ms2_tolerance = meta[0]['fragmentmasstolerance']
-ms2_tolerance_unit = meta[0]['fragmentmasstoleranceunit']
+ms2_tolerance = meta['fragmentmasstolerance']
+ms2_tolerance_unit = meta['fragmentmasstoleranceunit']
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a bug where meta is incorrectly indexed as a list, which would cause the process to fail at runtime.

High
Use absolute path for default models

Use an absolute path with ${baseDir} for the default model directory to improve
workflow portability and reliability.

subworkflows/local/peptide_database_search/main.nf [83-87]

 if (params.ms2features_model_dir && params.ms2features_model_dir != true) {
     ms2_model_dir = Channel.from(file(params.ms2features_model_dir, checkIfExists: true))
 } else {
-    ms2_model_dir = Channel.from(file("pretrained_models"))
+    ms2_model_dir = Channel.from(file("${baseDir}/assets/pretrained_models", checkIfExists: true))
 }
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential portability issue with a relative path and proposes using the Nextflow built-in baseDir variable for a more robust solution.

Medium
Correct model_dir assignment

Refactor the ms2_model_dir assignment to use a ternary operator, removing the
redundant self-assignment and clarifying the logic.

modules/local/utils/msrescore_fine_tuning/main.nf [32-36]

-if (params.ms2features_model_dir && params.ms2features_model_dir != true) {
-    ms2_model_dir = ms2_model_dir
-} else {
-    ms2_model_dir = "./"
-}
+def ms2_model_dir_var = (params.ms2features_model_dir && params.ms2features_model_dir != true) ? params.ms2features_model_dir : './'
+ms2_model_dir = ms2_model_dir_var
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why: This suggestion correctly identifies a no-op assignment and proposes a more concise and correct implementation using a ternary operator, improving code clarity and correctness.

Low
High-level
Re-evaluate the subworkflow refactoring

The feature extraction and pre-processing logic should be moved from the
peptide_database_search subworkflow into a new, separate subworkflow. This would
improve modularity and adhere to the single-responsibility principle.

Examples:

subworkflows/local/peptide_database_search/main.nf [77-235]
    if (params.skip_rescoring != true) {

        if (params.ms2features_enable == true){
            // Only add ms2_model_dir if it's actually set and not empty
            // Handle cases where parameter might be empty string, null, boolean true, or whitespace
            // When --ms2features_model_dir is passed with no value, Nextflow may set it to boolean true
            if (params.ms2features_model_dir && params.ms2features_model_dir != true) {
                ms2_model_dir = Channel.from(file(params.ms2features_model_dir, checkIfExists: true))
            } else {
                ms2_model_dir = Channel.from(file("pretrained_models"))

 ... (clipped 149 lines)
subworkflows/local/psm_rescoring/main.nf [10-69]
workflow PSM_RESCORING {
    take:
    ch_file_preparation_results
    ch_id_files_feats
    ch_expdesign

    main:
    ch_software_versions = Channel.empty()
    ch_results  = Channel.empty()
    ch_fdridpep = Channel.empty()

 ... (clipped 50 lines)

Solution Walkthrough:

Before:

// subworkflows/local/peptide_database_search/main.nf
workflow PEPTIDE_DATABASE_SEARCH {
    take:
        ch_mzmls_search
        ch_searchengine_in_db
        ch_expdesign

    main:
        // ... database search logic (SAGE, MSGF, COMET)

        if (params.skip_rescoring != true) {
            // ... extensive feature extraction, fine-tuning,
            // and ID merging logic is now here ...
            MSRESCORE_FINE_TUNING(...)
            MSRESCORE_FEATURES(...)
            SPECTRUM_FEATURES(...)
            ID_MERGER(...)
        }
    emit:
        ch_id_files_idx // files with features
        ...
}

After:

// subworkflows/local/peptide_database_search/main.nf
workflow PEPTIDE_DATABASE_SEARCH {
    // ... database search logic (SAGE, MSGF, COMET)
    emit:
        ch_id_files_idx // raw search results
        ...
}

// subworkflows/local/feature_extraction/main.nf (New subworkflow)
workflow FEATURE_EXTRACTION {
    take:
        ch_id_files
        ...
    main:
        // ... extensive feature extraction, fine-tuning,
        // and ID merging logic moved here ...
        MSRESCORE_FINE_TUNING(...)
        MSRESCORE_FEATURES(...)
        SPECTRUM_FEATURES(...)
        ID_MERGER(...)
    emit:
        ch_id_files_feats // files with features
        ...
}
Suggestion importance[1-10]: 8

__

Why: This is a valid and impactful architectural suggestion that correctly identifies a major design issue in the PR, where the peptide_database_search subworkflow has been overloaded with responsibilities beyond its name, harming modularity and maintainability.

Medium
General
Refactor repeated code into a closure

Extract the repetitive logic for preparing training datasets into a reusable
closure to reduce code duplication.

subworkflows/local/peptide_database_search/main.nf [95-117]

-sage_train_datasets = ch_id_sage
-    .combine(ch_mzmls_search, by: 0)
-    .toSortedList()
-    .flatMap()
-    .randomSample(params.fine_tuning_sample_run, 2025)
-    .combine(Channel.value("sage"))
-    .groupTuple(by: 3)
+def prepare_train_data = { id_channel, engine_name ->
+    return id_channel
+        .combine(ch_mzmls_search, by: 0)
+        .toSortedList()
+        .flatMap()
+        .randomSample(params.fine_tuning_sample_run, 2025)
+        .combine(Channel.value(engine_name))
+        .groupTuple(by: 3)
+}
 
-msgf_train_datasets = ch_id_msgf
-    .combine(ch_mzmls_search, by: 0)
-    .toSortedList()
-    .flatMap()
-    .randomSample(params.fine_tuning_sample_run, 2025)
-    .combine(Channel.value("msgf"))
-    .groupTuple(by: 3)
+sage_train_datasets = prepare_train_data(ch_id_sage, "sage")
+msgf_train_datasets = prepare_train_data(ch_id_msgf, "msgf")
+comet_train_datasets = prepare_train_data(ch_id_comet, "comet")
 
-comet_train_datasets = ch_id_comet
-    .combine(ch_mzmls_search, by: 0)
-    .toSortedList()
-    .flatMap()
-    .randomSample(params.fine_tuning_sample_run, 2025)
-    .combine(Channel.value("comet"))
-    .groupTuple(by: 3)
-
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies significant code duplication and proposes a good refactoring using a closure, which improves maintainability and readability.

Low
  • More

@daichengxin daichengxin changed the title fixed quantms-rescoring (DON'T MERGE) fixed quantms-rescoring Jan 4, 2026
@ypriverol ypriverol self-requested a review January 4, 2026 14:12
@ypriverol ypriverol merged commit 4659730 into bigbio:dev Jan 4, 2026
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants