fix(maxquant): Filter out decoys with decoy column#133
Conversation
📝 WalkthroughWalkthrough
ChangesContaminant Filtering Enhancement
Estimated Code Review Effort🎯 2 (Simple) | ⏱️ ~5 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@R/clean_MaxQuant.R`:
- Around line 20-27: The current filter_cols in clean_MaxQuant.R uses incorrect
MaxQuant header names (e.g., "Potentialcontaminant" and "Decoy") so filtering
silently skips expected columns; change filter_cols to use the literal MaxQuant
column names present in our inputs (e.g., "Contaminant",
"Potential.contaminant", "Reverse") and remove "Decoy"; if remove_by_site is
true append "Only.identified.by.site" to filter_cols and update the msg text to
match these literal names so the log reflects the actual columns being filtered;
locate the filter_cols and msg variables in clean_MaxQuant.R to make this edit.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
Motivation and Context
https://groups.google.com/g/msstats/c/NwsByfS2Y5M
Motivation and Context
MaxQuant proteomics software has introduced the "Potential.contaminant" column in its output format as an additional means of identifying potentially problematic proteins. The
.cleanRawMaxQuant()function previously filtered proteins only based on theContaminant,Reverse, andDecoycolumns. This PR updates the function to also filter out proteins marked in the newPotential.contaminantcolumn, ensuring that the MSstatsConvert package properly handles recent changes to MaxQuant's output format and prevents potentially problematic proteins from being included in downstream analysis.Changes
R/clean_MaxQuant.R:"Potentialcontaminant"to thefilter_colsvector (line 20) to filter rows where this column contains marked valuesremove_by_site = TRUEcase (lines 25-26) to also mention "Potential.contaminant" alongside existing filters ("Contaminant", "Reverse", "Decoy", and "Only.identified.by.site")Unit Tests
No unit tests were added or modified in this PR. The existing test suite in
inst/tinytest/test_cleanRaw.Rdoes test MaxQuant cleaning functionality (lines 40-55), but the test data already contains thePotential.contaminantcolumn in themq_pg.csvfile, so the filtering behavior is implicitly covered by existing tests. No explicit new test cases were created to specifically validate thePotentialcontaminantfiltering behavior.Coding Guidelines
No violations of coding guidelines identified. The changes follow the existing code patterns and maintain consistency with the R coding style used throughout the package.