Skip to content

Freed heavy fit objects and large dataProcess intermediates early#207

Merged
tonywu1999 merged 6 commits into
develfrom
MSstats/work/20260514_free-heavy-objects
May 26, 2026
Merged

Freed heavy fit objects and large dataProcess intermediates early#207
tonywu1999 merged 6 commits into
develfrom
MSstats/work/20260514_free-heavy-objects

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

@tonywu1999 tonywu1999 commented May 14, 2026

User description

  • Stripped survreg() $y / $linear.predictors and lm() $model after fitting in .fitSurvival and .fitLinearModel; downstream predict / summary / vcov paths are unaffected.
  • Rewrote .updateUnequalVariances to mutate a single data.frame in place instead of rebuilding it ~5x per iteration via data.frame(...).
  • Added rm(raw), rm(peptides_dict), and gc(verbose = FALSE) inside dataProcess() so the ~250-400 MB held by function arguments is freed before MSstatsSummarizationOutput runs.
  • Added rm(survival_fit) and rm(fit) inside the per-protein linear / TMP summarizers so local fit objects drop as soon as survival / result is materialised.
  • Tests: inst/tinytest/test_memory_optimization.R Issue 1 (tests 1a, 1b) — 14 assertions, all green.

PR Type

Bug fix, Enhancement, Tests


Description

  • Free large pipeline inputs earlier

  • Strip heavy fields from fit objects

  • Avoid repeated copies in variance updates

  • Add fit memory regression tests


Diagram Walkthrough

flowchart LR
  raw["raw and peptides_dict inputs"]
  proc["dataProcess() processing pipeline"]
  fits["Stripped survreg and lm objects"]
  var["In-place unequal variance updates"]
  tests["Memory optimization regression tests"]
  raw -- "freed earlier" --> proc
  fits -- "reduce object size" --> proc
  var -- "avoids repeated copies" --> proc
  proc -- "validated by" --> tests
Loading

File Walkthrough

Relevant files
Bug fix
dataProcess.R
Free pipeline intermediates and fit objects early               

R/dataProcess.R

  • Remove raw after initial preparation
  • Remove peptides_dict after normalization
  • Run gc(verbose = FALSE) before output assembly
  • Drop local survival_fit and fit sooner
+11/-3   
Enhancement
utils_imputation.R
Strip unused `survreg` fields after fitting                           

R/utils_imputation.R

  • Clear fit$y after survreg() fitting
  • Clear fit$linear.predictors before returning
  • Keep downstream prediction behavior intact
+3/-1     
utils_summarization.R
Reduce `lm` footprint and variance-copy churn                       

R/utils_summarization.R

  • Clear linear_model$model before returning
  • Convert input to data.frame once upfront
  • Add and remove helper columns in place
  • Avoid repeated full-frame copies per iteration
+17/-15 
Tests
test_memory_optimization_fits.R
Add regression tests for memory-stripped fits                       

inst/tinytest/test_memory_optimization_fits.R

  • Add tests for stripped survreg fields
  • Verify stripped survival fits still predict
  • Add tests for stripped lm$model
  • Verify summary() and vcov() still work
+111/-0 

Motivation & Context

This PR addresses memory usage issues in the MSstats package by aggressively freeing large intermediate objects that are no longer needed after their primary use. The dataProcess() function was identified as holding approximately 250–400 MB of function arguments unnecessarily before proceeding to summarization. Additionally, fit objects from iterative per-protein summarization were accumulating in memory across loop iterations, causing unnecessary memory overhead.

Solution Summary

The PR implements memory optimizations across three main areas:

  1. Explicit intermediate cleanup in dataProcess(): Large local objects (raw, peptides_dict) are explicitly removed and garbage collection is triggered before summarization begins
  2. Fit object field stripping: Unnecessary components are removed from fitted model objects—$y and $linear.predictors from survreg() objects and $model from lm() objects—while preserving fields required for downstream operations like predict(), summary(), and vcov()
  3. Reduced copy churn in updateUnequalVariances(): The function was refactored to mutate a single data.frame in place during iterations rather than repeatedly rebuilding it

Detailed Changes

R/dataProcess.R:

  • Explicitly removes the raw object immediately after use
  • Removes the peptides_dict object immediately after use
  • Calls gc(verbose = FALSE) to trigger garbage collection before summarization
  • In MSstatsSummarizeSingleLinear() and MSstatsSummarizeSingleTMP() per-protein summarizers: explicitly removes survival_fit and fit objects immediately after their results have been materialized, reducing retained memory across iterations

R/utils_imputation.R:

  • .fitSurvival() now properly closes nested if/else branches when handling unlabeled inputs with n_features != 1L and countdf = FALSE
  • Strips fit$y and fit$linear.predictors fields from the fitted survival object before returning, while preserving fields required for predict() and other downstream operations

R/utils_summarization.R:

  • .fitLinearModel(): reformatted call to .updateUnequalVariances() using expanded named-argument layout
  • .updateUnequalVariances(): refactored to convert input to a data.table once and use in-place := assignments for intermediate column computation (abs.resids, fitted, loess.fitted, weight), eliminating repeated data.frame() reconstruction per iteration; intermediate columns are cleared by setting to NULL rather than being dropped and rebuilt
  • .fitLinearModel() now strips fit$model field from the fitted linear model object before returning, preserving $coefficients, $qr, $residuals, and other fields required for summary(), vcov(), and predict()

Unit Tests

Two test files were added to verify memory optimizations:

inst/tinytest/test_memory_optimization_fits.R (82 lines):

  • Tests that MSstats:::.fitSurvival() returns objects with $y and $linear.predictors set to NULL while preserving $coefficients and $scale
  • Verifies predict() works correctly on stripped survival fit objects
  • Confirms object.size() of stripped survival fits is smaller than unstripped equivalents
  • Tests that MSstats:::.fitLinearModel() returns objects with $model set to NULL while preserving $coefficients, $qr, and $residuals
  • Verifies summary() and vcov() work correctly on stripped linear fit objects
  • Confirms object.size() of stripped linear fits is smaller than unstripped equivalents

inst/tinytest/test_memory_optimization.R:

  • Tests Issue 1 cases 1a and 1b with large dataset benchmarking
  • Reports 14 assertions passing

Coding Guidelines

No coding guideline violations were identified. The changes maintain backward compatibility by preserving all fields required for downstream operations (predict(), summary(), vcov()), only removing fields that hold large data objects ($y, $linear.predictors, $model) that are not needed post-fit. The memory optimization is transparent to users.

Review Change Stack

* Stripped survreg() $y / $linear.predictors and lm() $model after
  fitting in .fitSurvival and .fitLinearModel; downstream predict /
  summary / vcov paths are unaffected.
* Rewrote .updateUnequalVariances to mutate a single data.frame in
  place instead of rebuilding it ~5x per iteration via data.frame(...).
* Added rm(raw), rm(peptides_dict), and gc(verbose = FALSE) inside
  dataProcess() so the ~250-400 MB held by function arguments is freed
  before MSstatsSummarizationOutput runs.
* Added rm(survival_fit) and rm(fit) inside the per-protein linear /
  TMP summarizers so local fit objects drop as soon as survival /
  result is materialised.
* Tests: inst/tinytest/test_memory_optimization.R Issue 1 (tests 1a,
  1b) — 14 assertions, all green.

See MSstats-ai/todos/active/TODO-MS-20260514_fix-memory-bugs.md

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 621e1c2c-78ca-4642-9dc7-a9856445fc9d

📥 Commits

Reviewing files that changed from the base of the PR and between 4d2d41a and 2e6cf17.

📒 Files selected for processing (1)
  • R/utils_summarization.R
🚧 Files skipped from review as they are similar to previous changes (1)
  • R/utils_summarization.R

📝 Walkthrough

Walkthrough

Memory optimizations reduce peak memory usage in the dataProcess fitting pipeline by stripping large dataset-holding fields from survival and linear model fits, explicitly freeing intermediate objects, refactoring iterative variance adjustment to use in-place column updates, and introducing tests to verify correctness and object-size reduction.

Changes

Memory optimization in data processing and fitting

Layer / File(s) Summary
Survival fit field stripping
R/utils_imputation.R
Fix missing brace in .fitSurvival() and null fit$y and fit$linear.predictors after model fitting to reduce retained memory.
Unequal-variance adjustment refactor
R/utils_summarization.R
Reformat .fitLinearModel() call to .updateUnequalVariances(). Refactor .updateUnequalVariances() to convert input to data.table once and use in-place := assignments for iterative loess-to-WLS computations, clearing intermediates via NULL instead of rebuilding data.frames.
Pipeline intermediate and fitted-model cleanup
R/dataProcess.R
Free raw and peptides_dict after use, call gc() before summarization, and remove survival_fit and fit objects after predictions and coefficients are extracted in linear and TMP summarization paths.
Memory-optimization tests
inst/tinytest/test_memory_optimization_fits.R
Add tests verifying .fitSurvival() strips heavy fields while preserving usability and that .fitLinearModel() strips $model while preserving usability, both achieving smaller object size than unstripped equivalents.

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

Review effort 3/5

🐰 A rabbit hops through memory lanes,
Stripping weights and chains,
Fits lean clean, and gc() runs bright—
Peak memory shrinks with all its might!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the primary change: freeing heavy fit objects and large dataProcess intermediates early for memory optimization.
Description check ✅ Passed The description includes motivation, detailed change list, testing summary, but is missing explicit checklist item completion and devtools::document() confirmation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch MSstats/work/20260514_free-heavy-objects

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Unguarded cleanup

In MSstatsSummarizeSingleTMP, survival_fit is created inside a tryCatch and the code already handles the failure case by setting converged = FALSE. The new unconditional rm(survival_fit) runs even when fitting failed, so this path now emits object 'survival_fit' not found warnings and can become a hard failure in environments that promote warnings to errors.

rm(survival_fit)

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@R/utils_summarization.R`:
- Around line 208-223: The loess iterations reuse stale fitted values because
input[["fitted"]] is only set from the initial fit; after each weighted refit
(wls.fit) update the stored fitted values so subsequent loess calls use
consistent residual/fitted pairs — e.g. inside the loop after computing wls.fit
(the variable wls.fit) assign input[["fitted"]] <- fitted(wls.fit) or
input[["fitted"]] <- wls.fit$fitted.values before recalculating
input[["abs.resids"]] and proceeding to the next iteration.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7d6992f3-0ec8-4947-80c1-53b71f9e9caa

📥 Commits

Reviewing files that changed from the base of the PR and between 3fa0bd1 and 5693fc9.

📒 Files selected for processing (4)
  • R/dataProcess.R
  • R/utils_imputation.R
  • R/utils_summarization.R
  • inst/tinytest/test_memory_optimization_fits.R

Comment thread R/utils_summarization.R
Comment on lines 208 to 223
for (i in seq_len(num_iter)) {
if (i == 1) {
abs.resids = data.frame(abs.resids = abs(fit$residuals))
fitted = data.frame(fitted = fit$fitted.values)
input = data.frame(input,
"abs.resids" = abs.resids,
"fitted" = fitted)
input[["abs.resids"]] = abs(fit$residuals)
input[["fitted"]] = fit$fitted.values
}
fit.loess = loess(abs.resids ~ fitted, data = input)
loess.fitted = data.frame(loess.fitted = fitted(fit.loess))
input = data.frame(input, "loess.fitted" = loess.fitted)
## loess fitted valuaes are predicted sd
input$weight = 1 / (input$loess.fitted ^ 2)
input = input[, !(colnames(input) %in% "abs.resids")]
input[["loess.fitted"]] = fitted(fit.loess)
## loess fitted values are predicted sd
input[["weight"]] = 1 / (input[["loess.fitted"]] ^ 2)
input[["abs.resids"]] = NULL
## re-fit using weight
wls.fit = lm(formula(fit), data = input, weights = weight)
abs.resids = data.frame(abs.resids = abs(wls.fit$residuals))
input = data.frame(input, "abs.resids" = abs.resids)
input = input[, -which(colnames(input) %in% c("loess.fitted", "weight"))]
input[["abs.resids"]] = abs(wls.fit$residuals)
input[["loess.fitted"]] = NULL
input[["weight"]] = NULL
}
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Refresh fitted after each weighted refit.

At Line 220 you refresh abs.resids from wls.fit, but fitted is never refreshed after Line 211. For num_iter > 1, later loess iterations use stale fitted values and inconsistent residual/fitted pairs.

Suggested fix
     for (i in seq_len(num_iter)) {
         if (i == 1) {
             input[["abs.resids"]] = abs(fit$residuals)
             input[["fitted"]] = fit$fitted.values
         }
         fit.loess = loess(abs.resids ~ fitted, data = input)
         input[["loess.fitted"]] = fitted(fit.loess)
         ## loess fitted values are predicted sd
         input[["weight"]] = 1 / (input[["loess.fitted"]] ^ 2)
         input[["abs.resids"]] = NULL
         ## re-fit using weight
         wls.fit = lm(formula(fit), data = input, weights = weight)
         input[["abs.resids"]] = abs(wls.fit$residuals)
+        input[["fitted"]] = wls.fit$fitted.values
         input[["loess.fitted"]] = NULL
         input[["weight"]] = NULL
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@R/utils_summarization.R` around lines 208 - 223, The loess iterations reuse
stale fitted values because input[["fitted"]] is only set from the initial fit;
after each weighted refit (wls.fit) update the stored fitted values so
subsequent loess calls use consistent residual/fitted pairs — e.g. inside the
loop after computing wls.fit (the variable wls.fit) assign input[["fitted"]] <-
fitted(wls.fit) or input[["fitted"]] <- wls.fit$fitted.values before
recalculating input[["abs.resids"]] and proceeding to the next iteration.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this piece of code be changed in a future PR? It uses data.frame and [[ a lot

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mstaniak, it looks like the current state of the file already addresses that — the version of .updateUnequalVariances in the branch now uses data.table with := for all in-place operations rather than data.frame and [[:

.updateUnequalVariances = function(input, fit, num_iter) {
    weight = abs.resids = loess.fitted = NULL
    if (!data.table::is.data.table(input)) {
        input = data.table::as.data.table(input)
    }
    for (i in seq_len(num_iter)) {
        if (i == 1) {
            input[, abs.resids := abs(fit$residuals)]
            input[, fitted := fit$fitted.values]
        }
        fit.loess = loess(abs.resids ~ fitted, data = input)
        input[, loess.fitted := fitted(fit.loess)]
        input[, weight := 1 / (loess.fitted ^ 2)]
        input[, abs.resids := NULL]
        wls.fit = lm(formula(fit), data = input, weights = weight)
        input[, abs.resids := abs(wls.fit$residuals)]
        input[, loess.fitted := NULL]
        input[, weight := NULL]
    }
    wls.fit
}

The diff in the review was anchored to an earlier commit snapshot — the branch has since been updated to the data.table/:= style. My earlier fitted refresh suggestion still applies to this current version, though.

@mstaniak
Copy link
Copy Markdown
Contributor

Thanks, fantastic update.
I wonder if people ever use the unequal variances option, btw.

Copy link
Copy Markdown
Contributor

@mstaniak mstaniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for his fantastic update, just one question

Comment thread R/utils_summarization.R Outdated
Comment thread inst/tinytest/test_memory_optimization_fits.R
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmark/profile_dataprocess_peak.R`:
- Around line 58-60: The required-column validation in
profile_dataprocess_peak.R currently defines required_cols as c("ProteinName",
"PeptideSequence", "PrecursorCharge", "FragmentIon", "ProductCharge",
"IsotopeLabelType", "Run", "BioReplicate", "Condition", "Intensity") which omits
PeptideModifiedSequence expected by the package validity path and risks later
failures; update the required_cols vector (and the analogous check around lines
85-89) to include "PeptideModifiedSequence", and ensure any check/handling for
"AnomalyScores" remains separate (i.e., validate presence of
PeptideModifiedSequence as required while treating AnomalyScores as an
optional/special-case column).
- Around line 143-148: The benchmark is using non-default parameters which makes
results non-comparable to dataProcess(); update the calls to
MSstats:::MSstatsSelectFeatures and MSstats:::MSstatsPrepareForSummarization to
use the same default knobs as dataProcess() rather than the explicit overrides
(featureSubset = "all", n_top_feature = 3, remove_uninformative_feature_outlier
= TRUE), i.e. remove the custom arguments and call the functions with their
defaults (or explicitly pass the same parameter values used by dataProcess()) so
the benchmark reflects the standard dataProcess() baseline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b06a9a32-3f43-4c52-841e-6e968edfde2d

📥 Commits

Reviewing files that changed from the base of the PR and between 9edb718 and d07f4d1.

📒 Files selected for processing (1)
  • benchmark/profile_dataprocess_peak.R

Comment thread benchmark/profile_dataprocess_peak.R Outdated
Comment on lines +58 to +60
required_cols <- c("ProteinName", "PeptideSequence", "PrecursorCharge",
"FragmentIon", "ProductCharge", "IsotopeLabelType",
"Run", "BioReplicate", "Condition", "Intensity")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Tighten required-column validation to match the pipeline contract.

The script currently accepts a 10-column schema, but the package validity path expects PeptideModifiedSequence too (with AnomalyScores handled separately). This can admit unsupported inputs and fail later in less obvious places.

Suggested patch
-required_cols <- c("ProteinName", "PeptideSequence", "PrecursorCharge",
+required_cols <- c("ProteinName", "PeptideSequence", "PeptideModifiedSequence", "PrecursorCharge",
                    "FragmentIon", "ProductCharge", "IsotopeLabelType",
                    "Run", "BioReplicate", "Condition", "Intensity")

Also applies to: 85-89

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark/profile_dataprocess_peak.R` around lines 58 - 60, The
required-column validation in profile_dataprocess_peak.R currently defines
required_cols as c("ProteinName", "PeptideSequence", "PrecursorCharge",
"FragmentIon", "ProductCharge", "IsotopeLabelType", "Run", "BioReplicate",
"Condition", "Intensity") which omits PeptideModifiedSequence expected by the
package validity path and risks later failures; update the required_cols vector
(and the analogous check around lines 85-89) to include
"PeptideModifiedSequence", and ensure any check/handling for "AnomalyScores"
remains separate (i.e., validate presence of PeptideModifiedSequence as required
while treating AnomalyScores as an optional/special-case column).

Comment thread benchmark/profile_dataprocess_peak.R Outdated
Comment thread benchmark/profile_dataprocess_peak.R Outdated
Comment thread benchmark/profile_dataprocess_peak.R Outdated
Comment thread benchmark/profile_dataprocess_peak.R Outdated
Comment thread benchmark/profile_dataprocess_peak.R Outdated
Comment thread benchmark/profile_dataprocess_peak.R Outdated
Comment thread benchmark/profile_dataprocess_peak.R Outdated
@tonywu1999
Copy link
Copy Markdown
Contributor Author

@Rudhik1904 If you could report your benchmarking results with a large dataset in this PR (i.e. amount of memory freed up, etc.), that would be great for documentation purposes

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
R/utils_summarization.R (1)

219-221: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Refresh fitted after each weighted refit.

At Line 219, abs.resids is updated from wls.fit, but fitted is not. For num_iter > 1, subsequent loess steps use stale fitted values.

Suggested fix
         wls.fit = lm(formula(fit), data = input, weights = weight)
         input[, abs.resids := abs(wls.fit$residuals)]
+        input[, fitted := wls.fit$fitted.values]
         input[, loess.fitted := NULL]
         input[, weight := NULL]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@R/utils_summarization.R` around lines 219 - 221, The code updates input[,
abs.resids := abs(wls.fit$residuals)] but does not refresh the fitted values, so
subsequent iterations (when num_iter > 1) use stale predictions; after computing
wls.fit add a line to update the stored fitted values (e.g., set input[, fitted
:= wls.fit$fitted] or equivalent) so each loess reweighting uses the most recent
fitted vector before clearing loess.fitted and weight.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmark/profile_dataprocess_peak.R`:
- Line 8: Replace the garbled header comment on line containing "mark and
elapsed time..." with a clear, readable description; e.g., rewrite the comment
used by the benchmark (the commented header string) to: "Mark elapsed time at
every stage boundary; useful for tracking the running max of `used` after
gc(reset = TRUE) to locate peak memory usage." Ensure you update the same
comment block that contains the malformed text so the benchmark header is
readable and informative.

---

Duplicate comments:
In `@R/utils_summarization.R`:
- Around line 219-221: The code updates input[, abs.resids :=
abs(wls.fit$residuals)] but does not refresh the fitted values, so subsequent
iterations (when num_iter > 1) use stale predictions; after computing wls.fit
add a line to update the stored fitted values (e.g., set input[, fitted :=
wls.fit$fitted] or equivalent) so each loess reweighting uses the most recent
fitted vector before clearing loess.fitted and weight.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dad8d3f4-a2a6-430c-be4c-8fd22547e10c

📥 Commits

Reviewing files that changed from the base of the PR and between d07f4d1 and 4d2d41a.

📒 Files selected for processing (2)
  • R/utils_summarization.R
  • benchmark/profile_dataprocess_peak.R

Comment thread benchmark/profile_dataprocess_peak.R Outdated
Comment thread R/utils_summarization.R Outdated

weight = abs.resids = loess.fitted = NULL
if (data.table::is.data.table(input)) {
input = data.table::copy(input)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we copying the data table here? This step seems unnecessary.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this addressed before merging?

Comment thread benchmark/profile_dataprocess_peak.R Outdated
Comment thread benchmark/profile_dataprocess_peak.R Outdated
# renamed proteins to
# scale up the input

default_input <- "data/reduce_output_spectronaut_input_qc_test.csv"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use absolute paths here?

@tonywu1999 tonywu1999 merged commit 5b07ebb into devel May 26, 2026
2 checks passed
@tonywu1999 tonywu1999 deleted the MSstats/work/20260514_free-heavy-objects branch May 26, 2026 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants