Implement true lossless pipeline with operation replay (#196) by christian-oreilly · Pull Request #250 · lina-usc/pylossless

christian-oreilly · 2026-01-28T16:54:41Z

Fix #196: Implement True Lossless Pipeline with Operation Replay

Summary

This PR implements a truly lossless pipeline by:

Saving original unmodified data instead of preprocessed data
Tracking all operations (preprocessing + artifact detection) in execution order
Replaying operations via RejectionPolicy to ensure reproducibility

Closes #196

Problem Statement

Currently, the pipeline applies preprocessing transformations (filtering, re-referencing) to self.raw before saving via mne_bids.write_raw_bids(). This makes the saved EDF files numerically different from the original BIDS data, violating the "lossless" philosophy.

Critical Issue: The pipeline interleaves preprocessing and artifact detection operations, and these operations have dependencies. For example:

Filtering enables good ICA decomposition
Channel flagging determines which channels to exclude from re-referencing
Re-referencing affects subsequent ICA results

The order matters for reproducibility!

Solution Overview

Key Architectural Changes

Store Original Data

self.raw_original = raw.copy()  # Before ANY modifications
self.raw = raw.copy()           # Working copy for processing

Track All Operations in Order

self.operations_log = []  # Captures preprocessing AND artifact detection

Save Original Data

mne_bids.write_raw_bids(
    self.raw_original,  # NOT self.raw!
    derivatives_path,
    ...
)

Replay via RejectionPolicy

def apply(self, pipeline):
    raw = pipeline.raw_original.copy()
    # Replay each operation in order
    for operation in pipeline.operations_log:
        raw = apply_operation(raw, operation)
    return raw

Changes Made

1. `pipeline.py`

Modified Methods

__init__(): Added self.raw_original and self.operations_log
run_with_raw(): Store original data before processing
save(): Save original data + operation log (not preprocessed data)
load(): Load original data + operation log

New Methods

_log_operation(): Track each operation with full metadata

Modified Pipeline Execution

All preprocessing operations now call _log_operation():

filter_args = self.config["filtering"]["filter_args"]
self.raw.filter(**filter_args)
self._log_operation(
    operation_type=OperationType.PREPROCESSING,
    operation_name="filter",
    parameters=filter_args
)

2. `rejection.py`

Modified `RejectionPolicy` Class

New Attributes:

self.apply_preprocessing = True
self.preprocessing_operations_to_skip = []
self.operation_param_overrides = {}

Modified Methods:

apply(): Now replays operations in sequence
_load_from_yaml(): Load preprocessing policy configuration

New Methods:

_get_final_params(): Get parameters with overrides
_get_channels_to_exclude_at_operation(): Handle operation dependencies

3. Configuration Files

New: `operations_log.json`

Saved during pipeline.save():

{
  "description": "Complete sequential log of all pipeline operations",
  "operations": [
    {
      "operation_id": 0,
      "operation_type": "preprocessing",
      "operation_name": "filter",
      "parameters": {"l_freq": 1.0, "h_freq": 100.0},
      "timestamp": "2026-01-28T10:30:00"
    },
    {
      "operation_id": 1,
      "operation_type": "artifact_flag",
      "operation_name": "flag_noisy_channels",
      "flags": {"noisy_channels": ["E31", "E67"]}
    },
    {
      "operation_id": 2,
      "operation_type": "preprocessing",
      "operation_name": "set_eeg_reference",
      "parameters": {"ref_channels": "average", "exclude": ["E31", "E67"]},
      "metadata": {"depends_on_operation": 1}
    }
  ]
}

Updated: `rejection_policy.yaml`

Extended with preprocessing section:

# Preprocessing Policy (NEW)
preprocessing:
  apply_preprocessing: true
  operations_to_skip: []  # e.g., ['notch_filter']
  param_overrides: {}     # e.g., {'filter': {'l_freq': 0.5}}

# Artifact Rejection Policy (EXISTING)
channel_rejection:
  flags_to_reject: ['noisy', 'uncorrelated', 'bridged']
  cleaning_mode: null

ica_rejection:
  flags_to_reject: ['muscle', 'ecg', 'eog', 'channel_noise', 'line_noise']
  rejection_threshold: 0.3
  remove_flagged_ics: true

4. File Structure

New derivatives directory structure:

derivatives/pylossless/
├── sub-XX/
│   └── ses-YY/
│       └── eeg/
│           ├── sub-XX_ses-YY_task-ZZ_eeg.edf    # ORIGINAL unmodified data
│           └── ...
├── operations_log.json                          # NEW: All operations
├── rejection_policy.yaml                         # UPDATED: Includes preprocessing
├── qc_preprocessed/                              # NEW: For QC dashboard
│   └── sub-XX/
│       └── ses-YY/
│           └── eeg/
│               ├── sub-XX_ses-YY_task-ZZ_eeg.edf  # Preprocessed version
│               └── ...
└── dataset_description.json

Examples

Basic Usage (No Code Changes for Users!)

# 1. Run pipeline
pipeline = ll.LosslessPipeline('config.yaml')
pipeline.run_with_raw(raw)

# 2. Save (now saves original data!)
pipeline.save('derivatives/pylossless')

# 3. Load and apply policy (same as before, but now truly lossless!)
pipeline = ll.LosslessPipeline.load('derivatives/pylossless')
rejection_policy = ll.read_rejection_policy('rejection_policy.yaml')
cleaned = rejection_policy.apply(pipeline)

Advanced: Custom Preprocessing

# Skip certain preprocessing operations
rejection_policy = ll.RejectionPolicy()
rejection_policy.preprocessing_operations_to_skip = ['notch_filter']
cleaned = rejection_policy.apply(pipeline)

# Override preprocessing parameters
rejection_policy.operation_param_overrides = {
    'filter': {'l_freq': 0.5, 'h_freq': 40.0}
}
cleaned = rejection_policy.apply(pipeline)

# Use original data without any preprocessing
rejection_policy.apply_preprocessing = False
original_with_artifacts_removed = rejection_policy.apply(pipeline)

Benefits

✅ True Losslessness: Original data files are byte-identical to source BIDS
✅ Reproducibility: Operations replayed in exact execution order
✅ Handles Dependencies: Re-referencing correctly excludes flagged channels
✅ Flexibility: Users can customize preprocessing via rejection policy
✅ Backwards Compatible: Existing workflows continue to work
✅ Transparent: Complete audit trail of all operations
✅ Architectural Consistency: Preprocessing integrated with RejectionPolicy pattern

Migration Guide

For End Users

No changes required! Existing code continues to work:

# This code works exactly as before
pipeline = ll.LosslessPipeline.load('derivatives/pylossless')
rejection_policy = ll.read_rejection_policy('rejection_policy.yaml')
cleaned = rejection_policy.apply(pipeline)

The difference is that now the derivatives contain truly lossless data.

For QC Dashboard

Update to load from qc_preprocessed/ directory:

# OLD
raw = mne_bids.read_raw_bids(derivatives_path)

# NEW
qc_path = derivatives_path / "qc_preprocessed"
if qc_path.exists():
    raw = mne_bids.read_raw_bids(qc_path)
else:
    # Fallback for old format
    raw = mne_bids.read_raw_bids(derivatives_path)

For Developers

When adding new preprocessing operations, use _log_operation():

def _apply_new_preprocessing(self):
    """Apply new preprocessing operation."""
    params = self.config["new_preprocessing"]
    self.raw.apply_operation(**params)
    
    # IMPORTANT: Log the operation
    self._log_operation(
        operation_type=OperationType.PREPROCESSING,
        operation_name="new_preprocessing",
        parameters=params
    )

Breaking Changes

None. This PR is fully backwards compatible.

Related Issues

Closes True Lossless #196
Related to QC dashboard (Desktop QC App #97)
Provides some background for properly implementing a flexible pipeline of operation, serving as a foundation for Add Artifact Blocking as an optional preprocessing step #73 and Integration/interoperability with autoreject #202

- Save original unmodified data instead of preprocessed data - Track all operations in execution order with operations_log.json - Replay operations via RejectionPolicy for reproducibility - Handle operation dependencies (e.g., re-referencing excludes flagged channels) - Save preprocessed version to qc_preprocessed/ for dashboard - Maintain backwards compatibility with legacy format

christian-oreilly · 2026-02-03T18:23:08Z

@scott-huberty This PR propose a significant refactoring of PyLossless so I'd like to merge it earlier than later to avoid code divergence. Do you want to review it before I merge?

codecov · 2026-02-03T18:42:13Z

Codecov Report

❌ Patch coverage is 83.41463% with 68 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.01%. Comparing base (3def180) to head (939c3d0).

Files with missing lines	Patch %	Lines
pylossless/pipeline_aux.py	73.63%	29 Missing ⚠️
pylossless/config/rejection.py	85.16%	27 Missing ⚠️
pylossless/pipeline.py	89.83%	12 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #250      +/-   ##
==========================================
+ Coverage   84.50%   85.01%   +0.50%     
==========================================
  Files          26       27       +1     
  Lines        1717     1995     +278     
==========================================
+ Hits         1451     1696     +245     
- Misses        266      299      +33

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

scott-huberty · 2026-02-05T21:03:09Z

Hey @christian-oreilly sorry for the belated response. I didn't have time to review this super closely but tests are passing and looks like you covered your bases! Just 2 things:

I checked out this branch and built the documentation locally and compared the examples/plot_10_run_pipeline.py to the website: https://pylossless.readthedocs.io/en/latest/auto_examples/plot_10_run_pipeline.html . It looks like on this branch the annotations no longer show up on the figure when you do raw.plot. Do you have an idea of what changed?
In this PR or a follow up PR can we adjust one of our tutorials (or add a new one) that demonstrates how users can configure this new approach? e.g. the things you do in the "advanced usage" sections of your PR description.

scott-huberty · 2026-02-05T21:19:31Z

+        # Check if we have operations log (new lossless format)
+        if hasattr(pipeline, 'operations_log') and len(pipeline.operations_log) > 0:
+            logger.info("LOSSLESS: Applying rejection policy by replaying"
+                        " operations...")
+            raw = self._apply_with_replay(pipeline)
+        else:


You said that this PR does not introduce breaking changes, but this clause will always be True, so users are opted-in to this new approach. So this doesn't change the API for them but the pipeline behavior does change (albeit in a good way, their original data are not filtered).

Should we cut a new release upon merge?

Agreed. It is a significant change in itself, and I think we also have quite a few PR since our last release.

christian-oreilly · 2026-02-10T19:12:39Z

Hey @christian-oreilly sorry for the belated response. I didn't have time to review this super closely but tests are passing and looks like you covered your bases! Just 2 things:

I checked out this branch and built the documentation locally and compared the examples/plot_10_run_pipeline.py to the website: https://pylossless.readthedocs.io/en/latest/auto_examples/plot_10_run_pipeline.html . It looks like on this branch the annotations no longer show up on the figure when you do raw.plot. Do you have an idea of what changed?

No idea... I'll investigate this.

In this PR or a follow up PR can we adjust one of our tutorials (or add a new one) that demonstrates how users can configure this new approach? e.g. the things you do in the "advanced usage" sections of your PR description.

Created as #251 to have this on the radar.

christian-oreilly linked an issue Jan 28, 2026 that may be closed by this pull request

True Lossless #196

Open

christian-oreilly added 7 commits January 28, 2026 12:53

Adding tests.

a178662

Linting

19a6a00

Fixing tests.

bac8bc3

Fixing tests

e710663

Linting

f88c7f8

Adding tests to increase coverage.

46a1d0c

Improving code coverage.

f531e35

christian-oreilly requested a review from scott-huberty January 29, 2026 15:45

Refactoring pylossless.py to reduce file size.

ae04c4e

Linting.

939c3d0

lina-usc deleted a comment from codecov Bot Feb 3, 2026

scott-huberty reviewed Feb 5, 2026

View reviewed changes

christian-oreilly mentioned this pull request Feb 10, 2026

Document advance usage for the preprocessing of the data when applying the RejectionPolicy #251

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement true lossless pipeline with operation replay (#196)#250

Implement true lossless pipeline with operation replay (#196)#250
christian-oreilly wants to merge 10 commits intomainfrom
196-true-lossless

christian-oreilly commented Jan 28, 2026 •

edited

Loading

Uh oh!

christian-oreilly commented Feb 3, 2026

Uh oh!

codecov Bot commented Feb 3, 2026

Uh oh!

scott-huberty commented Feb 5, 2026

Uh oh!

scott-huberty Feb 5, 2026

Uh oh!

christian-oreilly Feb 10, 2026

Uh oh!

christian-oreilly commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

christian-oreilly commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix #196: Implement True Lossless Pipeline with Operation Replay

Summary

Problem Statement

Solution Overview

Key Architectural Changes

Changes Made

1. pipeline.py

Modified Methods

New Methods

Modified Pipeline Execution

2. rejection.py

Modified RejectionPolicy Class

3. Configuration Files

New: operations_log.json

Updated: rejection_policy.yaml

4. File Structure

Examples

Basic Usage (No Code Changes for Users!)

Advanced: Custom Preprocessing

Benefits

Migration Guide

For End Users

For QC Dashboard

For Developers

Breaking Changes

Related Issues

Uh oh!

christian-oreilly commented Feb 3, 2026

Uh oh!

codecov Bot commented Feb 3, 2026

Codecov Report

Uh oh!

scott-huberty commented Feb 5, 2026

Uh oh!

scott-huberty Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

christian-oreilly Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

christian-oreilly commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

christian-oreilly commented Jan 28, 2026 •

edited

Loading

1. `pipeline.py`

2. `rejection.py`

Modified `RejectionPolicy` Class

New: `operations_log.json`

Updated: `rejection_policy.yaml`