Skip to content

fast forward seeks on compressed streams#807

Open
oh6hay wants to merge 4 commits into
log2timeline:mainfrom
oh6hay:compressed-stream-seek
Open

fast forward seeks on compressed streams#807
oh6hay wants to merge 4 commits into
log2timeline:mainfrom
oh6hay:compressed-stream-seek

Conversation

@oh6hay

@oh6hay oh6hay commented May 26, 2026

Copy link
Copy Markdown

The compressed_stream_io.py discarded the uncompressor on forward seeks, making it really slow to access the stream if the use pattern consisted of a large number of (small) forward seeks. I found this bug by investigating why log2timeline/plaso failed on a stuck worker that attempted to analyze a fairly large xz compressed tarball -- enumerating the tar contents thrashed the CompressedStream because it initialized a new decompressor and decompressed the entire stream on every forward seek.

The fix makes the CompressedStream retain the decompressor and when initialized, only discard it when seeking backwards, as compressed streams generally cannot be seeked backwards. Prior to the fix, small forward seeks caused the decompressor be initialized and decompress the entire stream up to the new seek position.

I included a test case that checks that 5000 forward seeks on a large xz-compressed stream of random bytes happens reasonably fast. (The compressed data shouldn't be a repeated pattern as that compresses too well, causing the test not to fail on an unfixed CompressedStream implementation)

@codecov

codecov Bot commented May 27, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 90.47619% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.86%. Comparing base (0f55fbc) to head (28c24a2).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
dfvfs/file_io/compressed_stream_io.py 90.47% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #807      +/-   ##
==========================================
+ Coverage   87.84%   87.86%   +0.01%     
==========================================
  Files         297      297              
  Lines       12287    12296       +9     
==========================================
+ Hits        10794    10804      +10     
+ Misses       1493     1492       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant