Skip to content

Enable parquet prefetch#522

Merged
adsharma merged 1 commit into
mainfrom
enable-parquet-prefetch
May 27, 2026
Merged

Enable parquet prefetch#522
adsharma merged 1 commit into
mainfrom
enable-parquet-prefetch

Conversation

@adsharma
Copy link
Copy Markdown
Contributor

@adsharma adsharma commented May 26, 2026

Summary

  • enable Parquet scan prefetch by default now remote VFS paths support positioned reads
  • keep skipped columns out of prefetch accounting and registration
  • remove the stale local-filesystem-only TODO

Validation

lbug -i xet://datasets/ladybugdb/small-kgs/main/kg_history/icebug-disk/schema.cypher
lbug> MATCH (a:person)-[b:proposed]->(c:concept) RETURN *;
(16 tuples)
(3 columns)
Time: 621.70ms (compiling), 8356.26ms (executing)

@adsharma adsharma requested a review from aheev May 26, 2026 00:10
@aheev
Copy link
Copy Markdown
Contributor

aheev commented May 26, 2026

I will take a look at it tmrw. Seems like It would need changes on the thrift side as well

@adsharma
Copy link
Copy Markdown
Contributor Author

Ah right. I made the changes on the thrift side as well, but it didn't help on the xet:/.../kg_history database. So switched to batching (which worked) and threw away the previous work.

Let me look into recreating it.

@adsharma adsharma force-pushed the enable-parquet-prefetch branch from 302d241 to 64f42ff Compare May 26, 2026 23:26
@adsharma
Copy link
Copy Markdown
Contributor Author

The code had:

std::min(existing_head->GetEnd(), new_read_head.GetEnd()) - new_start

That computes the intersection-ish end, not the union end. Example:

existing: [1000, 2000)
new: [1800, 2600)

wrong merged range: [1000, 2000)
correct range: [1000, 2600)

With the old code, the new range could be partially dropped from the read-ahead buffer. Correctness of query results usually survived because a later read that missed the prefetch buffer fell back to readFromFile, but the prefetch registration no longer meant “this full range is buffered.” For remote files, that causes extra range reads and defeats part of prefetch.

@adsharma
Copy link
Copy Markdown
Contributor Author

The thrift change didn't change query performance. Improving further requires either changes to query plan or more aggressive local caching. Probably post 0.17.0 release work.

@adsharma adsharma merged commit 9da6864 into main May 27, 2026
4 checks passed
@adsharma adsharma deleted the enable-parquet-prefetch branch May 27, 2026 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants