fix(scanner): don't leak start_tag across serialize/deserialize#361
Open
lucieleblanc wants to merge 2 commits into
Open
fix(scanner): don't leak start_tag across serialize/deserialize#361lucieleblanc wants to merge 2 commits into
lucieleblanc wants to merge 2 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
I was running macOS
leakson a long-running host that parses many.sqlfiles and noticed a growing set of allocations rooted intree_sitter_sql_external_scanner_deserializethat grew linearly with the number of parses.I think the external scanner's
_serialize/_deserializepair mishandles theLexerState.start_tagfor strings delimited by dollar-sign tags instead of quotes:_serializefrees and nullsstart_tagafter copying it into the snapshot buffer. If tree-sitter callsscan()on the same scanner instance again, it sees a NULLstart_tagand can no longer match the pending dollar-quoted-string end tag._deserializeassigns tostart_tagwithout freeing whatever the state already owns, leaking the previous allocation every time saved state is restored.Solution
Free any existing
start_tagat the top of_deserializeinstead of_serialize, using the same pattern as_destroy.Testing
After this fix, running
leakson the same program shows no more live allocations fromtree_sitter_sql_external_scanner_deserialize.I added a corpus entry to assert the AST shape and guard the named-tag parse against future regressions. It produces the same AST as the previous test, but uses named tags (
$body$ ... $body$instead of$$...$$).