Skip to content

fix(scanner): don't leak start_tag across serialize/deserialize#361

Open
lucieleblanc wants to merge 2 commits into
DerekStride:mainfrom
lucieleblanc:scanner-deserialize-leak
Open

fix(scanner): don't leak start_tag across serialize/deserialize#361
lucieleblanc wants to merge 2 commits into
DerekStride:mainfrom
lucieleblanc:scanner-deserialize-leak

Conversation

@lucieleblanc
Copy link
Copy Markdown

Problem

I was running macOS leaks on a long-running host that parses many .sql files and noticed a growing set of allocations rooted in tree_sitter_sql_external_scanner_deserialize that grew linearly with the number of parses.

I think the external scanner's _serialize/_deserialize pair mishandles the LexerState.start_tag for strings delimited by dollar-sign tags instead of quotes:

  • _serialize frees and nulls start_tag after copying it into the snapshot buffer. If tree-sitter calls scan() on the same scanner instance again, it sees a NULL start_tag and can no longer match the pending dollar-quoted-string end tag.
  • _deserialize assigns to start_tag without freeing whatever the state already owns, leaking the previous allocation every time saved state is restored.

Solution

Free any existing start_tag at the top of _deserialize instead of _serialize, using the same pattern as _destroy.

Testing

After this fix, running leaks on the same program shows no more live allocations from tree_sitter_sql_external_scanner_deserialize.

I added a corpus entry to assert the AST shape and guard the named-tag parse against future regressions. It produces the same AST as the previous test, but uses named tags ($body$ ... $body$ instead of $$...$$).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant