fix: skip unavailable tree-sitter parsers#459
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves resilience of the graph build when a specific tree_sitter_language_pack grammar hangs or fails to load by (1) avoiding module-import-time parser loading and (2) proactively probing parser loadability in a short-lived child process, so only the problematic language is skipped rather than blocking the entire build.
Changes:
- Lazily import
tree_sitter_language_packinsideCodeParser._get_parser()instead of at module import time. - Add a subprocess-based probe (
_parser_load_probe_succeeds) to detect hanging/failing language bindings and mark only that language as unavailable. - Add a regression test ensuring a probe timeout marks a language unavailable and
_get_parser()returnsNone.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
code_review_graph/parser.py |
Adds subprocess probing + unavailable-language tracking; switches to lazy import of tree_sitter_language_pack. |
tests/test_parser.py |
Adds a test validating timeout probing behavior and resets the unavailable-language cache between tests. |
| def _get_parser(self, language: str): # type: ignore[arg-type] | ||
| if language in _UNAVAILABLE_LANGUAGES: | ||
| return None | ||
| if language not in self._parsers: | ||
| if not _parser_load_probe_succeeds(language): | ||
| _UNAVAILABLE_LANGUAGES.add(language) | ||
| logger.warning("Skipping unavailable tree-sitter parser for %s", language) | ||
| return None |
|
I left this out of v2.3.4. The goal is reasonable, but parser availability and parser start-up behaviour are risky areas. This needs a smaller failure-mode description and focused tests showing which unavailable grammars are skipped and which parser failures should still surface. It is better suited to a parser reliability pass than to this release. |
Summary
tree_sitter_language_packparsers instead of importing the package at module import timeWhy
On macOS with
tree-sitter-language-pack==0.13.0, loading the TSX binding can hang insidetree_sitter_language_pack.get_parser("tsx"). Because parser loading previously happened in the main process, one broken language binding could block the entire graph build until an outer timeout killed it.This change makes that failure language-scoped: the affected language is skipped, while all other languages still contribute graph context.
Verification
uv run python -X faulthandler -m pytest tests/test_parser.py::TestCodeParser::test_parser_probe_timeout_marks_language_unavailable -q -p no:asynciouv run python -m py_compile code_review_graph/parser.py tests/test_parser.pygit diff --checkcode-review-graph build --skip-flows: timed out after 10sPYTHONPATHwithCRG_PARSER_LOAD_TIMEOUT_SECONDS=1: completed in ~1.9s, produced 640 nodes / 6555 edges while skipping unavailabletsxNote:
ruff checkcurrently hangs in my local dev environment even with--no-cache; no lint output was produced.