Fix platform-dependent byte offset drift and Unicode alignment in tree-sitter-ng generators#444
Open
victorgveloso wants to merge 2 commits into
Open
Fix platform-dependent byte offset drift and Unicode alignment in tree-sitter-ng generators#444victorgveloso wants to merge 2 commits into
victorgveloso wants to merge 2 commits into
Conversation
…oin(System.lineSeparator(), contentLines);
|
@jrfaller |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Root Causes Identified (see #439 for more details):
System.lineSeparator(). On Windows, this forced CRLF (2 bytes) onto files that originally used LF (1 byte), causing a +1 byte drift for every line in the file.
files containing multi-byte Unicode characters (e.g., Emojis or non-Latin scripts).
The Fix:
Verification:
We replicated the bug by comparing the official Tree-sitter CLI (v0.26.9) against GumTree on a large Python project (FastAPI). We verified the fix across Windows 10
(PowerShell) and Linux (WSL/Ubuntu) environments.
I have added three new test cases to AbstractTreeSitterNgGeneratorTest that specifically target these discrepancies:
Test report before fix (Failing on Windows/WSL)
AbstractTreeSitterNgGeneratorTest > testMatchNodeOrAncestorTypes() PASSED
AbstractTreeSitterNgGeneratorTest > OffsetConsistency_testMultiByteOffsets() FAILED
org.opentest4j.AssertionFailedError: Line 2 should start at byte offset 7 after a 4-byte emoji and LF ==> expected: <7> but was: <5>
at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:563)
at
app//com.github.gumtreediff.gen.treesitterng.AbstractTreeSitterNgGeneratorTest.OffsetConsistency_testMultiByteOffsets(AbstractTreeSitterNgGeneratorTest.java:71)
AbstractTreeSitterNgGeneratorTest > OffsetConsistency_testCRLFOffsets() FAILED
org.opentest4j.AssertionFailedError: Line 2 should start at byte offset 7 for CRLF content ==> expected: <7> but was: <6>
at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:563)
at app//com.github.gumtreediff.gen.treesitterng.AbstractTreeSitterNgGeneratorTest.OffsetConsistency_testCRLFOffsets(AbstractTreeSitterNgGeneratorTest.java:58)
AbstractTreeSitterNgGeneratorTest > OffsetConsistency_testLFOffsets() PASSED
4 tests completed, 2 failed
FAILURE: Build failed with an exception.
Execution failed for task ':gen.treesitter-ng:test'.
BUILD FAILED in 11s
13 actionable tasks: 3 executed, 10 up-to-date
Test report after fix (Passing all 57 project tasks)
$ ./gradlew test
Starting a Gradle Daemon, 1 incompatible and 1 stopped Daemons could not be reused, use --status for details
TestActionGenerator > testAlignChildren() PASSED
TestActionGenerator > testWithUnmappedRoot() PASSED
TestActionGenerator > testWithActionExample() PASSED
TestActionGenerator > testWithZsCustomExample() PASSED
TestActionGenerator > testWithActionExampleNoMove() PASSED
TestActionIoUtils > testActionsIoUtilsMove() PASSED
TestActionIoUtils > testActionsIoUtilsInsert() PASSED
TestActionIoUtils > testActionsIoUtilsUpdate() PASSED
TestAlgorithms > testLcs() PASSED
TestAlgorithms > testLcss() PASSED
TestAlgorithms > testHungarianAlgorithm() PASSED
TestAutoMatcher > testAutoMatcher() PASSED
TestCdMatcher > testLeafMatcher() PASSED
TestClassicGumtreeStability > testStability() PASSED
TestDefaultPriorityTreeQueue > testPopOpenWithHeight() PASSED
TestDefaultPriorityTreeQueue > testPopOpenWithSize() PASSED
TestDefaultPriorityTreeQueue > testPopOpenWithSizeAndMinPriority() PASSED
TestDefaultPriorityTreeQueue > testSynchronize() PASSED
TestDiff > testComputeWithReaders() PASSED
TestDirectoryComparator > testDirectoryComparatorOnTwoFolders() PASSED
TestDirectoryComparator > testPairFilesInvalidArguments() PASSED
TestDirectoryComparator > testPairAndUnpairFiles() PASSED
TestDirectoryComparator > testDirectoryComparatorOnFileAndFolder() PASSED
TestDirectoryComparator > testDirectoryComparatorOnNonExistentFiles() PASSED
TestDirectoryComparator > testDirectoryComparatorOnTwoFiles() PASSED
TestGumtreeMatcher > testMappingComparatorPosInParent() PASSED
TestGumtreeMatcher > testMinHeightThreshold() PASSED
TestGumtreeMatcher > testSimAndSizeThreshold() PASSED
TestGumtreeMatcher > testMappingComparatorPosInTree() PASSED
TestGumtreeProperties > testGreedyBottomUpMatcher() PASSED
TestGumtreeProperties > testAbstractSubtreeMatcher() PASSED
TestGumtreeProperties > testCompositeMatcher() PASSED
TestGumtreeProperties > testBottomUpMatcher() PASSED
TestGumtreeProperties > testChangeDistillerBottomUpMatcher() PASSED
TestGumtreeProperties > testChangeDistillerLeavesMatcher() PASSED
TestIdMatcher > testIdMatcher() PASSED
TestMappingComparators > testTwinMappings() PASSED
TestMappingStore > testMultiMappingStore() PASSED
TestMappingStore > testMappingStore() PASSED
TestMetadata > testExportInvalid1() PASSED
TestMetadata > testExportInvalid2() PASSED
TestMetadata > testPutNode() PASSED
TestMetadata > testExportCustom() PASSED
TestMetadata > testGlobalIterator() PASSED
TestMetadata > testLocalIterator() PASSED
TestOptimizedMatchers > testRtedThetaMatcher() PASSED
TestOptimizedMatchers > testChangeDistillerThetaParMatcher() PASSED
TestOptimizedMatchers > testClassicGumtreeThetaMatcher() PASSED
TestPair > testToString() PASSED
TestPair > testEquals() PASSED
TestRegistries > testTreeGenerators() PASSED
TestRegistries > testMatchers() PASSED
TestRtedMatcher > testRtedMatcher() PASSED
TestSequenceAlgorithms > testITreeLcssIsomorphism() PASSED
TestSequenceAlgorithms > testLcs() PASSED
TestSequenceAlgorithms > testITreeLcss() PASSED
TestSequenceAlgorithms > testStringLcss() PASSED
TestSimilarityMetrics > testOverlapSimilarity() PASSED
TestSimilarityMetrics > testChawatheSimilarity() PASSED
TestSimilarityMetrics > testJaccardSimilarity() PASSED
TestSimilarityMetrics > testDiceSimilarity() PASSED
TestTree > testGetDescendants() PASSED
TestTree > testImmutable() PASSED
TestTree > testChildUrl() PASSED
TestTree > testToString() PASSED
TestTree > testTypeThreading() PASSED
TestTree > testInsertChild() PASSED
TestTree > testGetParents() PASSED
TestTree > testIsomophism() PASSED
TestTree > testTypesAndLabels() PASSED
TestTree > testIsostructure() PASSED
TestTree > testTreesBetweenPositions() PASSED
TestTree > testDeepCopy() PASSED
TestTree > testSearchSubtree() PASSED
TestTree > testIsRoot() PASSED
TestTree > testFakeTree() PASSED
TestTree > testIsClone() PASSED
TestTree > testChildManipulation() PASSED
TestTreeClassifier > testOnlyRootsClassifier() PASSED
TestTreeClassifier > testAllNodesClassifier() PASSED
TestTreeIoUtils > testSerializeTree() PASSED
TestTreeIoUtils > testLineReader() PASSED
TestTreeIoUtils > testPrintTextTree() PASSED
TestTreeIoUtils > testDotFormatter() PASSED
TestTreeUtils > testPostOrder() PASSED
TestTreeUtils > testPreOrderList() PASSED
TestTreeUtils > testBfs() PASSED
TestTreeUtils > testDepth() PASSED
TestTreeUtils > testBfs2() PASSED
TestTreeUtils > testBfs3() PASSED
TestTreeUtils > testHash() PASSED
TestTreeUtils > testSize() PASSED
TestTreeUtils > testBfsList() PASSED
TestTreeUtils > testPostOrderNumbering() PASSED
TestTreeUtils > testLeafIterator() PASSED
TestTreeUtils > testPostOrder2() PASSED
TestTreeUtils > testPostOrder3() PASSED
TestTreeUtils > testHashValue() PASSED
TestTreeUtils > testHeight() PASSED
TestZsMatcher > testWithCustomExample() PASSED
TestZsMatcher > testWithSlideExample() PASSED
TestCssTreeGenerator > badSyntax() PASSED
TestCssTreeGenerator > testSimple() PASSED
TestJavaParserGenerator > testBadSyntax() PASSED
TestJavaParserGenerator > testRange() PASSED
TestJavaParserGenerator > testSimpleSyntax(String, int, String) > [1] CompilationUnit, 12, package foo.bar; public class Foo { public int foo; } PASSED
TestJavaParserGenerator > testSimpleSyntax(String, int, String) > [2] CompilationUnit, 37, public class Foo { public List foo; public void foo() { for (A f : foo)
{ System.out.println(f); } } } PASSED
TestJavaParserGenerator > testSimpleSyntax(String, int, String) > [3] CompilationUnit, 23, public class Foo {
public void foo() {
new ArrayList().stream().forEach(a -> {});
}
} PASSED
TestJdtGenerator > testTagElement() PASSED
TestJdtGenerator > testPrefixExpression() PASSED
TestJdtGenerator > testArrayCreation() PASSED
TestJdtGenerator > testIds() PASSED
TestJdtGenerator > testComments2() PASSED
TestJdtGenerator > testClassReservedKeywords() PASSED
TestJdtGenerator > testMethodInvocation() PASSED
TestJdtGenerator > testJava8Syntax() PASSED
TestJdtGenerator > testSimpleSyntax() PASSED
TestJdtGenerator > testVarargs() PASSED
TestJdtGenerator > testComments() PASSED
TestJdtGenerator > testAssignment() PASSED
TestJdtGenerator > testGenericFunctionWithTypeParameter() PASSED
TestJdtGenerator > testTypeDefinition() PASSED
TestJdtGenerator > testClassReservedKeywords2() PASSED
TestJdtGenerator > testClassReservedKeywords3() PASSED
TestJdtGenerator > testJava5Syntax() PASSED
TestJdtGenerator > testPostfixExpression() PASSED
TestJdtGenerator > testEnumReservedKeywords() PASSED
TestJdtGenerator > badSyntax() PASSED
TestJdtGenerator > testRecordReservedKeywords() PASSED
TestJdtGenerator > testInfixOperator() PASSED
TestJdtMatching > testCase_1_20391Classic() SKIPPED
TestJdtMatching > testSpurious1WithClassicDefault() SKIPPED
TestJdtMatching > testSpurious1WithSimple() SKIPPED
TestJdtMatching > testCase_1_0007_Classic() SKIPPED
TestJdtMatching > testSpurious1WithClassic1_Default_0007d191fec7fe2d6a0c4e87594cb286a553f92c() SKIPPED
TestJdtMatching > testCase_1_0a66_Simple() SKIPPED
TestJdtMatching > testNotSpurious1() SKIPPED
TestJdtMatching > testSpurious1WithClassicConfiguredGreedySubtreeMatcher() SKIPPED
TestJdtMatching > testSpurious1WithClassicConfiguredGreedyBottomUpMatcher() SKIPPED
TestJdtMatching > testCase_1_0007_Simple() SKIPPED
TestJdtMatching > testSpurious1WithClassic_Configured_1_0007d191fec7fe2d6a0c4e87594cb286a553f92c() SKIPPED
TestJdtMatching > testSpurious1WithClassic_Configured4Passing_1_0007d191fec7fe2d6a0c4e87594cb286a553f92c() SKIPPED
TestJdtMatching > testSpurious1WithSimple_0007d191fec7fe2d6a0c4e87594cb286a553f92c() SKIPPED
TestJsGenerator > testStatement() PASSED
TestJsGenerator > testComment() PASSED
TestJsGenerator > testLambda() PASSED
TestJsGenerator > badSyntax() PASSED
TestJsonTreeGenerator > testSyntaxError1() PASSED
TestJsonTreeGenerator > testSyntaxError2() PASSED
TestJsonTreeGenerator > testSyntaxError3() PASSED
TestJsonTreeGenerator > testJsonArray() PASSED
TestJsonTreeGenerator > testJsonObject() PASSED
WARNING: A restricted method in java.lang.System has been called
WARNING: java.lang.System::load has been called by org.treesitter.utils.NativeUtils in an unnamed module
(file:/home/victor/.gradle/caches/modules-2/files-2.1/io.github.bonede/tree-sitter/0.26.6/a06f40ff61859e602985bc8850ebe28d3f54ebd0/tree-sitter-0.26.6.jar)
WARNING: Use --enable-native-access=ALL-UNNAMED to avoid a warning for callers in this module
WARNING: Restricted methods will be blocked in a future release unless native access is enabled
AbstractTreeSitterNgGeneratorTest > testMatchNodeOrAncestorTypes() PASSED
CMakeTreeSitterNgTreeGeneratorTest > testHelloWorld() PASSED
CTreeSitterNgTreeGeneratorTest > testHelloWorld() PASSED
CppTreeSitterNgTreeGeneratorTest > testHelloWorld() PASSED
GoTreeSitterNgTreeGeneratorTest > testHelloWorld() PASSED
HaskellTreeSitterNgTreeGeneratorTest > testHelloWorld() PASSED
JavaScriptTreeSitterNgTreeGeneratorTest > testHelloWorld() PASSED
JavaTreeSitterNgTreeGeneratorTest > testUnicodeInComment() PASSED
JavaTreeSitterNgTreeGeneratorTest > testUnicodeInString() PASSED
JavaTreeSitterNgTreeGeneratorTest > testCommentLine() PASSED
JavaTreeSitterNgTreeGeneratorTest > testAffectationOperatorChange() PASSED
JavaTreeSitterNgTreeGeneratorTest > testHelloWorld() PASSED
KotlinTreeSitterNgTreeGeneratorTest > testUnicodeInString() PASSED
KotlinTreeSitterNgTreeGeneratorTest > testAffectationOperatorChange() PASSED
KotlinTreeSitterNgTreeGeneratorTest > testHelloWorld() PASSED
OcamlTreeSitterNgTreeGeneratorTest > testHelloWorld() PASSED
PhpTreeSitterNgTreeGeneratorTest > testHelloWorld() PASSED
PythonTreeSitterNgTreeGeneratorTest > testComparisonOperators(String) > [1] < PASSED
PythonTreeSitterNgTreeGeneratorTest > testComparisonOperators(String) > [2] <= PASSED
PythonTreeSitterNgTreeGeneratorTest > testComparisonOperators(String) > [3] > PASSED
PythonTreeSitterNgTreeGeneratorTest > testComparisonOperators(String) > [4] >= PASSED
PythonTreeSitterNgTreeGeneratorTest > testComparisonOperators(String) > [5] == PASSED
PythonTreeSitterNgTreeGeneratorTest > testComparisonOperators(String) > [6] != PASSED
PythonTreeSitterNgTreeGeneratorTest > testBooleanOperators(String) > [1] and PASSED
PythonTreeSitterNgTreeGeneratorTest > testBooleanOperators(String) > [2] or PASSED
PythonTreeSitterNgTreeGeneratorTest > testAssignmentOperators(String) > [1] = PASSED
PythonTreeSitterNgTreeGeneratorTest > testAssignmentOperators(String) > [2] += PASSED
PythonTreeSitterNgTreeGeneratorTest > testAssignmentOperators(String) > [3] -= PASSED
PythonTreeSitterNgTreeGeneratorTest > testAssignmentOperators(String) > [4] *= PASSED
PythonTreeSitterNgTreeGeneratorTest > testAssignmentOperators(String) > [5] /= PASSED
PythonTreeSitterNgTreeGeneratorTest > testAssignmentOperators(String) > [6] //= PASSED
PythonTreeSitterNgTreeGeneratorTest > testAssignmentOperators(String) > [7] %= PASSED
PythonTreeSitterNgTreeGeneratorTest > testAssignmentOperators(String) > [8] **= PASSED
PythonTreeSitterNgTreeGeneratorTest > testString() PASSED
PythonTreeSitterNgTreeGeneratorTest > testHelloWorld() PASSED
RTreeSitterNgTreeGeneratorTest > testHelloWorld() PASSED
RubyTreeSitterNgTreeGeneratorTest > testHelloWorld() PASSED
RustTreeSitterNgTreeGeneratorTest > testHelloWorld() PASSED
SwiftTreeSitterNgTreeGeneratorTest > testHelloWorld() PASSED
TsxTreeSitterNgTreeGeneratorTest > testHelloWorld() PASSED
TypeScriptTreeSitterNgTreeGeneratorTest > testHelloWorld() PASSED
TestXmlTreeGenerator > testSimpleSyntax() PASSED
TestXmlTreeGenerator > testXmlDeclaration() PASSED
TestYamlGenerator > testSyntaxError() PASSED
TestYamlGenerator > testSimpleSyntax() PASSED
[Incubating] Problems report is available at: file:///mnt/c/Users/victor/Downloads/gumtree/build/reports/problems/problems-report.html
BUILD SUCCESSFUL in 1m 52s
57 actionable tasks: 45 executed, 12 up-to-date
Consider enabling configuration cache to speed up this build: https://docs.gradle.org/9.5.1/userguide/configuration_cache_enabling.html