diff --git a/CHANGELOG.md b/CHANGELOG.md index 90dcbfca5..e438cf6bf 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -28,6 +28,9 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). - A stuck git command can no longer hang CodeGraph indefinitely. The git checks behind worktree detection and git-hook setup, and the installer's optional `npm install -g` step, now time out and fail gracefully instead of blocking forever — this matters most for the background MCP server, where an unbounded git hang (network filesystems, a wedged fsmonitor) could previously freeze it long enough for the safety watchdog to kill it. Thanks @inth3shadows for the report. (#1139) - The context hook's new plain-words matching works immediately on projects indexed by an older CodeGraph version. The word lookup it relies on is built at index time, so a project indexed before the upgrade had an empty one, and the hook would silently find nothing until something else happened to refresh the index; the hook now fills it in on first use (a one-time step — normally the background MCP server's startup catch-up gets there first). Thanks @inth3shadows for the report. (#1142) - Several accuracy fixes to the plain-words matching: a renamed symbol (for example a NestJS route after its module prefix is applied) stays findable under its new name (#1141); a word that only appears in your code as an import statement's package name is no longer presented as a matched symbol (#1144); plural words no longer generate garbled lookup keys ("services" no longer also looks up "servic") (#1145); and a name matching both the singular and plural of one word can no longer squeeze out a genuine two-word match (#1146). Thanks @inth3shadows for the reports. +- Heavily-reflected Unreal Engine C++ classes are no longer dropped from the index. Reflection markup that decorates members — `UPROPERTY(...)`, `UFUNCTION(...)`, `UCLASS(...)`, `GENERATED_BODY()`, `UE_DEPRECATED_*(...)`, `DECLARE_DELEGATE_*(...)` — are no-semicolon macro calls that tree-sitter doesn't recognize, so each drops into error recovery; in a big class the errors pile up until the whole `class_specifier` collapses and the class, its base clause, and its members vanish (`UCharacterMovementComponent`, with ~240 such macros, disappeared entirely, breaking every subclass/type-hierarchy and blast-radius query that went through it). These line-leading annotation macros are now blanked (offset-preserving) before parsing so the class survives. Thanks @luoyxy for the report and root-cause analysis. (#1093 follow-up) +- Unreal Engine members and methods prefixed by an export/visibility macro are no longer lost. The `*_API` macro doesn't only sit on the class header — it prefixes almost every exported member of a large UE class (`ENGINE_API virtual void Tick(...)`, `static ENGINE_API void AddReferencedObjects(...)`); the parser read the macro as an extra type token and each such declaration fell into error recovery, so on headers like `Actor.h` and `World.h` hundreds of return types piled up as orphan errors and could still tip the class into collapse. Member/method-level `*_API` / `*_EXPORT` / `*_ABI` macros (Unreal, Qt/Boost, LLVM) are now blanked before parsing, mirroring the existing class-header recovery. (#1093 follow-up) +- Unreal Engine annotation macros that appear mid-line — an enum value's `UMETA(DisplayName=...)`, a parameter's `UPARAM(ref)`, or a deprecation tag wedged into a `using` alias (`using FOnNetTick UE_DEPRECATED(5.5, "...") = ...;`, which alone collapsed `UWorld` in `World.h`) — are now stripped too. These sit in positions the line-leading recovery structurally can't reach, and a single one could take down the surrounding enum or class. They are matched by an Unreal-only name list (`UMETA`, `UPARAM`, `UE_DEPRECATED*`) so no standard-C++ or other-library code is affected. Together these three fixes recover the main class of every large Unreal Engine header tested (`Actor`, `ActorComponent`, `SkeletalMeshComponent`, `World`, `LightComponent`, `CharacterMovementComponent`). (#1093 follow-up) ## [1.2.0] - 2026-07-02 diff --git a/__tests__/extraction.test.ts b/__tests__/extraction.test.ts index b5e9d6a64..a0a2cdf28 100644 --- a/__tests__/extraction.test.ts +++ b/__tests__/extraction.test.ts @@ -11,7 +11,7 @@ import * as os from 'os'; import { CodeGraph } from '../src'; import { extractFromSource, scanDirectory, buildDefaultIgnore, discoverEmbeddedRepoRoots, buildScopeIgnore } from '../src/extraction'; import { detectLanguage, isLanguageSupported, getSupportedLanguages, initGrammars, loadAllGrammars, isSourceFile } from '../src/extraction/grammars'; -import { stripCppTemplateArgs, blankCppExportMacros, blankCppInlineMacros, blankMetalAttributes, recoverMangledCppName } from '../src/extraction/languages/c-cpp'; +import { stripCppTemplateArgs, blankCppExportMacros, blankCppInlineMacros, blankMetalAttributes, blankCppAnnotationMacroCalls, blankCppApiPrefixMacros, blankCppInlineAnnotationMacros, recoverMangledCppName } from '../src/extraction/languages/c-cpp'; import { normalizePath } from '../src/utils'; beforeAll(async () => { @@ -2928,6 +2928,179 @@ kernel void computeBlur(texture2d inTexture [[texture(0)]], }); }); + describe('C++ in-body reflection-macro annotations do not collapse the class (UE)', () => { + // Unreal reflection markup — `UPROPERTY(...)`, `UFUNCTION(...)`, + // `GENERATED_BODY()`, `UE_DEPRECATED_*(...)`, `DECLARE_DELEGATE_*(...)` — are + // no-semicolon macro CALLS decorating members. tree-sitter doesn't know they + // are macros, so each drops into error recovery; in a heavily-reflected class + // the errors accumulate until the enclosing class_specifier can't close and + // the whole class (its base clause and members) collapses into an ERROR node + // and disappears from the graph. blankCppAnnotationMacroCalls strips them, + // offset-preserving, so the class parses normally. + it('recovers a heavily-reflected class with multiple inheritance + members', () => { + const code = `UCLASS(MinimalAPI) +class UMyMovement : public UPawnMovementComponent, public IRVOAvoidanceInterface, public INetworkPredictionInterface +{ +\tGENERATED_BODY() +public: +\tUE_DEPRECATED_FORGAME(5.0, "Deprecated; note the commas, and (parens) inside the string") +\tUPROPERTY(Category="Move", EditAnywhere, BlueprintReadWrite, meta=(ClampMin="0", UIMin="0")) +\tfloat MaxWalkSpeed; + +\tUFUNCTION(BlueprintCallable, Category="Move") +\tfloat ComputeSpeed() const { return MaxWalkSpeed * 2.0f; } +}; +`; + const result = extractFromSource('movement.cpp', code); + const cls = result.nodes.find((n) => n.kind === 'class' && n.name === 'UMyMovement'); + expect(cls).toBeTruthy(); + // The class body parses, so its inline method definition is extracted too — + // proof the class_specifier closed instead of collapsing into an ERROR node. + expect(result.nodes.some((n) => n.name === 'ComputeSpeed')).toBe(true); + // The base clause survives (inheritance queries keep working). + expect( + result.unresolvedReferences.find( + (r) => r.referenceKind === 'extends' && r.referenceName === 'UPawnMovementComponent' + ) + ).toBeTruthy(); + }); + + it('strips line-leading no-semicolon ALL-CAPS calls, offset-preserving', () => { + const inp = `\tUPROPERTY(EditAnywhere, meta=(ClampMin="0"))\n\tfloat X;\n`; + const out = blankCppAnnotationMacroCalls(inp); + expect(out.length).toBe(inp.length); // every byte offset preserved + expect(out).not.toContain('UPROPERTY'); + expect(out).toContain('float X;'); + // A macro whose args carry commas/parens inside a string still balances. + const inp2 = `UE_DEPRECATED_FORGAME(5.0, "a, b (c)")\nUPROPERTY(Foo)\nfloat Y;\n`; + const out2 = blankCppAnnotationMacroCalls(inp2); + expect(out2.length).toBe(inp2.length); + expect(out2).not.toContain('UE_DEPRECATED_FORGAME'); + expect(out2).not.toContain('UPROPERTY'); + expect(out2).toContain('float Y;'); + }); + + it('does NOT blank expression / condition / statement / init-list macro uses', () => { + for (const c of [ + 'void f() {\n\tif (CHECK_FLAG(x)) { g(); }\n}', // condition — not line-leading + 'void f() {\n\tLOG_MESSAGE("hi");\n}', // statement call — trailing ; + 'C::C()\n\t: MEMBER_A(1)\n\t, MEMBER_B(2)\n{}', // init-list — comma / not line-leading + 'C::C() :\n\tMEMBER_A(1),\n\tMEMBER_B(2)\n{}', // init-list wrapped — trailing , / { + 'auto y =\n\tMAKE_THING(a) + 1;', // line-leading but an expression fragment + ]) { + expect(blankCppAnnotationMacroCalls(c)).toBe(c); + } + }); + }); + + describe('C++ member/method-level export macros do not orphan declarations (UE)', () => { + // The `*_API` visibility macro doesn't only prefix the class header — it + // prefixes almost every exported member/method of a big UE class + // (`ENGINE_API virtual void Tick(…)`, `static ENGINE_API void Foo(…)`). + // blankCppExportMacros only recovers the class-HEADER form; without blanking + // the member form, tree-sitter reads `MACRO (` as an extra type + // token and each declaration drops into error recovery. + it('recovers a class + base + members when members are *_API-prefixed', () => { + const code = `class ENGINE_API AActor : public UObject +{ +\tGENERATED_BODY() +public: +\tENGINE_API virtual void Tick(float DeltaSeconds); +\tstatic ENGINE_API void AddReferencedObjects(int32 Count); +\tENGINE_API float GetLifeSpan() const { return LifeSpan; } +}; +`; + const result = extractFromSource('actor.cpp', code); + expect(result.nodes.some((n) => n.kind === 'class' && n.name === 'AActor')).toBe(true); + // The inline definition (its body prefixed by ENGINE_API) is extracted — + // proof the class_specifier closed instead of collapsing into an ERROR. + expect(result.nodes.some((n) => n.name === 'GetLifeSpan')).toBe(true); + // The base clause survives (inheritance queries keep working). + expect( + result.unresolvedReferences.find( + (r) => r.referenceKind === 'extends' && r.referenceName === 'UObject' + ) + ).toBeTruthy(); + }); + + it('blanks only the suffix macro before a declaration, offset-preserving', () => { + const inp = `ENGINE_API void Tick();\nstatic MYMOD_EXPORT int32 X;\nLLVM_ABI bool Y();\n`; + const out = blankCppApiPrefixMacros(inp); + expect(out.length).toBe(inp.length); // every byte offset preserved + expect(out).not.toContain('ENGINE_API'); + expect(out).not.toContain('MYMOD_EXPORT'); + expect(out).not.toContain('LLVM_ABI'); + expect(out).toContain('void Tick();'); + expect(out).toContain('int32 X;'); + expect(out).toContain('bool Y();'); + expect(out).toMatch(/static\s+int32 X;/); // `static` kept, only the macro blanked + }); + + it('does NOT blank an *_API token used as a value or in non-declaration position', () => { + for (const c of [ + 'int x = SOME_API;', // rvalue — trailing ; + 'if (mode == FOO_API) { g(); }', // comparison — trailing ) + 'return DEFAULT_API, other;', // comma operand + 'auto v = NS_API::Make();', // qualified name — trailing :: + 'x = A_API + B_API;', // operands of + / trailing ; + ]) { + expect(blankCppApiPrefixMacros(c)).toBe(c); + } + }); + + it('leaves a genuine _API-suffixed word alone when it is itself the name', () => { + // A longer word merely CONTAINING _API (not ending in it) must not match. + const inp = 'FOO_APIENTRY handler;'; + expect(blankCppApiPrefixMacros(inp)).toBe(inp); + }); + }); + + describe('C++ mid-line UE annotation macros do not collapse the enum/class (UE)', () => { + // UMETA / UPARAM / UE_DEPRECATED can sit MID-LINE (not line-leading), where + // blankCppAnnotationMacroCalls structurally can't reach them: an enum value's + // `UMETA(...)`, or a deprecation tag wedged into a class-scope `using` + // (`using X UE_DEPRECATED(5.5, "…") = …;`) — which alone collapsed UWorld in + // World.h. blankCppInlineAnnotationMacros strips them, offset-preserving. + it('recovers a class whose in-body using-alias carries a mid-line UE_DEPRECATED', () => { + const code = `class ENGINE_API UWorld : public UObject +{ +\tGENERATED_BODY() +public: +\tusing FOnNetTickEvent UE_DEPRECATED(5.5, "use TMulticastDelegate") = TMulticastDelegate; +\tENGINE_API float GetTimeSeconds() const { return TimeSeconds; } +}; +`; + const result = extractFromSource('world.cpp', code); + expect(result.nodes.some((n) => n.kind === 'class' && n.name === 'UWorld')).toBe(true); + // The member after the poison using-alias is reached — the class closed. + expect(result.nodes.some((n) => n.name === 'GetTimeSeconds')).toBe(true); + expect( + result.unresolvedReferences.find( + (r) => r.referenceKind === 'extends' && r.referenceName === 'UObject' + ) + ).toBeTruthy(); + }); + + it('blanks mid-line UMETA/UPARAM/UE_DEPRECATED with balanced parens, offset-preserving', () => { + const inp = `enum class EMode : uint8 {\n\tWalk UMETA(DisplayName="Walk (fast), safe"),\n\tRun\n};\n`; + const out = blankCppInlineAnnotationMacros(inp); + expect(out.length).toBe(inp.length); + expect(out).not.toContain('UMETA'); + expect(out).toContain('Walk'); + expect(out).toContain('Run'); + const inp2 = `void F(UPARAM(ref) int& x) {}\n`; + const out2 = blankCppInlineAnnotationMacros(inp2); + expect(out2.length).toBe(inp2.length); + expect(out2).not.toContain('UPARAM'); + expect(out2).toContain('int& x'); + }); + + it('does NOT touch source without those UE-only macro names', () => { + const c = 'enum class E { A, B };\nvoid metadata(int meta) { return; }\n'; + expect(blankCppInlineAnnotationMacros(c)).toBe(c); + }); + }); + describe('C++ forward declarations do not mint phantom class nodes (#1093)', () => { // `class Foo;` parses as a bodiless class_specifier. Repeated across headers, // each forward decl minted a phantom bodiless `class` node that crowded out — diff --git a/src/extraction/languages/c-cpp.ts b/src/extraction/languages/c-cpp.ts index 983d1a5c1..6dc57f5fe 100644 --- a/src/extraction/languages/c-cpp.ts +++ b/src/extraction/languages/c-cpp.ts @@ -384,11 +384,182 @@ export function blankMetalAttributes(source: string): string { return source.replace(METAL_ATTRIBUTE_RE, (m) => ' '.repeat(m.length)); } -/** C/C++ source pre-processing before tree-sitter: recover both macro-annotated - * class definitions and macro-prefixed function definitions — plus, for `.metal` +/** + * Blank annotation-style macro invocations that decorate a declaration but carry + * NO terminating semicolon — the pervasive Unreal-Engine reflection markup + * (`UPROPERTY(...)`, `UFUNCTION(...)`, `UCLASS(...)`, `GENERATED_BODY()`, + * `UE_DEPRECATED_FORGAME(...)`, `DECLARE_DELEGATE_*(...)`, …) that sits on its + * own line right before a member/type. tree-sitter's C++ grammar doesn't know + * these are macros, so each one drops into error recovery; in a big reflected + * class (`CharacterMovementComponent.h` has ~240 of them) the errors accumulate + * until the enclosing `class_specifier` can't close and collapses into an ERROR + * node — the whole class definition, its members, and its `extends` edges vanish + * from the graph. Neither `blankCppExportMacros` (class-header export macros) nor + * `blankCppInlineMacros` (return-type inline specifiers) touches these in-body + * markup macros. Replacing each with equal-length spaces preserves every byte + * offset (so line/column stay exact) and the class then parses normally. + * + * Deliberately name-list-FREE — UE alone has hundreds of such macros and projects + * add their own — so it keys on structure, not a curated list, matched tightly to + * avoid touching legitimate C++: + * - the macro must be the FIRST non-whitespace token on its line (`^[ \t]*`), + * which is where declaration markup lives — so a macro used inside an + * expression or condition (`if (CHECK(x))`, `x = MACRO(a) + b`) is never + * matched (it isn't line-leading); + * - the name must be ALL-CAPS (`[A-Z][A-Z0-9_]{2,}`), since ordinary + * function/type names called at line start are lower/mixed case; + * - the char after the balanced `(...)` must START A DECLARATION — a letter, + * `_`, `~` (destructor), or `#` (a following directive). Declaration markup is + * always followed by the thing it decorates (`UPROPERTY(...)\n float X;`, + * `UE_DEPRECATED(...) UPROPERTY(...)`), whereas a statement call is followed by + * `;` (`FOO(x);`), an init-list item by `,`/`{`, and an expression fragment by + * an operator (`MAKE(a) + 1`) — all rejected. String/char literals inside the + * args are skipped so an embedded `)` can't mis-close the balance. + * + * C++-only (wired into cppExtractor). A blanked macro inside a block comment is + * harmless (comments don't parse), and the rare line-leading no-semicolon + * ALL-CAPS call that isn't markup only loses that one annotation, never a whole + * class. + */ +export function blankCppAnnotationMacroCalls(source: string): string { + if (!/^[ \t]*[A-Z][A-Z0-9_]{2,}\s*\(/m.test(source)) return source; + const chars = source.split(''); + const re = /^([ \t]*)([A-Z][A-Z0-9_]{2,})(\s*)\(/gm; + let m: RegExpExecArray | null; + while ((m = re.exec(source)) !== null) { + const macroStart = m.index + (m[1] ?? '').length; // skip leading indent + let i = m.index + m[0].length - 1; // index of the opening '(' + let depth = 0; + let end = -1; + for (; i < source.length; i++) { + const c = source[i]; + if (c === '"' || c === "'") { + const quote = c; + i++; + while (i < source.length && source[i] !== quote) { + if (source[i] === '\\') i++; + i++; + } + continue; + } + if (c === '(') depth++; + else if (c === ')') { + depth--; + if (depth === 0) { end = i + 1; break; } + } + } + if (end < 0) continue; + let j = end; + while (j < source.length && /\s/.test(source[j] as string)) j++; + const after = source[j]; + // Only markup is followed by the declaration it decorates; a statement call + // (`;`), init-list item (`,`/`{`), or expression fragment (operator) is not. + if (!after || !/[A-Za-z_~#]/.test(after)) continue; + for (let k = macroStart; k < end; k++) { + if (chars[k] !== '\n' && chars[k] !== '\r') chars[k] = ' '; + } + re.lastIndex = end; + } + return chars.join(''); +} + +/** + * Blank an export/visibility macro sitting in front of a *member* or *method* + * declaration inside a class/namespace (`ENGINE_API virtual void Tick(…)`, + * `static ENGINE_API void AddReferencedObjects(…)`, `UE_API FVector GetVel() + * const`), before parsing. `blankCppExportMacros` only recovers the macro in a + * `class MACRO Name` *header*; the very same macro also prefixes almost every + * exported member of a big Unreal-Engine class, and tree-sitter — not knowing + * it's a macro — reads `MACRO (` as an extra type token and + * drops each such declaration into error recovery. In a heavily-exported header + * (`Actor.h`, `World.h`, …) hundreds of these accumulate: the return types pile + * up as orphan ERROR tokens and, combined with other markup, can still tip the + * enclosing class into collapse. Replacing the macro with equal-length spaces + * preserves every byte offset (line/column stay exact) and each member parses + * as an ordinary declaration. + * + * Matched tightly so it can't touch the same token used as a value + * (`int x = SOME_API;`, `if (mode == FOO_API)`): the token must be ALL-CAPS AND + * end in the conventional visibility-macro suffix `_API` / `_EXPORT` / `_ABI` + * (Unreal `*_API`, Qt/Boost `*_EXPORT`, LLVM `*_ABI`) — ordinary identifiers + * effectively never carry these suffixes — and must be immediately followed by + * whitespace then a declaration token (`\s+[A-Za-z_]`: a type, `virtual`, + * `static`, or the name). A value use is instead followed by `;`, `)`, `,`, + * `=`, `::`, or an operator, all of which fail the look-ahead. C++-only (wired + * into cppExtractor). + */ +const CPP_API_PREFIX_RE = /\b[A-Z][A-Z0-9_]*(?:_API|_EXPORT|_ABI)\b(?=\s+[A-Za-z_])/g; +export function blankCppApiPrefixMacros(source: string): string { + if (!/_(?:API|EXPORT|ABI)\b/.test(source)) return source; + return source.replace(CPP_API_PREFIX_RE, (m) => ' '.repeat(m.length)); +} + +/** + * Blank an Unreal-Engine annotation macro that appears MID-LINE (not + * line-leading, so `blankCppAnnotationMacroCalls` never sees it) inside a + * declaration: an enum value's `UMETA(DisplayName="…")`, a parameter's + * `UPARAM(ref)`, or a deprecation tag wedged into a `using`/member declaration + * (`using FOnNetTick UE_DEPRECATED(5.5, "…") = TMulticastDelegate;` + * in `World.h`, which otherwise collapses `UWorld`). tree-sitter can't reconcile + * these embedded macro calls and drops into error recovery, and a mid-line one + * inside a big enum or a class-scope `using` can cascade into the whole enum / + * class being lost. Replacing the entire `MACRO(...)` (balanced parens, string + * literals skipped so an embedded `)` can't mis-close) with equal-length spaces + * preserves every byte offset and the declaration parses normally. + * + * Keyed on an explicit UE-only name list (`UMETA`, `UPARAM`, and the + * `UE_DEPRECATED*` family) — these identifiers are exclusive to Unreal's + * reflection layer and appear in no standard-C++ or other-library code, so + * blanking them is zero-risk to non-UE sources. (The line-LEADING forms of + * `UE_DEPRECATED(...)` are already handled by `blankCppAnnotationMacroCalls`; + * this covers the mid-line forms it structurally can't.) C++-only. + */ +const CPP_INLINE_ANNOTATION_RE = /\b(?:UMETA|UPARAM|UE_DEPRECATED\w*)\s*\(/g; +export function blankCppInlineAnnotationMacros(source: string): string { + if (!/\b(?:UMETA|UPARAM|UE_DEPRECATED)/.test(source)) return source; + const chars = source.split(''); + const re = new RegExp(CPP_INLINE_ANNOTATION_RE.source, 'g'); + let m: RegExpExecArray | null; + while ((m = re.exec(source)) !== null) { + let i = m.index + m[0].length - 1; // index of the opening '(' + let depth = 0; + let end = -1; + for (; i < source.length; i++) { + const c = source[i]; + if (c === '"' || c === "'") { + const quote = c; + i++; + while (i < source.length && source[i] !== quote) { + if (source[i] === '\\') i++; + i++; + } + continue; + } + if (c === '(') depth++; + else if (c === ')') { + depth--; + if (depth === 0) { end = i + 1; break; } + } + } + if (end < 0) continue; + for (let k = m.index; k < end; k++) { + if (chars[k] !== '\n' && chars[k] !== '\r') chars[k] = ' '; + } + re.lastIndex = end; + } + return chars.join(''); +} + +/** C/C++ source pre-processing before tree-sitter: recover macro-annotated class + * definitions, macro-prefixed function definitions, macro-prefixed members, and + * macro-decorated members (Unreal-Engine reflection markup) — plus, for `.metal` * shaders (parsed with the C++ grammar), MSL attribute annotations. Offset-preserving. */ function preParseCppSource(source: string, filePath?: string): string { - const blanked = blankCppInlineMacros(blankCppExportMacros(source)); + const blanked = blankCppAnnotationMacroCalls( + blankCppInlineAnnotationMacros( + blankCppApiPrefixMacros(blankCppInlineMacros(blankCppExportMacros(source))) + ) + ); return filePath && filePath.toLowerCase().endsWith('.metal') ? blankMetalAttributes(blanked) : blanked;