Skip to content

name_segment_vocab silently drifts from renamed nodes — updateNode() never calls insertNameSegments(), and the only backfill never re-triggers #1141

Description

@inth3shadows

Summary

insertNode() (src/db/queries.ts:255-324) populates the new name_segment_vocab table (#1136) on
every insert. updateNode() (src/db/queries.ts:355-415) — used whenever a node's name is rewritten
after initial extraction — has no equivalent call. A renamed node's new name never gets indexed for
prose-search, and the only backfill mechanism can't catch it either (see below), so the gap is
permanent, not just until the next sync.

Root cause

// src/db/queries.ts:316-324 (insertNode)
// Segment vocabulary rides the same write path (and transaction) so it can
// never drift ahead of the nodes it describes. ...
if (node.kind !== 'file') this.insertNameSegments(node.name);
}

// src/db/queries.ts:355-415 (updateNode) — no insertNameSegments call anywhere in this method

The comment on insertNode asserts the vocab "can never drift ahead of the nodes it describes" —
true for insertNode's own path, but updateNode is a second, real write path to the same nodes
table that the comment doesn't account for.

Repro (traced directly, not run end-to-end against a real NestJS project)

  • src/resolution/frameworks/nestjs.ts:259-260: the NestJS route-prefixing pass computes
    applyModulePrefix(route, prefix) and returns the updated node when the name changed.
  • src/resolution/index.ts:277-287 (runPostExtract): for every framework resolver's returned
    node, calls this.queries.updateNode(node) — confirmed no insertNameSegments call on this path.
  • src/index.ts:457 and :571: runPostExtract() runs on every indexAll() and every sync()
    that touched files — this is the normal, routine code path for any NestJS repo, not a rare edge
    case.
  • The only backfill, rebuildNameSegmentVocab() (src/index.ts:1017), is gated on
    vocabWasEmpty — captured at src/index.ts:560, BEFORE that sync's own writes — and only runs
    if (vocabWasEmpty && nodes > 0) (src/index.ts:633). Once the vocab table has any rows at all
    (true immediately after the first indexAll, since insertNode populates it during extraction,
    before runPostExtract renames anything), vocabWasEmpty is false on every subsequent sync
    forever — so the backfill never re-runs to catch a post-rename gap, even on a repo that's been
    fully re-indexed since.

So concretely: index a NestJS repo → route nodes get segments under their pre-prefix name
("GET /") → runPostExtract renames them to the prefixed route ("GET /admin/users/:id") via
updateNode → the prefixed name has zero vocab rows, permanently, and the pre-prefix name's rows
become orphans (silently dropped by the honesty re-check in getSegmentMatches, per its own design,
so at least this doesn't surface wrong data — it just means the real route is unreachable via
prose search).

Impact

Prose-search coverage (the #1136 MEDIUM tier) permanently misses any framework-renamed symbol.
NestJS route-prefixing is the concrete case I traced; I haven't checked whether other framework
resolvers' postExtract implementations also rename nodes via updateNode (didn't audit every
resolver in src/resolution/frameworks/ for this).

Suggested fix

Move the segment-vocab write into updateNode() the same way insertNode() has it — but only when the
name actually changed, to avoid a redundant INSERT OR IGNORE call on the (presumably common) case of
an update that doesn't touch the name:

updateNode(node: Node): void {
  // ... existing update logic ...
  if (node.kind !== 'file') this.insertNameSegments(node.name);
}

insertNameSegments is already idempotent (checks the in-memory segmentedNames Set first, and the
underlying INSERT OR IGNORE), so calling it unconditionally on every updateNode wouldn't be
incorrect — just slightly wasteful on no-rename updates. Gating on a name-changed check (compare
against the previous name, or just always call it — the idempotency makes either safe) is a
maintainer call I don't have enough context to make confidently.

Verification / scope

  • Checked all 29 currently-open PRs' changed-file lists for src/db/queries.ts — 1 touches it
    (fix(db): chunk resolved reference deletes #1005, chunked resolved-reference deletes, hunk at line ~1731 — nowhere near updateNode at
    line 355).
  • Checked issues/PRs for "name_segment_vocab", "updateNode segment", "vocab drift" — nothing.
  • Did not run this end-to-end against a real NestJS project — the call chain above is traced by
    reading the code directly (grep + Read), not executed.

Environment

Found on main (tip e699ee9, v1.2.0).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions