feat(#132): SDLC artifacts and P1 implementation for smart codebase indexing#142
feat(#132): SDLC artifacts and P1 implementation for smart codebase indexing#142Bekovmi wants to merge 8 commits intocyberfabric:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces fragile line-number CPT indexing with a structural traceability index: adds ADRs, PRD/design/feature docs, a new trace_graph implementation (anchors, hashing, JSON cache, typed graph, session watcher), tests, validate CLI flags ( Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as Validate CLI
participant Cache as IndexCache ('.cypilot-cache/trace-index.json')
participant GitFS as Git / Filesystem
participant Scanner as scan_cpt_ids / parser
participant Graph as TraceGraph
participant Session as SessionIndex
CLI->>Cache: load (if --incremental)
CLI->>GitFS: query changed files (git_changed_files / mtimes)
CLI->>Scanner: scan stale files (or use precomputed_hits)
Scanner->>Graph: build/update nodes & edges
Graph-->>CLI: report validation results
CLI->>Cache: save updated entries
alt --watch and PASS
CLI->>Session: register files & start poll loop
Session->>GitFS: poll mtimes (interval)
GitFS-->>Session: detect changed file
Session->>Graph: refresh_file -> affected_by_change
Session-->>CLI: emit StaleNotification
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…mart codebase indexing SDLC artifacts for issue cyberfabric#132 (structural traceability graph): - PRD: add Section 5.3 with 4 new FRs (structural anchoring, incremental index, traceability graph, session sync) - ADR-0019: document 5 architectural decisions (stdlib ast, zero deps, JSON cache, custom TraceGraph, shadow-then-replace migration) - FEATURE: smart-indexing.md with 4 flows, 6 algorithms, 1 state machine, 4 definitions of done, 14 acceptance criteria - DECOMPOSITION: add entry 2.17 Smart Codebase Indexing with phase deps P1 implementation (structural anchoring + content hashing): - New module: trace_graph.py with StructuralAnchor, AnchoredHit, content hashing, heading-tree doc anchors, ast-based Python container detection, regex fallback for other languages - 27 new tests in test_trace_graph.py (all passing) - All 2162 existing tests still pass - cpt validate: PASS (0 errors, 0 warnings, 42 artifacts, 175/175 coverage) Also adds market research document for AI code quality landscape. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Dmitrii Bleklov <Bekovmi@users.noreply.github.com>
de08c09 to
00e60e5
Compare
…ity graph, session sync P2: Incremental diff-aware index - IndexCache class with mtime + content hash staleness detection - JSON persistence at .cypilot-cache/trace-index.json - --incremental flag on cpt validate: re-parse only changed files - git_changed_files() helper for narrowing scope via git diff - precomputed_hits parameter on cross_validate_artifacts() for cache bypass P3: Traceability graph - TraceGraph class with dual adjacency lists (forward + reverse) - NodeType enum: ARTIFACT, SECTION, DEFINITION, REFERENCE, CODE_BLOCK - EdgeType enum: CONTAINS, DEFINES, REFERENCES, IMPLEMENTS - Query methods: neighbors, reverse_neighbors, affected_by_change, definitions_for_id, references_for_id, implementations_for_id - build_trace_graph() bridge from flat defs_by_id/refs_by_id dicts P4: Real-time session sync - SessionIndex wrapping graph + cache with mtime polling - StaleNotification dataclass for change events - --watch flag on cpt validate: poll files and report stale markers - register_files(), check_for_changes(), refresh_file() methods Tests: 54 total (27 P1 + 27 P2-P4), all passing Full suite: 2822 passed, cpt validate: PASS (0 errors, 182/182 coverage) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Dmitrii Bleklov <Bekovmi@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
skills/cypilot/scripts/cypilot/utils/constraints.py (1)
1063-1078:⚠️ Potential issue | 🟠 MajorThis still rereads unchanged artifacts during "incremental" validation.
When
precomputed_hitsis present, onlyscan_cpt_ids()is skipped.heading_constraint_ids_by_line()/headings_by_line()still reopen and parse every artifact here, so cross-validation remains O(all artifacts) and misses the "re-parse only stale files" goal. Cache the normalized heading context alongside the hits, or gate heading loading behind checks that actually need it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/cypilot/scripts/cypilot/utils/constraints.py` around lines 1063 - 1078, The loop over artifacts still reparses headings for every artifact even when precomputed_hits exists; change the logic so that when precomputed_hits contains an entry for hkey you also load the normalized heading context from that cache (or store it there during index creation) instead of calling heading_constraint_ids_by_line/headings_by_line; alternatively, gate the heading parsing behind a conditional that checks if the artifact is stale (using the same key and metadata as scan_cpt_ids/precomputed_hits) so headings_cache[hkey] is only populated for changed files; update the code referencing precomputed_hits, hkey, headings_cache, heading_constraint_ids_by_line and headings_by_line to use the cached heading context when available.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/cypilot/scripts/cypilot/commands/validate.py`:
- Around line 799-815: The watch mode creates a fresh SessionIndex and only
registers file mtimes, so SessionIndex.graph remains empty and
check_for_changes() will never return affected IDs; after creating session and
calling session.register_files(watch_paths) (and before entering the while
loop), seed/populate SessionIndex.graph with the current dependency graph
derived from all_artifacts_for_cross (e.g., iterate all_artifacts_for_cross to
add nodes/edges or call the SessionIndex method that builds/loads the graph) so
that session.check_for_changes() can compute affected_ids and trigger
dependency-aware refreshes.
- Around line 450-463: The stale-file branch updates the IndexCache but doesn't
populate precomputed_hits, causing cross_validate_artifacts to re-scan files;
modify the block inside the for-loop that handles stale entries in validate.py
so after computing hits = scan_cpt_ids(art.path) and calling
_index_cache.update_entry(art.path, hits) you also set
precomputed_hits[str(art.path)] = hits (same keying used for non-stale entries)
so cross_validate_artifacts can reuse the freshly computed hits; reference
symbols: IndexCache, _index_cache, precomputed_hits, scan_cpt_ids, update_entry,
entries, hits, cross_validate_artifacts, art.path.
In `@skills/cypilot/scripts/cypilot/utils/trace_graph.py`:
- Around line 205-221: compute_code_containers_regex never clears the active
container, so a top-level declaration remains attached to later unrelated lines;
update the function to track the indentation of the last matched declaration
(use the match start or leading-space count from _FUNC_RE) and store it
alongside current_container, then on each line compute its indentation and: (1)
if a new _FUNC_RE match appears, update current_container and current_indent;
(2) otherwise if the line is non-blank and its indentation is less than or equal
to current_indent, clear current_container/current_indent so the scope ends;
also treat blank lines as not continuing a scope. Use the existing function name
compute_code_containers_regex and regex _FUNC_RE to locate the code to change.
- Around line 347-357: is_stale currently ignores the stored content_hash and
treats any mtime change as stale; change it to consult entry.content_hash: if no
entry -> return True; stat the path (handle OSError -> True); if current_mtime
== entry.mtime -> return False; otherwise read the file, compute the same
content hash used by update_entry (e.g., sha256 of bytes), compare it to
entry.content_hash — if hashes match, update entry.mtime to current_mtime and
return False (not stale), else return True (stale); ensure you reference
is_stale, update_entry, and entry.content_hash when implementing.
- Around line 70-84: The to_legacy_row method on AnchoredHit (def to_legacy_row)
omits the "type" field, breaking downstream validators that split on h["type"];
update to_legacy_row to include "type": self.type (or appropriate source such as
self.anchor.type if stored on anchor) in the returned dict so the legacy bridge
preserves the definition/reference classification alongside the existing keys
like "id", "line", and "headings".
---
Outside diff comments:
In `@skills/cypilot/scripts/cypilot/utils/constraints.py`:
- Around line 1063-1078: The loop over artifacts still reparses headings for
every artifact even when precomputed_hits exists; change the logic so that when
precomputed_hits contains an entry for hkey you also load the normalized heading
context from that cache (or store it there during index creation) instead of
calling heading_constraint_ids_by_line/headings_by_line; alternatively, gate the
heading parsing behind a conditional that checks if the artifact is stale (using
the same key and metadata as scan_cpt_ids/precomputed_hits) so
headings_cache[hkey] is only populated for changed files; update the code
referencing precomputed_hits, hkey, headings_cache,
heading_constraint_ids_by_line and headings_by_line to use the cached heading
context when available.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 005621a2-4782-4f8c-ba84-c5a34f37c3b0
📒 Files selected for processing (11)
.gitignorearchitecture/ADR/0019-cpt-cypilot-adr-structural-traceability-graph-v1.mdarchitecture/DECOMPOSITION.mdarchitecture/DESIGN.mdarchitecture/MARKET-RESEARCH.mdarchitecture/PRD.mdarchitecture/features/smart-indexing.mdskills/cypilot/scripts/cypilot/commands/validate.pyskills/cypilot/scripts/cypilot/utils/constraints.pyskills/cypilot/scripts/cypilot/utils/trace_graph.pytests/test_trace_graph.py
✅ Files skipped from review due to trivial changes (4)
- .gitignore
- architecture/ADR/0019-cpt-cypilot-adr-structural-traceability-graph-v1.md
- architecture/DECOMPOSITION.md
- architecture/PRD.md
🚧 Files skipped from review as they are similar to previous changes (1)
- architecture/DESIGN.md
- Store freshly scanned hits in precomputed_hits for stale files to avoid double-scanning in cross_validate_artifacts (validate.py) - Seed SessionIndex.graph with current defs/refs before entering watch mode so affected_ids are populated in stale notifications - Add missing 'type' field to AnchoredHit.to_legacy_row() for backward compatibility with downstream validators - Fix regex container tracking to exit scope when indentation returns to declaration level (prevents false container attribution) - Enhance IndexCache.is_stale() to verify content hash when mtime changes, so a plain touch does not force unnecessary reparse Tests: 56 trace_graph tests pass, 2824 total pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Dmitrii Bleklov <Bekovmi@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/cypilot/scripts/cypilot/commands/validate.py`:
- Around line 799-825: The watch-mode seed only builds the trace graph from
document hits and registers only artifact paths, so code files with `@cpt-`*
markers aren’t tracked; update the watch block (the SessionIndex/session setup
and build_trace_graph call) to 1) collect code-file refs alongside document refs
from the available code trace sources you already have in scope (e.g., pass code
refs into build_trace_graph in addition to _defs/_refs), and 2) include those
code file paths when calling session.register_files (add code paths to
watch_paths derived from your code-ref list or from all_artifacts_for_cross
equivalents) so SessionIndex sees edits to code files and triggers
dependency-aware notifications.
- Around line 447-467: The incremental path still reparses unchanged artifacts
because only precomputed_hits is passed to cross_validate_artifacts; instead, in
the cmd_validate flow build a single hits_by_path dict (use variable name
hits_by_path) when handling args.incremental by loading IndexCache
(IndexCache.load), calling scan_cpt_ids only for stale artifacts, updating the
cache via _index_cache.update_entry, and populating hits_by_path[hkey] for every
art; then pass and reuse this hits_by_path throughout the rest of cmd_validate
(so replace direct calls to scan_cpt_ids elsewhere and let functions like
cross_validate_artifacts and subsequent coverage/reference passes consult
hits_by_path) to ensure unchanged files are not reparsed.
In `@skills/cypilot/scripts/cypilot/utils/trace_graph.py`:
- Around line 206-229: The function compute_code_containers_regex currently uses
a single current_container and container_indent which causes outer containers to
be lost when entering/exiting nested scopes; replace that with a stack of
(indent, name) tuples: on a matched function/class (in
compute_code_containers_regex where _FUNC_RE.match(line) is used) push
(container_indent, container_name) onto the stack, and on non-blank lines
compute line_indent and pop entries while stack and stack[-1].indent >=
line_indent; then set current_container to stack[-1].name if stack else "" and
assign result[line_no] accordingly so unwinding nested scopes restores the
parent container instead of clearing it.
- Around line 693-726: register_files currently only seeds self._mtime_snapshot,
making SessionIndex mtime-driven; instead seed the content hash cache
(self.cache) for each Path so we can detect real content changes like
IndexCache.is_stale does. In register_files, read each file's bytes (or a
streamed hash) and populate self.cache with the file's hash alongside the
existing self._mtime_snapshot; in check_for_changes, when mtime differs, compute
the current content hash and call the same cache/is_stale logic (use
IndexCache.is_stale or the cache API you have) before creating a
StaleNotification from graph.affected_by_change; if the hash shows no content
change, update _mtime_snapshot but do not emit a notification, and if content
changed update self.cache and _mtime_snapshot and then emit the notification.
In `@tests/test_trace_graph.py`:
- Around line 492-501: The test_detect_change uses time.sleep(0.05) which is
flaky; replace it by explicitly bumping the file mtime (or polling) before
calling SessionIndex.check_for_changes. After writing the first content and
registering files with SessionIndex.register_files, capture the original mtime
(os.stat(f).st_mtime), then write the new content and call os.utime(f, (now,
original_mtime + 1)) or os.utime with ns to set the file's mtime to a value
strictly greater than the original; ensure you import os and remove the
time.sleep so SessionIndex.check_for_changes reliably sees the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fb647018-f5d9-4a50-bcb9-bdac18e7fec9
📒 Files selected for processing (3)
skills/cypilot/scripts/cypilot/commands/validate.pyskills/cypilot/scripts/cypilot/utils/trace_graph.pytests/test_trace_graph.py
- Use _cached_scan() for all scan_cpt_ids calls in validate.py - Include code files in watch mode for @cpt-* stale detection - Stack-based regex container for correct nested scope handling - Content-hash-aware SessionIndex (mtime gate + SHA-256 verify) - Deterministic tests with os.utime instead of time.sleep Tests: 58 trace_graph tests, 2826 total pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Dmitrii Bleklov <Bekovmi@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
tests/test_trace_graph.py (1)
277-298: Minor:FakeBlockshadows builtinidin constructor.The
idparameter inFakeBlock.__init__shadows the Python builtin. While this is a test helper and the impact is minimal, renaming toblock_idwould be cleaner.💡 Proposed fix
class FakeBlock: - def __init__(self, id, phase, inst, content): - self.id = id + def __init__(self, block_id, phase, inst, content): + self.id = block_id self.phase = phase self.inst = inst self.content = content blocks = [ - FakeBlock("cpt-test-flow-a", 1, "step-1", ("x = 1", "y = 2")), - FakeBlock("cpt-test-flow-a", 1, "step-2", ("z = 3",)), + FakeBlock(block_id="cpt-test-flow-a", phase=1, inst="step-1", content=("x = 1", "y = 2")), + FakeBlock(block_id="cpt-test-flow-a", phase=1, inst="step-2", content=("z = 3",)), ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_trace_graph.py` around lines 277 - 298, The test helper FakeBlock should not shadow the builtin id: change the __init__ signature from def __init__(self, id, phase, inst, content): to def __init__(self, block_id, phase, inst, content):, assign self.id = block_id inside FakeBlock, and update all instantiations in TestComputeCodeFingerprints (the two FakeBlock(...) calls) to pass the same values (first arg) unchanged so compute_code_fingerprints still receives the same block.id values.skills/cypilot/scripts/cypilot/commands/validate.py (2)
813-829: Graph seeding re-scans files instead of using_cached_scan.The watch mode initialization at lines 820-828 calls
_scan(_art.path)directly (imported asscan_cpt_ids) rather than using the_cached_scanhelper defined earlier. When--incremental --watchis used together, this duplicates scanning work for files that were already cached.💡 Proposed fix to reuse cached scans
- from ..utils.document import scan_cpt_ids as _scan _defs: dict = {} _refs: dict = {} for _art in all_artifacts_for_cross: - for _h in _scan(_art.path): + for _h in _cached_scan(_art.path): _hid = str(_h.get("id", ""))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/cypilot/scripts/cypilot/commands/validate.py` around lines 813 - 829, The graph-seeding loop is re-scanning artifact files by calling the imported scan function (_scan or scan_cpt_ids) instead of reusing the earlier caching helper; change the loop that iterates over all_artifacts_for_cross to call the existing _cached_scan helper (the one used by watch/incremental logic) for each artifact rather than _scan(_art.path), and feed the cached scan results into the same logic that builds _defs/_refs so session.graph = build_trace_graph(_defs, _refs) uses cached data; update any call sites inside the loop that assume a path string to instead handle the cached scan item shape returned by _cached_scan.
807-848: Consider logging the swallowed exception instead of silently passing.The bare
except Exception: passblock at lines 830-831 swallows all errors during graph seeding, which makes debugging difficult if something goes wrong. While graceful degradation is appropriate here, logging the exception would aid troubleshooting.💡 Proposed improvement
session.graph = build_trace_graph(_defs, _refs) - except Exception: - pass # Degrade gracefully — watch still detects file changes + except Exception as exc: + _sys.stderr.write(f"[watch] Warning: Could not seed trace graph: {exc}\n")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/cypilot/scripts/cypilot/commands/validate.py` around lines 807 - 848, The try/except that seeds session.graph currently swallows all exceptions; update the except to capture the Exception (e.g., "except Exception as e:") and emit the error details and stack trace to stderr instead of silently passing — locate the try block around imports of _xva/_scan, the loop that builds _defs/_refs, the call to build_trace_graph and the assignment session.graph, and replace the bare "except Exception: pass" with logic that writes traceback.format_exc() (or equivalent) to stderr so failures are logged while still allowing graceful degradation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/cypilot/scripts/cypilot/utils/trace_graph.py`:
- Around line 751-758: refresh_file currently updates cache and _mtime_snapshot
but leaves _hash_snapshot stale; modify refresh_file to recompute the file's
content hash using the same hashing helper used by check_for_changes (the
project's file-hash function, e.g., self._compute_hash or compute_file_hash) and
set self._hash_snapshot[str(path)] to that value after updating the cache, while
handling exceptions (like OSError or read errors) similarly to the existing
mtime handling so no new exceptions propagate.
---
Nitpick comments:
In `@skills/cypilot/scripts/cypilot/commands/validate.py`:
- Around line 813-829: The graph-seeding loop is re-scanning artifact files by
calling the imported scan function (_scan or scan_cpt_ids) instead of reusing
the earlier caching helper; change the loop that iterates over
all_artifacts_for_cross to call the existing _cached_scan helper (the one used
by watch/incremental logic) for each artifact rather than _scan(_art.path), and
feed the cached scan results into the same logic that builds _defs/_refs so
session.graph = build_trace_graph(_defs, _refs) uses cached data; update any
call sites inside the loop that assume a path string to instead handle the
cached scan item shape returned by _cached_scan.
- Around line 807-848: The try/except that seeds session.graph currently
swallows all exceptions; update the except to capture the Exception (e.g.,
"except Exception as e:") and emit the error details and stack trace to stderr
instead of silently passing — locate the try block around imports of _xva/_scan,
the loop that builds _defs/_refs, the call to build_trace_graph and the
assignment session.graph, and replace the bare "except Exception: pass" with
logic that writes traceback.format_exc() (or equivalent) to stderr so failures
are logged while still allowing graceful degradation.
In `@tests/test_trace_graph.py`:
- Around line 277-298: The test helper FakeBlock should not shadow the builtin
id: change the __init__ signature from def __init__(self, id, phase, inst,
content): to def __init__(self, block_id, phase, inst, content):, assign self.id
= block_id inside FakeBlock, and update all instantiations in
TestComputeCodeFingerprints (the two FakeBlock(...) calls) to pass the same
values (first arg) unchanged so compute_code_fingerprints still receives the
same block.id values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 98504dae-6f55-45e4-a27f-15e0a812d14f
📒 Files selected for processing (3)
skills/cypilot/scripts/cypilot/commands/validate.pyskills/cypilot/scripts/cypilot/utils/trace_graph.pytests/test_trace_graph.py
…t spurious notifications Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Dmitrii Bleklov <Bekovmi@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/cypilot/scripts/cypilot/utils/trace_graph.py`:
- Around line 293-315: _extract_section_text currently treats lines inside
fenced code as headings because it re-scans raw lines; update it to be
fence-aware (or use compute_doc_anchors/anchor map) so fenced code blocks are
ignored when finding section_start/section_end. Specifically, inside
_extract_section_text implement the same fenced-block state machine used by
compute_doc_anchors (detect fence open/close with backticks/tildes and ignore
heading regex _HEADING_RE while inside a fence) or alternatively locate the
target StructuralAnchor from the anchor map (compute_doc_anchors output) and
derive section bounds from that anchor's start/end, then return the joined lines
for that range so StructuralAnchor.content_hash becomes stable for docs with
code samples.
- Around line 343-350: FileIndexEntry.hits is being stored and restored as plain
dict rows, which loses AnchoredHit/StructuralAnchor state (heading_path,
container, content_hash) and causes cross_validate_artifacts() to fall back to
line-based rows; update the serialization/deserialization logic used by save()
and load() (and any helpers that populate FileIndexEntry.hits) to explicitly
serialize AnchoredHit objects (or their anchor state) and rehydrate them back
into AnchoredHit instances on load so that StructuralAnchor.heading_path,
container and content_hash are preserved before the hits are passed into
validate.py -> cross_validate_artifacts(); ensure any code that mutates or
appends to FileIndexEntry.hits (e.g., the functions around the current save/load
areas) uses AnchoredHit.to_dict/from_dict (or equivalent) rather than raw dict
rows.
- Around line 555-577: The traversal in affected_by_change starts from
CODE_BLOCK nodes (via nodes_for_file) but only follows reverse_edges
(reverse_neighbors) for EdgeType.REFERENCES and EdgeType.IMPLEMENTS, so it never
follows the forward IMPLEMENTS edge from a CODE_BLOCK to its DEFINITIONs; as a
result definitions and their dependents are missed. Update affected_by_change to
also traverse forward IMPLEMENTS edges (call neighbors(node.id,
EdgeType.IMPLEMENTS) or equivalent) when exploring a node so that CODE_BLOCK ->
DEFINITION links are followed, while keeping the existing reverse_neighbors(…,
EdgeType.REFERENCES) behavior; reference the affected_by_change function,
nodes_for_file, reverse_neighbors, neighbors, and EdgeType.IMPLEMENTS to locate
and change the traversal logic.
- Around line 719-727: When Path.stat() raises in the loop over
self._mtime_snapshot (the block handling p = Path(path_str)), remove that path
from both self._mtime_snapshot and self._hash_snapshot (and any related cache
entries) after appending the StaleNotification so the deleted/inaccessible file
is not repeatedly reported; i.e., on OSError catch, append the notification, pop
path_str from self._mtime_snapshot and self._hash_snapshot (or set a tombstone
state) and then continue. Ensure you reference the same keys (path_str) when
removing entries so snapshots stay consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4c0daf4f-fad9-422b-aced-6e20ae77f640
📒 Files selected for processing (1)
skills/cypilot/scripts/cypilot/utils/trace_graph.py
- Make _extract_section_text fence-aware so headings inside code blocks do not split sections and destabilize content hashes - Document FileIndexEntry cache design (plain dicts by-design for backward compat with cross_validate_artifacts) - Include untracked files in git_changed_files via git ls-files --others so newly created artifacts are picked up for incremental indexing - Fix affected_by_change to follow forward IMPLEMENTS/REFERENCES edges so code changes reach the definitions they implement and the docs that reference those definitions - Remove deleted files from SessionIndex snapshots after first notification to prevent spam on every subsequent poll cycle - Add tests: code->definition->referencing-doc traversal, deleted-file tombstone behavior Tests: 60 trace_graph tests, 2828 total pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Dmitrii Bleklov <Bekovmi@users.noreply.github.com>
…-code whitelist - Remove redundant if-condition and unused _xva import in validate.py - Remove unused os/time imports from trace_graph.py - Prefix unused headings_at param with underscore - Add check=False to subprocess.run calls in git_changed_files - Narrow broad Exception catch to (OSError, ValueError, KeyError) - Add vulture whitelist entries for trace_graph.py public API Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Dmitrii Bleklov <Bekovmi@users.noreply.github.com>
|
@Bekovmi it is possible to run the same CI locally, using |
| _defs.setdefault(_hid, []).append(_row) | ||
| else: | ||
| _refs.setdefault(_hid, []).append(_row) | ||
| session.graph = build_trace_graph(_defs, _refs) |
There was a problem hiding this comment.
Watch mode still does not propagate code-file edits into dependency-aware affected_ids.
This block now watches code files, but the graph is still seeded with:
build_trace_graph(_defs, _refs)
That means the watch graph contains document definitions/references only, but no CODE_BLOCK nodes for traced source files. When a watched code file changes, SessionIndex.check_for_changes() calls graph.affected_by_change(path), but the traversal starts from nodes_for_file(path). For code files, that set remains empty, so the notification is emitted without meaningful affected IDs.
Why it matters
This leaves the new watch-mode behavior only partially implemented: source-file edits are detected, but the code→spec propagation is missing, so dependency-aware refresh/reporting is incomplete.
Fix prompt
Invoke skill cypilot.
Use /cypilot-generate for skills/cypilot/scripts/cypilot/commands/validate.py.
Target the watch-mode block around session.graph = build_trace_graph(_defs, _refs).
Required fix:
- Build
code_refsfrom the already-available parsed code trace data in this validation pass. - Pass those refs into
build_trace_graph(_defs, _refs, code_refs=...). - Keep watched code paths and graph contents aligned so a changed source file produces non-empty
affected_idswhen it implements traced IDs.
Validate that:
- a changed code file is present in the watch set,
graph.nodes_for_file(code_path)returns code nodes,affected_by_change(code_path)reaches implemented definitions and referencing docs.
Review prompt
Invoke skill cypilot.
Use /cypilot-analyze to review the watch-mode implementation in skills/cypilot/scripts/cypilot/commands/validate.py.
Scope:
- the watch block that seeds
SessionIndex, - the data flow from parsed code traces into
build_trace_graph(...), - whether code-file edits produce correct
affected_ids, - whether changed code nodes reach implemented definitions and referencing artifacts.
Focus on:
- correctness of code→definition→reference traversal,
- consistency between watched paths and graph nodes,
- edge cases where source files are watched but absent from the graph,
- whether the implementation matches the intended dependency-aware session-sync behavior.
| hits = scan_cpt_ids(art.path) | ||
| hkey = str(art.path) | ||
| # Use precomputed hits when available (incremental index) | ||
| if precomputed_hits is not None and hkey in precomputed_hits: |
There was a problem hiding this comment.
--incremental still reparses heading context for unchanged artifacts, so it does not fully satisfy the “re-parse only changed files” behavior.
precomputed_hits avoids re-running scan_cpt_ids() for unchanged files, but this path still rebuilds heading context for every artifact through heading_constraint_ids_by_line(...) / headings_by_line(...). Those helpers reopen and rescan artifact content even when hits came from cache.
Why it matters
The feature is presented as incremental indexing, but unchanged artifacts are still re-read during heading reconstruction. That weakens the main performance benefit and makes the implementation diverge from the advertised behavior.
Fix prompt
Invoke skill cypilot.
Use /cypilot-generate for skills/cypilot/scripts/cypilot/utils/constraints.py and any directly related support code.
Target the heading-cache path in cross_validate_artifacts(...).
Required fix:
- Avoid recomputing heading context for unchanged artifacts when cached index data is reused.
- Either cache normalized heading context alongside hits, or gate heading reconstruction so it runs only for stale files.
- Preserve current validation behavior and compatibility with the existing pipeline.
Validate that:
- unchanged artifacts do not get rescanned for headings in incremental mode,
- changed artifacts still rebuild both hits and heading context correctly,
- validation results stay identical to the non-incremental path.
Review prompt
Invoke skill cypilot.
Use /cypilot-analyze to review the incremental validation path in skills/cypilot/scripts/cypilot/utils/constraints.py.
Scope:
cross_validate_artifacts(...),- interaction between
precomputed_hitsand heading reconstruction, - whether unchanged artifacts are still re-read in incremental mode,
- whether implementation behavior matches the feature claim of reparsing only changed files.
Focus on:
- hidden rescans of unchanged artifacts,
- correctness and compatibility of any heading-context caching strategy,
- risk of stale heading data,
- parity of validation results between incremental and full modes.
…shold - trace_graph.py: 89% -> 96% (+29 tests covering edge cases: fence-aware section extraction, OSError fallbacks, cache staleness paths, graph query methods, session hash updates) - validate.py: 87% -> 93% (+3 integration tests for --incremental, --watch, and combined flags) - Total: 2850 tests pass, vulture clean, pylint clean Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Dmitrii Bleklov <Bekovmi@users.noreply.github.com>
Summary
trace_graph.pywith 54 tests--incrementaland--watchflagsSDLC Artifacts Created/Updated
architecture/PRD.mdarchitecture/ADR/0019-...-v1.mdarchitecture/features/smart-indexing.mdarchitecture/DECOMPOSITION.mdarchitecture/DESIGN.mdarchitecture/MARKET-RESEARCH.mdImplementation (P1-P4)
P1: Structural Anchoring + Content Hashing
StructuralAnchor/AnchoredHitdataclasses withto_legacy_row()backward compatcompute_doc_anchors()— heading-tree path for markdown filescompute_py_containers()— AST-based container detection for.pyfilescompute_code_containers_regex()— regex fallback for non-Pythoncontent_hash()/hash_block_content()— SHA-256 content hashingP2: Incremental Diff-Aware Index
IndexCacheclass with mtime + content hash staleness detection.cypilot-cache/trace-index.json--incrementalflag oncpt validateprecomputed_hitsparameter oncross_validate_artifacts()P3: Traceability Graph
TraceGraphclass with dual adjacency lists (forward + reverse)NodeType/EdgeTypeenums for typed graphaffected_by_change(),neighbors(),reverse_neighbors()build_trace_graph()bridge from existing flat dictsP4: Real-Time Session Sync
SessionIndexwith mtime polling andStaleNotificationevents--watchflag oncpt validatefor live monitoringFiles Changed
skills/cypilot/scripts/cypilot/utils/trace_graph.pyskills/cypilot/scripts/cypilot/utils/constraints.pyprecomputed_hitsparam tocross_validate_artifacts()skills/cypilot/scripts/cypilot/commands/validate.py--incrementaland--watchflags with cache integrationtests/test_trace_graph.pyValidation
cpt validate: PASS (0 errors, 0 warnings, 42 artifacts, 182/182 coverage)pytest tests/: 2822 passed (all existing + 54 new)Test plan
cpt validatepasses with 0 errorspytest tests/test_trace_graph.py— 54 tests passpytest tests/— all existing tests still passcpt validate --incrementalworks (creates cache, second run is faster)Closes #132
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests
Chores