Skip to content

feat(#132): SDLC artifacts and P1 implementation for smart codebase indexing#142

Open
Bekovmi wants to merge 8 commits intocyberfabric:mainfrom
Bekovmi:feat/132-smart-indexing-sdlc-p1
Open

feat(#132): SDLC artifacts and P1 implementation for smart codebase indexing#142
Bekovmi wants to merge 8 commits intocyberfabric:mainfrom
Bekovmi:feat/132-smart-indexing-sdlc-p1

Conversation

@Bekovmi
Copy link
Copy Markdown

@Bekovmi Bekovmi commented Apr 1, 2026

Summary

SDLC Artifacts Created/Updated

File Action
architecture/PRD.md Added §5.3 Smart Indexing with 4 new FRs
architecture/ADR/0019-...-v1.md Created — 5 architectural decisions (stdlib ast, zero deps, JSON cache, custom TraceGraph, shadow migration)
architecture/features/smart-indexing.md Created — FEATURE spec (4 flows, 6 algos, 1 state machine, 4 DoDs, 14 acceptance criteria)
architecture/DECOMPOSITION.md Added entry 2.17 with phase dependencies
architecture/DESIGN.md Added ADR-0019 reference
architecture/MARKET-RESEARCH.md Market research for AI code quality landscape

Implementation (P1-P4)

P1: Structural Anchoring + Content Hashing

  • StructuralAnchor / AnchoredHit dataclasses with to_legacy_row() backward compat
  • compute_doc_anchors() — heading-tree path for markdown files
  • compute_py_containers() — AST-based container detection for .py files
  • compute_code_containers_regex() — regex fallback for non-Python
  • content_hash() / hash_block_content() — SHA-256 content hashing

P2: Incremental Diff-Aware Index

  • IndexCache class with mtime + content hash staleness detection
  • JSON persistence at .cypilot-cache/trace-index.json
  • --incremental flag on cpt validate
  • precomputed_hits parameter on cross_validate_artifacts()

P3: Traceability Graph

  • TraceGraph class with dual adjacency lists (forward + reverse)
  • NodeType / EdgeType enums for typed graph
  • Query methods: affected_by_change(), neighbors(), reverse_neighbors()
  • build_trace_graph() bridge from existing flat dicts

P4: Real-Time Session Sync

  • SessionIndex with mtime polling and StaleNotification events
  • --watch flag on cpt validate for live monitoring

Files Changed

File Description
skills/cypilot/scripts/cypilot/utils/trace_graph.py NEW — 700+ lines, all P1-P4 data structures and functions
skills/cypilot/scripts/cypilot/utils/constraints.py Added precomputed_hits param to cross_validate_artifacts()
skills/cypilot/scripts/cypilot/commands/validate.py Added --incremental and --watch flags with cache integration
tests/test_trace_graph.py NEW — 54 tests covering all P1-P4 functions

Validation

  • cpt validate: PASS (0 errors, 0 warnings, 42 artifacts, 182/182 coverage)
  • pytest tests/: 2822 passed (all existing + 54 new)
  • Semantic review: PASS WITH NOTES (0 critical, 0 high — 2 medium findings fixed)

Test plan

  • cpt validate passes with 0 errors
  • pytest tests/test_trace_graph.py — 54 tests pass
  • pytest tests/ — all existing tests still pass
  • cpt validate --incremental works (creates cache, second run is faster)
  • New FR IDs cross-referenced across PRD → ADR → FEATURE → DECOMPOSITION

Closes #132

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Smart Codebase Indexing: structural anchors, persistent incremental index, typed traceability graph, and session sync with live stale notifications; validate supports incremental and watch modes (--incremental, --watch).
  • Documentation

    • Added ADRs, design, PRD, feature spec, decomposition, and market research describing Smart Indexing and rollout.
  • Tests

    • Comprehensive tests for hashing, anchoring, indexing, graph queries, cache persistence, and session sync.
  • Chores

    • Simplified ignore rules to broadly ignore plan cache directories.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces fragile line-number CPT indexing with a structural traceability index: adds ADRs, PRD/design/feature docs, a new trace_graph implementation (anchors, hashing, JSON cache, typed graph, session watcher), tests, validate CLI flags (--incremental, --watch), updates cross-validation to accept precomputed hits, and consolidates a .gitignore rule for .bootstrap/.plans/.

Changes

Cohort / File(s) Summary
Architecture & Specs
architecture/ADR/0019-cpt-cypilot-adr-structural-traceability-graph-v1.md, architecture/DESIGN.md, architecture/DECOMPOSITION.md, architecture/PRD.md, architecture/MARKET-RESEARCH.md, architecture/features/smart-indexing.md
Added ADR, design, decomposition, PRD, market research, and feature spec describing structural anchors, per-block hashing, incremental JSON cache, typed traceability graph, session sync/watch, migration strategy, and acceptance criteria.
Indexing Implementation
skills/cypilot/scripts/cypilot/utils/trace_graph.py
New module implementing StructuralAnchor/AnchoredHit models, hashing utilities, markdown heading & Python AST container extraction with regex fallback, FileIndexEntry/IndexCache (mtime/hash staleness, JSON persistence, git diff helper), TraceGraph (node/edge types, forward/reverse adjacencies, graph queries), SessionIndex watcher and StaleNotification.
Validation CLI & Integration
skills/cypilot/scripts/cypilot/commands/validate.py, skills/cypilot/scripts/cypilot/utils/constraints.py
Added --incremental and --watch flags to validate; validate loads/saves IndexCache and uses precomputed_hits to avoid re-scanning stale artifacts; cross_validate_artifacts signature updated to accept optional precomputed_hits.
Tests
tests/test_trace_graph.py
New comprehensive tests covering hashing, heading/container extraction, anchored hits, cache persistence/staleness/round-trip, TraceGraph node/edge behaviors and queries, build_trace_graph, and SessionIndex change detection and notifications.
Configuration
.gitignore
Replaced several specific ignore patterns with a single rule ignoring the entire .bootstrap/.plans/ directory.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as Validate CLI
  participant Cache as IndexCache ('.cypilot-cache/trace-index.json')
  participant GitFS as Git / Filesystem
  participant Scanner as scan_cpt_ids / parser
  participant Graph as TraceGraph
  participant Session as SessionIndex

  CLI->>Cache: load (if --incremental)
  CLI->>GitFS: query changed files (git_changed_files / mtimes)
  CLI->>Scanner: scan stale files (or use precomputed_hits)
  Scanner->>Graph: build/update nodes & edges
  Graph-->>CLI: report validation results
  CLI->>Cache: save updated entries

  alt --watch and PASS
    CLI->>Session: register files & start poll loop
    Session->>GitFS: poll mtimes (interval)
    GitFS-->>Session: detect changed file
    Session->>Graph: refresh_file -> affected_by_change
    Session-->>CLI: emit StaleNotification
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

I hop through headings, roots and leaves,
Anchors snug where code believes. 🐇
Hashes hum and graphs grow wide,
Watchers blink when changes hide,
I nibble stale lines, stitch the guide. 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.87% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically describes the main change: implementing SDLC artifacts and Phase 1 of smart codebase indexing (issue #132), which aligns directly with the comprehensive changes across architecture, implementation, and tests.
Linked Issues check ✅ Passed The PR fully implements all P1–P4 objectives from issue #132: structural anchoring via AST and heading paths with content hashing (P1), incremental diff-aware indexing with cache (P2), typed traceability graph with node/edge queries (P3), and session sync with file watching (P4). All code requirements are met.
Out of Scope Changes check ✅ Passed All changes directly support the #132 objectives: architecture artifacts (PRD, ADR, specs, market research, decomposition), core implementation (trace_graph.py, validate.py, constraints.py), and comprehensive tests (test_trace_graph.py). No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…mart codebase indexing

SDLC artifacts for issue cyberfabric#132 (structural traceability graph):
- PRD: add Section 5.3 with 4 new FRs (structural anchoring, incremental
  index, traceability graph, session sync)
- ADR-0019: document 5 architectural decisions (stdlib ast, zero deps,
  JSON cache, custom TraceGraph, shadow-then-replace migration)
- FEATURE: smart-indexing.md with 4 flows, 6 algorithms, 1 state machine,
  4 definitions of done, 14 acceptance criteria
- DECOMPOSITION: add entry 2.17 Smart Codebase Indexing with phase deps

P1 implementation (structural anchoring + content hashing):
- New module: trace_graph.py with StructuralAnchor, AnchoredHit,
  content hashing, heading-tree doc anchors, ast-based Python container
  detection, regex fallback for other languages
- 27 new tests in test_trace_graph.py (all passing)
- All 2162 existing tests still pass
- cpt validate: PASS (0 errors, 0 warnings, 42 artifacts, 175/175 coverage)

Also adds market research document for AI code quality landscape.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Dmitrii Bleklov <Bekovmi@users.noreply.github.com>
@Bekovmi Bekovmi force-pushed the feat/132-smart-indexing-sdlc-p1 branch from de08c09 to 00e60e5 Compare April 1, 2026 09:44
…ity graph, session sync

P2: Incremental diff-aware index
- IndexCache class with mtime + content hash staleness detection
- JSON persistence at .cypilot-cache/trace-index.json
- --incremental flag on cpt validate: re-parse only changed files
- git_changed_files() helper for narrowing scope via git diff
- precomputed_hits parameter on cross_validate_artifacts() for cache bypass

P3: Traceability graph
- TraceGraph class with dual adjacency lists (forward + reverse)
- NodeType enum: ARTIFACT, SECTION, DEFINITION, REFERENCE, CODE_BLOCK
- EdgeType enum: CONTAINS, DEFINES, REFERENCES, IMPLEMENTS
- Query methods: neighbors, reverse_neighbors, affected_by_change,
  definitions_for_id, references_for_id, implementations_for_id
- build_trace_graph() bridge from flat defs_by_id/refs_by_id dicts

P4: Real-time session sync
- SessionIndex wrapping graph + cache with mtime polling
- StaleNotification dataclass for change events
- --watch flag on cpt validate: poll files and report stale markers
- register_files(), check_for_changes(), refresh_file() methods

Tests: 54 total (27 P1 + 27 P2-P4), all passing
Full suite: 2822 passed, cpt validate: PASS (0 errors, 182/182 coverage)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Dmitrii Bleklov <Bekovmi@users.noreply.github.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
skills/cypilot/scripts/cypilot/utils/constraints.py (1)

1063-1078: ⚠️ Potential issue | 🟠 Major

This still rereads unchanged artifacts during "incremental" validation.

When precomputed_hits is present, only scan_cpt_ids() is skipped. heading_constraint_ids_by_line() / headings_by_line() still reopen and parse every artifact here, so cross-validation remains O(all artifacts) and misses the "re-parse only stale files" goal. Cache the normalized heading context alongside the hits, or gate heading loading behind checks that actually need it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/cypilot/scripts/cypilot/utils/constraints.py` around lines 1063 -
1078, The loop over artifacts still reparses headings for every artifact even
when precomputed_hits exists; change the logic so that when precomputed_hits
contains an entry for hkey you also load the normalized heading context from
that cache (or store it there during index creation) instead of calling
heading_constraint_ids_by_line/headings_by_line; alternatively, gate the heading
parsing behind a conditional that checks if the artifact is stale (using the
same key and metadata as scan_cpt_ids/precomputed_hits) so headings_cache[hkey]
is only populated for changed files; update the code referencing
precomputed_hits, hkey, headings_cache, heading_constraint_ids_by_line and
headings_by_line to use the cached heading context when available.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/cypilot/scripts/cypilot/commands/validate.py`:
- Around line 799-815: The watch mode creates a fresh SessionIndex and only
registers file mtimes, so SessionIndex.graph remains empty and
check_for_changes() will never return affected IDs; after creating session and
calling session.register_files(watch_paths) (and before entering the while
loop), seed/populate SessionIndex.graph with the current dependency graph
derived from all_artifacts_for_cross (e.g., iterate all_artifacts_for_cross to
add nodes/edges or call the SessionIndex method that builds/loads the graph) so
that session.check_for_changes() can compute affected_ids and trigger
dependency-aware refreshes.
- Around line 450-463: The stale-file branch updates the IndexCache but doesn't
populate precomputed_hits, causing cross_validate_artifacts to re-scan files;
modify the block inside the for-loop that handles stale entries in validate.py
so after computing hits = scan_cpt_ids(art.path) and calling
_index_cache.update_entry(art.path, hits) you also set
precomputed_hits[str(art.path)] = hits (same keying used for non-stale entries)
so cross_validate_artifacts can reuse the freshly computed hits; reference
symbols: IndexCache, _index_cache, precomputed_hits, scan_cpt_ids, update_entry,
entries, hits, cross_validate_artifacts, art.path.

In `@skills/cypilot/scripts/cypilot/utils/trace_graph.py`:
- Around line 205-221: compute_code_containers_regex never clears the active
container, so a top-level declaration remains attached to later unrelated lines;
update the function to track the indentation of the last matched declaration
(use the match start or leading-space count from _FUNC_RE) and store it
alongside current_container, then on each line compute its indentation and: (1)
if a new _FUNC_RE match appears, update current_container and current_indent;
(2) otherwise if the line is non-blank and its indentation is less than or equal
to current_indent, clear current_container/current_indent so the scope ends;
also treat blank lines as not continuing a scope. Use the existing function name
compute_code_containers_regex and regex _FUNC_RE to locate the code to change.
- Around line 347-357: is_stale currently ignores the stored content_hash and
treats any mtime change as stale; change it to consult entry.content_hash: if no
entry -> return True; stat the path (handle OSError -> True); if current_mtime
== entry.mtime -> return False; otherwise read the file, compute the same
content hash used by update_entry (e.g., sha256 of bytes), compare it to
entry.content_hash — if hashes match, update entry.mtime to current_mtime and
return False (not stale), else return True (stale); ensure you reference
is_stale, update_entry, and entry.content_hash when implementing.
- Around line 70-84: The to_legacy_row method on AnchoredHit (def to_legacy_row)
omits the "type" field, breaking downstream validators that split on h["type"];
update to_legacy_row to include "type": self.type (or appropriate source such as
self.anchor.type if stored on anchor) in the returned dict so the legacy bridge
preserves the definition/reference classification alongside the existing keys
like "id", "line", and "headings".

---

Outside diff comments:
In `@skills/cypilot/scripts/cypilot/utils/constraints.py`:
- Around line 1063-1078: The loop over artifacts still reparses headings for
every artifact even when precomputed_hits exists; change the logic so that when
precomputed_hits contains an entry for hkey you also load the normalized heading
context from that cache (or store it there during index creation) instead of
calling heading_constraint_ids_by_line/headings_by_line; alternatively, gate the
heading parsing behind a conditional that checks if the artifact is stale (using
the same key and metadata as scan_cpt_ids/precomputed_hits) so
headings_cache[hkey] is only populated for changed files; update the code
referencing precomputed_hits, hkey, headings_cache,
heading_constraint_ids_by_line and headings_by_line to use the cached heading
context when available.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 005621a2-4782-4f8c-ba84-c5a34f37c3b0

📥 Commits

Reviewing files that changed from the base of the PR and between de08c09 and 3edb9c5.

📒 Files selected for processing (11)
  • .gitignore
  • architecture/ADR/0019-cpt-cypilot-adr-structural-traceability-graph-v1.md
  • architecture/DECOMPOSITION.md
  • architecture/DESIGN.md
  • architecture/MARKET-RESEARCH.md
  • architecture/PRD.md
  • architecture/features/smart-indexing.md
  • skills/cypilot/scripts/cypilot/commands/validate.py
  • skills/cypilot/scripts/cypilot/utils/constraints.py
  • skills/cypilot/scripts/cypilot/utils/trace_graph.py
  • tests/test_trace_graph.py
✅ Files skipped from review due to trivial changes (4)
  • .gitignore
  • architecture/ADR/0019-cpt-cypilot-adr-structural-traceability-graph-v1.md
  • architecture/DECOMPOSITION.md
  • architecture/PRD.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • architecture/DESIGN.md

Comment thread skills/cypilot/scripts/cypilot/commands/validate.py
Comment thread skills/cypilot/scripts/cypilot/commands/validate.py
Comment thread skills/cypilot/scripts/cypilot/utils/trace_graph.py
Comment thread skills/cypilot/scripts/cypilot/utils/trace_graph.py
Comment thread skills/cypilot/scripts/cypilot/utils/trace_graph.py Outdated
- Store freshly scanned hits in precomputed_hits for stale files to
  avoid double-scanning in cross_validate_artifacts (validate.py)
- Seed SessionIndex.graph with current defs/refs before entering watch
  mode so affected_ids are populated in stale notifications
- Add missing 'type' field to AnchoredHit.to_legacy_row() for backward
  compatibility with downstream validators
- Fix regex container tracking to exit scope when indentation returns
  to declaration level (prevents false container attribution)
- Enhance IndexCache.is_stale() to verify content hash when mtime
  changes, so a plain touch does not force unnecessary reparse

Tests: 56 trace_graph tests pass, 2824 total pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Dmitrii Bleklov <Bekovmi@users.noreply.github.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/cypilot/scripts/cypilot/commands/validate.py`:
- Around line 799-825: The watch-mode seed only builds the trace graph from
document hits and registers only artifact paths, so code files with `@cpt-`*
markers aren’t tracked; update the watch block (the SessionIndex/session setup
and build_trace_graph call) to 1) collect code-file refs alongside document refs
from the available code trace sources you already have in scope (e.g., pass code
refs into build_trace_graph in addition to _defs/_refs), and 2) include those
code file paths when calling session.register_files (add code paths to
watch_paths derived from your code-ref list or from all_artifacts_for_cross
equivalents) so SessionIndex sees edits to code files and triggers
dependency-aware notifications.
- Around line 447-467: The incremental path still reparses unchanged artifacts
because only precomputed_hits is passed to cross_validate_artifacts; instead, in
the cmd_validate flow build a single hits_by_path dict (use variable name
hits_by_path) when handling args.incremental by loading IndexCache
(IndexCache.load), calling scan_cpt_ids only for stale artifacts, updating the
cache via _index_cache.update_entry, and populating hits_by_path[hkey] for every
art; then pass and reuse this hits_by_path throughout the rest of cmd_validate
(so replace direct calls to scan_cpt_ids elsewhere and let functions like
cross_validate_artifacts and subsequent coverage/reference passes consult
hits_by_path) to ensure unchanged files are not reparsed.

In `@skills/cypilot/scripts/cypilot/utils/trace_graph.py`:
- Around line 206-229: The function compute_code_containers_regex currently uses
a single current_container and container_indent which causes outer containers to
be lost when entering/exiting nested scopes; replace that with a stack of
(indent, name) tuples: on a matched function/class (in
compute_code_containers_regex where _FUNC_RE.match(line) is used) push
(container_indent, container_name) onto the stack, and on non-blank lines
compute line_indent and pop entries while stack and stack[-1].indent >=
line_indent; then set current_container to stack[-1].name if stack else "" and
assign result[line_no] accordingly so unwinding nested scopes restores the
parent container instead of clearing it.
- Around line 693-726: register_files currently only seeds self._mtime_snapshot,
making SessionIndex mtime-driven; instead seed the content hash cache
(self.cache) for each Path so we can detect real content changes like
IndexCache.is_stale does. In register_files, read each file's bytes (or a
streamed hash) and populate self.cache with the file's hash alongside the
existing self._mtime_snapshot; in check_for_changes, when mtime differs, compute
the current content hash and call the same cache/is_stale logic (use
IndexCache.is_stale or the cache API you have) before creating a
StaleNotification from graph.affected_by_change; if the hash shows no content
change, update _mtime_snapshot but do not emit a notification, and if content
changed update self.cache and _mtime_snapshot and then emit the notification.

In `@tests/test_trace_graph.py`:
- Around line 492-501: The test_detect_change uses time.sleep(0.05) which is
flaky; replace it by explicitly bumping the file mtime (or polling) before
calling SessionIndex.check_for_changes. After writing the first content and
registering files with SessionIndex.register_files, capture the original mtime
(os.stat(f).st_mtime), then write the new content and call os.utime(f, (now,
original_mtime + 1)) or os.utime with ns to set the file's mtime to a value
strictly greater than the original; ensure you import os and remove the
time.sleep so SessionIndex.check_for_changes reliably sees the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fb647018-f5d9-4a50-bcb9-bdac18e7fec9

📥 Commits

Reviewing files that changed from the base of the PR and between 3edb9c5 and 64a1b75.

📒 Files selected for processing (3)
  • skills/cypilot/scripts/cypilot/commands/validate.py
  • skills/cypilot/scripts/cypilot/utils/trace_graph.py
  • tests/test_trace_graph.py

Comment thread skills/cypilot/scripts/cypilot/commands/validate.py
Comment thread skills/cypilot/scripts/cypilot/commands/validate.py
Comment thread skills/cypilot/scripts/cypilot/utils/trace_graph.py Outdated
Comment thread skills/cypilot/scripts/cypilot/utils/trace_graph.py
Comment thread tests/test_trace_graph.py
- Use _cached_scan() for all scan_cpt_ids calls in validate.py
- Include code files in watch mode for @cpt-* stale detection
- Stack-based regex container for correct nested scope handling
- Content-hash-aware SessionIndex (mtime gate + SHA-256 verify)
- Deterministic tests with os.utime instead of time.sleep

Tests: 58 trace_graph tests, 2826 total pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Dmitrii Bleklov <Bekovmi@users.noreply.github.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tests/test_trace_graph.py (1)

277-298: Minor: FakeBlock shadows builtin id in constructor.

The id parameter in FakeBlock.__init__ shadows the Python builtin. While this is a test helper and the impact is minimal, renaming to block_id would be cleaner.

💡 Proposed fix
     class FakeBlock:
-        def __init__(self, id, phase, inst, content):
-            self.id = id
+        def __init__(self, block_id, phase, inst, content):
+            self.id = block_id
             self.phase = phase
             self.inst = inst
             self.content = content
 
     blocks = [
-        FakeBlock("cpt-test-flow-a", 1, "step-1", ("x = 1", "y = 2")),
-        FakeBlock("cpt-test-flow-a", 1, "step-2", ("z = 3",)),
+        FakeBlock(block_id="cpt-test-flow-a", phase=1, inst="step-1", content=("x = 1", "y = 2")),
+        FakeBlock(block_id="cpt-test-flow-a", phase=1, inst="step-2", content=("z = 3",)),
     ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_trace_graph.py` around lines 277 - 298, The test helper FakeBlock
should not shadow the builtin id: change the __init__ signature from def
__init__(self, id, phase, inst, content): to def __init__(self, block_id, phase,
inst, content):, assign self.id = block_id inside FakeBlock, and update all
instantiations in TestComputeCodeFingerprints (the two FakeBlock(...) calls) to
pass the same values (first arg) unchanged so compute_code_fingerprints still
receives the same block.id values.
skills/cypilot/scripts/cypilot/commands/validate.py (2)

813-829: Graph seeding re-scans files instead of using _cached_scan.

The watch mode initialization at lines 820-828 calls _scan(_art.path) directly (imported as scan_cpt_ids) rather than using the _cached_scan helper defined earlier. When --incremental --watch is used together, this duplicates scanning work for files that were already cached.

💡 Proposed fix to reuse cached scans
-            from ..utils.document import scan_cpt_ids as _scan
             _defs: dict = {}
             _refs: dict = {}
             for _art in all_artifacts_for_cross:
-                for _h in _scan(_art.path):
+                for _h in _cached_scan(_art.path):
                     _hid = str(_h.get("id", ""))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/cypilot/scripts/cypilot/commands/validate.py` around lines 813 - 829,
The graph-seeding loop is re-scanning artifact files by calling the imported
scan function (_scan or scan_cpt_ids) instead of reusing the earlier caching
helper; change the loop that iterates over all_artifacts_for_cross to call the
existing _cached_scan helper (the one used by watch/incremental logic) for each
artifact rather than _scan(_art.path), and feed the cached scan results into the
same logic that builds _defs/_refs so session.graph = build_trace_graph(_defs,
_refs) uses cached data; update any call sites inside the loop that assume a
path string to instead handle the cached scan item shape returned by
_cached_scan.

807-848: Consider logging the swallowed exception instead of silently passing.

The bare except Exception: pass block at lines 830-831 swallows all errors during graph seeding, which makes debugging difficult if something goes wrong. While graceful degradation is appropriate here, logging the exception would aid troubleshooting.

💡 Proposed improvement
             session.graph = build_trace_graph(_defs, _refs)
-        except Exception:
-            pass  # Degrade gracefully — watch still detects file changes
+        except Exception as exc:
+            _sys.stderr.write(f"[watch] Warning: Could not seed trace graph: {exc}\n")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/cypilot/scripts/cypilot/commands/validate.py` around lines 807 - 848,
The try/except that seeds session.graph currently swallows all exceptions;
update the except to capture the Exception (e.g., "except Exception as e:") and
emit the error details and stack trace to stderr instead of silently passing —
locate the try block around imports of _xva/_scan, the loop that builds
_defs/_refs, the call to build_trace_graph and the assignment session.graph, and
replace the bare "except Exception: pass" with logic that writes
traceback.format_exc() (or equivalent) to stderr so failures are logged while
still allowing graceful degradation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/cypilot/scripts/cypilot/utils/trace_graph.py`:
- Around line 751-758: refresh_file currently updates cache and _mtime_snapshot
but leaves _hash_snapshot stale; modify refresh_file to recompute the file's
content hash using the same hashing helper used by check_for_changes (the
project's file-hash function, e.g., self._compute_hash or compute_file_hash) and
set self._hash_snapshot[str(path)] to that value after updating the cache, while
handling exceptions (like OSError or read errors) similarly to the existing
mtime handling so no new exceptions propagate.

---

Nitpick comments:
In `@skills/cypilot/scripts/cypilot/commands/validate.py`:
- Around line 813-829: The graph-seeding loop is re-scanning artifact files by
calling the imported scan function (_scan or scan_cpt_ids) instead of reusing
the earlier caching helper; change the loop that iterates over
all_artifacts_for_cross to call the existing _cached_scan helper (the one used
by watch/incremental logic) for each artifact rather than _scan(_art.path), and
feed the cached scan results into the same logic that builds _defs/_refs so
session.graph = build_trace_graph(_defs, _refs) uses cached data; update any
call sites inside the loop that assume a path string to instead handle the
cached scan item shape returned by _cached_scan.
- Around line 807-848: The try/except that seeds session.graph currently
swallows all exceptions; update the except to capture the Exception (e.g.,
"except Exception as e:") and emit the error details and stack trace to stderr
instead of silently passing — locate the try block around imports of _xva/_scan,
the loop that builds _defs/_refs, the call to build_trace_graph and the
assignment session.graph, and replace the bare "except Exception: pass" with
logic that writes traceback.format_exc() (or equivalent) to stderr so failures
are logged while still allowing graceful degradation.

In `@tests/test_trace_graph.py`:
- Around line 277-298: The test helper FakeBlock should not shadow the builtin
id: change the __init__ signature from def __init__(self, id, phase, inst,
content): to def __init__(self, block_id, phase, inst, content):, assign self.id
= block_id inside FakeBlock, and update all instantiations in
TestComputeCodeFingerprints (the two FakeBlock(...) calls) to pass the same
values (first arg) unchanged so compute_code_fingerprints still receives the
same block.id values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 98504dae-6f55-45e4-a27f-15e0a812d14f

📥 Commits

Reviewing files that changed from the base of the PR and between 64a1b75 and 356d914.

📒 Files selected for processing (3)
  • skills/cypilot/scripts/cypilot/commands/validate.py
  • skills/cypilot/scripts/cypilot/utils/trace_graph.py
  • tests/test_trace_graph.py

Comment thread skills/cypilot/scripts/cypilot/utils/trace_graph.py
…t spurious notifications

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Dmitrii Bleklov <Bekovmi@users.noreply.github.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/cypilot/scripts/cypilot/utils/trace_graph.py`:
- Around line 293-315: _extract_section_text currently treats lines inside
fenced code as headings because it re-scans raw lines; update it to be
fence-aware (or use compute_doc_anchors/anchor map) so fenced code blocks are
ignored when finding section_start/section_end. Specifically, inside
_extract_section_text implement the same fenced-block state machine used by
compute_doc_anchors (detect fence open/close with backticks/tildes and ignore
heading regex _HEADING_RE while inside a fence) or alternatively locate the
target StructuralAnchor from the anchor map (compute_doc_anchors output) and
derive section bounds from that anchor's start/end, then return the joined lines
for that range so StructuralAnchor.content_hash becomes stable for docs with
code samples.
- Around line 343-350: FileIndexEntry.hits is being stored and restored as plain
dict rows, which loses AnchoredHit/StructuralAnchor state (heading_path,
container, content_hash) and causes cross_validate_artifacts() to fall back to
line-based rows; update the serialization/deserialization logic used by save()
and load() (and any helpers that populate FileIndexEntry.hits) to explicitly
serialize AnchoredHit objects (or their anchor state) and rehydrate them back
into AnchoredHit instances on load so that StructuralAnchor.heading_path,
container and content_hash are preserved before the hits are passed into
validate.py -> cross_validate_artifacts(); ensure any code that mutates or
appends to FileIndexEntry.hits (e.g., the functions around the current save/load
areas) uses AnchoredHit.to_dict/from_dict (or equivalent) rather than raw dict
rows.
- Around line 555-577: The traversal in affected_by_change starts from
CODE_BLOCK nodes (via nodes_for_file) but only follows reverse_edges
(reverse_neighbors) for EdgeType.REFERENCES and EdgeType.IMPLEMENTS, so it never
follows the forward IMPLEMENTS edge from a CODE_BLOCK to its DEFINITIONs; as a
result definitions and their dependents are missed. Update affected_by_change to
also traverse forward IMPLEMENTS edges (call neighbors(node.id,
EdgeType.IMPLEMENTS) or equivalent) when exploring a node so that CODE_BLOCK ->
DEFINITION links are followed, while keeping the existing reverse_neighbors(…,
EdgeType.REFERENCES) behavior; reference the affected_by_change function,
nodes_for_file, reverse_neighbors, neighbors, and EdgeType.IMPLEMENTS to locate
and change the traversal logic.
- Around line 719-727: When Path.stat() raises in the loop over
self._mtime_snapshot (the block handling p = Path(path_str)), remove that path
from both self._mtime_snapshot and self._hash_snapshot (and any related cache
entries) after appending the StaleNotification so the deleted/inaccessible file
is not repeatedly reported; i.e., on OSError catch, append the notification, pop
path_str from self._mtime_snapshot and self._hash_snapshot (or set a tombstone
state) and then continue. Ensure you reference the same keys (path_str) when
removing entries so snapshots stay consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4c0daf4f-fad9-422b-aced-6e20ae77f640

📥 Commits

Reviewing files that changed from the base of the PR and between 356d914 and 7523a50.

📒 Files selected for processing (1)
  • skills/cypilot/scripts/cypilot/utils/trace_graph.py

Comment thread skills/cypilot/scripts/cypilot/utils/trace_graph.py
Comment thread skills/cypilot/scripts/cypilot/utils/trace_graph.py
Comment thread skills/cypilot/scripts/cypilot/utils/trace_graph.py
Comment thread skills/cypilot/scripts/cypilot/utils/trace_graph.py
Comment thread skills/cypilot/scripts/cypilot/utils/trace_graph.py
Bekovmi and others added 2 commits April 1, 2026 15:26
- Make _extract_section_text fence-aware so headings inside code blocks
  do not split sections and destabilize content hashes
- Document FileIndexEntry cache design (plain dicts by-design for
  backward compat with cross_validate_artifacts)
- Include untracked files in git_changed_files via git ls-files --others
  so newly created artifacts are picked up for incremental indexing
- Fix affected_by_change to follow forward IMPLEMENTS/REFERENCES edges
  so code changes reach the definitions they implement and the docs
  that reference those definitions
- Remove deleted files from SessionIndex snapshots after first
  notification to prevent spam on every subsequent poll cycle
- Add tests: code->definition->referencing-doc traversal, deleted-file
  tombstone behavior

Tests: 60 trace_graph tests, 2828 total pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Dmitrii Bleklov <Bekovmi@users.noreply.github.com>
…-code whitelist

- Remove redundant if-condition and unused _xva import in validate.py
- Remove unused os/time imports from trace_graph.py
- Prefix unused headings_at param with underscore
- Add check=False to subprocess.run calls in git_changed_files
- Narrow broad Exception catch to (OSError, ValueError, KeyError)
- Add vulture whitelist entries for trace_graph.py public API

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Dmitrii Bleklov <Bekovmi@users.noreply.github.com>
@ainetx
Copy link
Copy Markdown
Collaborator

ainetx commented Apr 2, 2026

@Bekovmi it is possible to run the same CI locally, using make ci

_defs.setdefault(_hid, []).append(_row)
else:
_refs.setdefault(_hid, []).append(_row)
session.graph = build_trace_graph(_defs, _refs)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watch mode still does not propagate code-file edits into dependency-aware affected_ids.

This block now watches code files, but the graph is still seeded with:

build_trace_graph(_defs, _refs)

That means the watch graph contains document definitions/references only, but no CODE_BLOCK nodes for traced source files. When a watched code file changes, SessionIndex.check_for_changes() calls graph.affected_by_change(path), but the traversal starts from nodes_for_file(path). For code files, that set remains empty, so the notification is emitted without meaningful affected IDs.

Why it matters

This leaves the new watch-mode behavior only partially implemented: source-file edits are detected, but the code→spec propagation is missing, so dependency-aware refresh/reporting is incomplete.

Fix prompt

Invoke skill cypilot.

Use /cypilot-generate for skills/cypilot/scripts/cypilot/commands/validate.py.

Target the watch-mode block around session.graph = build_trace_graph(_defs, _refs).

Required fix:

  1. Build code_refs from the already-available parsed code trace data in this validation pass.
  2. Pass those refs into build_trace_graph(_defs, _refs, code_refs=...).
  3. Keep watched code paths and graph contents aligned so a changed source file produces non-empty affected_ids when it implements traced IDs.

Validate that:

  • a changed code file is present in the watch set,
  • graph.nodes_for_file(code_path) returns code nodes,
  • affected_by_change(code_path) reaches implemented definitions and referencing docs.

Review prompt

Invoke skill cypilot.

Use /cypilot-analyze to review the watch-mode implementation in skills/cypilot/scripts/cypilot/commands/validate.py.

Scope:

  • the watch block that seeds SessionIndex,
  • the data flow from parsed code traces into build_trace_graph(...),
  • whether code-file edits produce correct affected_ids,
  • whether changed code nodes reach implemented definitions and referencing artifacts.

Focus on:

  • correctness of code→definition→reference traversal,
  • consistency between watched paths and graph nodes,
  • edge cases where source files are watched but absent from the graph,
  • whether the implementation matches the intended dependency-aware session-sync behavior.

hits = scan_cpt_ids(art.path)
hkey = str(art.path)
# Use precomputed hits when available (incremental index)
if precomputed_hits is not None and hkey in precomputed_hits:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--incremental still reparses heading context for unchanged artifacts, so it does not fully satisfy the “re-parse only changed files” behavior.

precomputed_hits avoids re-running scan_cpt_ids() for unchanged files, but this path still rebuilds heading context for every artifact through heading_constraint_ids_by_line(...) / headings_by_line(...). Those helpers reopen and rescan artifact content even when hits came from cache.

Why it matters

The feature is presented as incremental indexing, but unchanged artifacts are still re-read during heading reconstruction. That weakens the main performance benefit and makes the implementation diverge from the advertised behavior.

Fix prompt

Invoke skill cypilot.

Use /cypilot-generate for skills/cypilot/scripts/cypilot/utils/constraints.py and any directly related support code.

Target the heading-cache path in cross_validate_artifacts(...).

Required fix:

  1. Avoid recomputing heading context for unchanged artifacts when cached index data is reused.
  2. Either cache normalized heading context alongside hits, or gate heading reconstruction so it runs only for stale files.
  3. Preserve current validation behavior and compatibility with the existing pipeline.

Validate that:

  • unchanged artifacts do not get rescanned for headings in incremental mode,
  • changed artifacts still rebuild both hits and heading context correctly,
  • validation results stay identical to the non-incremental path.

Review prompt

Invoke skill cypilot.

Use /cypilot-analyze to review the incremental validation path in skills/cypilot/scripts/cypilot/utils/constraints.py.

Scope:

  • cross_validate_artifacts(...),
  • interaction between precomputed_hits and heading reconstruction,
  • whether unchanged artifacts are still re-read in incremental mode,
  • whether implementation behavior matches the feature claim of reparsing only changed files.

Focus on:

  • hidden rescans of unchanged artifacts,
  • correctness and compatibility of any heading-context caching strategy,
  • risk of stale heading data,
  • parity of validation results between incremental and full modes.

…shold

- trace_graph.py: 89% -> 96% (+29 tests covering edge cases: fence-aware
  section extraction, OSError fallbacks, cache staleness paths, graph
  query methods, session hash updates)
- validate.py: 87% -> 93% (+3 integration tests for --incremental,
  --watch, and combined flags)
- Total: 2850 tests pass, vulture clean, pylint clean

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Dmitrii Bleklov <Bekovmi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Smart codebase indexing: structural traceability graph instead of row/line-based index

2 participants