Add support for unparseable/encrypted files#309
Conversation
9e123c1 to
3d9735e
Compare
IEEE-1735 encrypted IP makes slang trip at the surrounding `endmodule` even though the lexer skips the protect envelope itself. Previously, that error in any one file aborted the entire `bender script --top X` invocation because session.cpp threw on the first per-file error. bender-slang now collects per-tree pass/fail status instead of throwing on parse errors. System-load failures (file unreadable) are still fatal — only parse errors are tolerated. Diagnostics are still surfaced to stderr per file so users can see exactly what slang complained about. A new failed_tree_indices() bridge function lets the Rust side learn which trees had errors. apply_slang_filters builds a force_kept_paths set from those indices and unions it with the reachability-derived kept_paths during file filtering, then prints a clear warning per file: "slang reported parse errors in <path>; preserving in script output regardless of reachability". The tolerance is on by default with loud per-file warnings so real syntax bugs still get noticed; users wanting strict behavior can read stderr and act on it. The dangling-reference case (parent instantiates a module slang can't see) is already handled in analysis.cpp's DFS, which silently skips refs missing from the symbol table. Fixture: tests/pickle has a new `encrypted` target with three files — encrypted_ip.sv (realistic IEEE-1735 envelope around a module body), encrypted_user.sv (instantiates encrypted_ip via a dangling ref), and encrypted_top.sv. The test confirms all three survive `--top encrypted_top` even though slang fails to parse encrypted_ip.sv. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iles Three follow-ups on the parse-error tolerance added in the previous commit: * script now prints a single summary warning at end of filtering: "warning: slang reported parse errors in N file(s); kept in script output: a.sv, b.sv" — instead of one stderr line per file, which got noisy on big designs. The diagnostic for each file still comes through (slang emits those itself). * script gains `--drop-unparseable`. Default behavior (keep) is unchanged. With the flag set, files slang couldn't parse are dropped from the output regardless of reachability; the summary warning's verb switches from "kept" to "dropped". Useful when the user knows the failures are real bugs in their RTL and wants the script to reflect that. * `--source-annotations` now emits `// UNPARSEABLE: slang reported parse errors` above each kept-but-failed file in the script, so a reader can spot them in the file list itself. Reuses the existing FileEntry.comment plumbing via a fresh `unparseable_comment` helper in emit_template. * pickle refuses on slang parse errors instead of silently emitting corrupt output: the slang printer reconstructs from the (partial) parse tree, and for encrypted protect envelopes that meant the base64 data block got mangled into something no downstream decrypt tool would accept. Pre-tolerance pickle threw automatically; now we check `failed_indices()` after parsing and surface a clear error pointing users at `bender script --top` for the file-list-only case. apply_slang_filters now returns the filtered groups alongside the set of kept-unparseable paths, which the caller threads into emit_template for the comment annotation. The retain predicate is unified into a single closure handling both the --top axis and the --drop-unparseable axis. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously, the parse-error tolerance was blanket: any file slang
couldn't parse was kept in the script output with a warning. That
masked genuine syntax bugs in user RTL.
Use slang's own taxonomy to discriminate. The lexer/preprocessor emit
`slang::diag::ProtectedEnvelope` (the [-Wprotected-envelope] warning)
every time they skip an IEEE-1735 protect envelope; real syntax bugs
never trigger it. We capture that signal per tree alongside the
existing parse-error bit, expose it via `protected_tree_indices()`,
and split the failed set into two:
* encrypted_paths — failed AND emitted ProtectedEnvelope. Always
tolerated (legal SystemVerilog that slang merely struggles to
digest fully).
* broken_paths — failed without ProtectedEnvelope. Looks like a
real syntax bug; aborts the run by default.
A new `--allow-broken` flag flips that abort into a warning, mirroring
how `--allow-broken` (and similar escape hatches) work elsewhere in
bender. The summary stderr now prints separate lines for encrypted vs
broken so users can tell apart automatic vs explicit tolerance.
bender-slang itself stays policy-free: parse_group is still lenient
(returns Ok on parse errors, throws only on system-load failures), and
the new protected_indices() complements the existing failed_indices().
Policy lives in `bender` (script.rs decides; pickle.rs continues to
refuse all parse errors since rewriting a partial tree corrupts the
output).
bender-slang/tests/basic.rs: the old `parse_invalid_file_returns_parse_error`
test was wrong after the previous tolerance commit and is updated to
the new contract — it asserts parse_group succeeds, failed_indices()
reports broken.sv, and protected_indices() is empty.
Fixture: tests/pickle/Bender.yml gains a `broken` target wired to the
existing tests/pickle/src/broken.sv. New script tests cover both
failure (default) and `--allow-broken` (kept with warning).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The changes makes sense in general. I have some ideas on how to make the slang interface cleaner. Particularly the |
fischeti
left a comment
There was a problem hiding this comment.
I refactored the bender-slang FFI now which looks cleaner now imo. A new ParsedTree type now bundles the actual syntax tree as well as parsed_ok and encrypted flags that can be probed easily. I also bumped the bender-slang version again.
The only remaining issue I have with it now are the --drop-unparseable and --allow-broken flags. For me it is not entirely clear how they differ resp. what the actual use cases are.
| /// Drop files slang could not parse (default: keep them) | ||
| #[cfg(feature = "slang")] | ||
| #[arg(long, global = true, help_heading = "General Script Options")] | ||
| pub drop_unparseable: bool, | ||
|
|
||
| /// Tolerate slang parse errors in non-encrypted files instead of failing | ||
| #[cfg(feature = "slang")] | ||
| #[arg(long, global = true, help_heading = "General Script Options")] | ||
| pub allow_broken: bool, | ||
|
|
There was a problem hiding this comment.
what's the difference between those two flags?
|
|
||
| /// Tolerate slang parse errors in non-encrypted files instead of failing | ||
| #[cfg(feature = "slang")] | ||
| #[arg(long, global = true, help_heading = "General Script Options")] |
There was a problem hiding this comment.
potentially, it could make sense to group the slang flags separately
No description provided.