postprocess: Preprocess the C source code passed to transforms#1846
postprocess: Preprocess the C source code passed to transforms#1846thedataking wants to merge 3 commits into
Conversation
332513d to
c1d7cd9
Compare
19142b4 to
24d771a
Compare
24d771a to
e1ff226
Compare
8ee5007 to
9d8d679
Compare
The transpiler now emits an allow for clippy::missing_safety_doc, which shifts the expected function spans by one line.
Restructure *.c_decls.json from a flat identifier-to-snippet map into
{definitions: {ident: {definition, preprocessed_definition}}}.
The AST exporter runs an in-process preprocessor-only pass per
translation unit (clang -E -fdirectives-only -C equivalent): conditional
directives are resolved, dropping inactive regions and their comments,
while macro invocations and comments are preserved. Line markers map the
output back to each function definition's source range; the main file is
recognized by the spelling in the leading line marker, since marker
paths are relative to the compilation directory. Functions whose mapping
fails get a null preprocessed_definition.
…prompts Read both forms from the structured *.c_decls.json instead of running a clang subprocess per snippet. The CommentsTransform prompt includes the preprocessed text only when it differs from the original, so prompts (and llm-cache keys) are unchanged for directive-free functions; comment gating and response validation use the preprocessed text so comments in inactive preprocessor regions are not transferred.
9d8d679 to
9a5f17f
Compare
fw-immunant
left a comment
There was a problem hiding this comment.
Generally looks good; inline comments about a couple edge-cases and formatting in tests.
One larger question I have is how this interacts with configurability--is this an improvement for only the case where we fully remove all preprocessor logic (as opposed to lowering it to #[cfg]s with Hayroll?
| let line_no: u64 = rest.get(..digits_len)?.parse().ok()?; | ||
| let rest = rest[digits_len..].trim_start_matches([' ', '\t']); | ||
| let path = rest.strip_prefix('"')?; | ||
| let path = &path[..path.find('"')?]; |
There was a problem hiding this comment.
Nit: what about filenames containing quotes--are they escaped somehow, or should we prefer rfind over find? Will we ever need to unescape paths somehow before comparison to what the filesystem holds?
This code probably doesn't have to be absolutely bulletproof, but a comment here would be a good start.
| }, | ||
| "f": { | ||
| "definition": "void f(void){}", | ||
| "preprocessed_definition": "void f(void){}void g(void){}/*comment for h*/void h(void){}int d;int e;;;;;;;;int another;" |
There was a problem hiding this comment.
preprocessed_definition here contains other functions that follow on the same line. Does this pattern trip up the postprocessor?
|
|
||
| #[test] | ||
| fn test_c_decls_directives() { | ||
| let c_path = Path::new("tests/c_decls_snapshots/directives.c"); | ||
| transpile_with_c_decl_map_snapshot(c_path); | ||
| } |
There was a problem hiding this comment.
Should be sorted before test_c_decls_nh; each section in this file is alphabetically sorted.
The
CommentsTransformcan fail if presented with a C and Rust function pair where some comments are inside directives that are compiled out during preprocessing and thus should not be transferred to the Rust.We can simply run the C compiler on the C snippet and it will pick up compile commands (if any) and give us the preprocessed version back.