Skip to content

postprocess: Preprocess the C source code passed to transforms#1846

Open
thedataking wants to merge 3 commits into
masterfrom
perl/postprocess-json-c-gpt-5.5-2026-04-23
Open

postprocess: Preprocess the C source code passed to transforms#1846
thedataking wants to merge 3 commits into
masterfrom
perl/postprocess-json-c-gpt-5.5-2026-04-23

Conversation

@thedataking

@thedataking thedataking commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

The CommentsTransform can fail if presented with a C and Rust function pair where some comments are inside directives that are compiled out during preprocessing and thus should not be transferred to the Rust.

We can simply run the C compiler on the C snippet and it will pick up compile commands (if any) and give us the preprocessed version back.

@thedataking thedataking requested a review from Crocodoctopus June 9, 2026 04:32
@thedataking thedataking force-pushed the perl/postprocess-json-c branch from 332513d to c1d7cd9 Compare June 9, 2026 08:19
@thedataking thedataking force-pushed the perl/postprocess-json-c-gpt-5.5-2026-04-23 branch 2 times, most recently from 19142b4 to 24d771a Compare June 9, 2026 09:18
@thedataking thedataking marked this pull request as draft June 9, 2026 18:43
@thedataking thedataking force-pushed the perl/postprocess-json-c-gpt-5.5-2026-04-23 branch from 24d771a to e1ff226 Compare June 10, 2026 09:34
@thedataking thedataking force-pushed the perl/postprocess-json-c-gpt-5.5-2026-04-23 branch 3 times, most recently from 8ee5007 to 9d8d679 Compare June 11, 2026 06:30
@thedataking thedataking marked this pull request as ready for review June 11, 2026 07:03
The transpiler now emits an allow for clippy::missing_safety_doc, which
shifts the expected function spans by one line.
Restructure *.c_decls.json from a flat identifier-to-snippet map into
{definitions: {ident: {definition, preprocessed_definition}}}.

The AST exporter runs an in-process preprocessor-only pass per
translation unit (clang -E -fdirectives-only -C equivalent): conditional
directives are resolved, dropping inactive regions and their comments,
while macro invocations and comments are preserved. Line markers map the
output back to each function definition's source range; the main file is
recognized by the spelling in the leading line marker, since marker
paths are relative to the compilation directory. Functions whose mapping
fails get a null preprocessed_definition.
…prompts

Read both forms from the structured *.c_decls.json instead of running a
clang subprocess per snippet. The CommentsTransform prompt includes the
preprocessed text only when it differs from the original, so prompts
(and llm-cache keys) are unchanged for directive-free functions; comment
gating and response validation use the preprocessed text so comments in
inactive preprocessor regions are not transferred.
@thedataking thedataking force-pushed the perl/postprocess-json-c-gpt-5.5-2026-04-23 branch from 9d8d679 to 9a5f17f Compare June 15, 2026 04:47
@thedataking thedataking changed the base branch from perl/postprocess-json-c to master June 15, 2026 04:48
@thedataking thedataking requested review from fw-immunant and removed request for Crocodoctopus June 15, 2026 04:48

@fw-immunant fw-immunant left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good; inline comments about a couple edge-cases and formatting in tests.

One larger question I have is how this interacts with configurability--is this an improvement for only the case where we fully remove all preprocessor logic (as opposed to lowering it to #[cfg]s with Hayroll?

let line_no: u64 = rest.get(..digits_len)?.parse().ok()?;
let rest = rest[digits_len..].trim_start_matches([' ', '\t']);
let path = rest.strip_prefix('"')?;
let path = &path[..path.find('"')?];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: what about filenames containing quotes--are they escaped somehow, or should we prefer rfind over find? Will we ever need to unescape paths somehow before comparison to what the filesystem holds?

This code probably doesn't have to be absolutely bulletproof, but a comment here would be a good start.

},
"f": {
"definition": "void f(void){}",
"preprocessed_definition": "void f(void){}void g(void){}/*comment for h*/void h(void){}int d;int e;;;;;;;;int another;"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preprocessed_definition here contains other functions that follow on the same line. Does this pattern trip up the postprocessor?

Comment on lines +599 to +604

#[test]
fn test_c_decls_directives() {
let c_path = Path::new("tests/c_decls_snapshots/directives.c");
transpile_with_c_decl_map_snapshot(c_path);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be sorted before test_c_decls_nh; each section in this file is alphabetically sorted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants