Skip to content
This repository was archived by the owner on Jun 17, 2026. It is now read-only.

docs: add native C++ export engine architecture plan#678

Open
kaili-yang wants to merge 4 commits into
siddharthvaddem:mainfrom
kaili-yang:perf/export-phase1
Open

docs: add native C++ export engine architecture plan#678
kaili-yang wants to merge 4 commits into
siddharthvaddem:mainfrom
kaili-yang:perf/export-phase1

Conversation

@kaili-yang

@kaili-yang kaili-yang commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary

It's a blueprint and a draft about the export optimization. Most optimization methods are just industry-standard, safe, conservative plays. They’re fine for quick, low-risk iterations.
Adds docs/export-optimize-native-cpp-plan.md, a design document for a native C++ export engine intended to replace the current WebCodecs-based pipeline as the primary export path.
Welcome to improve it.

What's in the doc

  • Prior art — documents the three CapCut (https://www.capcut.com) export optimisation strategies this plan draws on: full-stack hardware acceleration, background pre-render cache, and on-demand trim-aware decode.
  • Why WebCodecs has a hard ceiling — explains the structural constraints (single-threaded serial loop, no GPU zero-copy, opaque HW encoder selection, real-time audio bottleneck) that cannot be addressed by incremental JS fixes.
  • Target architecture — a standalone openscreen-export-helper C++ binary following the same child-process pattern as the existing openscreen-screencapturekit-helper and openscreen-wgc-capture-helper. The helper owns the full decode → GPU composite → HW encode → mux pipeline while the renderer stays untouched.
  • Hardware acceleration stack — per-platform priority table covering VideoToolbox (macOS), NVENC / AMF / Quick Sync (Windows), VAAPI (Linux), and a libx264 software fallback.
  • Phased delivery roadmap — six milestones from skeleton encode (no effects) through full feature parity (cursor, webcam PiP, audio, GIF), each milestone independently shippable with a fallback to the existing WebCodecs path.

P1-C: Prefer hardware acceleration on all platforms including Windows.
Previously Windows tried software first to avoid known driver bugs, but
modern hardware encoders (NVENC, VideoToolbox, VAAPI) are 5–10× faster.
Software remains the fallback if hardware configure/encode throws.

P1-D: Thread encoder latency mode through VideoExporterConfig. Medium
and good quality presets now use latencyMode "realtime" which skips
encoder lookahead for ~3–5× faster encode throughput. Source quality
keeps "quality" mode for maximum compression efficiency.

P1-B: Reuse the Pixi TextureSource resource across frames instead of
destroying and recreating the GPU texture on every frame. Updates the
backing resource and calls source.update() to re-upload, eliminating
per-frame GPU alloc/free churn (~5–15 % overhead on long exports).
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

New draft doc outlines a standalone native C++ export helper process spawned from Electron, replacing the WebCodecs browser pipeline. Specifies hardware-acceleration stack (VideoToolbox/NVENC/AMF/Quick Sync/VAAPI), GPU zero-copy decode/composite/encode stages, JSON IPC contract, phased feature delivery, and performance targets (under ~15s for typical 1080p with acceleration).

Changes

Native C++ Export Helper Design Plan

Layer / File(s) Summary
Rationale & Architecture
docs/export-optimize-native-cpp-plan.md (lines 1–77)
Problem statement (WebCodecs performance ceiling and browser-process limitations), prior art context, and the proposed architecture: separate helper process for isolation, parallelism, and direct OS API access. Specifies platform-specific hardware acceleration backends (VideoToolbox, NVENC, AMF, Quick Sync, VAAPI with FFmpeg/software fallbacks) and emphasizes GPU zero-copy as the cross-stage optimization requirement.
Technical Pipeline & IPC Contract
docs/export-optimize-native-cpp-plan.md (lines 80–126)
Functional pipeline breakdown: decode (GOP-level skipping within trim regions), composite (multi-pass GPU shader assembly and baked shadow reuse), encode (overlapped with composite), offline audio (time-stretch and re-encode), and MP4 muxing with faststart. JSON-based CLI input and newline-delimited progress events on stdout; SIGTERM-based cancellation and TypeScript fallback to WebCodecs for unsupported configs.
Build, Deployment & Feature Roadmap
docs/export-optimize-native-cpp-plan.md (lines 129–157)
CMake build conventions, per-platform/arch packaging, FFmpeg static-link subset, macOS code-signing/notarization. Phased delivery milestones from skeleton (basic decode→pass-through→encode) through effects, cursor overlay, webcam PiP+audio, and GIF output. Performance targets: sub-15s for typical 2-minute 1080p with hardware acceleration, with software fallback parity to current WebCodecs path.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes


Poem

A helper process rises, GPU-bound and fleet,
Leaving WebCodecs behind on the browser's back street.
Decode, composite, encode—each stage flows,
From trim region to faststart MP4.
Near real-time exports, where CapCut goes.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'docs: add native C++ export engine architecture plan' clearly and specifically describes the main change—adding a design document for a new native C++ export engine.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description effectively communicates the purpose, motivation, and scope of the change, but deviates from the template structure by omitting several standard sections.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/export-optimize-native-cpp-plan.md`:
- Around line 121-122: Update the docs to make stdin the canonical input channel
instead of a large JSON command-line argument: replace the line describing Input
as “a single JSON object passed as a command-line argument” with language that
the exporter reads the full JSON job description from stdin (and that argv
should only contain a tiny bootstrap token/flags), and keep Output described as
newline-delimited JSON progress events on stdout (`ready`, `progress`, `done`,
`error`); explicitly call out that large payloads (trim maps, cursor/effects)
must be provided via stdin to avoid argv length limits.
- Around line 46-49: The fenced diagram block showing the process boundary (the
three-line block that starts with "Electron Renderer  ──IPC──►  Electron Main 
──spawn──►  openscreen-export-helper") should include a language tag (e.g., use
```text) to satisfy markdownlint MD040; update that fenced code block to begin
with a language identifier like "text" so the diagram is properly tagged.
- Around line 123-126: Define and implement an explicit cancellation/error
contract for the native helper invoked by NativeExporter: ensure that on SIGTERM
or any encoding/muxing error the helper stops the pipeline, cleans up temporary
artifacts (partial files, temp dirs), and exits non‑zero; only on successful
completion write output to a temp path and publish the final file via an atomic
rename/move into the target path. Update NativeExporter (the TypeScript wrapper
around VideoExporterConfig → helper JSON) to pass a temp-output path, listen for
helper exit codes/events, delete temp artifacts on non‑zero/error/SIGTERM, and
only surface success callbacks (and call onProgress finalization) after the
atomic rename; keep VideoExporter (WebCodecs) as the documented fallback for
systems where the helper binary is unavailable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 48382e63-d821-4b99-84b9-82c4ce9d2e50

📥 Commits

Reviewing files that changed from the base of the PR and between d2dd44a and fb57d5d.

📒 Files selected for processing (1)
  • docs/export-optimize-native-cpp-plan.md

Comment on lines +46 to +49
```
Electron Renderer ──IPC──► Electron Main ──spawn──► openscreen-export-helper
(React / UI) (Node.js) (C++ encode engine)
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Nit: add a language tag to fenced diagram block.

markdownlint MD040 is valid here; use something like ```text for the process-boundary diagram. nit, but cleaner CI/docs hygiene.

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 46-46: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/export-optimize-native-cpp-plan.md` around lines 46 - 49, The fenced
diagram block showing the process boundary (the three-line block that starts
with "Electron Renderer  ──IPC──►  Electron Main  ──spawn──► 
openscreen-export-helper") should include a language tag (e.g., use ```text) to
satisfy markdownlint MD040; update that fenced code block to begin with a
language identifier like "text" so the diagram is properly tagged.

Comment on lines +121 to +122
- **Input**: a single JSON object passed as a command-line argument describing the full export job (paths, effects, quality, trim, cursor data, etc.)
- **Output**: newline-delimited JSON progress events on stdout (`ready`, `progress`, `done`, `error`)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Input transport is inconsistent and lowkey risky at scale.

Doc says “stdin/stdout JSON contract” but then defines input as a JSON command-line arg. For large jobs (trim maps, cursor/effects payloads), argv length limits (especially on Windows) can fail export startup. recommend making stdin the canonical input channel and keeping argv to a tiny bootstrap token only.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/export-optimize-native-cpp-plan.md` around lines 121 - 122, Update the
docs to make stdin the canonical input channel instead of a large JSON
command-line argument: replace the line describing Input as “a single JSON
object passed as a command-line argument” with language that the exporter reads
the full JSON job description from stdin (and that argv should only contain a
tiny bootstrap token/flags), and keep Output described as newline-delimited JSON
progress events on stdout (`ready`, `progress`, `done`, `error`); explicitly
call out that large payloads (trim maps, cursor/effects) must be provided via
stdin to avoid argv length limits.

Comment on lines +123 to +126
- **Cancellation**: SIGTERM

The JS side (`NativeExporter`) wraps this in a thin TypeScript class that translates the existing `VideoExporterConfig` into the helper's JSON format and maps progress events back to the `onProgress` callback. The `VideoExporter` (WebCodecs) remains as a fallback for systems where the helper binary is unavailable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cancellation contract needs explicit cleanup + atomic output semantics.

SIGTERM-only is kinda under-specified for long-running encode/mux. Please define required behavior on cancel/error: stop pipeline, delete temp artifacts, and only publish output via atomic rename on success. otherwise users can end up with corrupt/partial files that look “done-ish”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/export-optimize-native-cpp-plan.md` around lines 123 - 126, Define and
implement an explicit cancellation/error contract for the native helper invoked
by NativeExporter: ensure that on SIGTERM or any encoding/muxing error the
helper stops the pipeline, cleans up temporary artifacts (partial files, temp
dirs), and exits non‑zero; only on successful completion write output to a temp
path and publish the final file via an atomic rename/move into the target path.
Update NativeExporter (the TypeScript wrapper around VideoExporterConfig →
helper JSON) to pass a temp-output path, listen for helper exit codes/events,
delete temp artifacts on non‑zero/error/SIGTERM, and only surface success
callbacks (and call onProgress finalization) after the atomic rename; keep
VideoExporter (WebCodecs) as the documented fallback for systems where the
helper binary is unavailable.

- **Background pre-render cache.** While the user scrubs the timeline and previews effects, CapCut silently renders affected segments into a low-bitrate segment cache. When export is triggered, cached segments are assembled directly without re-rendering — reducing the export to a mux-and-encode pass over pre-computed frames.
- **On-demand decode.** Only frames that survive trim boundaries are decoded. The demuxer seeks to the nearest keyframe before each active segment and skips the rest at the packet level, so a 10-minute source with a 30-second active region decodes approximately 30 seconds of video, not 10 minutes.

The architecture proposed here applies the first and third strategies directly. The second (segment pre-render cache) is a longer-term addition that can layer on top once the core C++ pipeline is in place.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why C++? I'm not against it, I see no argument in favor or against it.

Comment on lines +64 to +76
The helper selects the best available backend at runtime, in priority order:

| Platform | Decode | Composite | Encode |
|---|---|---|---|
| macOS (Apple Silicon) | VideoToolbox | Metal compute | VideoToolbox (H.264 / HEVC) |
| macOS (Intel) | VideoToolbox | Metal compute | VideoToolbox |
| Windows (NVIDIA) | NVDEC | D3D11 compute | NVENC |
| Windows (AMD) | AMF decoder | D3D11 compute | AMF encoder |
| Windows (Intel) | Quick Sync | D3D11 compute | Quick Sync |
| Linux | VAAPI / NVDEC | OpenGL compute | VAAPI / NVENC |
| All (fallback) | FFmpeg software | CPU | libx264 / libx265 |

The critical optimisation at each stage is **GPU zero-copy**: the decoded frame lives on a GPU surface, the compositor reads and writes GPU textures, and the encoder consumes the GPU surface directly — no pixel data crosses the CPU bus until the final muxed file is written to disk.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using an abstraction like libavcodec library from FFmpeg project? that already maintain all these backend layer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the C++ helper does use FFmpeg (libavcodec / libavformat / libswscale). FFmpeg is the abstraction layer over every hardware backend listed in this document — VideoToolbox, NVENC, AMF, Quick Sync, VAAPI. The question is not whether to use FFmpeg, but where to run it.

There are two realistic ways to run FFmpeg-backed code from an Electron app:

Option A — WebAssembly (ffmpeg.wasm). Compile FFmpeg to WASM and run it inside the renderer or main process. This is a real project and works for simple transcodes. The problem is that WASM runs inside the browser's sandbox, which means it cannot open a VideoToolbox session, cannot acquire an NVENC encoder context, cannot use D3D11VA or VAAPI, and cannot share GPU surfaces with the compositor. Every frame would be a CPU copy. WASM is also single-threaded by default; SharedArrayBuffer threads are available but cannot call native OS APIs. For a pipeline whose entire performance argument is GPU zero-copy and hardware encode, WASM erases the benefit entirely.

Option B — Native process or addon. Compile FFmpeg into a native binary (or N-API addon) that runs outside the browser sandbox with full OS API access. This is what this document proposes, and it is exactly how CapCut, DaVinci Resolve, and every other professional desktop video tool works.

So the stack is: C++ process → libavcodec (FFmpeg) → platform HW API (VideoToolbox / NVENC / VAAPI). FFmpeg is not an alternative to this plan; it is a core dependency of it. The C++ layer exists specifically to host FFmpeg outside the sandbox where it can actually reach the hardware.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants