Skip to content

BridgeJS: Optimize string encoding for JS-to-Swift crossings#10

Draft
krodak wants to merge 2 commits into
mainfrom
kr/string-encoding-optimization
Draft

BridgeJS: Optimize string encoding for JS-to-Swift crossings#10
krodak wants to merge 2 commits into
mainfrom
kr/string-encoding-optimization

Conversation

@krodak
Copy link
Copy Markdown
Collaborator

@krodak krodak commented May 20, 2026

Overview

String parameters crossing from JS to Swift go through TextEncoder.encode() + object heap retain/release on every call. This is measurably slow for repeated strings and for string arrays. Two independent optimizations target the two string-encoding paths in the generated JS glue, without touching BridgeType or the codegen structure.

Related: swiftwasm#677, swiftwasm#700 (different approach - adds JSString as a new BridgeType; this PR avoids that)

What changed

1. encoding cache for parameter and return paths (commit 1 - no Swift changes)

When JS passes a string to an exported Swift function (or returns a string from an imported JS function), the glue calls textEncoder.encode(string) to get a Uint8Array, then retains it in the object heap. The same string encoded 100k times means 100k allocations.

A 4096-entry encoding cache (Map<string, Uint8Array>) now sits in front of textEncoder.encode(). Repeated strings get a cache hit and skip encoding. The cache uses JS Map insertion-order semantics for O(1) eviction - delete-and-reinsert on hit, delete-first on eviction. 4096 entries covers realistic vocabularies without pathological eviction churn that smaller caches exhibit.

Affected fragments: stringLowerParameter, stringLowerReturn

2. Direct string retain + encodeInto() for the stack ABI (commit 2)

Arrays, struct fields, enum payloads, and dictionary entries use the stack ABI, which encodes each string element independently. Instead of allocating a Uint8Array per element, the JS glue now retains the JS string itself and passes a worst-case buffer capacity via _maxUTF8Len() (str.length * 3).

A new dedicated swift_js_init_memory_from_string WASM import handles this path - it encodes UTF-8 directly into the WASM buffer via encodeInto() and returns the actual byte count written. The existing swift_js_init_memory is unchanged.

On str.length * 3: Each UTF-16 code unit can produce at most 3 UTF-8 bytes. Surrogate pairs (2 code units) produce 4 UTF-8 bytes, so the per-unit ratio stays <= 3. This is the standard worst-case estimate - wasm-bindgen uses str.length * 3 with encodeInto, then realloc to shrink. Emscripten's stringToUTF8Array documents "at most str.length*4+1 bytes" (their *4 accounts for code points rather than UTF-16 units, +1 for null terminator). We don't need wasm-bindgen's shrink step because Swift's String(unsafeUninitializedCapacity:) uses the returned byte count as the string length regardless of the buffer capacity.

Affected fragments: stackLowerFragment for .string / .rawValueEnum(_, .string)

Benchmarks

100k iterations, Node.js v22, 15-run average:

Benchmark Before After Change
StringRoundtrip/takeString 33.35 ms 26.48 ms -21%
ArrayRoundtrip/takeStringArray 162.35 ms 106.18 ms -35%
ArrayRoundtrip/roundtripStringArray 223.98 ms 158.87 ms -29%
StringRoundtrip/makeString 10.53 ms 10.29 ms neutral
ArrayRoundtrip/makeStringArray 59.07 ms 57.64 ms neutral

The make* benchmarks (Swift-to-JS direction) are unaffected - those paths already use direct memory reads via decodeString(ptr, len).

Independence of the two techniques

The two commits are independent. Commit 1 (cache) has zero Swift-side changes and can be cherry-picked alone for ~21% improvement on takeString. Commit 2 adds a new WASM import (swift_js_init_memory_from_string) without modifying the existing swift_js_init_memory.

Comparison with alternative approaches

We benchmarked three other approaches for comparison.

Approach: PR swiftwasm#700 JSString (new BridgeType)

PR swiftwasm#700 adds JSString as a new BridgeType. Users opt in per-parameter by changing String to JSString. It passes a JS object heap reference (single i32) with no encoding.

Benchmark Baseline This PR PR swiftwasm#700 JSString
takeString 36 ms 26 ms (-21%) 36 ms (no change)
takeJSString N/A N/A 18 ms (-50%)
makeString 10 ms 10 ms 10 ms
makeJSString N/A N/A 32 ms (3x slower)
takeStringArray 238 ms 106 ms (-35%) 241 ms (no change)
takeJSStringArray N/A N/A 131 ms (-45%)
makeJSStringArray N/A N/A 179 ms (2.5x slower)
roundtripStringArray 310 ms 159 ms (-29%) 313 ms (no change)
roundtripJSStringArray N/A N/A 162 ms (-48%)

PR swiftwasm#700 wins on the take direction (no encoding at all), but has significant regressions on all make/return paths due to object heap management overhead. It also only helps callers who rewrite their API surface from String to JSString. This PR improves all existing String usage transparently.

Approach: switching String to Stack ABI

Per Yuta's suggestion in swiftwasm#677, we tested replacing the inline WASM params (bytes: i32, length: i32) with the stack ABI (push to JS arrays, zero WASM params). Result: no measurable difference.

Benchmark Inline params (current) Stack ABI Change
StringRoundtrip/takeString 36.55 ms 37.31 ms +2.1% (noise)
ArrayRoundtrip/takeStringArray 230.75 ms 229.86 ms -0.4% (noise)

The bottleneck is encoding and heap management, not the parameter passing mechanism. Branch string-opt/stack-abi-test has this experiment.

Files changed (excluding snapshots)

  • Sources/JavaScriptKit/BridgeJSIntrinsics.swift - new _swift_js_init_memory_from_string import; bridgeJSStackPop uses it; existing _swift_js_init_memory unchanged
  • Plugins/BridgeJS/Sources/BridgeJSLink/BridgeJSLink.swift - encoding cache preamble; new swift_js_init_memory_from_string handler; _maxUTF8Len helper
  • Plugins/BridgeJS/Sources/BridgeJSLink/JSGlueGen.swift - stringLowerParameter and stringLowerReturn use cache; stackLowerFragment uses direct retain + _maxUTF8Len; reserved variable names
  • Plugins/PackageToJS/Templates/instantiate.js - stub for new swift_js_init_memory_from_string import

@krodak krodak force-pushed the kr/string-encoding-optimization branch 9 times, most recently from 0334cfa to 5e55f1c Compare May 20, 2026 13:43
krodak added 2 commits May 20, 2026 16:12
Add a size-limited encoding cache (Map<string, Uint8Array>) in front
of textEncoder.encode() for the ExportSwift parameter path and the
ImportTS return path. Repeated strings skip encoding entirely on
cache hit.

No BridgeType changes. No Swift-side changes.
For arrays, struct fields, enum payloads, and dictionary entries, the
stack ABI now retains the JS string directly (instead of encoding to a
Uint8Array first) and passes the worst-case UTF-8 byte length via
_maxUTF8Len() as buffer capacity.

A new dedicated swift_js_init_memory_from_string import handles the
string path - it encodes UTF-8 directly into the WASM buffer via
encodeInto() and returns the actual byte count written. This avoids
modifying the existing swift_js_init_memory contract.

This eliminates the intermediate Uint8Array allocation for every string
element in arrays and struct fields.
@krodak krodak force-pushed the kr/string-encoding-optimization branch from 5e55f1c to 4d11610 Compare May 20, 2026 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant