BridgeJS: Optimize string encoding for JS-to-Swift crossings by krodak · Pull Request #10 · PassiveLogic/JavaScriptKit

krodak · 2026-05-20T09:18:42Z

Overview

String parameters crossing from JS to Swift go through TextEncoder.encode() + object heap retain/release on every call. This is measurably slow for repeated strings and for string arrays. Two independent optimizations target the two string-encoding paths in the generated JS glue, without touching BridgeType or the codegen structure.

Related: swiftwasm#677, swiftwasm#700 (different approach - adds JSString as a new BridgeType; this PR avoids that)

What changed

1. encoding cache for parameter and return paths (commit 1 - no Swift changes)

When JS passes a string to an exported Swift function (or returns a string from an imported JS function), the glue calls textEncoder.encode(string) to get a Uint8Array, then retains it in the object heap. The same string encoded 100k times means 100k allocations.

A 4096-entry encoding cache (Map<string, Uint8Array>) now sits in front of textEncoder.encode(). Repeated strings get a cache hit and skip encoding. The cache uses JS Map insertion-order semantics for O(1) eviction - delete-and-reinsert on hit, delete-first on eviction. 4096 entries covers realistic vocabularies without pathological eviction churn that smaller caches exhibit.

Affected fragments: stringLowerParameter, stringLowerReturn

2. Direct string retain + encodeInto() for the stack ABI (commit 2)

Arrays, struct fields, enum payloads, and dictionary entries use the stack ABI, which encodes each string element independently. Instead of allocating a Uint8Array per element, the JS glue now retains the JS string itself and passes a worst-case buffer capacity via _maxUTF8Len() (str.length * 3).

A new dedicated swift_js_init_memory_from_string WASM import handles this path - it encodes UTF-8 directly into the WASM buffer via encodeInto() and returns the actual byte count written. The existing swift_js_init_memory is unchanged.

On str.length * 3: Each UTF-16 code unit can produce at most 3 UTF-8 bytes. Surrogate pairs (2 code units) produce 4 UTF-8 bytes, so the per-unit ratio stays <= 3. This is the standard worst-case estimate - wasm-bindgen uses str.length * 3 with encodeInto, then realloc to shrink. Emscripten's stringToUTF8Array documents "at most str.length*4+1 bytes" (their *4 accounts for code points rather than UTF-16 units, +1 for null terminator). We don't need wasm-bindgen's shrink step because Swift's String(unsafeUninitializedCapacity:) uses the returned byte count as the string length regardless of the buffer capacity.

Affected fragments: stackLowerFragment for .string / .rawValueEnum(_, .string)

Benchmarks

100k iterations, Node.js v22, 15-run average:

Benchmark	Before	After	Change
`StringRoundtrip/takeString`	33.35 ms	26.48 ms	-21%
`ArrayRoundtrip/takeStringArray`	162.35 ms	106.18 ms	-35%
`ArrayRoundtrip/roundtripStringArray`	223.98 ms	158.87 ms	-29%
`StringRoundtrip/makeString`	10.53 ms	10.29 ms	neutral
`ArrayRoundtrip/makeStringArray`	59.07 ms	57.64 ms	neutral

The make* benchmarks (Swift-to-JS direction) are unaffected - those paths already use direct memory reads via decodeString(ptr, len).

Independence of the two techniques

The two commits are independent. Commit 1 (cache) has zero Swift-side changes and can be cherry-picked alone for ~21% improvement on takeString. Commit 2 adds a new WASM import (swift_js_init_memory_from_string) without modifying the existing swift_js_init_memory.

Comparison with alternative approaches

We benchmarked three other approaches for comparison.

Approach: PR swiftwasm#700 JSString (new BridgeType)

PR swiftwasm#700 adds JSString as a new BridgeType. Users opt in per-parameter by changing String to JSString. It passes a JS object heap reference (single i32) with no encoding.

Benchmark	Baseline	This PR	PR swiftwasm#700 JSString
`takeString`	36 ms	26 ms (-21%)	36 ms (no change)
`takeJSString`	N/A	N/A	18 ms (-50%)
`makeString`	10 ms	10 ms	10 ms
`makeJSString`	N/A	N/A	32 ms (3x slower)
`takeStringArray`	238 ms	106 ms (-35%)	241 ms (no change)
`takeJSStringArray`	N/A	N/A	131 ms (-45%)
`makeJSStringArray`	N/A	N/A	179 ms (2.5x slower)
`roundtripStringArray`	310 ms	159 ms (-29%)	313 ms (no change)
`roundtripJSStringArray`	N/A	N/A	162 ms (-48%)

PR swiftwasm#700 wins on the take direction (no encoding at all), but has significant regressions on all make/return paths due to object heap management overhead. It also only helps callers who rewrite their API surface from String to JSString. This PR improves all existing String usage transparently.

Approach: switching String to Stack ABI

Per Yuta's suggestion in swiftwasm#677, we tested replacing the inline WASM params (bytes: i32, length: i32) with the stack ABI (push to JS arrays, zero WASM params). Result: no measurable difference.

Benchmark	Inline params (current)	Stack ABI	Change
`StringRoundtrip/takeString`	36.55 ms	37.31 ms	+2.1% (noise)
`ArrayRoundtrip/takeStringArray`	230.75 ms	229.86 ms	-0.4% (noise)

The bottleneck is encoding and heap management, not the parameter passing mechanism. Branch string-opt/stack-abi-test has this experiment.

Files changed (excluding snapshots)

Sources/JavaScriptKit/BridgeJSIntrinsics.swift - new _swift_js_init_memory_from_string import; bridgeJSStackPop uses it; existing _swift_js_init_memory unchanged
Plugins/BridgeJS/Sources/BridgeJSLink/BridgeJSLink.swift - encoding cache preamble; new swift_js_init_memory_from_string handler; _maxUTF8Len helper
Plugins/BridgeJS/Sources/BridgeJSLink/JSGlueGen.swift - stringLowerParameter and stringLowerReturn use cache; stackLowerFragment uses direct retain + _maxUTF8Len; reserved variable names
Plugins/PackageToJS/Templates/instantiate.js - stub for new swift_js_init_memory_from_string import

Add a size-limited encoding cache (Map<string, Uint8Array>) in front of textEncoder.encode() for the ExportSwift parameter path and the ImportTS return path. Repeated strings skip encoding entirely on cache hit. No BridgeType changes. No Swift-side changes.

For arrays, struct fields, enum payloads, and dictionary entries, the stack ABI now retains the JS string directly (instead of encoding to a Uint8Array first) and passes the worst-case UTF-8 byte length via _maxUTF8Len() as buffer capacity. A new dedicated swift_js_init_memory_from_string import handles the string path - it encodes UTF-8 directly into the WASM buffer via encodeInto() and returns the actual byte count written. This avoids modifying the existing swift_js_init_memory contract. This eliminates the intermediate Uint8Array allocation for every string element in arrays and struct fields.

krodak force-pushed the kr/string-encoding-optimization branch 9 times, most recently from 0334cfa to 5e55f1c Compare May 20, 2026 13:43

krodak added 2 commits May 20, 2026 16:12

krodak force-pushed the kr/string-encoding-optimization branch from 5e55f1c to 4d11610 Compare May 20, 2026 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BridgeJS: Optimize string encoding for JS-to-Swift crossings#10

BridgeJS: Optimize string encoding for JS-to-Swift crossings#10
krodak wants to merge 2 commits into
mainfrom
kr/string-encoding-optimization

krodak commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

krodak commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What changed

Benchmarks

Independence of the two techniques

Comparison with alternative approaches

Approach: PR swiftwasm#700 JSString (new BridgeType)

Approach: switching String to Stack ABI

Files changed (excluding snapshots)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

krodak commented May 20, 2026 •

edited

Loading