BridgeJS: Optimize string encoding for JS-to-Swift crossings#10
Draft
krodak wants to merge 2 commits into
Draft
Conversation
0334cfa to
5e55f1c
Compare
Add a size-limited encoding cache (Map<string, Uint8Array>) in front of textEncoder.encode() for the ExportSwift parameter path and the ImportTS return path. Repeated strings skip encoding entirely on cache hit. No BridgeType changes. No Swift-side changes.
For arrays, struct fields, enum payloads, and dictionary entries, the stack ABI now retains the JS string directly (instead of encoding to a Uint8Array first) and passes the worst-case UTF-8 byte length via _maxUTF8Len() as buffer capacity. A new dedicated swift_js_init_memory_from_string import handles the string path - it encodes UTF-8 directly into the WASM buffer via encodeInto() and returns the actual byte count written. This avoids modifying the existing swift_js_init_memory contract. This eliminates the intermediate Uint8Array allocation for every string element in arrays and struct fields.
5e55f1c to
4d11610
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
String parameters crossing from JS to Swift go through
TextEncoder.encode()+ object heapretain/releaseon every call. This is measurably slow for repeated strings and for string arrays. Two independent optimizations target the two string-encoding paths in the generated JS glue, without touching BridgeType or the codegen structure.Related: swiftwasm#677, swiftwasm#700 (different approach - adds
JSStringas a new BridgeType; this PR avoids that)What changed
1. encoding cache for parameter and return paths (commit 1 - no Swift changes)
When JS passes a string to an exported Swift function (or returns a string from an imported JS function), the glue calls
textEncoder.encode(string)to get aUint8Array, then retains it in the object heap. The same string encoded 100k times means 100k allocations.A 4096-entry encoding cache (
Map<string, Uint8Array>) now sits in front oftextEncoder.encode(). Repeated strings get a cache hit and skip encoding. The cache uses JSMapinsertion-order semantics for O(1) eviction - delete-and-reinsert on hit, delete-first on eviction. 4096 entries covers realistic vocabularies without pathological eviction churn that smaller caches exhibit.Affected fragments:
stringLowerParameter,stringLowerReturn2. Direct string retain +
encodeInto()for the stack ABI (commit 2)Arrays, struct fields, enum payloads, and dictionary entries use the stack ABI, which encodes each string element independently. Instead of allocating a
Uint8Arrayper element, the JS glue now retains the JS string itself and passes a worst-case buffer capacity via_maxUTF8Len()(str.length * 3).A new dedicated
swift_js_init_memory_from_stringWASM import handles this path - it encodes UTF-8 directly into the WASM buffer viaencodeInto()and returns the actual byte count written. The existingswift_js_init_memoryis unchanged.On
str.length * 3: Each UTF-16 code unit can produce at most 3 UTF-8 bytes. Surrogate pairs (2 code units) produce 4 UTF-8 bytes, so the per-unit ratio stays <= 3. This is the standard worst-case estimate - wasm-bindgen usesstr.length * 3withencodeInto, thenreallocto shrink. Emscripten'sstringToUTF8Arraydocuments "at moststr.length*4+1bytes" (their*4accounts for code points rather than UTF-16 units,+1for null terminator). We don't need wasm-bindgen's shrink step because Swift'sString(unsafeUninitializedCapacity:)uses the returned byte count as the string length regardless of the buffer capacity.Affected fragments:
stackLowerFragmentfor.string/.rawValueEnum(_, .string)Benchmarks
100k iterations, Node.js v22, 15-run average:
StringRoundtrip/takeStringArrayRoundtrip/takeStringArrayArrayRoundtrip/roundtripStringArrayStringRoundtrip/makeStringArrayRoundtrip/makeStringArrayThe
make*benchmarks (Swift-to-JS direction) are unaffected - those paths already use direct memory reads viadecodeString(ptr, len).Independence of the two techniques
The two commits are independent. Commit 1 (cache) has zero Swift-side changes and can be cherry-picked alone for ~21% improvement on
takeString. Commit 2 adds a new WASM import (swift_js_init_memory_from_string) without modifying the existingswift_js_init_memory.Comparison with alternative approaches
We benchmarked three other approaches for comparison.
Approach: PR swiftwasm#700 JSString (new BridgeType)
PR swiftwasm#700 adds
JSStringas a newBridgeType. Users opt in per-parameter by changingStringtoJSString. It passes a JS object heap reference (single i32) with no encoding.takeStringtakeJSStringmakeStringmakeJSStringtakeStringArraytakeJSStringArraymakeJSStringArrayroundtripStringArrayroundtripJSStringArrayPR swiftwasm#700 wins on the
takedirection (no encoding at all), but has significant regressions on allmake/return paths due to object heap management overhead. It also only helps callers who rewrite their API surface fromStringtoJSString. This PR improves all existingStringusage transparently.Approach: switching String to Stack ABI
Per Yuta's suggestion in swiftwasm#677, we tested replacing the inline WASM params
(bytes: i32, length: i32)with the stack ABI (push to JS arrays, zero WASM params). Result: no measurable difference.StringRoundtrip/takeStringArrayRoundtrip/takeStringArrayThe bottleneck is encoding and heap management, not the parameter passing mechanism. Branch
string-opt/stack-abi-testhas this experiment.Files changed (excluding snapshots)
Sources/JavaScriptKit/BridgeJSIntrinsics.swift- new_swift_js_init_memory_from_stringimport;bridgeJSStackPopuses it; existing_swift_js_init_memoryunchangedPlugins/BridgeJS/Sources/BridgeJSLink/BridgeJSLink.swift- encoding cache preamble; newswift_js_init_memory_from_stringhandler;_maxUTF8LenhelperPlugins/BridgeJS/Sources/BridgeJSLink/JSGlueGen.swift-stringLowerParameterandstringLowerReturnuse cache;stackLowerFragmentuses direct retain +_maxUTF8Len; reserved variable namesPlugins/PackageToJS/Templates/instantiate.js- stub for newswift_js_init_memory_from_stringimport