On-device speech, language and audio inference for iOS and macOS
on Apple Silicon. The SDK ships compiled CoreML and MLX engines
through HuggingFace, auto-detects the best backend per device
(ANE / GPU / CPU), and exposes a unified infer / infer_stream
API for every pipeline. No server in the hot path.
TheStageCore.xcframework/— pre-built SDK binary (ios-arm64+macos-arm64slices).Package.swift+Sources/TheStageSDK/— SwiftPM entry point for native Swift apps on iOS and macOS.import TheStageSDK.examples/macos_swift_tts/— minimal native-Swift streaming-TTS command-line demo (macOS, no Xcode). Start here.examples/tts_front_stream/— streaming neural TTS demo (Flutter, iPhone).examples/voice_agent/— full voice-assistant loop, mic → VAD → STT → LLM → streaming TTS (Flutter, iPhone).plugin/thestage_apple_sdk/— Flutter plugin over platform channels. iOS only for now.docs/— per-pipeline reference guides (LLM, Whisper, NeuTTS, VAD, Streaming, Voice Agent).scripts/setup.sh— one-time host setup (only needed for the Flutter examples).
A tiny native-Swift program that streams TTS straight to your speakers:
cd examples/macos_swift_tts
export TS_API_TOKEN=th_… # from app.thestage.ai
swift runThe first run downloads the NeuTTS engines from HuggingFace and caches them; subsequent runs start instantly. Output-only playback needs no microphone permission or entitlements. See examples/macos_swift_tts/README.md.
# 1. One-time host setup (xcframework symlink + secrets bootstrap).
# Idempotent — safe to re-run. (espeak is opt-in: --espeak, nano apps only.)
./scripts/setup.sh
# 2. Drop your API keys into the example you want to run.
cp examples/tts_front_stream/secrets.example.json \
examples/tts_front_stream/secrets.json
$EDITOR examples/tts_front_stream/secrets.jsonOpen examples/tts_front_stream/ios/Runner.xcodeproj in Xcode, select
the Runner target, and under Signing & Capabilities set your
Team and a unique Bundle Identifier. Then run on a device:
cd examples/tts_front_stream
flutter pub get
flutter run --release \
--dart-define-from-file=secrets.json \
-d <YOUR_IPHONE_DEVICE_ID>flutter devices lists attached devices. examples/voice_agent
follows the same recipe (it additionally needs OPENAI_API_KEY in its
secrets.json). See each example's README.md for app-specific notes.
| Requirement | Minimum | Tested with |
|---|---|---|
| macOS | 15.0 | 15.6 |
| iOS | 18.0 | 18.6 |
| Xcode | 16.0 | 26.1 |
| Swift | 6.0 | 6.2.1 |
| Flutter (only for the Flutter examples) | 3.24 | 3.38.7 |
| Dart | 3.5 | 3.10.7 |
| Hardware | Apple Silicon Mac or physical iPhone / iPad | — |
The Simulator is not supported — MLX requires Metal on real hardware. The Flutter plugin and the two Flutter example apps are iOS-only; native Swift via SwiftPM runs on both iOS and macOS.
You'll need a TheStage API token from
app.thestage.ai. It's validated once on
first model start, then runs offline (7-day grace window if the device
is briefly disconnected). For the Flutter path you also need a Flutter
toolchain (brew install flutter, then flutter config --enable-swift-package-manager).
In Xcode: File → Add Package Dependencies…, paste this repo's URL,
and add the TheStageSDK product to your target. Or in Package.swift:
.package(url: "https://github.com/TheStageAI/AppleSDK.git", from: "1.0.0")Then:
import TheStageSDK
let ai = TheStageAI.shared
try await ai.initialize(apiToken: "th_…")
// Construct any pipeline directly from an HF repo or local path.
// The same `on_load_progress` contract applies to all of them.
let llm = try await TheStageLLM(
engines_path: "TheStageAI/Qwen3-0.6B",
on_load_progress: { p in
print("[\(p.model)] \(p.phase) \(Int(p.fraction * 100))%")
}
)
let result = llm.infer(
prompt: "Give me a one-line haiku about Swift.",
max_new_tokens: 64
)
print(result.text)Every pipeline (TheStageLLM, WhisperPipeline,
NeuTTSMultilingualPipeline, NeuTTSNanoPipeline) shares the same
constructor shape. Prefer the singleton
TheStageAI.shared.start_model(...) / infer(model_name:input_json:)
flow when you want lifecycle and JSON dispatch (e.g. driving the SDK
from Flutter). Both flows share the same on-disk cache and the same
LoadProgress events.
The plugin bundles the native framework — nothing to build or link. Three steps:
1. Add the git: dependency in your app's pubspec.yaml, pinned to
a tag:
dependencies:
thestage_apple_sdk:
git:
url: https://github.com/TheStageAI/AppleSDK.git
path: plugin/thestage_apple_sdk
ref: v1.0.02. Configure the iOS project once: enable SwiftPM and set the deployment target to iOS 18.0+:
flutter config --enable-swift-package-manager
# then in Xcode: Runner target → General → Minimum Deployments → iOS 18.03. Use it (flutter pub get, then run on a physical device):
import 'package:thestage_apple_sdk/thestage_apple_sdk.dart';
await TheStageFlutterSDK.initialize(api_token: 'th_…');
await TheStageFlutterSDK.start_model(
model_name: 'llm',
engines_path: 'TheStageAI/Qwen3-0.6B',
);
final result = await TheStageFlutterSDK.infer(
model_name: 'llm',
input_json: {
'prompt': 'Give me a one-line haiku about Swift.',
'max_new_tokens': 64,
},
);
print(result[0]['text']);Full install notes, the voice-agent API and the audio player live in the
plugin README. The fastest way
to see a real app is to copy one of the examples/ apps.
Full API reference, with parallel Swift and Flutter examples for every
pipeline, lives under docs/:
- LLM —
TheStageLLM: Qwen2 / Qwen3 / Gemma3 chat with streaming, KV cache, chat-template auto-detect. - Whisper ASR — speech-to-text with automatic VAD chunking and long-audio stitching.
- NeuTTS — multilingual + Nano TTS, batch + push-based streaming.
- VAD —
SileroVAD: stateful per-chunk speech detection. - Streaming — TTS / LLM streaming patterns, back-pressure, sentence segmentation.
- Voice Agent —
TheStageVoiceAgent: end-to-end voice assistant with barge-in.
The Swift singleton (TheStageAI.shared) and the Flutter
TheStageFlutterSDK mirror each other one-to-one. Pipeline
constructors (TheStageLLM(...), WhisperPipeline(...), etc.) are
Swift-only — Dart consumers always go through the JSON path.
| Operation | Swift | Flutter (Dart) |
|---|---|---|
| Initialize | try await TheStageAI.shared.initialize(apiToken: "...") |
await TheStageFlutterSDK.initialize(api_token: '...') |
| Start a model | try await ai.start_model(model_name:engines_path:config:on_load_progress:) |
await TheStageFlutterSDK.start_model(model_name:, engines_path:, config:) |
| Stop a model | _ = try ai.stop_model(model_name: "llm") |
await TheStageFlutterSDK.stop_model(model_name: 'llm') |
| Single-shot inference | try ai.infer(model_name:input_json:) -> [[String: Any]] |
await TheStageFlutterSDK.infer(model_name:, input_json:) -> List<Map<String, dynamic>> |
| Streaming inference | try ai.infer_stream(model_name:input_json:) -> AsyncStream<InferenceStreamChunk> |
TheStageFlutterSDK.infer_stream(model_name:, input_json:, stream_id:?) -> Stream<Map<String, dynamic>> |
| Push text into a TTS stream | streamer.send(text); streamer.stop_stream() |
await TheStageFlutterSDK.send(stream_id:, text:); await TheStageFlutterSDK.finish_stream(stream_id:) |
| Cancel a running stream | streamer.stop_stream() |
await TheStageFlutterSDK.stop_stream(stream_id:) |
| Load progress | on_load_progress: LoadProgressHandler? on start_model / constructors |
Global stream TheStageFlutterSDK.on_progress ({model_name, phase, progress}) |
| Audio buffer type | [Float] |
Float32List (never Float64List) |
Note the one asymmetry that bites people: the Swift initializer is
initialize(apiToken:) (camelCase), while the Flutter call is
initialize(api_token:) (snake_case).
All public loaders accept an optional on_load_progress: LoadProgressHandler that fires through four phases with a monotonic
fraction in 0...1:
| Phase | Fraction band | Notes |
|---|---|---|
downloading |
0.00 – 0.70 | HuggingFace repo download (skipped on cache hit) |
extracting |
0.70 – 0.85 | Bundle unpack to local cache (skipped on cache hit) |
loading |
0.85 – 0.99 | Pipeline construction |
ready |
1.00 (terminal) | Emitted on success only |
The phase strings, fraction bands and terminal contract are identical on both surfaces. See docs/llm.md for the full event contract.
All audio crossing the public SDK surface uses PCM [Float], mono,
samples normalized to [-1.0, 1.0]. Sample rate depends on the
pipeline:
| Pipeline | Direction | Sample rate | Frame / chunking |
|---|---|---|---|
SileroVAD |
input | 16 000 Hz | exactly 512 samples per infer (32 ms); stateful |
WhisperPipeline |
input | 16 000 Hz | any length; auto-split into 10 s windows |
NeuTTSMultilingualPipeline / NeuTTSNanoPipeline |
output | 24 000 Hz | streamer emits per-sentence chunks; batch emits one full [Float] |
The mic stack runs at 16 kHz mono for VAD/ASR; TTS output is always
24 kHz. Rather than hardcoding it, read the rate from the pipeline
(tts.sample_rate) — see examples/macos_swift_tts.
The Flutter example apps read tokens at build time via
String.fromEnvironment(...) and --dart-define-from-file=secrets.json.
Each ships a secrets.example.json template — copy it to secrets.json
and fill in your keys. secrets.json is covered by .gitignore; real
keys never belong in source. The macOS example reads TS_API_TOKEN
from the environment instead.
See LICENSE.