Kotlin Multiplatform (KMP) podcast player app targeting Android and iOS. The key differentiating feature is automatic music detection that reverts playback speed to 1x during music sections, then returns to the user's chosen speed for speech — solving a real pain point for music-heavy podcasts.
# Compile shared code for Android
./gradlew :composeApp:compileDebugKotlinAndroid
# Build the Android app
./gradlew :composeApp:assembleDebug
# Install on a connected device
./gradlew :composeApp:installDebug
# iOS targets are disabled on non-macOS.
# On macOS: open MusiCast/iosApp/iosApp.xcodeproj in Xcode, or
# ./gradlew :composeApp:compileKotlinIosArm64
# ./gradlew :composeApp:podInstall # regenerates the Podfile + installs TensorFlowLiteCRequirements: Android SDK at path in local.properties (sdk.dir=...), JDK 17+, Gradle 8.10+. For iOS builds: Xcode + CocoaPods on macOS.
:composeApp— single KMP module combining all business logic, Compose Multiplatform UI, and the Android application shell. Source sets:commonMain,androidMain,iosMain.
- expect/actual for platform code:
AudioPlayer,PcmDecoder,EpisodeDownloader,DatabaseDriverFactory - Interface + Koin for platform code with asymmetric constructors:
YamNetClassifier(Android needsContext, iOS does not) - Koin DI — configured in
di/CommonModule.kt+ platform modules - SQLDelight — schema in
composeApp/src/commonMain/sqldelight/com/musicast/musicast/db/PodcastDatabase.sq - Compose Multiplatform — single UI codebase in
ui/package - StateFlow — all state management via Kotlin coroutines flows
The core feature lives in composeApp/src/commonMain/kotlin/com/musicast/musicast/audio/ and uses Google's YAMNet model running on-device via TensorFlow Lite for accurate speech/music classification.
PcmDecoder(platform-specific) — streams raw PCM audio in chunks (mono, 16kHz)StreamingWindowBuffer— accumulates PCM into 0.975-second windows (15,600 samples)YamNetClassifier(platform-specific) — runs YAMNet TFLite inference per window, returns 521 AudioSet class scoresSegmentClassifier— maps YAMNet scores to SPEECH/MUSIC, applies median filter + segment mergingMusicDetector— orchestrates the pipeline, returnsList<AudioSegment>
YAMNet is a pre-trained audio classification model (521 classes from Google AudioSet). Speech-related class scores (indices 0-5, 12) are summed against music-related scores (indices 24-31 singing, 132-276 instruments/genres) to determine each window's label.
The pipeline uses streaming processing to avoid OOM on long episodes. The TFLite model file (yamnet.tflite, ~16MB) is bundled in composeApp/src/androidMain/assets/ for Android and in iosApp/iosApp/yamnet.tflite for iOS.
Segments are persisted in the segments_data column as "startMs:endMs:TYPE;..." format.
PlaybackManager— central coordinator. ManagesuserSpeedvscurrentSpeed, checks segments at current position, auto-adjusts speed. Saves playback position every 10 seconds.- Android: Media3 ExoPlayer (
AndroidAudioPlayer) - iOS: AVFoundation AVPlayer (
IosAudioPlayer)
The Android app integrates with the system media infrastructure via Media3's MediaSessionService:
PlaybackService(composeApp/src/androidMain/kotlin/com/musicast/musicast/PlaybackService.kt) —MediaSessionServicesubclass that manages theMediaSession, foreground notification, and custom notification actions.- Lock screen / notification controls: Custom button layout — skip back 15s, play/pause, skip forward 30s, speed toggle (renders current speed like "1.5x" as a dynamically generated bitmap icon via
IconCompat.createWithBitmap()). SpeedAwareNotificationProvider— customDefaultMediaNotificationProvidersubclass that overridesaddNotificationActions()to handle bothSessionCommand(custom actions) andplayerCommand(play/pause) buttons.- Audio focus: ExoPlayer configured with
setAudioAttributes()+handleAudioFocus=true— starting playback automatically pauses other media apps. - Samsung Now Bar / Dynamic Island: Requires the service to be truly foreground.
MainActivitycallsstartForegroundService()when playback starts, triggeringonStartCommand()which callsstartForeground()with aMediaStylenotification linked to the session. TheMediaSessionalso needssetSessionActivity()with aPendingIntentso Samsung can resolve the tap target. - Metadata: Set immediately on
onMediaItemTransitionso the system UI shows episode title/podcast name before playback fully starts.
RSS Feed → RssFeedService → PodcastRepository → LocalDataSource → SQLDelight DB
↕
Episode download → EpisodeDownloader → local file → PcmDecoder (16kHz) → StreamingWindowBuffer
→ YamNetClassifier (TFLite) → SegmentClassifier → segments saved to DB
↓
PlaybackManager ← loads segments on play ← LocalDataSource
Paths are relative to composeApp/src/ unless noted otherwise. All Kotlin packages live under com.musicast.musicast.
| File | Purpose |
|---|---|
commonMain/kotlin/com/musicast/musicast/audio/YamNetClassifier.kt |
Common interface for TFLite YAMNet inference |
androidMain/kotlin/com/musicast/musicast/audio/AndroidYamNetClassifier.kt |
Android TFLite interpreter implementation |
iosMain/kotlin/com/musicast/musicast/audio/IosYamNetClassifier.kt |
iOS TFLite C API implementation (uses cocoapods.TensorFlowLiteC) |
commonMain/kotlin/com/musicast/musicast/audio/MusicDetector.kt |
Orchestrates decode → YAMNet → classify pipeline |
commonMain/kotlin/com/musicast/musicast/audio/SegmentClassifier.kt |
Post-processes YAMNet scores into speech/music segments |
commonMain/kotlin/com/musicast/musicast/audio/StreamingWindowBuffer.kt |
Buffers PCM into 0.975s windows for YAMNet |
commonMain/kotlin/com/musicast/musicast/audio/AudioConstants.kt |
Shared constants (sample rate, window size, class count) |
commonMain/kotlin/com/musicast/musicast/player/PlaybackManager.kt |
Speed management, music detection during playback |
commonMain/kotlin/com/musicast/musicast/ui/viewmodel/EpisodeListViewModel.kt |
Download, analysis, and play orchestration |
commonMain/kotlin/com/musicast/musicast/data/local/LocalDataSource.kt |
All DB operations including segment persistence |
commonMain/sqldelight/com/musicast/musicast/db/PodcastDatabase.sq |
Database schema (SQLDelight) |
commonMain/kotlin/com/musicast/musicast/App.kt |
Main Compose entry point, navigation, Koin injection |
androidMain/kotlin/com/musicast/musicast/PlaybackService.kt |
MediaSessionService with notification controls, speed bitmap, Samsung Now Bar support |
androidMain/kotlin/com/musicast/musicast/MainActivity.kt |
Activity shell, notification permission, foreground service launch |
androidMain/kotlin/com/musicast/musicast/PodcastApplication.kt |
Android Application class that starts Koin |
- SQLDelight dialect: Uses
sqlite_3_18— noRETURNINGclause. UseSELECT last_insert_rowid()instead. - rss-parser API: Feed item audio is at
rawEnclosure?.url, notenclosures. - Composable scope:
koinInject()must be called at composable scope level, not insideremember{}. - iOS native targets: Won't compile on Linux/Windows. Build warnings are expected and suppressed via
kotlin.native.ignoreDisabledTargets. - Lifecycle version: Currently pinned to
2.10.0via the template'slibs.versions.toml. The previous project required2.8.4forlifecycle-viewmodel-compose— if you hit runtime crashes or ViewModel scoping issues after an upgrade, try reverting to2.8.4. - Memory: Audio analysis uses streaming pipeline — never accumulate full decoded audio in memory.
- TFLite input resize: YAMNet's TFLite model has a dynamic input shape. Must call
resizeInput(0, [15600])+allocateTensors()after creating the interpreter. - TFLite multiple outputs: YAMNet has 3 output tensors (scores, embeddings, spectrogram). Use
runForMultipleInputsOutputswith ByteBuffer placeholders for unused outputs. - PcmDecoder resampling: The resampler accumulator can go negative between chunks — guard
idx0 >= 0before array access. - MediaSession notification ID: Must use the same notification ID (1001) in both
onStartCommand()and Media3'sMediaNotificationManager. Using different IDs causes duplicate notifications (one plain, one media). - Samsung Now Bar: Requires a true foreground service (
startForeground(), not justNotificationManager.notify()), aMediaStylenotification with the session token, and a non-nullPendingIntent(setSessionActivity()on the session +setContentIntent()on the notification). - Custom notification buttons: All custom buttons must use
SessionCommand(notplayerCommand) to appear in the notification. TheonConnectcallback must remove default prev/next player commands and add custom session commands. Speed icon usesIconCompat.createWithBitmap()but still needssetIconResId()on theCommandButtonfor thePlaybackStateCompatcompat layer (crashes withIllegalArgumentExceptionotherwise). - iOS TFLite wiring: TensorFlowLiteC is integrated via the
kotlin-cocoapodsGradle plugin (pod("TensorFlowLiteC", "~> 2.14.0")incomposeApp/build.gradle.kts). The Kotlin-side imports live undercocoapods.TensorFlowLiteC.*. On first-time setup on macOS, run./gradlew :composeApp:podInstallto generate the Podfile and install the pod before opening the Xcode project. - iOS framework name: The shared framework is
ComposeApp(notshared). Swift sources mustimport ComposeApp.
The classifier in SegmentClassifier.kt uses these key parameters:
- Speech indices: 0-5, 12 (Speech, Child speech, Conversation, Narration, Babbling, Speech synthesizer, Whispering)
- Music indices: 24-31 (singing), 132-276 (music, instruments, genres)
- Classification: per-window sum of speech scores vs music scores — music wins if
musicScore > speechScore MIN_SEGMENT_DURATION_MS = 3000— segments shorter than this get merged with neighbors- Median filter kernel size = 5 windows — smooths classification noise
If detection is too aggressive or not sensitive enough, adjust the class index sets or add a bias factor to the score comparison in SegmentClassifier.classify().