Skip to content

Commit 5f0347b

Browse files
committed
feat: podcast generation system with marketplace & TTS worker fix
- Added {{@podcast:}} tag for AI-powered multi-speaker podcast generation - Built podcast marketplace with 15+ curated templates (Tech, Science, Business, Creative, Education) - 3-phase pipeline: web research → AI script generation → Kokoro TTS multi-speaker synthesis - Configurable styles: debate, interview, chat, lecture, storytelling - Fixed critical TTS worker bug: service worker cache-first served stale tts-worker.js - Extracted processMultiSegments() and bundled segments with init message - Added cache-busting, worker.onerror, per-chunk timeout, event loop yields - Bumped service worker cache v2→v3, excluded worker files from caching
1 parent 14368ba commit 5f0347b

17 files changed

Lines changed: 4022 additions & 34 deletions

CHANGELOG-podcast-system.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Podcast Generation System & TTS Worker Fixes
2+
3+
- Added `{{@Podcast:}}` tag for AI-powered podcast generation from any document content
4+
- Built multi-speaker script generation with configurable styles (debate, interview, chat, lecture, storytelling)
5+
- Integrated Kokoro TTS multi-speaker synthesis with voice pre-fetching and per-chunk progress
6+
- Created podcast marketplace UI with pre-built podcast templates and search/filter
7+
- Added podcast template system with 15+ curated templates across tech, science, business categories
8+
- Built WAV audio creation from Float32Array TTS output with download support
9+
- Added real-time podcast generation progress UI with phase indicators (research → script → audio → done)
10+
- Extracted `processMultiSegments()` as standalone async function in TTS worker for reliable synthesis
11+
- Fixed: TTS worker message delivery bug — bundled segments with `init` message to process in same handler execution
12+
- Fixed: Service worker cache-first strategy serving stale `tts-worker.js` — added cache-busting `?v=` param
13+
- Fixed: Worker files now excluded from service worker `shouldCacheResponse()` caching
14+
- Bumped service worker cache version from `v2``v3` to force cache invalidation
15+
- Added `worker.onerror` handler on main thread to catch worker-level errors
16+
- Added TTS worker version identifier (`TTS_WORKER_VERSION`) with startup logging
17+
- Added `_pendingMultiSegments` backup mechanism in `status: ready` handler
18+
- Added per-chunk 90s timeout via `Promise.race` to prevent infinite synthesis hangs
19+
- Added event loop yields (`setTimeout(0)`) between WASM calls so `postMessage` flushes
20+
- Added voice pre-fetch phase before synthesis to separate network vs WASM issues
21+
- Added heartbeat logger (10s interval) during multi-speaker synthesis
22+
- Added detailed timestamped logging across `textToSpeech.js`, `tts-worker.js`, `podcast-docgen.js`
23+
- Added help mode entries for podcast generation feature
24+
- Added podcast renderer integration in `renderer.js` for `{{@Podcast:}}` tag processing
25+
26+
---
27+
28+
## Summary
29+
Complete podcast generation system: users write `{{@Podcast: topic}}` in any document and get an AI-generated multi-speaker podcast with web research, script writing, and Kokoro TTS audio synthesis. Also fixed a critical TTS worker bug where the service worker's cache-first strategy served stale worker code, and the Web Worker silently dropped `speak-multi` messages sent after the async `init` handler completed.
30+
31+
---
32+
33+
## 1. Podcast Document Generator (`{{@Podcast:}}` Tag)
34+
**Files:** `js/podcast-docgen.js`, `css/podcast-docgen.css`
35+
**What:** New IIFE component that intercepts `{{@Podcast: topic}}` tags in rendered markdown. Performs 3-phase generation: (1) web search research via Jina API, (2) AI script generation with `[Speaker]` markers, (3) Kokoro TTS multi-speaker audio synthesis. Includes `parseScript()` for speaker segmentation, `createWavBlob()` for audio encoding, and real-time progress UI with phase indicators.
36+
**Impact:** Users can generate full podcast episodes from any topic directly in their documents — no external tools needed.
37+
38+
## 2. Podcast Marketplace
39+
**Files:** `js/podcast-marketplace.js`, `css/podcast-marketplace.css`, `js/templates/podcasts.js`
40+
**What:** Built a browsable marketplace UI with 15+ curated podcast templates across categories (Tech, Science, Business, Creative, Education). Includes search/filter, category tabs, template cards with metadata (duration, speakers, style), and one-click generation. Templates define speaker count, style, custom prompts, and voice assignments.
41+
**Impact:** Users can browse and generate podcasts from pre-built templates without writing prompts.
42+
43+
## 3. TTS Worker Multi-Speaker Fix (Critical Bug)
44+
**Files:** `js/tts-worker.js`, `js/textToSpeech.js`
45+
**What:** The Web Worker silently dropped `speak-multi` messages sent after the async `init` handler completed. Root cause: service worker served cached `tts-worker.js` via cache-first strategy, AND the worker couldn't reliably process a second `postMessage` after `init`. Fix: (1) extracted `processMultiSegments()` as standalone function, (2) bundled segments with `init` message via `pendingSegments` field, (3) worker processes segments inline at end of init handler, (4) added cache-busting `?v=` param to worker URL, (5) added `_pendingMultiSegments` backup dispatch from `status: ready` handler.
46+
**Impact:** Podcast TTS synthesis now works reliably — previously it hung forever after model loaded.
47+
48+
## 4. Service Worker Cache Fix
49+
**Files:** `sw.js`
50+
**What:** Bumped `CACHE_NAME` from `textagent-v2` to `textagent-v3` to invalidate stale caches. Added exclusion for `*worker*` files in `shouldCacheResponse()` so worker JS is always fetched fresh. This prevents the cache-first strategy from serving outdated worker code.
51+
**Impact:** Future worker code changes take effect immediately without manual cache clearing.
52+
53+
## 5. TTS Synthesis Robustness
54+
**Files:** `js/tts-worker.js`, `js/textToSpeech.js`
55+
**What:** Added per-chunk 90s timeout (`Promise.race`), event loop yields between WASM calls (`await setTimeout(0)`), voice pre-fetch phase, heartbeat logger, `worker.onerror` handler, and version stamping. Failed chunks are skipped gracefully instead of aborting the entire podcast.
56+
**Impact:** Audio synthesis is more resilient — provides real-time progress, detects hangs, and degrades gracefully on failures.
57+
58+
## 6. Integration & UI Updates
59+
**Files:** `index.html`, `js/renderer.js`, `js/templates.js`, `js/modal-templates.js`, `js/help-mode.js`, `src/main.js`
60+
**What:** Added podcast module imports in `main.js`, podcast tag processing in `renderer.js`, marketplace modal in `modal-templates.js`, help entries in `help-mode.js`, and toolbar button in `templates.js`. Updated `index.html` with podcast CSS imports.
61+
**Impact:** Podcast features are fully integrated into the TextAgent UI with discoverable entry points.
62+
63+
---
64+
65+
## Files Changed (14 total)
66+
67+
| File | Lines Changed | Type |
68+
|------|:---:|------|
69+
| `js/podcast-docgen.js` | +1046 | New: podcast generation engine |
70+
| `js/podcast-marketplace.js` | +923 | New: marketplace UI |
71+
| `css/podcast-marketplace.css` | +730 | New: marketplace styles |
72+
| `css/podcast-docgen.css` | +406 | New: podcast player styles |
73+
| `js/templates/podcasts.js` | +279 | New: podcast templates |
74+
| `js/textToSpeech.js` | +205 −30 | Multi-speaker fix, worker caching, error handling |
75+
| `js/tts-worker.js` | +203 −0 | processMultiSegments, bundled init, version stamp |
76+
| `index.html` | +30 −23 | CSS imports, podcast integration |
77+
| `js/renderer.js` | +12 −1 | Podcast tag processing |
78+
| `js/help-mode.js` | +11 | Podcast help entries |
79+
| `src/main.js` | +9 | Module imports |
80+
| `js/templates.js` | +4 −1 | Toolbar button |
81+
| `sw.js` | +3 −1 | Cache version bump, worker exclusion |
82+
| `js/modal-templates.js` | +1 | Marketplace modal |

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
| **📌 AI Annotations** | Right-click context menu on selected text → 5 annotation types: ⭐ Highlight, 📝 Sticky Note, ❓ Ask AI, 🔖 Bookmark, 📖 Define; color-coded pills render inline in preview; sliding thread panel for multi-turn AI Q&A with document context, web search, and model selector; annotations stored as HTML comments in markdown source (portable, no external DB); **Study Copy** workflow for annotating shared read-only documents; `findBlockEnd()` structural insertion prevents markdown syntax breakage |
3131
| **🎤 Voice Dictation** | Dual-engine speech-to-text: **Voxtral Mini 3B** (WebGPU, primary, 13 languages, ~2.7 GB) or **Whisper Large V3 Turbo** (WASM fallback, ~800 MB) with consensus scoring; download consent popup with model info before first use; 50+ Markdown-aware voice commands — natural phrases ("heading one", "bold…end bold", "add table", "undo"); auto-punctuation via AI refinement or built-in fallback; streaming partial results |
3232
| **🔊 Text-to-Speech** | Hybrid Kokoro TTS engine — 9 languages (English, Japanese, Chinese, Spanish, French, Hindi, Italian, Portuguese) via [Kokoro 82M v1.0 ONNX](https://huggingface.co/textagent/Kokoro-82M-v1.0-ONNX) (~80 MB, off-thread WebWorker), Korean, German & others via Web Speech API fallback; **chunked synthesis** for long text (sentence-boundary splitting, ~500 chars/chunk, sequential synthesis with per-chunk progress); TTS card with separate ▶ Run (generate audio) / ▷ Play (replay) / 💾 Save (WAV download) buttons; hover any preview text and click 🔊 to hear pronunciation; voice auto-selection by language |
33+
| **🎙️ Podcast Generation** | `{{@Podcast:}}` tag for AI-powered multi-speaker podcast creation from any topic; 3-phase pipeline: web research (Jina API), AI script generation, Kokoro TTS multi-speaker audio synthesis; configurable styles (debate, interview, chat, lecture, storytelling); **Podcast Marketplace** with 15+ curated templates across Tech, Science, Business, Creative, Education categories; real-time progress with phase indicators; WAV audio download; per-segment voice assignments |
3334
| **Import** | TXT, MD, DOCX, XLSX/XLS, CSV, TSV, HTML, JSON, XML, YAML, TOML, PDF — drag & drop or click Upload to import |
3435
| **Export** | Markdown, self-contained styled HTML, PDF (smart page-breaks, shared rendering pipeline), LLM Memory (5 formats: XML, JSON, Compact JSON, Markdown, Plain Text + shareable link) |
3536
| **Sharing** | AES-256-GCM encrypted sharing via Firebase; **compact share links** (`#s=<id>`, ~36 chars vs ~111 chars) with encryption key stored server-side; **custom named links** — optionally choose your own memorable name (e.g. `#s=mynotes`) with case-insensitive uniqueness, slug validation, and reserved-word protection; read-only shared links with auto-dismiss banner + floating "Read-only" pill indicator, **clean read-only view** (composer FAB + agent panel hidden when header collapsed), optional password protection (zero-knowledge — passphrase-derived key never stored); **view-locked links** (lock recipients to PPT or Preview mode, stored in Firestore to prevent URL tampering); **editor links** — cryptographic edit key system (`&ek=<token>`) grants write access to trusted collaborators (SHA-256 verified, AES-GCM encrypted write-token, auto-save to same document); **shared versions tracking** ("Previously Shared" panel with timestamps, view-mode badges, copy/delete actions); backward-compatible with legacy `#id=...&k=...` links |
@@ -548,7 +549,8 @@ TextAgent has undergone significant evolution since its inception. What started
548549

549550
| Date | Commits | Feature / Update |
550551
|------|---------|-----------------:|
551-
| **2026-04-12** ||**React JSX Live Runtime** — new `jsx-autorun` code block type for interactive React components in markdown; `exec-jsx.js` (658 lines) with `@babel/standalone` transpilation (`['env', { modules: false }], 'react'` presets), sandboxed iframe rendering via React 18 CDN, and `LIB_REGISTRY` auto-detection of 12+ libraries (Recharts, Tailwind CSS, Lucide React, Framer Motion, Lodash, date-fns, dayjs, PapaParse, UUID, clsx, Chart.js, Google Fonts) with CDN injection; `⚛ React JSX Live` toolbar button in Coding dropdown; Help Mode docs with 3 FAQ examples; `export default function App()` component detection with PascalCase fallback; Load File button for `.jsx` import; Fixed: `exports is not defined` runtime error (import/export stripping moved to pre-transpilation to prevent Babel `env` preset from generating CommonJS references); 20 Playwright tests (module lifecycle, rendering, state, library detection, Run All pipeline) |
552+
| **2026-04-15** || 🎙️ **Podcast Generation System** — new `{{@Podcast:}}` document tag for AI-powered multi-speaker podcast creation; 3-phase pipeline (web research via Jina API → AI script generation with `[Speaker]` markers → Kokoro TTS multi-speaker audio synthesis); configurable styles (debate, interview, chat, lecture, storytelling); `parseScript()` speaker segmentation; `createWavBlob()` Float32Array→WAV encoder; real-time progress UI with phase indicators; WAV audio download; **Podcast Marketplace** with 15+ curated templates across 5 categories (Tech, Science, Business, Creative, Education); search/filter, template cards with metadata; `podcast-docgen.js` (~1046 lines) + `podcast-marketplace.js` (~923 lines) + `css/podcast-docgen.css` + `css/podcast-marketplace.css` + `js/templates/podcasts.js` |
553+
| **2026-04-15** || 🔧 **TTS Worker Multi-Speaker Fix** — fixed critical bug where Web Worker silently dropped `speak-multi` messages after async `init` handler completed; root cause: service worker (`sw.js`) used cache-first strategy for `.js` files, serving stale `tts-worker.js` indefinitely; fix: (1) extracted `processMultiSegments()` as standalone async function, (2) bundled segments with `init` message via `pendingSegments` field for same-handler-execution processing, (3) added cache-busting `?v=` param to worker URL, (4) excluded worker files from service worker caching, (5) bumped `CACHE_NAME` v2→v3, (6) added `worker.onerror` handler, (7) per-chunk 90s timeout, event loop yields, voice pre-fetch phase, heartbeat logger, version stamping |
552554
| **2026-04-08** ||**Star on GitHub Button** — new gold/amber gradient pill button in header next to Issues button linking to the GitHub repo for starring; `.star-github-pill` CSS class with dark mode variant; fixed Issues button inline styles that prevented proper `.help-mode-pill` rendering |
553555
| **2026-04-05** || 🔬 **Interactive Periodic Table Template** — new Science template category (`bi-atom` icon) with full 118-element interactive periodic table; React + Babel `html-autorun` block; 18×10 grid layout with 11 color-coded element categories (nonmetal, noble gas, alkali, alkaline, transition, post-transition, metalloid, halogen, lanthanide, actinide, unknown); search filter and category highlight; element detail view with 4 tabbed sections (Overview, Properties, Structure, Uses & Hazards); interactive Bohr model atom visualization with concentric orbit rings, animated electrons, 3D nucleus cluster, mouse drag rotation/tilt, scroll zoom; dark/light theme toggle (30+ CSS tokens per theme); lanthanide (57–71) and actinide (89–103) rows with labels; left-aligned header controls |
554556
| **2026-04-04** | — | 🤖 **Agentic Tool Calling** — transitioned AI assistant from firehose context injection to **model-driven tool calling** for Groq cloud models; `buildToolDefinitions()` registers enabled connectors (Weather, HN, GitHub, Slack) + web search as OpenAI-format tools; model decides which tools to call via `tool_choice: 'auto'`; `executeToolCall()` runs tools in parallel; `handleToolCalls()` orchestrates two-pass generation (Pass 1: tool selection, Pass 2: synthesis with results); **query-relevance filter** (`queryNeedsConnectors()`) for local models — keyword-based gating prevents weather/news injection on general queries like "what is algebra?"; `extractLocationFromQuery()` with 5 extraction strategies (preposition patterns, weather patterns, capitalized words) for smart geocoding; softened grounding header allows general knowledge answers when connector data is irrelevant; Groq worker updated with non-streaming tool detection path and `rawMessages` for Pass 2; model-aware context budgets (4K local / 30K cloud); WebGPU buffer overflow translated to user-friendly error messages; model-size-aware context limits (0.8B→4K, 2B→8K, 4B+→32K chars) |

0 commit comments

Comments
 (0)