Textagent
diff --git a/‎CHANGELOG-podcast-system.md‎
Lines changed: 82 additions & 0 deletions b/‎CHANGELOG-podcast-system.md‎
Lines changed: 82 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 3 additions & 1 deletion b/‎README.md‎
Lines changed: 3 additions & 1 deletion
@@ -0,0 +1,82 @@
+# Podcast Generation System & TTS Worker Fixes
+
+- Added `{{@Podcast:}}` tag for AI-powered podcast generation from any document content
+- Built multi-speaker script generation with configurable styles (debate, interview, chat, lecture, storytelling)
+- Integrated Kokoro TTS multi-speaker synthesis with voice pre-fetching and per-chunk progress
+- Created podcast marketplace UI with pre-built podcast templates and search/filter
+- Added podcast template system with 15+ curated templates across tech, science, business categories
+- Built WAV audio creation from Float32Array TTS output with download support
+- Added real-time podcast generation progress UI with phase indicators (research → script → audio → done)
+- Extracted `processMultiSegments()` as standalone async function in TTS worker for reliable synthesis
+- Fixed: TTS worker message delivery bug — bundled segments with `init` message to process in same handler execution
+- Fixed: Service worker cache-first strategy serving stale `tts-worker.js` — added cache-busting `?v=` param
+- Fixed: Worker files now excluded from service worker `shouldCacheResponse()` caching
+- Bumped service worker cache version from `v2` → `v3` to force cache invalidation
+- Added `worker.onerror` handler on main thread to catch worker-level errors
+- Added TTS worker version identifier (`TTS_WORKER_VERSION`) with startup logging
+- Added `_pendingMultiSegments` backup mechanism in `status: ready` handler
+- Added per-chunk 90s timeout via `Promise.race` to prevent infinite synthesis hangs
+- Added event loop yields (`setTimeout(0)`) between WASM calls so `postMessage` flushes
+- Added voice pre-fetch phase before synthesis to separate network vs WASM issues
+- Added heartbeat logger (10s interval) during multi-speaker synthesis
+- Added detailed timestamped logging across `textToSpeech.js`, `tts-worker.js`, `podcast-docgen.js`
+- Added help mode entries for podcast generation feature
+- Added podcast renderer integration in `renderer.js` for `{{@Podcast:}}` tag processing
+
+---
+
+## Summary
+Complete podcast generation system: users write `{{@Podcast: topic}}` in any document and get an AI-generated multi-speaker podcast with web research, script writing, and Kokoro TTS audio synthesis. Also fixed a critical TTS worker bug where the service worker's cache-first strategy served stale worker code, and the Web Worker silently dropped `speak-multi` messages sent after the async `init` handler completed.
+
+---
+
+## 1. Podcast Document Generator (`{{@Podcast:}}` Tag)
+**Files:** `js/podcast-docgen.js`, `css/podcast-docgen.css`
+**What:** New IIFE component that intercepts `{{@Podcast: topic}}` tags in rendered markdown. Performs 3-phase generation: (1) web search research via Jina API, (2) AI script generation with `[Speaker]` markers, (3) Kokoro TTS multi-speaker audio synthesis. Includes `parseScript()` for speaker segmentation, `createWavBlob()` for audio encoding, and real-time progress UI with phase indicators.
+**Impact:** Users can generate full podcast episodes from any topic directly in their documents — no external tools needed.
+
+## 2. Podcast Marketplace
+**Files:** `js/podcast-marketplace.js`, `css/podcast-marketplace.css`, `js/templates/podcasts.js`
+**What:** Built a browsable marketplace UI with 15+ curated podcast templates across categories (Tech, Science, Business, Creative, Education). Includes search/filter, category tabs, template cards with metadata (duration, speakers, style), and one-click generation. Templates define speaker count, style, custom prompts, and voice assignments.
+**Impact:** Users can browse and generate podcasts from pre-built templates without writing prompts.
+
+## 3. TTS Worker Multi-Speaker Fix (Critical Bug)
+**Files:** `js/tts-worker.js`, `js/textToSpeech.js`
+**What:** The Web Worker silently dropped `speak-multi` messages sent after the async `init` handler completed. Root cause: service worker served cached `tts-worker.js` via cache-first strategy, AND the worker couldn't reliably process a second `postMessage` after `init`. Fix: (1) extracted `processMultiSegments()` as standalone function, (2) bundled segments with `init` message via `pendingSegments` field, (3) worker processes segments inline at end of init handler, (4) added cache-busting `?v=` param to worker URL, (5) added `_pendingMultiSegments` backup dispatch from `status: ready` handler.
+**Impact:** Podcast TTS synthesis now works reliably — previously it hung forever after model loaded.
+
+## 4. Service Worker Cache Fix
+**Files:** `sw.js`
+**What:** Bumped `CACHE_NAME` from `textagent-v2` to `textagent-v3` to invalidate stale caches. Added exclusion for `*worker*` files in `shouldCacheResponse()` so worker JS is always fetched fresh. This prevents the cache-first strategy from serving outdated worker code.
+**Impact:** Future worker code changes take effect immediately without manual cache clearing.
+
+## 5. TTS Synthesis Robustness
+**Files:** `js/tts-worker.js`, `js/textToSpeech.js`
+**What:** Added per-chunk 90s timeout (`Promise.race`), event loop yields between WASM calls (`await setTimeout(0)`), voice pre-fetch phase, heartbeat logger, `worker.onerror` handler, and version stamping. Failed chunks are skipped gracefully instead of aborting the entire podcast.
+**Impact:** Audio synthesis is more resilient — provides real-time progress, detects hangs, and degrades gracefully on failures.
+
+## 6. Integration & UI Updates
+**Files:** `index.html`, `js/renderer.js`, `js/templates.js`, `js/modal-templates.js`, `js/help-mode.js`, `src/main.js`
+**What:** Added podcast module imports in `main.js`, podcast tag processing in `renderer.js`, marketplace modal in `modal-templates.js`, help entries in `help-mode.js`, and toolbar button in `templates.js`. Updated `index.html` with podcast CSS imports.
+**Impact:** Podcast features are fully integrated into the TextAgent UI with discoverable entry points.
+
+---
+
+## Files Changed (14 total)
+
+| File | Lines Changed | Type |
+|------|:---:|------|
+| `js/podcast-docgen.js` | +1046 | New: podcast generation engine |
+| `js/podcast-marketplace.js` | +923 | New: marketplace UI |
+| `css/podcast-marketplace.css` | +730 | New: marketplace styles |
+| `css/podcast-docgen.css` | +406 | New: podcast player styles |
+| `js/templates/podcasts.js` | +279 | New: podcast templates |
+| `js/textToSpeech.js` | +205 −30 | Multi-speaker fix, worker caching, error handling |
+| `js/tts-worker.js` | +203 −0 | processMultiSegments, bundled init, version stamp |
+| `index.html` | +30 −23 | CSS imports, podcast integration |
+| `js/renderer.js` | +12 −1 | Podcast tag processing |
+| `js/help-mode.js` | +11 | Podcast help entries |
+| `src/main.js` | +9 | Module imports |
+| `js/templates.js` | +4 −1 | Toolbar button |
+| `sw.js` | +3 −1 | Cache version bump, worker exclusion |
+| `js/modal-templates.js` | +1 | Marketplace modal |
@@ -30,6 +30,7 @@
 | **📌 AI Annotations** | Right-click context menu on selected text → 5 annotation types: ⭐ Highlight, 📝 Sticky Note, ❓ Ask AI, 🔖 Bookmark, 📖 Define; color-coded pills render inline in preview; sliding thread panel for multi-turn AI Q&A with document context, web search, and model selector; annotations stored as HTML comments in markdown source (portable, no external DB); **Study Copy** workflow for annotating shared read-only documents; `findBlockEnd()` structural insertion prevents markdown syntax breakage |
 | **🎤 Voice Dictation** | Dual-engine speech-to-text: **Voxtral Mini 3B** (WebGPU, primary, 13 languages, ~2.7 GB) or **Whisper Large V3 Turbo** (WASM fallback, ~800 MB) with consensus scoring; download consent popup with model info before first use; 50+ Markdown-aware voice commands — natural phrases ("heading one", "bold…end bold", "add table", "undo"); auto-punctuation via AI refinement or built-in fallback; streaming partial results |
 | **🔊 Text-to-Speech** | Hybrid Kokoro TTS engine — 9 languages (English, Japanese, Chinese, Spanish, French, Hindi, Italian, Portuguese) via [Kokoro 82M v1.0 ONNX](https://huggingface.co/textagent/Kokoro-82M-v1.0-ONNX) (~80 MB, off-thread WebWorker), Korean, German & others via Web Speech API fallback; **chunked synthesis** for long text (sentence-boundary splitting, ~500 chars/chunk, sequential synthesis with per-chunk progress); TTS card with separate ▶ Run (generate audio) / ▷ Play (replay) / 💾 Save (WAV download) buttons; hover any preview text and click 🔊 to hear pronunciation; voice auto-selection by language |
+| **🎙️ Podcast Generation** | `{{@Podcast:}}` tag for AI-powered multi-speaker podcast creation from any topic; 3-phase pipeline: web research (Jina API), AI script generation, Kokoro TTS multi-speaker audio synthesis; configurable styles (debate, interview, chat, lecture, storytelling); **Podcast Marketplace** with 15+ curated templates across Tech, Science, Business, Creative, Education categories; real-time progress with phase indicators; WAV audio download; per-segment voice assignments |
 | **Import** | TXT, MD, DOCX, XLSX/XLS, CSV, TSV, HTML, JSON, XML, YAML, TOML, PDF — drag & drop or click Upload to import |
 | **Export** | Markdown, self-contained styled HTML, PDF (smart page-breaks, shared rendering pipeline), LLM Memory (5 formats: XML, JSON, Compact JSON, Markdown, Plain Text + shareable link) |
 | **Sharing** | AES-256-GCM encrypted sharing via Firebase; **compact share links** (`#s=<id>`, ~36 chars vs ~111 chars) with encryption key stored server-side; **custom named links** — optionally choose your own memorable name (e.g. `#s=mynotes`) with case-insensitive uniqueness, slug validation, and reserved-word protection; read-only shared links with auto-dismiss banner + floating "Read-only" pill indicator, **clean read-only view** (composer FAB + agent panel hidden when header collapsed), optional password protection (zero-knowledge — passphrase-derived key never stored); **view-locked links** (lock recipients to PPT or Preview mode, stored in Firestore to prevent URL tampering); **editor links** — cryptographic edit key system (`&ek=<token>`) grants write access to trusted collaborators (SHA-256 verified, AES-GCM encrypted write-token, auto-save to same document); **shared versions tracking** ("Previously Shared" panel with timestamps, view-mode badges, copy/delete actions); backward-compatible with legacy `#id=...&k=...` links |
@@ -548,7 +549,8 @@ TextAgent has undergone significant evolution since its inception. What started
 
 | Date | Commits | Feature / Update |
 |------|---------|-----------------:|
-| **2026-04-12** | — | ⚛ **React JSX Live Runtime** — new `jsx-autorun` code block type for interactive React components in markdown; `exec-jsx.js` (658 lines) with `@babel/standalone` transpilation (`['env', { modules: false }], 'react'` presets), sandboxed iframe rendering via React 18 CDN, and `LIB_REGISTRY` auto-detection of 12+ libraries (Recharts, Tailwind CSS, Lucide React, Framer Motion, Lodash, date-fns, dayjs, PapaParse, UUID, clsx, Chart.js, Google Fonts) with CDN injection; `⚛ React JSX Live` toolbar button in Coding dropdown; Help Mode docs with 3 FAQ examples; `export default function App()` component detection with PascalCase fallback; Load File button for `.jsx` import; Fixed: `exports is not defined` runtime error (import/export stripping moved to pre-transpilation to prevent Babel `env` preset from generating CommonJS references); 20 Playwright tests (module lifecycle, rendering, state, library detection, Run All pipeline) |
+| **2026-04-15** | — | 🎙️ **Podcast Generation System** — new `{{@Podcast:}}` document tag for AI-powered multi-speaker podcast creation; 3-phase pipeline (web research via Jina API → AI script generation with `[Speaker]` markers → Kokoro TTS multi-speaker audio synthesis); configurable styles (debate, interview, chat, lecture, storytelling); `parseScript()` speaker segmentation; `createWavBlob()` Float32Array→WAV encoder; real-time progress UI with phase indicators; WAV audio download; **Podcast Marketplace** with 15+ curated templates across 5 categories (Tech, Science, Business, Creative, Education); search/filter, template cards with metadata; `podcast-docgen.js` (~1046 lines) + `podcast-marketplace.js` (~923 lines) + `css/podcast-docgen.css` + `css/podcast-marketplace.css` + `js/templates/podcasts.js` |
+| **2026-04-15** | — | 🔧 **TTS Worker Multi-Speaker Fix** — fixed critical bug where Web Worker silently dropped `speak-multi` messages after async `init` handler completed; root cause: service worker (`sw.js`) used cache-first strategy for `.js` files, serving stale `tts-worker.js` indefinitely; fix: (1) extracted `processMultiSegments()` as standalone async function, (2) bundled segments with `init` message via `pendingSegments` field for same-handler-execution processing, (3) added cache-busting `?v=` param to worker URL, (4) excluded worker files from service worker caching, (5) bumped `CACHE_NAME` v2→v3, (6) added `worker.onerror` handler, (7) per-chunk 90s timeout, event loop yields, voice pre-fetch phase, heartbeat logger, version stamping |
 | **2026-04-08** | — | ⭐ **Star on GitHub Button** — new gold/amber gradient pill button in header next to Issues button linking to the GitHub repo for starring; `.star-github-pill` CSS class with dark mode variant; fixed Issues button inline styles that prevented proper `.help-mode-pill` rendering |
 | **2026-04-05** | — | 🔬 **Interactive Periodic Table Template** — new Science template category (`bi-atom` icon) with full 118-element interactive periodic table; React + Babel `html-autorun` block; 18×10 grid layout with 11 color-coded element categories (nonmetal, noble gas, alkali, alkaline, transition, post-transition, metalloid, halogen, lanthanide, actinide, unknown); search filter and category highlight; element detail view with 4 tabbed sections (Overview, Properties, Structure, Uses & Hazards); interactive Bohr model atom visualization with concentric orbit rings, animated electrons, 3D nucleus cluster, mouse drag rotation/tilt, scroll zoom; dark/light theme toggle (30+ CSS tokens per theme); lanthanide (57–71) and actinide (89–103) rows with labels; left-aligned header controls |
 | **2026-04-04** | — | 🤖 **Agentic Tool Calling** — transitioned AI assistant from firehose context injection to **model-driven tool calling** for Groq cloud models; `buildToolDefinitions()` registers enabled connectors (Weather, HN, GitHub, Slack) + web search as OpenAI-format tools; model decides which tools to call via `tool_choice: 'auto'`; `executeToolCall()` runs tools in parallel; `handleToolCalls()` orchestrates two-pass generation (Pass 1: tool selection, Pass 2: synthesis with results); **query-relevance filter** (`queryNeedsConnectors()`) for local models — keyword-based gating prevents weather/news injection on general queries like "what is algebra?"; `extractLocationFromQuery()` with 5 extraction strategies (preposition patterns, weather patterns, capitalized words) for smart geocoding; softened grounding header allows general knowledge answers when connector data is irrelevant; Groq worker updated with non-streaming tool detection path and `rawMessages` for Pass 2; model-aware context budgets (4K local / 30K cloud); WebGPU buffer overflow translated to user-friendly error messages; model-size-aware context limits (0.8B→4K, 2B→8K, 4B+→32K chars) |