Turn spoken developer thoughts into structured prompts, reusable templates, and voice input for MCP-enabled tools.
Developer Voice to Prompt is a desktop tray app for developers who think faster than they type. It gives you one consistent voice workflow across AI tools: capture an idea, edit it live, drop it into a reusable prompt template, optionally enhance it, and send it wherever you work.
The base experience works immediately with Web Speech or Whisper. No developer account, no API key, and no subscription are required to start dictating.
Prompting is now part of daily development work, but the input experience is fragmented.
- Some AI tools have voice input.
- Many do not.
- Some workflows need reusable prompt structure, exact file names, or tool names.
- Switching between dictation tools, editors, and AI clients breaks flow.
Developer Voice to Prompt gives you a single voice-first workflow that stays useful even when you switch models, IDEs, or AI tools.
This app is aimed at developers who want to:
- capture implementation ideas before they disappear
- fill reusable prompt templates with exact file names, modules, or tool names
- refine bug reports, refactoring requests, and architecture prompts by voice
- keep a reusable prompt history outside any one AI product
- provide voice input to MCP-aware tools without depending on their built-in dictation UX
Start speaking and the transcript appears live. If you notice something wrong, edit the text directly and continue speaking without restarting the session.
Templates let you keep the parts that should stay stable while changing only the details that matter for the current task.
Typical template fields developers fill in by voice:
- file names
- module names
- affected components
- MCP tool names
- agent or skill names
- acceptance criteria
This is especially useful when the target AI tool searches your codebase and performs better with exact names.
Every prompt is stored locally so you can search, reuse, refine, or delete it later. That history belongs to your workflow, not to a single AI product.
If you connect GitHub Copilot, raw dictation can be transformed into a cleaner, more structured prompt based on your own enhancement instructions.
Use this when you want to turn a rough stream of thoughts into something closer to:
- a bug investigation brief
- a refactoring request
- a feature implementation plan
- an agent instruction block
- a review or debugging prompt
Developer Voice to Prompt can also run as a local MCP server.
That means MCP-capable tools can ask this app for voice input, open the popup, show the reason they need input, and wait for you to speak or type the answer. This is useful when your AI client supports MCP tools but has weak or no native dictation support.
- runs a local HTTP MCP server on
localhost - exposes a
voice_to_texttool - opens the popup when a client requests voice input
- shows the request reason in the popup so you know what the AI is asking for
- optionally pre-fills context text that you can edit before submitting
- returns the final text back to the calling MCP client
- answer an agent's follow-up question without typing
- dictate missing implementation details into an MCP-driven workflow
- speak a bug reproduction description when the agent asks for clarification
- capture architecture context for an AI tool that can call MCP tools but has no voice UI
- use voice input in VS Code and in GitHub Copilot CLI through MCP instead of depending on whether the current tool has good built-in dictation
- make voice a first-class part of the developer workflow, not just a fallback for occasional note-taking
- use the popup as the default human-in-the-loop tool for agent systems when an agent is missing context, missing data, or missing the right tool during debugging and experimentation
- reuse local history to test agent workflows repeatedly, compare phrasing, and refine prompts across multiple runs
- turn unstructured brainstorming into a more structured prompt before sending it into an AI tool or agent workflow
- start with plain voice-to-text, then optionally apply a prompt template and an LLM-based enhancement step to shape the final transcript to your preferred style
- use Azure Speech for mixed-language dictation when technical terms and spoken language switch back and forth, and use local Whisper as the default mode when mixed-language support is not needed
- expose voice-to-text through MCP so other AI tools can adopt faster voice-driven workflows without having to build their own voice UI first
- Open Settings.
- Enable the MCP server in General.
- Choose the local port.
- Save settings.
- Add the local server URL to your MCP-capable client.
The server is disabled by default until you enable it and save.
The app exposes a local voice_to_text MCP tool that other tools can call.
One example is registering the local server in GitHub Copilot, but the feature is not limited to Copilot. Any same-machine MCP client that can talk to a local HTTP MCP server can use it.
Sample with GitHub Copilot: when Copilot calls the tool, the popup shows the request context so you can answer with voice or typed edits and send the result back to the caller.
These points reflect the current implementation:
- local only: the server binds to
127.0.0.1 - one request at a time: concurrent dictation requests are rejected
- timeout: pending requests time out after 5 minutes if nothing is submitted
- cancel on close: closing the popup cancels the MCP request
- configurable port: set it in Settings
- no authentication layer: intended for trusted local workflows
Choose the engine that matches your workflow.
| Feature | Web Speech | Azure | Whisper |
|---|---|---|---|
| Real-time transcription | ✅ | ✅ | ❌ |
| Editable while speaking | ✅ | ✅ | ✅1 |
| Auto-punctuation | ❌ | ✅ | ✅ |
| Multi-language mixing | ❌ | ✅ | ❌ |
| Custom phrase boost | ❌ | ✅ | ✅ |
| Silence auto-stop | ✅ | ✅ | ✅ |
| Microphone selection | ✅ | ✅ | ✅ |
| GPU acceleration | ❌ | ❌ | ✅2 |
| Zero setup | ✅ | ❌ | ❌ |
| Local offline use | ❓3 | ❌ | ✅ |
Azure is especially useful when your spoken language and your technical vocabulary do not match cleanly, for example when you speak one language but say English framework names, class names, or code terms.
| Switch language |
|---|
![]() |
- Open the popup with your global shortcut.
- Optionally load a template.
- Speak your prompt, idea, or answer.
- Edit while speaking if needed.
- Copy the result into your AI tool, or submit it back to an MCP client.
- Optionally run prompt enhancement if you use Copilot.
The app works out of the box with Web Speech.
- no developer account required
- no API key required
- no paid plan required
That makes it the fastest way to start using the app.
| Capability | What you need | Why you would use it |
|---|---|---|
| Azure Speech | Azure Speech key and region, free tier available | Better speech recognition, punctuation, phrase boosting, multi-language scenarios |
| Whisper local | Download the whisper-server binary + a model in Settings (see Whisper Setup) | Local offline transcription, privacy, GPU acceleration |
| Prompt enhancement | GitHub Copilot CLI installed and logged in, plus a Copilot plan (including free tier). The Full installer bundles Node.js; Lite requires Node.js 20+ separately. | Turn rough dictation into cleaner prompts |
| MCP voice input | An MCP-capable client running on the same machine | Let AI tools request voice input through the popup |
Without Azure, Whisper, Copilot, or MCP, the app is still useful as a standalone dictation and prompt-template tool.
Whisper runs entirely on your device using whisper.cpp. No data is sent to any cloud service. Setup differs by platform:
- Open Settings → Speech → Whisper (Local).
- Click Detect GPU to see your hardware and get a recommended binary variant (CPU, OpenBLAS, CUDA 11.8, or CUDA 12.4).
- Select the variant and click Download whisper-server. The app downloads the pre-built binary from the whisper.cpp GitHub releases.
- Download a model (e.g. "Base" for a good balance of speed and accuracy).
- Select Whisper as your speech provider and start dictating.
Pre-built macOS binaries are not available in whisper.cpp releases. Install via Homebrew instead:
brew install whisper-cppThis installs whisper-server with Metal acceleration on Apple Silicon. The app detects the Homebrew installation automatically.
Then open Settings, download a model, and select Whisper as your speech provider.
The app manages a local whisper-server process that loads the model once and stays running during your dictation session. Audio is captured in the browser via an AudioWorklet, sent to the Rust backend, and forwarded as WAV via HTTP POST to the local server. This keeps the model in memory for fast inference (~50–150ms per decode cycle) without reloading on every request.
Each release ships two installer variants per platform:
| Variant | Includes Node.js | Copilot ready | Size |
|---|---|---|---|
| Full | ✅ | After installing the Copilot CLI | Larger (~30 MB extra) |
| Lite | ❌ | After installing Node.js 20+ and the Copilot CLI | Smaller |
Both variants require the GitHub Copilot CLI to be installed and logged in for the prompt-enhancement feature. The only difference is whether Node.js is bundled.
| Platform | Full | Lite |
|---|---|---|
| Windows | Full installer (.exe) | Lite installer (.exe) |
| macOS | Full DMG | Lite DMG |
All releases are listed on the Releases page.
If you want a clean uninstall that also removes your local settings, history, templates, and other stored app data, open Settings and go to the General tab before uninstalling.
Use the app data removal action there. It permanently removes all app data and restarts the app, which is useful if you do not want local data to remain on the machine after uninstalling.
If macOS shows a message like Developer Voice to Prompt is damaged and can't be opened, that is Gatekeeper blocking an app that is not yet code-signed and notarized by Apple.
Use this sequence:
- Download the DMG.
- Drag the app into Applications.
- Try to open it once.
- If macOS blocks it, run:
xattr -cr "/Applications/Developer Voice to Prompt.app"If you have not copied the app to Applications yet and are trying to launch it directly from the mounted DMG, use the mounted path instead:
xattr -cr "/Volumes/Developer Voice to Prompt/Developer Voice to Prompt.app"Then open the app again. This is a one-time step for that downloaded copy.
All shortcuts are customizable in Settings.
| Action | Windows | macOS |
|---|---|---|
| Show or hide popup | Ctrl+Alt+V |
Cmd+Alt+V |
| Start or stop voice | Ctrl+Shift+M |
Cmd+Shift+M |
| Copy and close | Ctrl+Enter |
Cmd+Enter |
| Switch speech provider | Ctrl+Shift+P |
Cmd+Shift+P |
| Enhance prompt | Ctrl+Shift+E |
Cmd+Shift+E |
| Dismiss | Esc |
Esc |
Press ? in the popup to see active shortcuts.
If you want to build from source, understand the architecture, or work on the project itself, see DEVELOPMENT.md.
MIT












