Developer Voice to Prompt (with MCP Support)

Turn spoken developer thoughts into structured prompts, reusable templates, and voice input for MCP-enabled tools.

Developer Voice to Prompt is a desktop tray app for developers who think faster than they type. It gives you one consistent voice workflow across AI tools: capture an idea, edit it live, drop it into a reusable prompt template, optionally enhance it, and send it wherever you work.

The base experience works immediately with Web Speech or Whisper. No developer account, no API key, and no subscription are required to start dictating.

Why This Exists

Prompting is now part of daily development work, but the input experience is fragmented.

Some AI tools have voice input.
Many do not.
Some workflows need reusable prompt structure, exact file names, or tool names.
Switching between dictation tools, editors, and AI clients breaks flow.

Developer Voice to Prompt gives you a single voice-first workflow that stays useful even when you switch models, IDEs, or AI tools.

Built For Developers

This app is aimed at developers who want to:

capture implementation ideas before they disappear
fill reusable prompt templates with exact file names, modules, or tool names
refine bug reports, refactoring requests, and architecture prompts by voice
keep a reusable prompt history outside any one AI product
provide voice input to MCP-aware tools without depending on their built-in dictation UX

Core Workflows

Speak, Edit, Continue

Start speaking and the transcript appears live. If you notice something wrong, edit the text directly and continue speaking without restarting the session.

Reuse Prompt Templates

Templates let you keep the parts that should stay stable while changing only the details that matter for the current task.

Typical template fields developers fill in by voice:

file names
module names
affected components
MCP tool names
agent or skill names
acceptance criteria

This is especially useful when the target AI tool searches your codebase and performs better with exact names.

Keep Your Prompt History

Every prompt is stored locally so you can search, reuse, refine, or delete it later. That history belongs to your workflow, not to a single AI product.

Enhance Rough Dictation Into Better Prompts

If you connect GitHub Copilot, raw dictation can be transformed into a cleaner, more structured prompt based on your own enhancement instructions.

Use this when you want to turn a rough stream of thoughts into something closer to:

a bug investigation brief
a refactoring request
a feature implementation plan
an agent instruction block
a review or debugging prompt

MCP Server For Voice Input

Developer Voice to Prompt can also run as a local MCP server.

That means MCP-capable tools can ask this app for voice input, open the popup, show the reason they need input, and wait for you to speak or type the answer. This is useful when your AI client supports MCP tools but has weak or no native dictation support.

What It Does

runs a local HTTP MCP server on localhost
exposes a voice_to_text tool
opens the popup when a client requests voice input
shows the request reason in the popup so you know what the AI is asking for
optionally pre-fills context text that you can edit before submitting
returns the final text back to the calling MCP client

Typical Use Cases

answer an agent's follow-up question without typing
dictate missing implementation details into an MCP-driven workflow
speak a bug reproduction description when the agent asks for clarification
capture architecture context for an AI tool that can call MCP tools but has no voice UI

Practical Use Cases

use voice input in VS Code and in GitHub Copilot CLI through MCP instead of depending on whether the current tool has good built-in dictation
make voice a first-class part of the developer workflow, not just a fallback for occasional note-taking
use the popup as the default human-in-the-loop tool for agent systems when an agent is missing context, missing data, or missing the right tool during debugging and experimentation
reuse local history to test agent workflows repeatedly, compare phrasing, and refine prompts across multiple runs
turn unstructured brainstorming into a more structured prompt before sending it into an AI tool or agent workflow
start with plain voice-to-text, then optionally apply a prompt template and an LLM-based enhancement step to shape the final transcript to your preferred style
use Azure Speech for mixed-language dictation when technical terms and spoken language switch back and forth, and use local Whisper as the default mode when mixed-language support is not needed
expose voice-to-text through MCP so other AI tools can adopt faster voice-driven workflows without having to build their own voice UI first

Setup

Open Settings.
Enable the MCP server in General.
Choose the local port.
Save settings.
Add the local server URL to your MCP-capable client.

The server is disabled by default until you enable it and save.

The app exposes a local voice_to_text MCP tool that other tools can call.

One example is registering the local server in GitHub Copilot, but the feature is not limited to Copilot. Any same-machine MCP client that can talk to a local HTTP MCP server can use it.

Sample with GitHub Copilot: when Copilot calls the tool, the popup shows the request context so you can answer with voice or typed edits and send the result back to the caller.

Current MCP Behavior

These points reflect the current implementation:

local only: the server binds to 127.0.0.1
one request at a time: concurrent dictation requests are rejected
timeout: pending requests time out after 5 minutes if nothing is submitted
cancel on close: closing the popup cancels the MCP request
configurable port: set it in Settings
no authentication layer: intended for trusted local workflows

Speech Engines

Choose the engine that matches your workflow.

Feature	Web Speech	Azure	Whisper
Real-time transcription	✅	✅	❌
Editable while speaking	✅	✅	✅¹
Auto-punctuation	❌	✅	✅
Multi-language mixing	❌	✅	❌
Custom phrase boost	❌	✅	✅
Silence auto-stop	✅	✅	✅
Microphone selection	✅	✅	✅
GPU acceleration	❌	❌	✅²
Zero setup	✅	❌	❌
Local offline use	❓³	❌	✅

Azure is especially useful when your spoken language and your technical vocabulary do not match cleanly, for example when you speak one language but say English framework names, class names, or code terms.

Switch language

Typical Workflow

Open the popup with your global shortcut.
Optionally load a template.
Speak your prompt, idea, or answer.
Edit while speaking if needed.
Copy the result into your AI tool, or submit it back to an MCP client.
Optionally run prompt enhancement if you use Copilot.

Requirements

What Works Immediately

The app works out of the box with Web Speech.

no developer account required
no API key required
no paid plan required

That makes it the fastest way to start using the app.

Optional Upgrades

Capability	What you need	Why you would use it
Azure Speech	Azure Speech key and region, free tier available	Better speech recognition, punctuation, phrase boosting, multi-language scenarios
Whisper local	Download the whisper-server binary + a model in Settings (see Whisper Setup)	Local offline transcription, privacy, GPU acceleration
Prompt enhancement	GitHub Copilot CLI installed and logged in, plus a Copilot plan (including free tier). The Full installer bundles Node.js; Lite requires Node.js 20+ separately.	Turn rough dictation into cleaner prompts
MCP voice input	An MCP-capable client running on the same machine	Let AI tools request voice input through the popup

Without Azure, Whisper, Copilot, or MCP, the app is still useful as a standalone dictation and prompt-template tool.

Whisper Setup

Whisper runs entirely on your device using whisper.cpp. No data is sent to any cloud service. Setup differs by platform:

Windows

Open Settings → Speech → Whisper (Local).
Click Detect GPU to see your hardware and get a recommended binary variant (CPU, OpenBLAS, CUDA 11.8, or CUDA 12.4).
Select the variant and click Download whisper-server. The app downloads the pre-built binary from the whisper.cpp GitHub releases.
Download a model (e.g. "Base" for a good balance of speed and accuracy).
Select Whisper as your speech provider and start dictating.

macOS

Pre-built macOS binaries are not available in whisper.cpp releases. Install via Homebrew instead:

brew install whisper-cpp

This installs whisper-server with Metal acceleration on Apple Silicon. The app detects the Homebrew installation automatically.

Then open Settings, download a model, and select Whisper as your speech provider.

How It Works

The app manages a local whisper-server process that loads the model once and stays running during your dictation session. Audio is captured in the browser via an AudioWorklet, sent to the Rust backend, and forwarded as WAV via HTTP POST to the local server. This keeps the model in memory for fast inference (~50–150ms per decode cycle) without reloading on every request.

Downloads

Each release ships two installer variants per platform:

Variant	Includes Node.js	Copilot ready	Size
Full	✅	After installing the Copilot CLI	Larger (~30 MB extra)
Lite	❌	After installing Node.js 20+ and the Copilot CLI	Smaller

Both variants require the GitHub Copilot CLI to be installed and logged in for the prompt-enhancement feature. The only difference is whether Node.js is bundled.

Platform	Full	Lite
Windows	Full installer (.exe)	Lite installer (.exe)
macOS	Full DMG	Lite DMG

All releases are listed on the Releases page.

Uninstall Note

If you want a clean uninstall that also removes your local settings, history, templates, and other stored app data, open Settings and go to the General tab before uninstalling.

Use the app data removal action there. It permanently removes all app data and restarts the app, which is useful if you do not want local data to remain on the machine after uninstalling.

macOS Install Note

If macOS shows a message like Developer Voice to Prompt is damaged and can't be opened, that is Gatekeeper blocking an app that is not yet code-signed and notarized by Apple.

Use this sequence:

Download the DMG.
Drag the app into Applications.
Try to open it once.
If macOS blocks it, run:

xattr -cr "/Applications/Developer Voice to Prompt.app"

If you have not copied the app to Applications yet and are trying to launch it directly from the mounted DMG, use the mounted path instead:

xattr -cr "/Volumes/Developer Voice to Prompt/Developer Voice to Prompt.app"

Then open the app again. This is a one-time step for that downloaded copy.

Keyboard Shortcuts

All shortcuts are customizable in Settings.

Action	Windows	macOS
Show or hide popup	`Ctrl+Alt+V`	`Cmd+Alt+V`
Start or stop voice	`Ctrl+Shift+M`	`Cmd+Shift+M`
Copy and close	`Ctrl+Enter`	`Cmd+Enter`
Switch speech provider	`Ctrl+Shift+P`	`Cmd+Shift+P`
Enhance prompt	`Ctrl+Shift+E`	`Cmd+Shift+E`
Dismiss	`Esc`	`Esc`

Press ? in the popup to see active shortcuts.

Developer Docs

If you want to build from source, understand the architecture, or work on the project itself, see DEVELOPMENT.md.

License

MIT

Whisper is not true real-time transcription, so some latency can occur while speaking. ↩
CUDA on Windows (download the CUDA variant); Metal on macOS Apple Silicon (via Homebrew). ↩
Web Speech can be local or cloud-backed depending on the platform WebView implementation. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
.github		.github
.vscode		.vscode
doc		doc
scripts		scripts
src-tauri		src-tauri
src		src
.gitignore		.gitignore
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
svelte.config.js		svelte.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Developer Voice to Prompt (with MCP Support)

Why This Exists

Built For Developers

Core Workflows

Speak, Edit, Continue

Reuse Prompt Templates

Keep Your Prompt History

Enhance Rough Dictation Into Better Prompts

MCP Server For Voice Input

What It Does

Typical Use Cases

Practical Use Cases

Setup

Current MCP Behavior

Speech Engines

Typical Workflow

Requirements

What Works Immediately

Optional Upgrades

Whisper Setup

Windows

macOS

How It Works

Downloads

Uninstall Note

macOS Install Note

Keyboard Shortcuts

Developer Docs

License

About

Uh oh!

Releases 1

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Developer Voice to Prompt (with MCP Support)

Why This Exists

Built For Developers

Core Workflows

Speak, Edit, Continue

Reuse Prompt Templates

Keep Your Prompt History

Enhance Rough Dictation Into Better Prompts

MCP Server For Voice Input

What It Does

Typical Use Cases

Practical Use Cases

Setup

Current MCP Behavior

Speech Engines

Typical Workflow

Requirements

What Works Immediately

Optional Upgrades

Whisper Setup

Windows

macOS

How It Works

Downloads

Uninstall Note

macOS Install Note

Keyboard Shortcuts

Developer Docs

License

Footnotes

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Contributors

Uh oh!

Languages