|
| 1 | +# whisper.php |
| 2 | + |
| 3 | +A PHP binding for [whisper.cpp](https://github.com/ggerganov/whisper.cpp/), enabling high-performance automatic speech |
| 4 | +recognition and transcription. |
| 5 | + |
| 6 | +## Requirements |
| 7 | + |
| 8 | +- PHP 8.1+ |
| 9 | +- FFI Extension |
| 10 | + |
| 11 | +## Platform Support |
| 12 | + |
| 13 | +Currently, whisper.php supports the following platforms: |
| 14 | + |
| 15 | +- Linux (x86_64 and arm64) |
| 16 | +- macOS (Apple Silicon and Intel) |
| 17 | + |
| 18 | +Note: Windows support is currently in development. Contributions and help are welcome to expand platform compatibility! |
| 19 | + |
| 20 | +## Features |
| 21 | + |
| 22 | +Speech recognition can be complex, but it doesn't have to be. Whisper.php simplifies the process by providing: |
| 23 | + |
| 24 | +- 🚀 High and low-level APIs |
| 25 | +- 📁 Model auto-downloading |
| 26 | +- 🎧 Support for various audio formats |
| 27 | +- 📝 Multiple output format exports |
| 28 | +- 🔊 Callback support for streaming and progress tracking |
| 29 | + |
| 30 | +## Installation |
| 31 | + |
| 32 | +Install the library using Composer: |
| 33 | + |
| 34 | +```bash |
| 35 | +composer require codewithkyrian/whisper.php |
| 36 | +``` |
| 37 | + |
| 38 | +Whisper.php requires the FFI extension to be enabled. In your php.ini configuration file, uncomment or add: |
| 39 | + |
| 40 | +```ini |
| 41 | +extension = ffi |
| 42 | +``` |
| 43 | + |
| 44 | +## Quick Start |
| 45 | + |
| 46 | +### Low-Level API |
| 47 | + |
| 48 | +The low-level API provides developers with granular control over the transcription process. It closely mimics the |
| 49 | +original C implementation, |
| 50 | +allowing for detailed configuration and manual segment processing: |
| 51 | + |
| 52 | +```php |
| 53 | +// Initialize context with a model |
| 54 | +$contextParams = WhisperContextParameters::default(); |
| 55 | +$ctx = new WhisperContext("path/to/model.bin", $contextParams); |
| 56 | + |
| 57 | +// Create state and set parameters |
| 58 | +$state = $ctx->createState(); |
| 59 | +$fullParams = WhisperFullParams::default() |
| 60 | + ->withNThreads(4) |
| 61 | + ... |
| 62 | + ->withLanguage('en'); |
| 63 | + |
| 64 | +// Transcribe audio |
| 65 | +$state->full($pcm, $fullParams); |
| 66 | + |
| 67 | +// Process segments |
| 68 | +$numSegments = $state->nSegments(); |
| 69 | +for ($i = 0; $i < $numSegments; $i++) { |
| 70 | + $segment = $state->getSegmentText($i); |
| 71 | + $startTimestamp = $state->getSegmentStartTime($i); |
| 72 | + $endTimestamp = $state->getSegmentEndTime($i); |
| 73 | + |
| 74 | + printf( |
| 75 | + "[%s - %s]: %s\n", |
| 76 | + toTimestamp($startTimestamp), |
| 77 | + toTimestamp($endTimestamp), |
| 78 | + $segment |
| 79 | + ); |
| 80 | +} |
| 81 | +``` |
| 82 | + |
| 83 | +#### Model Loading |
| 84 | + |
| 85 | +Downloading and managing whisper models can be a complex process. Whisper.php simplifies this with the ModelLoader, a |
| 86 | +convenient utility that |
| 87 | +streamlines model acquisition and management. |
| 88 | + |
| 89 | +```php |
| 90 | +// Automatically download and load a model if it's already downloaded |
| 91 | +$modelPath = ModelLoader::loadModel('tiny.en', __DIR__.'/models'); |
| 92 | +``` |
| 93 | + |
| 94 | +The `ModelLoader::loadModel()` method accepts two key parameters: |
| 95 | + |
| 96 | +1. **Model Name**: Specify the model variant you want to use: |
| 97 | + - Supported base models: tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large, large.en |
| 98 | + - Note: Quantized models (q5, q8, etc.) are not supported by this utility |
| 99 | +2. **Model Directory**: Specify the local directory where models should be stored and searched |
| 100 | + |
| 101 | +In the example above, it looks for `ggml-tiny.en.bin` in the `__DIR__./models` directory and if the model isn't found |
| 102 | +locally, it automatically downloads it |
| 103 | +from the official `whisper.cpp` huggingface repository |
| 104 | + |
| 105 | +### Libraries Loading |
| 106 | + |
| 107 | +Whisper.php relies on platform-specific shared libraries, which are automatically downloaded the first time you |
| 108 | +initialize a model context. While this may cause a slight delay on the initial run, the process is one-time (unless you |
| 109 | +update the library via Composer). Once the libraries are cached, subsequent runs will perform as expected. |
| 110 | + |
| 111 | +#### Audio Input |
| 112 | + |
| 113 | +THe Whisper model expects a float array of sampled audio data at 16kHz. While tools like ffmpeg can generate this data, |
| 114 | +Whisper.php provides a built-in helper function to simplify the process for you. |
| 115 | + |
| 116 | +```php |
| 117 | +// Convenient audio reading function |
| 118 | +$pcm = readAudio($audioPath); |
| 119 | +``` |
| 120 | + |
| 121 | +The `readAudio()`helper function Supports multiple audio formats (MP3, WAV, OGG, M4A), automatically resamples to 16kHz |
| 122 | +and does these efficiently using `libsndfile` and `libsamplerate` |
| 123 | + |
| 124 | +The low level approach is ideal for developers who need: |
| 125 | + |
| 126 | +- Exact control over transcription parameters |
| 127 | +- Custom segment processing |
| 128 | +- Integration with existing complex audio processing pipelines |
| 129 | + |
| 130 | +### High-Level API |
| 131 | + |
| 132 | +For those seeking a more straightforward experience, the high-level API offers a simpler more abstracted workflow: |
| 133 | + |
| 134 | +```php |
| 135 | +// Simple transcription |
| 136 | +$whisper = Whisper::fromPretrained('tiny.en', baseDir: __DIR__.'/models'); |
| 137 | +$audio = readAudio(__DIR__.'/sounds/sample.wav'); |
| 138 | +$segments = $whisper->transcribe($audio, 4); |
| 139 | + |
| 140 | +// Accessing segment data |
| 141 | +foreach ($segments as $segment) { |
| 142 | + echo toTimestamp($segment->startTimestamp) . ': ' . $segment->text . "\n"; |
| 143 | +} |
| 144 | +``` |
| 145 | + |
| 146 | +The Whisper::fromPretrained() method simplifies the entire setup process with three key parameters: |
| 147 | + |
| 148 | +1. **Model Name**: Specify the whisper model variant (e.g., 'tiny.en', 'base', 'small.en') |
| 149 | +2. **Base Directory**: Specify where models should be stored and searched |
| 150 | +3. **Transcription Parameters**: Optionally customize transcription behavior |
| 151 | + |
| 152 | +```php |
| 153 | +// Advanced usage with custom parameters |
| 154 | +$params = WhisperFullParams::default() |
| 155 | + ->withNThreads(4) |
| 156 | + ->withLanguage('en'); |
| 157 | + |
| 158 | +$whisper = Whisper::fromPretrained( |
| 159 | + 'tiny.en', // Model name |
| 160 | + baseDir: __DIR__.'/models', // Model storage directory |
| 161 | + params: $params // Custom transcription parameters |
| 162 | +); |
| 163 | +``` |
| 164 | + |
| 165 | +The high-level API is perfect for quick prototyping, simple projects, or when you want to minimize boilerplate code |
| 166 | +while maintaining the power of the underlying whisper.cpp technology. |
| 167 | + |
| 168 | +## Whisper Full Parameters |
| 169 | + |
| 170 | +The `WhisperFullParams` offers a comprehensive and flexible configuration mechanism for fine-tuning the transcription |
| 171 | +process. It's designed with a fluent interface thus enabling method chaining and creating a clean, readable way to |
| 172 | +configure transcription parameters. |
| 173 | + |
| 174 | +### Language Detection |
| 175 | + |
| 176 | +While the whisper model is remarkably good at automatic language detection, there are scenarios where manually |
| 177 | +specifying the language can improve accuracy: |
| 178 | + |
| 179 | +```php |
| 180 | +$fullParams = WhisperFullParams::default() |
| 181 | + ->withLanguage('en'); // Specify two-letter language code eg. 'en' (English), 'de' (German), 'es' (Spanish) |
| 182 | +``` |
| 183 | + |
| 184 | +### Threading |
| 185 | + |
| 186 | +Computational performance can be fine-tuned by adjusting the number of threads used during transcription: |
| 187 | + |
| 188 | +```php |
| 189 | +$fullParams = WhisperFullParams::default() |
| 190 | + ->withNThreads(8); // Default is 4 |
| 191 | +``` |
| 192 | + |
| 193 | +More threads can speed up transcription on multi-core systems. For very short audio files however, more threads might |
| 194 | +introduce overhead. Experiment with thread counts to find the sweet spot for your specific use case and hardware |
| 195 | +configuration. |
| 196 | + |
| 197 | +### Segment Callback |
| 198 | + |
| 199 | +In many real-world applications, you'll want to process transcription segments as they're generated, rather than waiting |
| 200 | +for the entire transcription to complete. |
| 201 | +You can achieve that by providing a callback to the full params object that accepts a `SegmentData` object. |
| 202 | + |
| 203 | +```php |
| 204 | +$fullParams = WhisperFullParams::default() |
| 205 | + ->withSegmentCallback(function (SegmentData $data) { |
| 206 | + printf("[%s - %s]: %s\n", |
| 207 | + toTimestamp($data->startTimestamp), |
| 208 | + toTimestamp($data->endTimestamp), |
| 209 | + $data->text |
| 210 | + ); |
| 211 | + }) |
| 212 | +``` |
| 213 | + |
| 214 | +### Progress Callback |
| 215 | + |
| 216 | +Provide a callback to the full params to get access to the transcription progress. |
| 217 | + |
| 218 | +```php |
| 219 | +$fullParams = $fullParams |
| 220 | + ->withProgressCallback(function (int $progress) { |
| 221 | + printf("Transcribing: %d%%\n", $progress); |
| 222 | + }); |
| 223 | +``` |
| 224 | + |
| 225 | +There are lots of configurations in the `WhisperFullParams`. Modern IDEs with robust PHP intellisense will reveal a |
| 226 | +comprehensive list of configuration methods as you type, offering real-time suggestions and documentation for each |
| 227 | +parameter. Simply start |
| 228 | +typing `withXXX()` after `WhisperFullParams::default()`, and your IDE will guide you through the available configuration |
| 229 | +options. |
| 230 | + |
| 231 | +## Exporting Outputs |
| 232 | + |
| 233 | +Once you've generated your transcription segments, you'll often need to export them in various formats for different use |
| 234 | +cases. Whisper.php provides convenient helper methods to export transcription segments to the most popular and |
| 235 | +widely-used formats. |
| 236 | +The exported segments are derived from an array of `SegmentData` objects, each containing precise timestamp and text |
| 237 | +information. |
| 238 | + |
| 239 | +```php |
| 240 | +outputTxt($segments, 'transcription.txt'); // Ideal for quick reading, documentation, or further text processing |
| 241 | +outputVtt($segments, 'subtitles.vtt'); // Primarily used for web-based video subtitles, compatible with HTML5 video players |
| 242 | +outputSrt($segments, 'subtitles.srt'); // Widely supported by media players, video editing software, and streaming platforms |
| 243 | +outputCsv($segments, 'transcription.csv'); // Perfect for data analysis and spreadsheet applications |
| 244 | +``` |
| 245 | + |
| 246 | +## Logging |
| 247 | + |
| 248 | +Whisper.php provides flexible logging capabilities, fully compatible with PSR-3 standards, which means seamless |
| 249 | +integration with popular logging libraries like Monolog and Laravel's logging system. |
| 250 | + |
| 251 | +By default, logging is disabled, but the library includes a built-in `WhisperLogger` that allows quick and easy logging: |
| 252 | + |
| 253 | +```php |
| 254 | +// Log to a file |
| 255 | +Whisper::setLogger(new WhisperLogger('whisper.log')); |
| 256 | + |
| 257 | +// Log to standard output |
| 258 | +Whisper::setLogger(new WhisperLogger(STDOUT)); |
| 259 | +``` |
| 260 | + |
| 261 | +Just make sure to call this `setLogger` method before initializing your WhisperContext. |
| 262 | + |
| 263 | +For more advanced logging needs, whisper.php integrates perfectly with Monolog, the most popular PHP logging library: |
| 264 | + |
| 265 | +```php |
| 266 | +$monologLogger = new Logger('whisper'); |
| 267 | +$monologLogger->pushHandler(new StreamHandler('whisper.log', Logger::DEBUG)); |
| 268 | +$monologLogger->pushHandler(new FirePHPHandler()); |
| 269 | + |
| 270 | +// Set the Monolog logger |
| 271 | +Whisper::setLogger($monologLogger); |
| 272 | +``` |
| 273 | + |
| 274 | +OR with Laravel Application Logger |
| 275 | + |
| 276 | +```php |
| 277 | +// Using Laravel's Log facade |
| 278 | +Whisper::setLogger(Log::getLogger()); |
| 279 | + |
| 280 | +// Or directly with Laravel's logger |
| 281 | +Whisper::setLogger(app('log')); |
| 282 | +``` |
| 283 | + |
| 284 | +## Contributing |
| 285 | + |
| 286 | +Contributions are welcome! Especially for: |
| 287 | + |
| 288 | +- Windows platform support |
| 289 | +- Additional features |
| 290 | +- Bug fixes |
| 291 | + |
| 292 | +## License |
| 293 | + |
| 294 | +This project is licensed under the MIT License. See |
| 295 | +the [LICENSE](https://github.com/codewithkyrian/whisper.php/blob/main/LICENSE) file for more information. |
| 296 | + |
| 297 | +## Acknowledgements |
| 298 | + |
| 299 | +- [whisper.cpp](https://github.com/ggerganov/whisper.cpp) - The underlying speech recognition technology |
0 commit comments