Skip to content

Commit f1a240e

Browse files
feat: Add README.md and LICENSE
1 parent ccf9955 commit f1a240e

8 files changed

Lines changed: 367 additions & 67 deletions

File tree

.github/workflows/tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ jobs:
99
strategy:
1010
fail-fast: false
1111
matrix:
12-
os: [macos-latest, macos-13, ubuntu-latest, windows-latest]
12+
os: [macos-latest, macos-13, ubuntu-latest]
1313
php: [8.1, 8.2, 8.3, 8.4]
1414

1515
name: Tests PHP${{ matrix.php }} - ${{ matrix.os }}

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2024 Obikwelu Kyrian Sochima
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 299 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,299 @@
1+
# whisper.php
2+
3+
A PHP binding for [whisper.cpp](https://github.com/ggerganov/whisper.cpp/), enabling high-performance automatic speech
4+
recognition and transcription.
5+
6+
## Requirements
7+
8+
- PHP 8.1+
9+
- FFI Extension
10+
11+
## Platform Support
12+
13+
Currently, whisper.php supports the following platforms:
14+
15+
- Linux (x86_64 and arm64)
16+
- macOS (Apple Silicon and Intel)
17+
18+
Note: Windows support is currently in development. Contributions and help are welcome to expand platform compatibility!
19+
20+
## Features
21+
22+
Speech recognition can be complex, but it doesn't have to be. Whisper.php simplifies the process by providing:
23+
24+
- 🚀 High and low-level APIs
25+
- 📁 Model auto-downloading
26+
- 🎧 Support for various audio formats
27+
- 📝 Multiple output format exports
28+
- 🔊 Callback support for streaming and progress tracking
29+
30+
## Installation
31+
32+
Install the library using Composer:
33+
34+
```bash
35+
composer require codewithkyrian/whisper.php
36+
```
37+
38+
Whisper.php requires the FFI extension to be enabled. In your php.ini configuration file, uncomment or add:
39+
40+
```ini
41+
extension = ffi
42+
```
43+
44+
## Quick Start
45+
46+
### Low-Level API
47+
48+
The low-level API provides developers with granular control over the transcription process. It closely mimics the
49+
original C implementation,
50+
allowing for detailed configuration and manual segment processing:
51+
52+
```php
53+
// Initialize context with a model
54+
$contextParams = WhisperContextParameters::default();
55+
$ctx = new WhisperContext("path/to/model.bin", $contextParams);
56+
57+
// Create state and set parameters
58+
$state = $ctx->createState();
59+
$fullParams = WhisperFullParams::default()
60+
->withNThreads(4)
61+
...
62+
->withLanguage('en');
63+
64+
// Transcribe audio
65+
$state->full($pcm, $fullParams);
66+
67+
// Process segments
68+
$numSegments = $state->nSegments();
69+
for ($i = 0; $i < $numSegments; $i++) {
70+
$segment = $state->getSegmentText($i);
71+
$startTimestamp = $state->getSegmentStartTime($i);
72+
$endTimestamp = $state->getSegmentEndTime($i);
73+
74+
printf(
75+
"[%s - %s]: %s\n",
76+
toTimestamp($startTimestamp),
77+
toTimestamp($endTimestamp),
78+
$segment
79+
);
80+
}
81+
```
82+
83+
#### Model Loading
84+
85+
Downloading and managing whisper models can be a complex process. Whisper.php simplifies this with the ModelLoader, a
86+
convenient utility that
87+
streamlines model acquisition and management.
88+
89+
```php
90+
// Automatically download and load a model if it's already downloaded
91+
$modelPath = ModelLoader::loadModel('tiny.en', __DIR__.'/models');
92+
```
93+
94+
The `ModelLoader::loadModel()` method accepts two key parameters:
95+
96+
1. **Model Name**: Specify the model variant you want to use:
97+
- Supported base models: tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large, large.en
98+
- Note: Quantized models (q5, q8, etc.) are not supported by this utility
99+
2. **Model Directory**: Specify the local directory where models should be stored and searched
100+
101+
In the example above, it looks for `ggml-tiny.en.bin` in the `__DIR__./models` directory and if the model isn't found
102+
locally, it automatically downloads it
103+
from the official `whisper.cpp` huggingface repository
104+
105+
### Libraries Loading
106+
107+
Whisper.php relies on platform-specific shared libraries, which are automatically downloaded the first time you
108+
initialize a model context. While this may cause a slight delay on the initial run, the process is one-time (unless you
109+
update the library via Composer). Once the libraries are cached, subsequent runs will perform as expected.
110+
111+
#### Audio Input
112+
113+
THe Whisper model expects a float array of sampled audio data at 16kHz. While tools like ffmpeg can generate this data,
114+
Whisper.php provides a built-in helper function to simplify the process for you.
115+
116+
```php
117+
// Convenient audio reading function
118+
$pcm = readAudio($audioPath);
119+
```
120+
121+
The `readAudio()`helper function Supports multiple audio formats (MP3, WAV, OGG, M4A), automatically resamples to 16kHz
122+
and does these efficiently using `libsndfile` and `libsamplerate`
123+
124+
The low level approach is ideal for developers who need:
125+
126+
- Exact control over transcription parameters
127+
- Custom segment processing
128+
- Integration with existing complex audio processing pipelines
129+
130+
### High-Level API
131+
132+
For those seeking a more straightforward experience, the high-level API offers a simpler more abstracted workflow:
133+
134+
```php
135+
// Simple transcription
136+
$whisper = Whisper::fromPretrained('tiny.en', baseDir: __DIR__.'/models');
137+
$audio = readAudio(__DIR__.'/sounds/sample.wav');
138+
$segments = $whisper->transcribe($audio, 4);
139+
140+
// Accessing segment data
141+
foreach ($segments as $segment) {
142+
echo toTimestamp($segment->startTimestamp) . ': ' . $segment->text . "\n";
143+
}
144+
```
145+
146+
The Whisper::fromPretrained() method simplifies the entire setup process with three key parameters:
147+
148+
1. **Model Name**: Specify the whisper model variant (e.g., 'tiny.en', 'base', 'small.en')
149+
2. **Base Directory**: Specify where models should be stored and searched
150+
3. **Transcription Parameters**: Optionally customize transcription behavior
151+
152+
```php
153+
// Advanced usage with custom parameters
154+
$params = WhisperFullParams::default()
155+
->withNThreads(4)
156+
->withLanguage('en');
157+
158+
$whisper = Whisper::fromPretrained(
159+
'tiny.en', // Model name
160+
baseDir: __DIR__.'/models', // Model storage directory
161+
params: $params // Custom transcription parameters
162+
);
163+
```
164+
165+
The high-level API is perfect for quick prototyping, simple projects, or when you want to minimize boilerplate code
166+
while maintaining the power of the underlying whisper.cpp technology.
167+
168+
## Whisper Full Parameters
169+
170+
The `WhisperFullParams` offers a comprehensive and flexible configuration mechanism for fine-tuning the transcription
171+
process. It's designed with a fluent interface thus enabling method chaining and creating a clean, readable way to
172+
configure transcription parameters.
173+
174+
### Language Detection
175+
176+
While the whisper model is remarkably good at automatic language detection, there are scenarios where manually
177+
specifying the language can improve accuracy:
178+
179+
```php
180+
$fullParams = WhisperFullParams::default()
181+
->withLanguage('en'); // Specify two-letter language code eg. 'en' (English), 'de' (German), 'es' (Spanish)
182+
```
183+
184+
### Threading
185+
186+
Computational performance can be fine-tuned by adjusting the number of threads used during transcription:
187+
188+
```php
189+
$fullParams = WhisperFullParams::default()
190+
->withNThreads(8); // Default is 4
191+
```
192+
193+
More threads can speed up transcription on multi-core systems. For very short audio files however, more threads might
194+
introduce overhead. Experiment with thread counts to find the sweet spot for your specific use case and hardware
195+
configuration.
196+
197+
### Segment Callback
198+
199+
In many real-world applications, you'll want to process transcription segments as they're generated, rather than waiting
200+
for the entire transcription to complete.
201+
You can achieve that by providing a callback to the full params object that accepts a `SegmentData` object.
202+
203+
```php
204+
$fullParams = WhisperFullParams::default()
205+
->withSegmentCallback(function (SegmentData $data) {
206+
printf("[%s - %s]: %s\n",
207+
toTimestamp($data->startTimestamp),
208+
toTimestamp($data->endTimestamp),
209+
$data->text
210+
);
211+
})
212+
```
213+
214+
### Progress Callback
215+
216+
Provide a callback to the full params to get access to the transcription progress.
217+
218+
```php
219+
$fullParams = $fullParams
220+
->withProgressCallback(function (int $progress) {
221+
printf("Transcribing: %d%%\n", $progress);
222+
});
223+
```
224+
225+
There are lots of configurations in the `WhisperFullParams`. Modern IDEs with robust PHP intellisense will reveal a
226+
comprehensive list of configuration methods as you type, offering real-time suggestions and documentation for each
227+
parameter. Simply start
228+
typing `withXXX()` after `WhisperFullParams::default()`, and your IDE will guide you through the available configuration
229+
options.
230+
231+
## Exporting Outputs
232+
233+
Once you've generated your transcription segments, you'll often need to export them in various formats for different use
234+
cases. Whisper.php provides convenient helper methods to export transcription segments to the most popular and
235+
widely-used formats.
236+
The exported segments are derived from an array of `SegmentData` objects, each containing precise timestamp and text
237+
information.
238+
239+
```php
240+
outputTxt($segments, 'transcription.txt'); // Ideal for quick reading, documentation, or further text processing
241+
outputVtt($segments, 'subtitles.vtt'); // Primarily used for web-based video subtitles, compatible with HTML5 video players
242+
outputSrt($segments, 'subtitles.srt'); // Widely supported by media players, video editing software, and streaming platforms
243+
outputCsv($segments, 'transcription.csv'); // Perfect for data analysis and spreadsheet applications
244+
```
245+
246+
## Logging
247+
248+
Whisper.php provides flexible logging capabilities, fully compatible with PSR-3 standards, which means seamless
249+
integration with popular logging libraries like Monolog and Laravel's logging system.
250+
251+
By default, logging is disabled, but the library includes a built-in `WhisperLogger` that allows quick and easy logging:
252+
253+
```php
254+
// Log to a file
255+
Whisper::setLogger(new WhisperLogger('whisper.log'));
256+
257+
// Log to standard output
258+
Whisper::setLogger(new WhisperLogger(STDOUT));
259+
```
260+
261+
Just make sure to call this `setLogger` method before initializing your WhisperContext.
262+
263+
For more advanced logging needs, whisper.php integrates perfectly with Monolog, the most popular PHP logging library:
264+
265+
```php
266+
$monologLogger = new Logger('whisper');
267+
$monologLogger->pushHandler(new StreamHandler('whisper.log', Logger::DEBUG));
268+
$monologLogger->pushHandler(new FirePHPHandler());
269+
270+
// Set the Monolog logger
271+
Whisper::setLogger($monologLogger);
272+
```
273+
274+
OR with Laravel Application Logger
275+
276+
```php
277+
// Using Laravel's Log facade
278+
Whisper::setLogger(Log::getLogger());
279+
280+
// Or directly with Laravel's logger
281+
Whisper::setLogger(app('log'));
282+
```
283+
284+
## Contributing
285+
286+
Contributions are welcome! Especially for:
287+
288+
- Windows platform support
289+
- Additional features
290+
- Bug fixes
291+
292+
## License
293+
294+
This project is licensed under the MIT License. See
295+
the [LICENSE](https://github.com/codewithkyrian/whisper.php/blob/main/LICENSE) file for more information.
296+
297+
## Acknowledgements
298+
299+
- [whisper.cpp](https://github.com/ggerganov/whisper.cpp) - The underlying speech recognition technology

composer.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"name": "codewithkyrian/whisper.php",
33
"description": "PHP bindings for OpenAI Whisper made possible by whisper.cpp",
44
"type": "library",
5-
"version": "1.0.0",
5+
"version": "1.7.2",
66
"require": {
77
"php": "^8.1",
88
"ext-ffi": "*",

examples/high-level.php

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
<?php
2+
3+
declare(strict_types=1);
4+
5+
use Codewithkyrian\Whisper\ModelLoader;
6+
use Codewithkyrian\Whisper\SegmentData;
7+
use Codewithkyrian\Whisper\Whisper;
8+
use Codewithkyrian\Whisper\WhisperContext;
9+
use Codewithkyrian\Whisper\WhisperContextParameters;
10+
use Codewithkyrian\Whisper\WhisperException;
11+
use Codewithkyrian\Whisper\WhisperFullParams;
12+
13+
use function Codewithkyrian\Whisper\outputSrt;
14+
use function Codewithkyrian\Whisper\readAudio;
15+
use function Codewithkyrian\Whisper\toTimestamp;
16+
17+
require_once __DIR__.'/../vendor/autoload.php';
18+
19+
try {
20+
$fullParams = WhisperFullParams::default()
21+
->withNThreads(4);
22+
23+
$whisper = Whisper::fromPretrained('tiny.en', baseDir: __DIR__.'/models');
24+
25+
$audio = readAudio(__DIR__.'/sounds/jfk.wav');
26+
27+
$segments = $whisper->transcribe($audio, 4);
28+
29+
printf('Generated Segments: %d', count($segments));
30+
31+
// Create output files
32+
$transcriptionPath = __DIR__.'/outputs/transcription.srt';
33+
outputSrt($segments, $transcriptionPath);
34+
} catch (WhisperException $e) {
35+
fprintf(STDERR, "Whisper error: %s\n", $e->getMessage());
36+
exit(1);
37+
} catch (Exception $e) {
38+
fprintf(STDERR, "Error: %s\n", $e->getMessage());
39+
exit(1);
40+
}

0 commit comments

Comments
 (0)