data:image/s3,"s3://crabby-images/cceee/cceee3531a13a0b627e7eb16dfda2bea7499241c" alt="Logo"
gittech. site
for different kinds of informations and explorations.
Whispr β Push to talk application for macOS in Rust
Published at
Dec 23, 2024
Main Article
data:image/s3,"s3://crabby-images/66abc/66abcfef7e785311b6044169dacd9d60847e126a" alt="Whispr Logo"
Whispr
Your voice, your keyboard, no cloud required ποΈ
Whispr is a macOS menubar application written in Rust for local voice-to-text transcription using Whisper.cpp.
Note: Apple Silicon is required to run Whispr.
Features
- Push-to-talk (right β Command key by default)
- Local processing
- Real-time transcription
- Menubar integration
- Configurable input and models
- Remove silence to prevent hallucination
- Custom vocabulary/dictionary based on config (to improve transcription quality with 'uncommon' words)
Usage
- The app requires a Whisper.cpp compatible model to be downloaded and placed in
~/.whispr/model.bin
- I highly recommend Whisper Large V3 Turbo
- Download link: ggml-large-v3-turbo.bin
mkdir -p ~/.whispr && wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin -O ~/.whispr/model.bin
- Launch Whispr
- Hold right β Command
- Speak
- Release to insert text
- Right click Whispr menubar to configure
Known Issues
- Startup experience is pretty rough, downloading the model and granting permissions.
- Silence removal is not tweaked yet and it is static, ideally it should be dynamic.
- Sometimes when right-clicking the menu bar icon, the menu doesn't open but flickers.
- Manually downloading the model is painful.
- The overlay lags when Whisper runs.
βοΈ Configuration
Whispr is highly configurable through its settings:
Audio Settings
- Choose input device
- Silence removal
- Recording options
Model Options
- Multiple Whisper models available
- Language selection
- Translation capabilities
Developer Features
- Save recordings for debugging
- Enable Whisper logging
- Detailed configuration options
Getting Started
- Download release
- Launch Whispr
- Configure settings (optional)
- Hold right β Command to speak
- Right click Whispr menubar to configure
data:image/s3,"s3://crabby-images/f997e/f997e1d5075c9b68315d93605b6c4447e52d2245" alt="Whispr Menubar Configuration"
Advanced usage
The advanced configuration for Whispr is located in ~/.whispr/settings.json
. Below is an example of the parameters you can configure:
{
"audio": {
"device_name": "MacBook Pro Microphone",
"remove_silence": true,
"silence_threshold": 0.9,
"min_silence_duration": 250,
"recordings_dir": ".whispr"
},
"developer": {
"save_recordings": true,
"whisper_logging": false
},
"whisper": {
"model_name": "base.en",
"language": "auto",
"translate": false,
"dictionary": ["USail", "CustomWord"]
},
"start_at_login": false,
"keyboard_shortcut": "right_command_key",
"model": {
"display_name": "Whisper Large v3 Turbo",
"url": "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin",
"filename": "ggml-large-v3-turbo.bin"
}
}
Roadmap
- Model Management: Automated model downloads
- Headless experience & redesign status icon
- The overlay is actually not needed at all, add a headless mode, use menubar icon coloring as recording indicator.
- Meeting mode with diarization and system audio recording
- Application context awareness
- We can use a small local model, feed it a OCR'ed version of the current active window, the cursor position and much more in a customizable prompt template to postprocess the transcription, allowing more expressive interaction.
- MLX-powered LLM post-processing
- Apple Vision API integration
- Add Windows support
- Replacements
- GitHub Actions for Builds and Releases
- Automate builds/releases using GitHub Actions.
- Brew formulae
Contributing
Open source project - contributions welcome.
License
MIT License
Made with β€οΈ in Germany together with Claude