VoiceInk
A native macOS dictation app that turns your voice into polished text using local AI and cloud transcription.
The Challenge
The core challenge was bridging two worlds: a Rust backend capturing system-level audio with CPAL and a React frontend rendering the UI — all inside a single Tauri application that feels like a native macOS app, not a wrapped browser. Tauri v2 gives a WebView-based UI with a Rust core, but the integration layer required careful design of the Tauri command and event API to avoid blocking the main thread. The second challenge was latency. The pipeline had to be: record — encode — transmit to API — receive text — optionally rewrite with LLM — paste — all in under two seconds on average. This meant streaming audio to Deepgram in real-time rather than waiting for the full recording. A third challenge was global hotkey registration on macOS. System-level Fn-key interception requires accessibility permissions and careful event loop management to avoid conflicts with other apps.
Architectural Decisions
Tauri v2 over Electron for native menubar footprint
Electron bundles Chromium and Node.js, resulting in a 150-300MB app that idles at 200MB RAM — unacceptable for a background menubar utility. Tauri v2 uses the system WebView and a Rust backend, producing a sub-5MB binary that idles under 20MB RAM.
Streaming audio to Deepgram over batch upload
Rather than recording a full audio buffer and POSTing it to a transcription API, audio is streamed to Deepgram via WebSocket as it is captured. This lets Deepgram begin transcribing mid-sentence, returning interim results that the UI displays in real time.
CPAL for cross-platform audio capture in Rust
Audio capture uses the cpal crate rather than calling macOS CoreAudio bindings directly. CPAL abstracts the host audio API while still exposing low-level buffer access, and it runs audio callbacks on a dedicated real-time thread.
Groq for sub-second LLM polishing
After transcription, an optional LLM pass rewrites the raw transcript into clean prose. Groq was chosen over OpenAI because its LPU inference delivers 300+ tokens/second, making a 100-word rewrite take under 400ms.
Global Fn-key hotkey via Tauri plugin-global-shortcut
Registering a global hotkey that fires even when the app has no focused window requires macOS accessibility permissions. The state machine (idle — recording — transcribing — pasting) lives in Rust to prevent race conditions from rapid key presses.
Pasteboard injection via AppleScript for universal paste
VoiceInk writes the transcribed text to the system pasteboard and then simulates Cmd+V via AppleScript. This universally works in any app with a text cursor, with no per-app integration needed.
Whisper local fallback for offline use
When Deepgram is unavailable or the user is offline, VoiceInk falls back to a local Whisper model running via whisper.cpp bindings. The local model is downloaded once on first use and runs fully on-device.