VoiceInk

A native macOS dictation app that turns your voice into polished text using local AI and cloud transcription.

RoleSolo Developer

TimelineEarly 2025

Stack

Tauri v2RustReact 19TypeScriptDeepgram APIOpenAI WhisperGroq SDKWeb Audio APIcpal (Rust audio)Tokio async runtimeTailwind CSSVite

The Challenge

The core challenge was bridging two worlds: a Rust backend capturing system-level audio with CPAL and a React frontend rendering the UI — all inside a single Tauri application that feels like a native macOS app, not a wrapped browser. Tauri v2 gives a WebView-based UI with a Rust core, but the integration layer required careful design of the Tauri command and event API to avoid blocking the main thread. The second challenge was latency. The pipeline had to be: record — encode — transmit to API — receive text — optionally rewrite with LLM — paste — all in under two seconds on average. This meant streaming audio to Deepgram in real-time rather than waiting for the full recording. A third challenge was global hotkey registration on macOS. System-level Fn-key interception requires accessibility permissions and careful event loop management to avoid conflicts with other apps.

Architectural Decisions

memory

Tauri v2 over Electron for native menubar footprint

Electron bundles Chromium and Node.js, resulting in a 150-300MB app that idles at 200MB RAM — unacceptable for a background menubar utility. Tauri v2 uses the system WebView and a Rust backend, producing a sub-5MB binary that idles under 20MB RAM.

stream

Streaming audio to Deepgram over batch upload

Rather than recording a full audio buffer and POSTing it to a transcription API, audio is streamed to Deepgram via WebSocket as it is captured. This lets Deepgram begin transcribing mid-sentence, returning interim results that the UI displays in real time.

mic

CPAL for cross-platform audio capture in Rust

Audio capture uses the cpal crate rather than calling macOS CoreAudio bindings directly. CPAL abstracts the host audio API while still exposing low-level buffer access, and it runs audio callbacks on a dedicated real-time thread.

auto_fix_high

Groq for sub-second LLM polishing

After transcription, an optional LLM pass rewrites the raw transcript into clean prose. Groq was chosen over OpenAI because its LPU inference delivers 300+ tokens/second, making a 100-word rewrite take under 400ms.

keyboard

Global Fn-key hotkey via Tauri plugin-global-shortcut

Registering a global hotkey that fires even when the app has no focused window requires macOS accessibility permissions. The state machine (idle — recording — transcribing — pasting) lives in Rust to prevent race conditions from rapid key presses.

content_paste

Pasteboard injection via AppleScript for universal paste

VoiceInk writes the transcribed text to the system pasteboard and then simulates Cmd+V via AppleScript. This universally works in any app with a text cursor, with no per-app integration needed.

wifi_off

Whisper local fallback for offline use

When Deepgram is unavailable or the user is offline, VoiceInk falls back to a local Whisper model running via whisper.cpp bindings. The local model is downloaded once on first use and runs fully on-device.

Impact

<2s

average voice-to-text latency with Deepgram

< 5MB

binary size vs 150MB+ for Electron alternatives

transcription providers with automatic failover

100%

universal paste — works in any macOS text field

Next Project