--- title: AI Voice Chat emoji: 🎙️ colorFrom: green colorTo: blue sdk: static pinned: false license: mit short_description: 100% in-browser, hands-free AI voice chat arxiv: "2503.23108" --- # AI Voice Chat **A 100% in-browser solution for hands-free AI voice chat.** No API keys, no server, no data leaves your device. Uses Silero VAD, Whisper STT, WebLLM (Qwen 1.5B), and Supertonic TTS - all running locally via WebGPU. **Swap in your own LLM** - The built-in model is just a demo. The real value is the voice pipeline. Point it at Claude, GPT-4, Ollama, or any LLM in ~10 lines of code. ## How It Works 1. **Click the green phone button** to start a call 2. **Speak naturally** - it detects when you're talking 3. **Wait for the response** - the AI thinks and speaks back 4. **Click the red button** to end the call ## What's Running Locally | Component | Model | Purpose | |-----------|-------|---------| | 🎤 Speech-to-Text | Whisper | Converts your voice to text | | 🧠 LLM | Qwen 1.5B | Generates responses (swappable) | | 🔊 Text-to-Speech | Supertonic | Speaks the response | | 👂 Voice Detection | Silero VAD | Knows when you're talking | All models download once (~1GB) and are cached in your browser. ## Requirements - **Browser**: Chrome 113+ or Edge 113+ (needs WebGPU) - **RAM**: ~4GB available - **Microphone**: Click "Allow" when prompted ## Controls - 🎤 **Mic button** - Mute/unmute your microphone - 🔊 **Speaker button** - Mute/unmute the AI voice - 📢 **Voice selector** - Choose from 10 voices (F1-F5, M1-M5) - 📞 **Phone button** - Start/end the call ## Privacy **100% local.** Your voice is processed in your browser. Nothing is sent to any server. The only network requests are to download the AI models (once, then cached). ## Source Code Want to run this yourself or swap in a different LLM? 👉 [GitHub Repository](https://github.com/iRelate-AI/voice-chat) Built with Next.js, WebLLM, Transformers.js, and Supertonic TTS.