Spaces:
Running
Running
| title: AI Voice Chat | |
| emoji: ποΈ | |
| colorFrom: green | |
| colorTo: blue | |
| sdk: static | |
| pinned: false | |
| license: mit | |
| short_description: 100% in-browser, hands-free AI voice chat | |
| arxiv: "2503.23108" | |
| # AI Voice Chat | |
| **A 100% in-browser solution for hands-free AI voice chat.** No API keys, no server, no data leaves your device. Uses Silero VAD, Whisper STT, WebLLM (Qwen 1.5B), and Supertonic TTS - all running locally via WebGPU. | |
| **Swap in your own LLM** - The built-in model is just a demo. The real value is the voice pipeline. Point it at Claude, GPT-4, Ollama, or any LLM in ~10 lines of code. | |
| ## How It Works | |
| 1. **Click the green phone button** to start a call | |
| 2. **Speak naturally** - it detects when you're talking | |
| 3. **Wait for the response** - the AI thinks and speaks back | |
| 4. **Click the red button** to end the call | |
| ## What's Running Locally | |
| | Component | Model | Purpose | | |
| |-----------|-------|---------| | |
| | π€ Speech-to-Text | Whisper | Converts your voice to text | | |
| | π§ LLM | Qwen 1.5B | Generates responses (swappable) | | |
| | π Text-to-Speech | Supertonic | Speaks the response | | |
| | π Voice Detection | Silero VAD | Knows when you're talking | | |
| All models download once (~1GB) and are cached in your browser. | |
| ## Requirements | |
| - **Browser**: Chrome 113+ or Edge 113+ (needs WebGPU) | |
| - **RAM**: ~4GB available | |
| - **Microphone**: Click "Allow" when prompted | |
| ## Controls | |
| - π€ **Mic button** - Mute/unmute your microphone | |
| - π **Speaker button** - Mute/unmute the AI voice | |
| - π’ **Voice selector** - Choose from 10 voices (F1-F5, M1-M5) | |
| - π **Phone button** - Start/end the call | |
| ## Privacy | |
| **100% local.** Your voice is processed in your browser. Nothing is sent to any server. The only network requests are to download the AI models (once, then cached). | |
| ## Source Code | |
| Want to run this yourself or swap in a different LLM? | |
| π [GitHub Repository](https://github.com/iRelate-AI/voice-chat) | |
| Built with Next.js, WebLLM, Transformers.js, and Supertonic TTS. | |