Spaces:
Running
Running
metadata
title: AI Voice Chat
emoji: ποΈ
colorFrom: green
colorTo: blue
sdk: static
pinned: false
license: mit
short_description: 100% in-browser, hands-free AI voice chat
arxiv: '2503.23108'
AI Voice Chat
A 100% in-browser solution for hands-free AI voice chat. No API keys, no server, no data leaves your device. Uses Silero VAD, Whisper STT, WebLLM (Qwen 1.5B), and Supertonic TTS - all running locally via WebGPU.
Swap in your own LLM - The built-in model is just a demo. The real value is the voice pipeline. Point it at Claude, GPT-4, Ollama, or any LLM in ~10 lines of code.
How It Works
- Click the green phone button to start a call
- Speak naturally - it detects when you're talking
- Wait for the response - the AI thinks and speaks back
- Click the red button to end the call
What's Running Locally
| Component | Model | Purpose |
|---|---|---|
| π€ Speech-to-Text | Whisper | Converts your voice to text |
| π§ LLM | Qwen 1.5B | Generates responses (swappable) |
| π Text-to-Speech | Supertonic | Speaks the response |
| π Voice Detection | Silero VAD | Knows when you're talking |
All models download once (~1GB) and are cached in your browser.
Requirements
- Browser: Chrome 113+ or Edge 113+ (needs WebGPU)
- RAM: ~4GB available
- Microphone: Click "Allow" when prompted
Controls
- π€ Mic button - Mute/unmute your microphone
- π Speaker button - Mute/unmute the AI voice
- π’ Voice selector - Choose from 10 voices (F1-F5, M1-M5)
- π Phone button - Start/end the call
Privacy
100% local. Your voice is processed in your browser. Nothing is sent to any server. The only network requests are to download the AI models (once, then cached).
Source Code
Want to run this yourself or swap in a different LLM?
π GitHub Repository
Built with Next.js, WebLLM, Transformers.js, and Supertonic TTS.