ai-voice-chat / README.md
RickRossTN's picture
Add arxiv citation for Supertonic TTS paper
c39e7ed
---
title: AI Voice Chat
emoji: πŸŽ™οΈ
colorFrom: green
colorTo: blue
sdk: static
pinned: false
license: mit
short_description: 100% in-browser, hands-free AI voice chat
arxiv: "2503.23108"
---
# AI Voice Chat
**A 100% in-browser solution for hands-free AI voice chat.** No API keys, no server, no data leaves your device. Uses Silero VAD, Whisper STT, WebLLM (Qwen 1.5B), and Supertonic TTS - all running locally via WebGPU.
**Swap in your own LLM** - The built-in model is just a demo. The real value is the voice pipeline. Point it at Claude, GPT-4, Ollama, or any LLM in ~10 lines of code.
## How It Works
1. **Click the green phone button** to start a call
2. **Speak naturally** - it detects when you're talking
3. **Wait for the response** - the AI thinks and speaks back
4. **Click the red button** to end the call
## What's Running Locally
| Component | Model | Purpose |
|-----------|-------|---------|
| 🎀 Speech-to-Text | Whisper | Converts your voice to text |
| 🧠 LLM | Qwen 1.5B | Generates responses (swappable) |
| πŸ”Š Text-to-Speech | Supertonic | Speaks the response |
| πŸ‘‚ Voice Detection | Silero VAD | Knows when you're talking |
All models download once (~1GB) and are cached in your browser.
## Requirements
- **Browser**: Chrome 113+ or Edge 113+ (needs WebGPU)
- **RAM**: ~4GB available
- **Microphone**: Click "Allow" when prompted
## Controls
- 🎀 **Mic button** - Mute/unmute your microphone
- πŸ”Š **Speaker button** - Mute/unmute the AI voice
- πŸ“’ **Voice selector** - Choose from 10 voices (F1-F5, M1-M5)
- πŸ“ž **Phone button** - Start/end the call
## Privacy
**100% local.** Your voice is processed in your browser. Nothing is sent to any server. The only network requests are to download the AI models (once, then cached).
## Source Code
Want to run this yourself or swap in a different LLM?
πŸ‘‰ [GitHub Repository](https://github.com/iRelate-AI/voice-chat)
Built with Next.js, WebLLM, Transformers.js, and Supertonic TTS.