ai-voice-chat / README.md
RickRossTN's picture
Add arxiv citation for Supertonic TTS paper
c39e7ed
metadata
title: AI Voice Chat
emoji: πŸŽ™οΈ
colorFrom: green
colorTo: blue
sdk: static
pinned: false
license: mit
short_description: 100% in-browser, hands-free AI voice chat
arxiv: '2503.23108'

AI Voice Chat

A 100% in-browser solution for hands-free AI voice chat. No API keys, no server, no data leaves your device. Uses Silero VAD, Whisper STT, WebLLM (Qwen 1.5B), and Supertonic TTS - all running locally via WebGPU.

Swap in your own LLM - The built-in model is just a demo. The real value is the voice pipeline. Point it at Claude, GPT-4, Ollama, or any LLM in ~10 lines of code.

How It Works

  1. Click the green phone button to start a call
  2. Speak naturally - it detects when you're talking
  3. Wait for the response - the AI thinks and speaks back
  4. Click the red button to end the call

What's Running Locally

Component Model Purpose
🎀 Speech-to-Text Whisper Converts your voice to text
🧠 LLM Qwen 1.5B Generates responses (swappable)
πŸ”Š Text-to-Speech Supertonic Speaks the response
πŸ‘‚ Voice Detection Silero VAD Knows when you're talking

All models download once (~1GB) and are cached in your browser.

Requirements

  • Browser: Chrome 113+ or Edge 113+ (needs WebGPU)
  • RAM: ~4GB available
  • Microphone: Click "Allow" when prompted

Controls

  • 🎀 Mic button - Mute/unmute your microphone
  • πŸ”Š Speaker button - Mute/unmute the AI voice
  • πŸ“’ Voice selector - Choose from 10 voices (F1-F5, M1-M5)
  • πŸ“ž Phone button - Start/end the call

Privacy

100% local. Your voice is processed in your browser. Nothing is sent to any server. The only network requests are to download the AI models (once, then cached).

Source Code

Want to run this yourself or swap in a different LLM?

πŸ‘‰ GitHub Repository

Built with Next.js, WebLLM, Transformers.js, and Supertonic TTS.