Spaces:

RickRossTN
/

ai-voice-chat

Running

App Files Files Community

ai-voice-chat / README.md

RickRossTN

Add arxiv citation for Supertonic TTS paper

c39e7ed 8 days ago

preview code

raw

history blame contribute delete

1.99 kB

metadata

title: AI Voice Chat
emoji: 🎙️
colorFrom: green
colorTo: blue
sdk: static
pinned: false
license: mit
short_description: 100% in-browser, hands-free AI voice chat
arxiv: '2503.23108'

AI Voice Chat

A 100% in-browser solution for hands-free AI voice chat. No API keys, no server, no data leaves your device. Uses Silero VAD, Whisper STT, WebLLM (Qwen 1.5B), and Supertonic TTS - all running locally via WebGPU.

Swap in your own LLM - The built-in model is just a demo. The real value is the voice pipeline. Point it at Claude, GPT-4, Ollama, or any LLM in ~10 lines of code.

How It Works

Click the green phone button to start a call
Speak naturally - it detects when you're talking
Wait for the response - the AI thinks and speaks back
Click the red button to end the call

What's Running Locally

Component	Model	Purpose
🎤 Speech-to-Text	Whisper	Converts your voice to text
🧠 LLM	Qwen 1.5B	Generates responses (swappable)
🔊 Text-to-Speech	Supertonic	Speaks the response
👂 Voice Detection	Silero VAD	Knows when you're talking

All models download once (~1GB) and are cached in your browser.

Requirements

Browser: Chrome 113+ or Edge 113+ (needs WebGPU)
RAM: ~4GB available
Microphone: Click "Allow" when prompted

Controls

🎤 Mic button - Mute/unmute your microphone
🔊 Speaker button - Mute/unmute the AI voice
📢 Voice selector - Choose from 10 voices (F1-F5, M1-M5)
📞 Phone button - Start/end the call

Privacy

100% local. Your voice is processed in your browser. Nothing is sent to any server. The only network requests are to download the AI models (once, then cached).

Source Code

Want to run this yourself or swap in a different LLM?

👉 GitHub Repository

Built with Next.js, WebLLM, Transformers.js, and Supertonic TTS.