Spaces:

RickRossTN
/

ai-voice-chat

Running

App Files Files Community

ai-voice-chat / README.md

RickRossTN

Add arxiv citation for Supertonic TTS paper

c39e7ed 8 days ago

preview code

raw

history blame contribute delete

1.99 kB

	---
	title: AI Voice Chat
	emoji: 🎙️
	colorFrom: green
	colorTo: blue
	sdk: static
	pinned: false
	license: mit
	short_description: 100% in-browser, hands-free AI voice chat
	arxiv: "2503.23108"
	---

	# AI Voice Chat

	A 100% in-browser solution for hands-free AI voice chat. No API keys, no server, no data leaves your device. Uses Silero VAD, Whisper STT, WebLLM (Qwen 1.5B), and Supertonic TTS - all running locally via WebGPU.

	Swap in your own LLM - The built-in model is just a demo. The real value is the voice pipeline. Point it at Claude, GPT-4, Ollama, or any LLM in ~10 lines of code.

	## How It Works

	1. Click the green phone button to start a call
	2. Speak naturally - it detects when you're talking
	3. Wait for the response - the AI thinks and speaks back
	4. Click the red button to end the call

	## What's Running Locally

	\| Component \| Model \| Purpose \|
	\|-----------\|-------\|---------\|
	\| 🎤 Speech-to-Text \| Whisper \| Converts your voice to text \|
	\| 🧠 LLM \| Qwen 1.5B \| Generates responses (swappable) \|
	\| 🔊 Text-to-Speech \| Supertonic \| Speaks the response \|
	\| 👂 Voice Detection \| Silero VAD \| Knows when you're talking \|

	All models download once (~1GB) and are cached in your browser.

	## Requirements

	- Browser: Chrome 113+ or Edge 113+ (needs WebGPU)
	- RAM: ~4GB available
	- Microphone: Click "Allow" when prompted

	## Controls

	- 🎤 Mic button - Mute/unmute your microphone
	- 🔊 Speaker button - Mute/unmute the AI voice
	- 📢 Voice selector - Choose from 10 voices (F1-F5, M1-M5)
	- 📞 Phone button - Start/end the call

	## Privacy

	100% local. Your voice is processed in your browser. Nothing is sent to any server. The only network requests are to download the AI models (once, then cached).

	## Source Code

	Want to run this yourself or swap in a different LLM?

	👉 [GitHub Repository](https://github.com/iRelate-AI/voice-chat)

	Built with Next.js, WebLLM, Transformers.js, and Supertonic TTS.