AI Apps voicechat2

voicechat2: Enhanced Local Voice Communication

Cut text-to-speech costs with Unreal Speech. 11x cheaper than 11Labs. Production-ready. Stream in 300ms. Generate 10-hr audio. 48 voices. 8 languages. Per-word timestamps. 250K chars free. Try live demo:

Non-Fiction

Fiction

News

Blog

Conversation

0/250

Speed

0 s

Filesize

0 kb

Get Started for Free →

Try voicechat2 →

Overview of Voicechat2: Local SRT/LLM/TTS Voicechat Solution

Voicechat2 is an open-source project hosted on GitHub that provides a fully local AI-powered voice chat system using WebSockets. This application facilitates real-time voice communication with low latency, leveraging modern AI technologies for speech recognition, language modeling, and text-to-speech capabilities. It is designed to be modular, allowing users to swap out different components as needed.

Key Features

WebSocket Server: Enables simple remote access while maintaining fast communication.
Voice Activity Detection (VAD): Utilizes the ricky0123/vad for detecting speech within the audio.
Opus Support: Integrates symblai/opus-encdec for audio encoding and decoding.
Modular Components: Supports various servers for SRT (Speech Recognition Technology), LLM (Large Language Models), and TTS (Text-to-Speech):
- SRT Options: Includes whisper.cpp, faster-whisper, and HF Transformers whisper.
- LLM Options: Compatible with llama.cpp and any OpenAI API compatible server.
- TTS Options: Supports coqui-tts, StyleTTS2, Piper, and MeloTTS.

Performance

On an AMD RDNA3 7900-class card, the voice-to-voice latency is approximately 1 second.
On an NVIDIA 4090, using Faster Whisper with faster-distil-whisper-large-v2, latency can be reduced to as low as 300 milliseconds.

Installation

The installation process is tailored for Ubuntu LTS and assumes that the user has already set up ROCm or CUDA. It is recommended to use conda or mamba for environment management. The installation involves updating the system, installing necessary audio processing tools, and setting up the Python environment with required libraries.

Usage

Voicechat2 includes convenience scripts for launching all servers on a GPU machine in separate byobu sessions, and for establishing remote and local tunnels for easy connectivity.

Additional Information

License: Apache-2.0
Contributors: The project currently has contributions from Leonard (lhl) and Utku Ege Tuluk (uetuluk).
Languages Used: Primarily written in Python (74.2%), with HTML (23.0%) and Shell scripts (2.8%).

Related Projects

Voicechat2 is part of a broader ecosystem of AI voice chat projects. Similar projects include:

Speech To Speech: Focuses on a modular approach but is oriented towards local devices.
webrtc-ai-voice-chat: Uses WebRTC instead of WebSockets and shows higher latency.
june: A console-based local client using similar technologies.
GlaDOS: Offers VAD and interruption support with a console-based interface.
local-talking-llm: A proof of concept with a detailed blog write-up.

Voicechat2 stands out for its low-latency capabilities and modular design, making it a suitable choice for developers looking to implement or experiment with local voice chat systems powered by the latest AI technologies.