Voicechat2 is an open-source project hosted on GitHub that provides a fully local AI-powered voice chat system using WebSockets. This application facilitates real-time voice communication with low latency, leveraging modern AI technologies for speech recognition, language modeling, and text-to-speech capabilities. It is designed to be modular, allowing users to swap out different components as needed.
ricky0123/vad
for detecting speech within the audio.symblai/opus-encdec
for audio encoding and decoding.whisper.cpp
, faster-whisper
, and HF Transformers whisper
.llama.cpp
and any OpenAI API compatible server.coqui-tts
, StyleTTS2
, Piper
, and MeloTTS
.Faster Whisper
with faster-distil-whisper-large-v2
, latency can be reduced to as low as 300 milliseconds.The installation process is tailored for Ubuntu LTS and assumes that the user has already set up ROCm or CUDA. It is recommended to use conda
or mamba
for environment management. The installation involves updating the system, installing necessary audio processing tools, and setting up the Python environment with required libraries.
Voicechat2 includes convenience scripts for launching all servers on a GPU machine in separate byobu
sessions, and for establishing remote and local tunnels for easy connectivity.
Voicechat2 is part of a broader ecosystem of AI voice chat projects. Similar projects include:
Voicechat2 stands out for its low-latency capabilities and modular design, making it a suitable choice for developers looking to implement or experiment with local voice chat systems powered by the latest AI technologies.