AI Apps voicechat2

voicechat2: Enhanced Local Voice Communication

Cut text-to-speech costs with Unreal Speech. 11x cheaper than 11Labs. Production-ready. Stream in 300ms. Generate 10-hr audio. 48 voices. 8 languages. Per-word timestamps. 250K chars free. Try live demo:
Non-Fiction
Fiction
News
Blog
Conversation
0/250
Filesize
0 kb
Get Started for Free
voicechat2

voicechat2

Local voice chat system with real-time speech processing.

voicechat2

Overview of Voicechat2: Local SRT/LLM/TTS Voicechat Solution

Voicechat2 is an open-source project hosted on GitHub that provides a fully local AI-powered voice chat system using WebSockets. This application facilitates real-time voice communication with low latency, leveraging modern AI technologies for speech recognition, language modeling, and text-to-speech capabilities. It is designed to be modular, allowing users to swap out different components as needed.

Key Features

  • WebSocket Server: Enables simple remote access while maintaining fast communication.
  • Voice Activity Detection (VAD): Utilizes the ricky0123/vad for detecting speech within the audio.
  • Opus Support: Integrates symblai/opus-encdec for audio encoding and decoding.
  • Modular Components: Supports various servers for SRT (Speech Recognition Technology), LLM (Large Language Models), and TTS (Text-to-Speech):
    • SRT Options: Includes whisper.cpp, faster-whisper, and HF Transformers whisper.
    • LLM Options: Compatible with llama.cpp and any OpenAI API compatible server.
    • TTS Options: Supports coqui-tts, StyleTTS2, Piper, and MeloTTS.

Performance

  • On an AMD RDNA3 7900-class card, the voice-to-voice latency is approximately 1 second.
  • On an NVIDIA 4090, using Faster Whisper with faster-distil-whisper-large-v2, latency can be reduced to as low as 300 milliseconds.

Installation

The installation process is tailored for Ubuntu LTS and assumes that the user has already set up ROCm or CUDA. It is recommended to use conda or mamba for environment management. The installation involves updating the system, installing necessary audio processing tools, and setting up the Python environment with required libraries.

Usage

Voicechat2 includes convenience scripts for launching all servers on a GPU machine in separate byobu sessions, and for establishing remote and local tunnels for easy connectivity.

Additional Information

  • License: Apache-2.0
  • Contributors: The project currently has contributions from Leonard (lhl) and Utku Ege Tuluk (uetuluk).
  • Languages Used: Primarily written in Python (74.2%), with HTML (23.0%) and Shell scripts (2.8%).

Related Projects

Voicechat2 is part of a broader ecosystem of AI voice chat projects. Similar projects include:

  • Speech To Speech: Focuses on a modular approach but is oriented towards local devices.
  • webrtc-ai-voice-chat: Uses WebRTC instead of WebSockets and shows higher latency.
  • june: A console-based local client using similar technologies.
  • GlaDOS: Offers VAD and interruption support with a console-based interface.
  • local-talking-llm: A proof of concept with a detailed blog write-up.

Voicechat2 stands out for its low-latency capabilities and modular design, making it a suitable choice for developers looking to implement or experiment with local voice chat systems powered by the latest AI technologies.

Share voicechat2:

Related Apps

Audioread
Audioread
Use AI to listen to articles, PDFs, emails, etc in your podcast player. "Read" while walking, driving, cleaning, and more.
Taskade
Productivity Tools
Taskade
Enhances team efficiency with automated tasks and collaborative tools.
Deepgram
Speech Recognition
Deepgram
Provides high-quality speech-to-text and text-to-speech APIs.
Big Speak
Speech Recognition
Big Speak
Converts text to speech and speech to text efficiently.
Teamble AI
Performance Management
Teamble AI
Enhances employee feedback and performance management in organizations.
Prompto
AI Interaction Tools
Prompto
Web application for interacting with large language models.
Universal-1
Speech Recognition
Universal-1
Advanced speech transcription and analysis services for diverse applications.
VoicePen
AI Transcription Tools
VoicePen
Transforms spoken language into structured written text.
Wallow - Slack for makers
Product Development
Wallow - Slack for makers
Unified product development tool integrating communication, management, and payments.
FarHouse
Audio Networking
FarHouse
Audio spaces for community engagement and real-time discussions.
Vocaldo AI
Speech Recognition
Vocaldo AI
Transcribes and translates spoken language into text efficiently.
Speech to Note
Speech Recognition
Speech to Note
Converts spoken language into written text efficiently.
Silvia
Voice Dictation
Silvia
Multilingual voice dictation system for seamless language switching.
Swatle
Project Management
Swatle
Enhances team efficiency and collaboration with advanced management tools.
ElevenLabs Studio
AI Audio Tools
ElevenLabs Studio
Generates realistic speech and audio content in multiple languages.
Sign In