AI Apps Zyphra Zonos

Zyphra Zonos: Cutting-Edge Real-Time Voice Cloning Technology

Cut text-to-speech costs with Unreal Speech. 11x cheaper than 11Labs. Production-ready. Stream in 300ms. Generate 10-hr audio. 48 voices. 8 languages. Per-word timestamps. 250K chars free. Try live demo:

Non-Fiction

Fiction

News

Blog

Conversation

0/250

Speed

0 s

Filesize

0 kb

Get Started for Free →

Try Zyphra Zonos →

Overview of Zyphra: Advanced Text-to-Speech Models

Zyphra introduces its latest development, the Zonos-v0.1 beta, a suite of text-to-speech (TTS) models designed for real-time, high-fidelity voice cloning. This release includes two models: a 1.6B transformer and a 1.6B hybrid, both of which are available under the Apache 2.0 license. The models are designed to produce expressive and natural speech from text inputs, and they are capable of voice cloning with short audio clips.

Key Features

Model Types: The suite includes two types of models:
- Transformer Model: Known for its effectiveness in handling sequential data.
- SSM Hybrid Model: The first open-source SSM model for TTS, exploring new capabilities in voice synthesis.
Voice Cloning: Capable of cloning voices with high fidelity using clips between 5 to 30 seconds.
Expressiveness: Generates expressive speech suitable for various applications, from audiobook narration to dynamic dialogues.
Emotion and Speech Modulation: Allows modulation of speech based on rate, pitch, quality, and emotions like sadness, happiness, and anger.
High-Quality Output: Delivers speech at a native resolution of 44KHz.

Training and Data

Extensive Training Data: Models are trained on approximately 200,000 hours of speech data, covering a wide range of speech types and emotions.
Language Support: Primarily supports English, with substantial datasets in Chinese, Japanese, French, Spanish, and German. Performance in other languages may vary.

Availability and Licensing

Open Source Licensing: Both models are released under the permissive Apache 2.0 license, facilitating broad use and further development by the community.
Access and Integration: Models can be accessed via Huggingface for weights and GitHub for sample inference code. Integration support is provided for Python and TypeScript through Zyphra’s API and model playground.

Pricing Structure

Flat-Rate Pricing: Usage is priced at $0.02 per minute.
Subscription Options:
- Free Tier: 100 free minutes per month.
- Pro Tier: 300 minutes for $5 per month.
- Custom Enterprise Tiers: Tailored solutions with unlimited voice cloning and no restrictions on concurrent generations.

Model Comparisons

Zyphra provides comparative samples of Zonos against both proprietary models like ElevenLabs and Cartesia, and open-source models such as FishSpeech-v1.5, to demonstrate the quality and capabilities of their models in realistic settings.

Zyphra’s Zonos-v0.1 models represent a significant step forward in the text-to-speech domain, offering tools for developers and content creators to generate high-quality, expressive speech outputs efficiently and affordably.