AI Apps Kolors

Kolors: Text-to-Image Diffusion Magic

Cut text-to-speech costs with Unreal Speech. 11x cheaper than 11Labs. Production-ready. Stream in 300ms. Generate 10-hr audio. 48 voices. 8 languages. Per-word timestamps. 250K chars free. Try live demo:

Non-Fiction

Fiction

News

Blog

Conversation

0/250

Speed

0 s

Filesize

0 kb

Get Started for Free →

Try Kolors →

Overview of Kolors: A Text-to-Image Generation Model

Kolors is a text-to-image generation model developed by the Kuaishou Kolors team. It utilizes advanced latent diffusion techniques to produce high-quality images from textual descriptions. This model is designed to handle a wide range of image synthesis tasks, supporting inputs in both Chinese and English.

Key Features

Multilingual Support: Kolors is proficient in generating images from both Chinese and English text inputs.
Large Training Dataset: The model has been trained on billions of text-image pairs, enhancing its ability to understand and generate complex semantic content accurately.
Advanced Model Architecture: Incorporates latent diffusion techniques, which contribute to the high visual quality of the generated images.

Recent Updates

2024.11.13: Release of Kolors-Portrait-with-Flux and Kolors-Character-With-Flux on HuggingFace Space.
2024.09.01: Launch of Kolors-Virtual-Try-On, a virtual try-on demo.
2024.08.06: Introduction of Pose ControlNet.
2024.07.31: Release of Kolors-IP-Adapter-FaceID-Plus weights and inference code.
2024.07.12: Integration with Diffusers for enhanced accessibility and usage.

Evaluation

Kolors has been rigorously evaluated against other state-of-the-art models through a dataset named KolorsPrompts, which includes over 1,000 prompts across various categories and dimensions. The evaluation process involved both human and machine assessments, where Kolors demonstrated superior performance in terms of visual appeal, text faithfulness, and overall satisfaction.

Comparative Performance

Human Assessment: Kolors achieved the highest scores in overall satisfaction and visual appeal.
Machine Assessment: It scored the highest on the Multi-dimensional Human Preference Score (MPS), confirming the results from human evaluations.

Usage

System Requirements

Python 3.8 or later
PyTorch 1.13.1 or later
Transformers 4.26.1 or later
Recommended: CUDA 11.7 or later

Installation and Setup

Clone the repository and install dependencies:

git clone https://github.com/Kwai-Kolors/Kolors
cd Kolors
conda create --name kolors python=3.8
conda activate kolors
pip install -r requirements.txt
python3 setup.py install

Download model weights:

huggingface-cli download --resume-download Kwai-Kolors/Kolors --local-dir weights/Kolors

Running Inference

To generate an image from text:

python3 scripts/sample.py "一张瓢虫的照片，微距，变焦，高质量，电影，拿着一个牌子，写着‘可图’"

Web Demo

For a more interactive experience, users can run a web demo:

python3 scripts/sampleui.py

Licensing

Kolors is released under the Apache-2.0 license, allowing for both academic and commercial use.

For more detailed information, including further technical details and access to additional resources such as technical reports and community discussions, users are encouraged to visit the official GitHub repository of Kolors.