ChatTTS: A Generative Speech Model for Natural Dialogue and LLM Assistants
Summary
ChatTTS is an advanced text-to-speech model specifically designed for dialogue scenarios, such as those involving LLM assistants. It offers highly natural and expressive speech synthesis, featuring fine-grained control over prosodic elements like laughter, pauses, and interjections. This Python-based project supports both English and Chinese, making it a powerful tool for conversational AI applications.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
ChatTTS is a cutting-edge generative speech model specifically engineered for daily dialogue and conversational scenarios, particularly useful for LLM assistants. This project stands out by delivering highly natural and expressive speech synthesis, making interactions feel more human-like.
Key highlights of ChatTTS include:
- Conversational TTS: Optimized for dialogue, it enables natural and expressive speech synthesis, supporting multiple speakers for interactive conversations.
- Fine-grained Control: The model can predict and control detailed prosodic features, such as laughter, pauses, and interjections, adding significant realism.
- Better Prosody: ChatTTS surpasses many open-source TTS models in terms of prosody, offering pretrained models for further research and development.
Currently, ChatTTS supports both English and Chinese languages, with more languages planned for future releases.
Installation
Getting started with ChatTTS is straightforward. You can clone the repository and install dependencies, or install it directly via PyPI.
First, clone the repository:
git clone https://github.com/2noise/ChatTTS
cd ChatTTS
Then, install the required packages. You can do this directly or using a conda environment:
1. Install Directly:
pip install --upgrade -r requirements.txt
2. Install from Conda:
conda create -n chattts python=3.11
conda activate chattts
pip install -r requirements.txt
Alternatively, you can install ChatTTS from PyPI or directly from GitHub:
Install from PyPI (stable version):
pip install ChatTTS
Install the latest version from GitHub:
pip install git+https://github.com/2noise/ChatTTS
Examples
ChatTTS provides various ways to generate speech, from a user-friendly WebUI to command-line inference and direct Python integration.
Quick Start:
1. Launch WebUI:
python examples/web/webui.py
2. Infer by Command Line:
python examples/cmd/run.py "Your text 1." "Your text 2."
This will save audio to ./output_audio_n.mp3
.
Basic Usage (Python):
import ChatTTS
import torch
import torchaudio
chat = ChatTTS.Chat()
chat.load(compile=False) # Set to True for better performance
texts = ["PUT YOUR 1st TEXT HERE", "PUT YOUR 2nd TEXT HERE"]
wavs = chat.infer(texts)
for i in range(len(wavs)):
try:
torchaudio.save(f"basic_output{i}.wav", torch.from_numpy(wavs[i]).unsqueeze(0), 24000)
except:
torchaudio.save(f"basic_output{i}.wav", torch.from_numpy(wavs[i]), 24000)
For more advanced control over speaker timbre, temperature, and fine-grained prosodic features like [laugh]
or [uv_break]
, refer to the "Advanced Usage" section in the official GitHub README.
Why Use ChatTTS?
ChatTTS is an excellent choice for developers and researchers working on conversational AI, virtual assistants, or any application requiring highly natural and expressive speech. Its optimization for dialogue, coupled with fine-grained control over prosody, allows for the creation of truly engaging and lifelike audio experiences. The project's active development, multi-language support, and strong community backing further solidify its position as a leading open-source TTS solution.
Links
- GitHub Repository: https://github.com/2noise/ChatTTS
- Hugging Face Models: https://huggingface.co/2Noise/ChatTTS
- Discord Server: https://discord.gg/Ud5Jxgx5yD
- Awesome-ChatTTS (Community Index): https://github.com/libukai/Awesome-ChatTTS/tree/en