ChatTTS: A Generative Speech Model for Natural Dialogue and LLM Assistants

Introduction

ChatTTS is a cutting-edge generative speech model specifically engineered for daily dialogue and conversational scenarios, particularly useful for LLM assistants. This project stands out by delivering highly natural and expressive speech synthesis, making interactions feel more human-like.

Key highlights of ChatTTS include:

Conversational TTS: Optimized for dialogue, it enables natural and expressive speech synthesis, supporting multiple speakers for interactive conversations.
Fine-grained Control: The model can predict and control detailed prosodic features, such as laughter, pauses, and interjections, adding significant realism.
Better Prosody: ChatTTS surpasses many open-source TTS models in terms of prosody, offering pretrained models for further research and development.

Currently, ChatTTS supports both English and Chinese languages, with more languages planned for future releases.

Installation

Getting started with ChatTTS is straightforward. You can clone the repository and install dependencies, or install it directly via PyPI.

First, clone the repository:

git clone https://github.com/2noise/ChatTTS
cd ChatTTS

Then, install the required packages. You can do this directly or using a conda environment:

1. Install Directly:

pip install --upgrade -r requirements.txt

2. Install from Conda:

conda create -n chattts python=3.11
conda activate chattts
pip install -r requirements.txt

Alternatively, you can install ChatTTS from PyPI or directly from GitHub:

Install from PyPI (stable version):

pip install ChatTTS

Install the latest version from GitHub:

pip install git+https://github.com/2noise/ChatTTS

Examples

ChatTTS provides various ways to generate speech, from a user-friendly WebUI to command-line inference and direct Python integration.

Quick Start:

1. Launch WebUI:

python examples/web/webui.py

2. Infer by Command Line:

python examples/cmd/run.py "Your text 1." "Your text 2."

This will save audio to ./output_audio_n.mp3.

Basic Usage (Python):

import ChatTTS
import torch
import torchaudio

chat = ChatTTS.Chat()
chat.load(compile=False) # Set to True for better performance

texts = ["PUT YOUR 1st TEXT HERE", "PUT YOUR 2nd TEXT HERE"]

wavs = chat.infer(texts)

for i in range(len(wavs)):
    try:
        torchaudio.save(f"basic_output{i}.wav", torch.from_numpy(wavs[i]).unsqueeze(0), 24000)
    except:
        torchaudio.save(f"basic_output{i}.wav", torch.from_numpy(wavs[i]), 24000)

For more advanced control over speaker timbre, temperature, and fine-grained prosodic features like [laugh] or [uv_break], refer to the "Advanced Usage" section in the official GitHub README.

Why Use ChatTTS?

ChatTTS is an excellent choice for developers and researchers working on conversational AI, virtual assistants, or any application requiring highly natural and expressive speech. Its optimization for dialogue, coupled with fine-grained control over prosody, allows for the creation of truly engaging and lifelike audio experiences. The project's active development, multi-language support, and strong community backing further solidify its position as a leading open-source TTS solution.

ChatTTS: A Generative Speech Model for Natural Dialogue and LLM Assistants

Summary

Repository Info

Tags

Introduction

Installation

Examples

Why Use ChatTTS?

Links