CosyVoice

CosyVoice is a multilingual large voice generation model that offers a full-stack solution for training, inference, and deployment of high-quality TTS systems. The model supports multiple languages, including Chinese, English, Japanese, Korean, and a range of Chinese dialects such as Cantonese, Sichuanese, Shanghainese, Tianjinese, and Wuhanese. It is designed for zero-shot voice cloning and cross-lingual or mix-lingual scenarios, so a single reference voice can be used to synthesize speech across languages and in code-switching contexts. CosyVoice 2.0 significantly improves on version 1.0 by boosting accuracy, stability, speed, and overall speech quality, making it more suitable for production environments. The repository contains training recipes, inference pipelines, deployment scripts, and integration examples, positioning it as a comprehensive toolkit rather than just a set of model weights.

Features

Multilingual TTS with support for major languages and many Chinese dialects
Zero-shot voice cloning, including cross-lingual and code-switching speech synthesis
CosyVoice 2.0 architecture offering higher accuracy, stability, and faster generation than 1.0
End-to-end recipes for training, inference, and deployment in real applications
Integration examples with other FunAudioLLM components for emotional voice chat and complex audio agents
Open-source code and models that can run on standard GPU hardware with documented requirements

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow CosyVoice

CosyVoice Web Site

Other Useful Business Software

Build on Google Cloud with $300 in Free Credit

New to Google Cloud? Get $300 in free credit to explore Compute Engine, BigQuery, Cloud Run, Vertex AI, and 150+ other products.

Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query exabytes in BigQuery, or build AI apps with Vertex AI and Gemini. Once your credits are used, keep building with 20+ products with free monthly usage, including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. Sign up to start building right away.

Start Free Trial

Rate This Project

User Reviews

Be the first to post a review of CosyVoice!

Additional Project Details

Programming Language

Python

Related Categories

Python Text to Speech Software

Registered

2025-11-28

Similar Business Software

Qwen3-TTS

Qwen3-TTS is an open source series of advanced text-to-speech models developed by the Qwen team at Alibaba Cloud under the Apache-2.0 license, offering stable, expressive, and real-time speech generation with features such as voice cloning, voice design, and fine-grained control of prosody and...

See Software
Inworld TTS

Inworld TTS is a state-of-the-art text-to-speech platform designed to deliver ultra-realistic, context-aware speech synthesis and precise voice-cloning capabilities at a radically accessible price. The flagship model, TTS-1, is optimized for real-time applications and supports low-latency...

See Software
Google Cloud Speech-to-Text

Google Cloud’s Speech API processes more than 1 billion voice minutes per month with close to human levels of understanding for many commonly spoken languages. Powered by the best of Google's AI research and technology, Google Cloud's Speech-to-Text API helps you accurately transcribe speech...

See Software
AnyVoice

AnyVoice is an ultra-realistic AI voice generator that enables users to convert text into natural-sounding speech using advanced AI technology. It offers hundreds of voices and supports instant voice cloning with just a 3-second recording. It provides multi-language support for English,...

See Software
Kokoro TTS

Kokoro TTS is an efficient text-to-speech tool with multilingual and customizable voice support. Its 182M parameter architecture delivers high-quality audio, supporting languages like American English, British English, French, Korean, Japanese, and Mandarin. It features lifelike voice options,...

See Software
Murf AI

Murf API is an advanced text-to-speech (TTS) solution that transforms written text into natural, lifelike voiceovers with remarkable accuracy and ease. It empowers developers and businesses with a suite of sophisticated features, including pitch and speed modulation, audio duration adjustments,...

See Software

Report inappropriate content

CosyVoice

Multi-lingual large voice generation model, providing inference

Get an email when there's a new version of CosyVoice

Features

Project Samples

Project Activity

Categories

License

Follow CosyVoice

User Reviews

Additional Project Details

Programming Language

Related Categories

Registered