Skip to content

OpenTalking

Overview

datascale-ai/opentalking

OpenTalking

datascale-ai/opentalking

Home
Home
- Quick Start
  Quick Start
- Usage
  Usage
  - Command Line Usage
    Command Line Usage
    
    Command Line Tools
    
    Advanced CLI Arguments
  - WebUI Usage
    WebUI Usage
    
    Basic Usage
    
    Custom Avatar
    
    Voice and TTS
    
    Video Clone
- Examples
  Examples
- Model Support
  Model Support
  - Model and Backend Selection
  - Local Audio + QuickTalk
  - Runtime Backends
    Runtime Backends
    
    Mock Backend
    
    Local Adapter
    
    Direct WebSocket
    
    OmniRT
  - Supported Models
    Supported Models
    
    Wav2Lip
    
    QuickTalk
    
    MuseTalk
    
    FlashTalk
    
    FlashHead
- Deployment Guide
  Deployment Guide
  - Support Matrix
  - Avatar Assets
  - Speech Models
    
    Speech Models
    
    Speech Recognition Models
    Speech Recognition Models
    
    Overview
    
    SenseVoice
    
    Speech Generation Models
    Speech Generation Models
    
    Overview
    
    CosyVoice
    
    IndexTTS
    
    Qwen3-TTS
  - Talking-Head Model Deployment
    
    Talking-Head Model Deployment
    
    Mock Backend
    
    QuickTalk
    QuickTalk
    
    Overview
    
    Local
    
    Apple Silicon
    
    OmniRT
    
    Wav2Lip
    Wav2Lip
    
    Overview
    
    Local
    
    OmniRT
    
    MuseTalk
    MuseTalk
    
    Overview
    
    Local
    
    OmniRT
    
    FasterLivePortrait
    
    FlashTalk
    
    FlashHead
  - Deployment Recipes
    
    Deployment Recipes
    
    Local Audio + QuickTalk
- Reference Materials
  Reference Materials
  - Benchmark
  - Changelog
- FAQ
  FAQ
  - FAQ
Quick Start
Quick Start
Usage
Usage
- Command Line Usage
  Command Line Usage
  - Command Line Tools
  - Advanced CLI Arguments
- WebUI Usage
  WebUI Usage
Examples
Examples
Model Support
Model Support
- Model and Backend Selection
- Local Audio + QuickTalk
- Runtime Backends
  Runtime Backends
- Supported Models
  Supported Models
  - Wav2Lip
  - QuickTalk
  - MuseTalk
  - FlashTalk
  - FlashHead
Deployment Guide
Deployment Guide
- Support Matrix
- Avatar Assets
- Speech Models
  Speech Models
  - Speech Recognition Models
    Speech Recognition Models
    
    Overview
    
    SenseVoice
  - Speech Generation Models
    Speech Generation Models
    
    Overview
    
    CosyVoice
    
    IndexTTS
    
    Qwen3-TTS
- Talking-Head Model Deployment
  Talking-Head Model Deployment
  - Mock Backend
  - QuickTalk
    QuickTalk
    
    Overview
    
    Local
    
    Apple Silicon
    
    OmniRT
  - Wav2Lip
    Wav2Lip
    
    Overview
    
    Local
    
    OmniRT
  - MuseTalk
    MuseTalk
    
    Overview
    
    Local
    
    OmniRT
  - FasterLivePortrait
  - FlashTalk
  - FlashHead
- Deployment Recipes
  Deployment Recipes
  - Local Audio + QuickTalk
Reference Materials
Reference Materials
- Benchmark
- Changelog
FAQ
FAQ
- FAQ

Speech Models¶

This directory collects speech-related model deployment, weight download, and verification for OpenTalking. Speech models are split into two groups:

Speech Recognition Models: convert microphone or uploaded audio into text; locally deployable models include SenseVoice.
Speech Generation Models: convert LLM text output into audio; locally deployable models include CosyVoice, IndexTTS, and Qwen3-TTS.

The LLM decides what to say and is not classified as a speech model; this section covers input recognition and output synthesis.