Skip to content

OpenTalking

Overview

datascale-ai/opentalking

OpenTalking

datascale-ai/opentalking

Home
Home
- Quick Start
  Quick Start
- Usage
  Usage
  - Command Line Usage
    Command Line Usage
    
    Command Line Tools
    
    Advanced CLI Arguments
  - WebUI Usage
    WebUI Usage
    
    Basic Usage
    
    Custom Avatar
    
    Voice and TTS
    
    Video Clone
- Examples
  Examples
- Model Support
  Model Support
  - Model and Backend Selection
  - Local Audio + QuickTalk
  - Runtime Backends
    Runtime Backends
    
    Mock Backend
    
    Local Adapter
    
    Direct WebSocket
    
    OmniRT
  - Supported Models
    Supported Models
    
    Wav2Lip
    
    QuickTalk
    
    MuseTalk
    
    FlashTalk
    
    FlashHead
- Deployment Guide
  Deployment Guide
  - Support Matrix
  - Avatar Assets
  - Speech Models
    
    Speech Models
    
    Speech Recognition Models
    Speech Recognition Models
    
    Overview
    
    SenseVoice
    
    Speech Generation Models
    Speech Generation Models
    
    Overview
    
    CosyVoice
    
    IndexTTS
    
    Qwen3-TTS
  - Talking-Head Model Deployment
    
    Talking-Head Model Deployment
    
    Mock Backend
    
    QuickTalk
    QuickTalk
    
    Overview
    
    Local
    
    Apple Silicon
    
    OmniRT
    
    Wav2Lip
    Wav2Lip
    
    Overview Overview
    Table of contents
    
    Support Status
    
    Benchmark Reference
    
    Choose a Deployment Mode
    
    When to Choose Another Model
    
    Related Pages
    
    Local
    
    OmniRT
    
    MuseTalk
    MuseTalk
    
    Overview
    
    Local
    
    OmniRT
    
    FasterLivePortrait
    
    FlashTalk
    
    FlashHead
  - Deployment Recipes
    
    Deployment Recipes
    
    Local Audio + QuickTalk
- Reference Materials
  Reference Materials
  - Benchmark
  - Changelog
- FAQ
  FAQ
  - FAQ
Quick Start
Quick Start
Usage
Usage
- Command Line Usage
  Command Line Usage
  - Command Line Tools
  - Advanced CLI Arguments
- WebUI Usage
  WebUI Usage
Examples
Examples
Model Support
Model Support
- Model and Backend Selection
- Local Audio + QuickTalk
- Runtime Backends
  Runtime Backends
- Supported Models
  Supported Models
  - Wav2Lip
  - QuickTalk
  - MuseTalk
  - FlashTalk
  - FlashHead
Deployment Guide
Deployment Guide
- Support Matrix
- Avatar Assets
- Speech Models
  Speech Models
  - Speech Recognition Models
    Speech Recognition Models
    
    Overview
    
    SenseVoice
  - Speech Generation Models
    Speech Generation Models
    
    Overview
    
    CosyVoice
    
    IndexTTS
    
    Qwen3-TTS
- Talking-Head Model Deployment
  Talking-Head Model Deployment
  - Mock Backend
  - QuickTalk
    QuickTalk
    
    Overview
    
    Local
    
    Apple Silicon
    
    OmniRT
  - Wav2Lip
    Wav2Lip
    
    Overview Overview
    Table of contents
    
    Support Status
    
    Benchmark Reference
    
    Choose a Deployment Mode
    
    When to Choose Another Model
    
    Related Pages
    
    Local
    
    OmniRT
  - MuseTalk
    MuseTalk
    
    Overview
    
    Local
    
    OmniRT
  - FasterLivePortrait
  - FlashTalk
  - FlashHead
- Deployment Recipes
  Deployment Recipes
  - Local Audio + QuickTalk
Reference Materials
Reference Materials
- Benchmark
- Changelog
FAQ
FAQ
- FAQ

Wav2Lip¶

Wav2Lip is the recommended first real lip-sync model path in OpenTalking. It is lighter than heavyweight talking-head models and is useful when moving from mock to real video output and testing the end-to-end audio-driven video chain.

Support Status¶

Item	Value
Model ID	`wav2lip`
Backend	`local` / `omnirt`
Evidence level	Local adapter is built in; OmniRT compatibility path is documented
Best for	First real lip-sync model, lightweight demos, low-cost pipeline validation

Benchmark Reference¶

The numbers below are summarized from Benchmark. Steady FPS is model-generation throughput, not full user-perceived latency; STT, LLM, TTS, queueing, and WebRTC still affect the complete experience.

Hardware	Backend	Output	Steady FPS	First-turn total/ms	TTFV/ms	Peak inference VRAM/GB
RTX 3090	OmniRT	498×832 / 30fps	37.269	3002.526	1625.962	7.928
RTX 4090	OmniRT	498×832 / 30fps	31.542	3689.764	1955.629	8.133
NPU 910B2	OmniRT	498×832 / 30fps	23.945	4019.564	2615.322	9.113

Choose a Deployment Mode¶

Mode	Best for	Entry
Local	Single-machine deployment, minimal moving parts, first real lip-sync validation	Wav2Lip Local Deployment
OmniRT	Isolated inference service, OmniRT preloading, and device configuration	Wav2Lip OmniRT Deployment

When to Choose Another Model¶

Need lower-latency realtime speaking: see QuickTalk.
Need higher quality or official MuseTalk preprocessing: see MuseTalk.
Need a heavyweight high-quality private deployment: see FlashTalk.