Models¶

This module explains how to make the full OpenTalking model chain runnable, not only the talking-head backend. A usable digital-human session depends on five parts:

flowchart LR
    STT[Speech recognition<br/>optional voice input]
    LLM[LLM<br/>decides what to say]
    TTS[TTS<br/>text to audio]
    Avatar[Avatar assets<br/>image / frames / template]
    Head[Talking-head backend<br/>audio to video]
    WebRTC[WebRTC<br/>browser delivery]

    STT --> LLM --> TTS --> Head --> WebRTC
    Avatar --> Head

Recommended defaults¶

Layer	Default for first run	When to change it
LLM	DashScope OpenAI-compatible endpoint	Use OpenAI, vLLM, Ollama, or DeepSeek when those are already standard in your environment.
STT	DashScope Paraformer realtime	Keep it unless you need a different realtime STT provider.
TTS	Edge TTS	Use DashScope, CosyVoice, or ElevenLabs for production voices and voice cloning.
Avatar assets	Built-in examples	Prepare model-specific assets before selecting Wav2Lip, QuickTalk, FlashHead, or FlashTalk.
Talking-head backend	`mock` first, then the Wav2Lip local path	Use QuickTalk local/OmniRT, FlashTalk through OmniRT, FlashHead direct WS, or another model service.

Setup order¶

Run Quickstart with mock.
Check the Support Matrix to choose the right path.
Configure LLM and STT.
Choose and verify TTS.
Prepare Avatar assets.
Start a talking-head model.
Verify /models, create a session, and test through the browser.

Model Shortcuts¶

Goal	Entry
End-to-end self-test with no weights	Mock
First real lip-sync model	Wav2Lip Local
Local STT/TTS + QuickTalk	Local STT/TTS + QuickTalk
V100 single-host FasterLivePortrait + FlashHead	V100 + FasterLivePortrait + FlashHead
Existing MuseTalk runtime	MuseTalk with OmniRT
Local realtime adapter	QuickTalk Local
Single-GPU realtime portrait with pasteback	FasterLivePortrait
High-quality heavy model	FlashTalk
Standalone FlashHead service	FlashHead

Keep model execution decoupled from OpenTalking itself: lightweight models should use local or direct_ws where possible, while OmniRT remains the recommended backend for heavyweight, multi-card, remote, or NPU deployments.

Frontend Entry¶

After the model or backend service is running, use the OpenTalking WebUI:

Terminal

cd "$OPENTALKING_HOME"
bash scripts/quickstart/start_frontend.sh --api-port 8000 --web-port 5173 --host 0.0.0.0

For a remote server, forward your local browser port to the server 5173, then open http://127.0.0.1:5173.