Talking-Head Models¶
This page turns the talking-head backend abstraction into runnable model paths. It covers where weights belong, how to download them from international and China-friendly sources, how to start each backend, and how to verify that OpenTalking can create a session.
OpenTalking is the orchestration layer. Model execution is selected per model:
| Model | Backend status | Recommended first path | Weight requirement |
|---|---|---|---|
mock |
mock |
Built-in self-test | None |
wav2lip |
omnirt for compatibility; local-first target |
Lightweight local or direct backend; OmniRT is the current runnable compatibility path | Wav2Lip + S3FD checkpoints |
musetalk |
omnirt |
OmniRT or a future local adapter | MuseTalk 1.5 weights |
quicktalk |
local |
Local adapter | QuickTalk hdModule asset bundle |
flashtalk |
omnirt |
OmniRT on CUDA or Ascend | SoulX-FlashTalk-14B + wav2vec2 |
flashhead |
direct_ws |
External FlashHead WebSocket service | Managed by the FlashHead service |
Shared layout¶
Use one parent directory for OpenTalking, optional backend services, models, logs, and
runtime files. OmniRT is needed only for models configured with backend: omnirt.
export DIGITAL_HUMAN_HOME="$HOME/digital-human"
export OMNIRT_MODEL_ROOT="$DIGITAL_HUMAN_HOME/models"
mkdir -p "$DIGITAL_HUMAN_HOME" "$OMNIRT_MODEL_ROOT"
cd "$DIGITAL_HUMAN_HOME"
Expected layout:
$DIGITAL_HUMAN_HOME/
├── opentalking/
├── omnirt/ # optional, for backend: omnirt
├── models/
│ ├── wav2lip/
│ ├── SoulX-FlashTalk-14B/
│ ├── chinese-wav2vec2-base/
│ └── quicktalk/
├── logs/
└── run/
Install OpenTalking first:
git clone https://github.com/datascale-ai/opentalking.git
cd opentalking
uv sync --extra dev --python 3.11
source .venv/bin/activate
cp .env.example .env
Set at least the LLM/STT credentials in .env:
Download tools¶
International environments can use Hugging Face directly:
uv pip install -U "huggingface_hub[cli]"
hf auth login # optional, required for gated/private models
China-friendly environments can use ModelScope when the model is mirrored there:
ModelScope examples:
# Snapshot-style download.
modelscope download --model <namespace>/<model> --local_dir "$OMNIRT_MODEL_ROOT/<target>"
# Python fallback when the CLI version differs.
python - <<'PY'
from modelscope.hub.snapshot_download import snapshot_download
snapshot_download("<namespace>/<model>", local_dir="<target-dir>")
PY
MagicLego/Modelers mirrors are also useful in China when a community or vendor mirror exists. Use the model page or Git/LFS instructions provided by that page, and keep the same target directory names used below. Start from:
Mock¶
mock is the fastest end-to-end path. It exercises the API, frontend, LLM, STT, TTS,
events, and WebRTC without model weights.
Open http://127.0.0.1:5173, select demo-avatar, then select mock.
Verify:
Expected status:
Wav2Lip¶
Wav2Lip is the recommended first real model because it is lightweight and easy to
debug. The product default should be local or a single-model direct backend, not a
mandatory OmniRT dependency. The current release keeps backend: omnirt as a
compatibility default because the bundled local Wav2Lip adapter is not complete yet;
the steps below are the runnable compatibility path.
1. Download weights¶
Primary Hugging Face sources:
mkdir -p "$OMNIRT_MODEL_ROOT/wav2lip"
hf download Pypa/wav2lip384 \
wav2lip384.pth \
--local-dir "$OMNIRT_MODEL_ROOT/wav2lip"
hf download rippertnt/wav2lip \
s3fd.pth \
--local-dir "$OMNIRT_MODEL_ROOT/wav2lip"
China-friendly options:
- Search ModelScope for wav2lip384
- Search ModelScope for s3fd wav2lip
- Search Modelers for wav2lip384
Keep the final files in:
2. Choose the backend¶
Recommended target deployment:
Current runnable compatibility path:
If you set OPENTALKING_WAV2LIP_BACKEND=local before installing a local adapter,
/models intentionally reports connected=false with reason=local_adapter_missing.
This is expected and prevents a silent fallback to OmniRT.
3. Prepare OmniRT for the compatibility path¶
cd "$DIGITAL_HUMAN_HOME"
git clone https://github.com/datascale-ai/omnirt.git
cd omnirt
uv sync --extra server --python 3.11
4. Start Wav2Lip through OmniRT¶
CUDA:
cd "$DIGITAL_HUMAN_HOME/opentalking"
bash scripts/quickstart/start_omnirt_wav2lip.sh --device cuda
Ascend:
5. Start OpenTalking¶
Verify:
Expected when OmniRT reports Wav2Lip:
Then select a Wav2Lip avatar such as singer, office-woman, or laozi.
MuseTalk 1.5¶
MuseTalk is configured as a pluggable model, but this repository currently provides the backend framework rather than a bundled local MuseTalk runtime. Use one of these paths:
backend: omnirtwhen OmniRT servesmusetalkthrough/v1/audio2video/musetalk.backend: direct_wswhen you run a standalone MuseTalk-compatible WebSocket service.backend: localonly after adding a local adapter underopentalking/models/musetalk/.
Primary upstream sources:
- TMElyralab/MuseTalk
- MuseTalk 1.5 on Hugging Face
- Search ModelScope for MuseTalk
- Search Modelers for MuseTalk
Example OmniRT configuration:
Start OpenTalking against an OmniRT instance that already serves MuseTalk:
bash scripts/quickstart/start_all.sh --omnirt http://127.0.0.1:9000
curl -s http://127.0.0.1:8000/models | jq '.statuses[] | select(.id=="musetalk")'
If you intentionally test a local adapter before implementing it:
Expected failure mode:
QuickTalk¶
QuickTalk is the reference local adapter. It does not use OmniRT. The adapter imports
from opentalking/models/quicktalk/ and loads a QuickTalk asset bundle at runtime.
Required asset shape:
$OMNIRT_MODEL_ROOT/quicktalk/hdModule/
└── checkpoints/
├── 256.onnx
├── repair.npy
├── chinese-hubert-large/
└── auxiliary_min/ or auxiliary/
Avatar metadata must point to both the QuickTalk asset root and a template video:
{
"id": "quicktalk-demo",
"name": "QuickTalk Demo",
"model_type": "quicktalk",
"fps": 25,
"sample_rate": 16000,
"width": 512,
"height": 512,
"metadata": {
"asset_root": "/absolute/path/to/models/quicktalk/hdModule",
"template_video": "/absolute/path/to/template.mp4"
}
}
Runtime environment:
OPENTALKING_QUICKTALK_ASSET_ROOT=/absolute/path/to/models/quicktalk/hdModule
OPENTALKING_QUICKTALK_TEMPLATE_VIDEO=/absolute/path/to/template.mp4
OPENTALKING_QUICKTALK_WORKER_CACHE=1
OPENTALKING_TORCH_DEVICE=cuda:0
Start:
Verify:
Expected:
FlashTalk¶
FlashTalk is the high-quality path. It is heavier than Wav2Lip and is best deployed through OmniRT on a dedicated GPU/NPU host.
1. Download weights¶
Primary Hugging Face sources:
hf download Soul-AILab/SoulX-FlashTalk-14B \
--local-dir "$OMNIRT_MODEL_ROOT/SoulX-FlashTalk-14B"
hf download TencentGameMate/chinese-wav2vec2-base \
--local-dir "$OMNIRT_MODEL_ROOT/chinese-wav2vec2-base"
China-friendly options:
- Search ModelScope for SoulX-FlashTalk-14B
- Search ModelScope for chinese-wav2vec2-base
- Search Modelers for SoulX-FlashTalk-14B
Optional source checkout used by the CUDA helper:
git clone https://github.com/Soul-AILab/SoulX-FlashTalk.git \
"$OMNIRT_MODEL_ROOT/SoulX-FlashTalk"
2. Start FlashTalk through OmniRT¶
CUDA single process:
cd "$DIGITAL_HUMAN_HOME/opentalking"
bash scripts/quickstart/start_omnirt_flashtalk.sh --device cuda --nproc 1
Ascend multi-process:
source /usr/local/Ascend/ascend-toolkit/set_env.sh
bash scripts/quickstart/start_omnirt_flashtalk.sh --device npu --nproc 8
The helper starts the FlashTalk worker service, points OmniRT at it, and exposes
OpenTalking-compatible audio2video routes on port 9000.
3. Start OpenTalking¶
Verify:
Expected:
Legacy direct WebSocket fallback remains available for existing deployments:
Use the explicit direct_ws backend for new single-model services:
FlashHead¶
FlashHead uses a model-specific WebSocket protocol, so OpenTalking treats it as
backend: direct_ws. Start the FlashHead service separately, then point OpenTalking
at its realtime endpoint.
Upstream/project links:
- Search Hugging Face for SoulX FlashHead
- Search ModelScope for FlashHead
- Search Modelers for FlashHead
OpenTalking configuration:
OPENTALKING_FLASHHEAD_WS_URL=ws://<flashhead-host>:8766/v1/avatar/realtime
OPENTALKING_FLASHHEAD_BASE_URL=http://<flashhead-host>:8766
OPENTALKING_FLASHHEAD_MODEL=soulx-flashhead-1.3b
YAML:
Start OpenTalking:
Verify:
Expected when the WebSocket URL is configured:
Use an avatar whose manifest has model_type: "flashhead", such as anchor.
Common verification¶
Check OpenTalking:
Check OmniRT-backed models:
Start the UI:
bash scripts/quickstart/start_all.sh --omnirt http://127.0.0.1:9000
open http://127.0.0.1:5173
Troubleshooting¶
| Symptom | Cause | Fix |
|---|---|---|
reason=not_configured |
Required endpoint or WebSocket URL is empty. | Set OMNIRT_ENDPOINT for omnirt models, or OPENTALKING_<MODEL>_WS_URL for direct_ws. |
reason=omnirt_unavailable |
OmniRT is reachable but does not report the selected model. | Check curl http://127.0.0.1:9000/v1/audio2video/models, model root paths, and OmniRT logs. |
reason=local_adapter_missing |
The model is configured as local but no adapter is registered. |
Add opentalking/models/<name>/adapter.py and register it, or switch backend to omnirt/direct_ws. |
| Wav2Lip helper reports missing checkpoints | Files are not under $OMNIRT_MODEL_ROOT/wav2lip/. |
Move or re-download wav2lip384.pth and s3fd.pth. |
| FlashTalk helper reports missing directories | FlashTalk weights or wav2vec2 weights are missing. | Ensure $OMNIRT_MODEL_ROOT/SoulX-FlashTalk-14B/ and $OMNIRT_MODEL_ROOT/chinese-wav2vec2-base/ exist. |
| Browser shows model but session creation fails | Avatar model_type does not match the selected model. |
Select an avatar whose manifest matches the model, or prepare a matching avatar bundle. |