MuseTalk WebSocket (FlashTalk protocol compatible)¶
OmniRT exposes model_backends/musetalk/musetalk_ws_server.py with the same WebSocket protocol as FlashTalk / Wav2Lip WS (JSON init / init_ok, binary AUDI chunks, VIDX JPEG frames). Set OpenTalking to OPENTALKING_FLASHTALK_MODE=remote and point OPENTALKING_FLASHTALK_WS_URL here—no client protocol changes.
Inference and MuseTalk source loading run on the OmniRT side. OpenTalking only selects the model, orchestrates sessions, and connects to this service. The official MuseTalk source is managed by omnirt runtime install musetalk --device <cuda|npu|cpu>.
Directory layout, weights, and troubleshooting: model_backends/musetalk/README.md.
Wiring OpenTalking¶
| Setting | Notes |
|---|---|
OPENTALKING_FLASHTALK_MODE |
remote |
OPENTALKING_FLASHTALK_WS_URL |
e.g. ws://<host>:8766; 8766 is the usual MuseTalk port (8765 is common for Wav2Lip) |
| Default model / avatar | Configure the remote FlashTalk session path; frames come from this MuseTalk service |
Note: configs/default.yaml may override .env. If the client still does not hit the remote WS, verify flashtalk.mode and the URL.
Huawei Ascend (Ascend 910 / CANN)—recommended path¶
Server-side UNet / VAE / Whisper run through torch_npu (actual device follows OMNIRT_MUSETALK_DEVICE).
1. Host and drivers¶
- CANN toolkit + drivers installed;
npu-smihealthy. - CANN shared libraries such as
libhccl.somust load before Python ortorch_npuimport fails.
2. Environment (set_env.sh)¶
Do not rely on pip-only PyTorch for NPU. Source CANN’s script first, for example:
bash scripts/start_musetalk_ws.sh searches for:
OMNIRT_MUSETALK_ENV_SCRIPTif set/usr/local/Ascend/ascend-toolkit/set_env.sh${ASCEND_TOOLKIT_HOME}/set_env.sh.../ascend-toolkit/latest/set_env.sh
Visibility defaults match FlashTalk / Wav2Lip launchers (cards 0–7). For single-card, export ASCEND_RT_VISIBLE_DEVICES=0 before launch.
3. Python venv and dependencies¶
Create a dedicated venv and install Huawei wheels matching your CANN version:
python3 -m venv /path/to/venvs/omnirt-musetalk-ascend
source /path/to/venvs/omnirt-musetalk-ascend/bin/activate
source /usr/local/Ascend/ascend-toolkit/set_env.sh # adjust path
export PIP_EXTRA_INDEX_URL=<Huawei Ascend wheel index per your cluster docs>
pip install -r model_backends/musetalk/requirements-musetalk-ascend.txt
Align torch / torchvision / torch-npu with model_backends/flashtalk/requirements-ascend.txt and wav2lip/requirements-wav2lip-ascend.txt so one venv can serve multiple backends.
Graph compilation may still require attrs, psutil, PyYAML, etc. (listed in the file); add packages if imports fail.
4. MuseTalk source and weights¶
- MuseTalk source: run
omnirt runtime install musetalk --device npu; by default it clones to${OMNIRT_HOME}/model-repos/MuseTalk. - Root directory: default
<omnirt>/models, controlled byOMNIRT_MUSETALK_MODELS_DIR. - Layout must satisfy MuseTalk v1.5 loading (
musetalk/,sd-vae-ft-mse/,whisper/tiny.pt, …)—seemodel_backends/musetalk/README.md. sd-vae-ft-mse/should use the official Hugging Facestabilityai/sd-vae-ft-mseDiffusers files:config.jsonplusdiffusion_pytorch_model.safetensors.diffusion_pytorch_model.bincan be used as a fallback, but Diffusers will warn about unsafe serialization.whisper/tiny.ptmust be the official OpenAIopenai-whispercheckpoint (~72 MB). Do not rename a Hugging Facepytorch_model.binand expect it to work.
Example:
mkdir -p models/sd-vae-ft-mse
wget -O models/sd-vae-ft-mse/config.json \
https://huggingface.co/stabilityai/sd-vae-ft-mse/resolve/main/config.json
wget -O models/sd-vae-ft-mse/diffusion_pytorch_model.safetensors \
https://huggingface.co/stabilityai/sd-vae-ft-mse/resolve/main/diffusion_pytorch_model.safetensors
If direct Hugging Face access is restricted, replace the domain with https://hf-mirror.com.
5. Inference variables (selection)¶
| Variable | Meaning |
|---|---|
OMNIRT_MUSETALK_PYTHON |
Python from the venv above |
OMNIRT_MUSETALK_DEVICE |
auto (default, prefers NPU), npu, or cpu |
OMNIRT_MUSETALK_NPU_INDEX |
Logical NPU index (default 0) |
OMNIRT_MUSETALK_REPO |
MuseTalk source checkout; default ${OMNIRT_HOME}/model-repos/MuseTalk |
OMNIRT_MUSETALK_MODELS_DIR |
Model root |
OMNIRT_MUSETALK_MAX_LONG_EDGE |
Max long edge for reference images in init (default 768; 0 disables scaling) |
OMNIRT_MUSETALK_PRELOAD |
When 1, loads models before listening |
OMNIRT_MUSETALK_DEFAULT_REF_IMAGE |
Local fallback image when init omits ref_image (optional) |
Full list: scripts/start_musetalk_ws.sh --help.
6. Example launch (Ascend)¶
cd /path/to/omnirt
omnirt runtime install musetalk --device npu
export OMNIRT_MUSETALK_PYTHON=$OMNIRT_HOME/runtimes/musetalk/npu/venv/bin/python
export OMNIRT_MUSETALK_REPO=$OMNIRT_HOME/model-repos/MuseTalk
export OMNIRT_MUSETALK_MODELS_DIR=/path/to/omnirt/models
bash scripts/start_musetalk_ws.sh
Logs should include MuseTalk inference device=npu:0 when device selection is correct.
Background mode:
Default log file: outputs/omnirt-musetalk-ws.log.
NVIDIA GPU (CUDA)¶
Install requirements-musetalk-gpu.txt with CUDA-enabled torch / torchvision / torchaudio from the PyTorch index matching your driver, for example:
pip install -r model_backends/musetalk/requirements-musetalk-gpu.txt \
--extra-index-url https://download.pytorch.org/whl/cu124
Set OMNIRT_MUSETALK_DEVICE=cuda, or keep auto on machines without NPU so it falls back to CUDA. No CANN set_env.sh or torch_npu is required; tune CUDA_VISIBLE_DEVICES as needed.
Troubleshooting (Ascend)¶
| Symptom | Likely cause |
|---|---|
Missing libhccl.so |
CANN set_env.sh not sourced or launcher path mismatch |
| MuseTalk v1.5 fails to load | Incomplete weights; missing config.json under sd-vae-ft-mse/; whisper/tiny.pt not the official OpenAI file (tiny XML placeholder) |
| VAE warns about unsafe serialization | sd-vae-ft-mse/ only has diffusion_pytorch_model.bin; add the official diffusion_pytorch_model.safetensors |
UnpicklingError / Whisper load failure |
PyTorch vs openai-whisper checkpoint compatibility—musetalk_ws_server.py patches loading for official tiny.pt; use current tree |
| Toolkit directory owner warning | Often root-installed toolkit; usually harmless |
| Misaligned mouth vs background | OpenTalking composer must paste using infer crop boxes—use an upstream-fixed composer.py |
See also¶
- Backend details:
model_backends/musetalk/README.md - Same protocol, lighter backend (Wav2Lip):
wav2lip_ws.en.md - SoulX FlashTalk WS reference:
flashtalk_ws.en.md