QuickTalk OmniRT Deployment¶

OmniRT mode runs QuickTalk inference outside the OpenTalking process. Use it when multiple models share one service endpoint, GPU dependencies need isolation, or inference runs on a separate machine.

Use Cases¶

OpenTalking owns sessions, TTS, and WebRTC while QuickTalk is served externally.
One OmniRT endpoint needs to expose quicktalk, wav2lip, and other models.
Web-service resources and inference GPU resources need separate scaling.

Weight Preparation¶

OmniRT reads $OMNIRT_MODEL_ROOT/quicktalk by default:

Terminal

export DIGITAL_HUMAN_HOME="$HOME/digital-human"
export OMNIRT_MODEL_ROOT="$DIGITAL_HUMAN_HOME/models"
mkdir -p "$OMNIRT_MODEL_ROOT/quicktalk/checkpoints"

uv pip install -U "huggingface_hub[cli]"
export HF_ENDPOINT="${HF_ENDPOINT:-https://hf-mirror.com}"

hf download datascale-ai/quicktalk \
  quicktalk.pth \
  repair.npy \
  chinese-hubert-large/config.json \
  chinese-hubert-large/preprocessor_config.json \
  chinese-hubert-large/pytorch_model.bin \
  --local-dir "$OMNIRT_MODEL_ROOT/quicktalk/checkpoints"

Confirm quicktalk.pth, repair.npy, HuBERT, and InsightFace buffalo_l all exist under the QuickTalk model directory. Prepare InsightFace as shown in Local.

Start Command¶

Start OmniRT first:

Terminal

cd "$OMNIRT_HOME"
uv sync --extra server --extra quicktalk-cuda --python 3.11
source .venv/bin/activate

export OMNIRT_QUICKTALK_RUNTIME=1
export OMNIRT_QUICKTALK_MODEL_ROOT="$OMNIRT_MODEL_ROOT/quicktalk"
export OMNIRT_QUICKTALK_CHECKPOINT="$OMNIRT_MODEL_ROOT/quicktalk/checkpoints/quicktalk.pth"
export OMNIRT_QUICKTALK_DEVICE=cuda:0
export OMNIRT_QUICKTALK_HUBERT_DEVICE=cuda:0
export OMNIRT_QUICKTALK_MAX_LONG_EDGE=900
export OMNIRT_QUICKTALK_MAX_TEMPLATE_SECONDS=1

omnirt serve-avatar-ws --host 0.0.0.0 --port 9000 --backend cuda

Then start OpenTalking:

Terminal

cd "$DIGITAL_HUMAN_HOME/opentalking"
bash scripts/start_unified.sh \
  --backend omnirt \
  --model quicktalk \
  --omnirt http://127.0.0.1:9000 \
  --api-port 8310 \
  --web-port 5380

Verification¶

Terminal

curl -fsS http://127.0.0.1:9000/v1/audio2video/models | jq
curl -s http://127.0.0.1:8310/models | jq '.statuses[] | select(.id=="quicktalk")'

OpenTalking should report backend=omnirt and connected=true.

Common Errors¶

Symptom	Action
`reason=omnirt_unavailable`	Check the OmniRT port, `OMNIRT_ENDPOINT`, and `/v1/audio2video/models`.
OmniRT does not list `quicktalk`	Check `OMNIRT_QUICKTALK_RUNTIME=1`, checkpoint paths, and startup logs.
Slow first frame or high VRAM	Tune `OMNIRT_QUICKTALK_MAX_LONG_EDGE`, HuBERT device, or prewarm strategy.
Avatar asset unavailable	Check that the selected avatar is uploaded, readable, and the session configuration is complete.