Skip to content

FasterLivePortrait / JoyVASA

Support Status

Item Value
Model ID fasterliveportrait
Backend omnirt
Evidence level Documented; realtime path exposed through the OmniRT runtime
Best for Single-GPU realtime audio-driven portrait avatars, original-image pasteback, video clone, frontend amplitude hot updates

Common Errors

Symptom Action
/models shows runtime_not_enabled Ensure OmniRT was started with OMNIRT_FASTLIVEPORTRAIT_RUNTIME=1, then check checkpoint paths and logs/omnirt.
Audio driving has no lip motion Check JoyVASA/motion_generator, JoyVASA/motion_template, and chinese-hubert-base/pytorch_model.bin.
Generation reports an ONNXRuntime GridSample error Re-run uv sync --extra server --extra fasterliveportrait --python 3.11, confirm import tensorrt works, and start with OMNIRT_FASTLIVEPORTRAIT_CFG=configs/trt_infer.yaml.
Browser sees the model but session creation fails Select an avatar whose model_type matches fasterliveportrait, or prepare a matching avatar bundle.

FasterLivePortrait also runs through the OmniRT audio2video compatibility path. OpenTalking owns sessions, TTS/audio streaming, WebRTC playback, and frontend parameter updates. OmniRT keeps FasterLivePortrait and JoyVASA resident and exposes /v1/audio2video/fasterliveportrait. This repository does not include an in-process local backend for FasterLivePortrait; even for single-machine deployments, start OmniRT on the same host and point OpenTalking at http://127.0.0.1:9000.

This path is intended for single-GPU realtime avatars. The default live profile uses 25fps, one-second audio chunks, a 448px width, and pasteback into the original avatar image. Full-body uploads are still driven through the detected face region; body motion is not synthesized by this runtime.

The same runtime can also serve the Video Clone workflow. OpenTalking keeps an avatar-library image as the source, streams browser camera frames or uploaded-video frames as driving input, and forwards them to OmniRT /v1/avatar/video-clone/fasterliveportrait. This path does not call LLM, STT, or TTS and does not reuse the realtime conversation speak queue.

1. Prepare code and weights

Prepare the shared directory variables first. FASTERLIVEPORTRAIT_HOME is the FasterLivePortrait source checkout; OMNIRT_MODEL_ROOT is the model-weight root. Do not put model weights inside the OpenTalking or OmniRT repository.

terminal
export DIGITAL_HUMAN_HOME="${DIGITAL_HUMAN_HOME:-/path/to/digital_human}"
export OPENTALKING_HOME="${OPENTALKING_HOME:-$DIGITAL_HUMAN_HOME/opentalking}"
export OMNIRT_REPO="${OMNIRT_REPO:-$DIGITAL_HUMAN_HOME/omnirt}"
export FASTERLIVEPORTRAIT_HOME="${FASTERLIVEPORTRAIT_HOME:-$DIGITAL_HUMAN_HOME/FasterLivePortrait}"
export OMNIRT_MODEL_ROOT="${OMNIRT_MODEL_ROOT:-/path/to/model}"
export FASTERLIVEPORTRAIT_REF="${FASTERLIVEPORTRAIT_REF:-5dcf03aa2e6b2eb2a55b971efdc28fc0afdb1494}"

The current OpenTalking video-clone path and OmniRT runtime depend on FasterLivePortrait patches for fine-grained motion controls, TensorRT output ordering, and new PyTorch checkpoint loading behavior. For now, deploy the pinned zyairehhh/FasterLivePortrait fork. Switch to the upstream package only after those patches are available in an official stable package.

terminal
if [ ! -d "$FASTERLIVEPORTRAIT_HOME/.git" ]; then
  git clone https://github.com/zyairehhh/FasterLivePortrait.git "$FASTERLIVEPORTRAIT_HOME"
fi

git -C "$FASTERLIVEPORTRAIT_HOME" fetch origin master
git -C "$FASTERLIVEPORTRAIT_HOME" checkout "$FASTERLIVEPORTRAIT_REF"

mkdir -p "$OMNIRT_MODEL_ROOT/FasterLivePortrait/checkpoints"

The checkpoint directory must include at least:

$OMNIRT_MODEL_ROOT/FasterLivePortrait/checkpoints/
  JoyVASA/
    motion_generator/motion_generator_hubert_chinese.pt
    motion_template/motion_template.pkl
  chinese-hubert-base/
    config.json
    preprocessor_config.json
    pytorch_model.bin
  liveportrait/ or appearance_feature_extractor.onnx and the other FasterLivePortrait ONNX/TRT files

If the model files already exist elsewhere, copy real files with rsync:

terminal
rsync -a /path/to/FasterLivePortrait/checkpoints/ \
  "$OMNIRT_MODEL_ROOT/FasterLivePortrait/checkpoints/"

Preflight check:

terminal
test -f "$OMNIRT_MODEL_ROOT/FasterLivePortrait/checkpoints/JoyVASA/motion_generator/motion_generator_hubert_chinese.pt"
test -f "$OMNIRT_MODEL_ROOT/FasterLivePortrait/checkpoints/JoyVASA/motion_template/motion_template.pkl"
test -f "$OMNIRT_MODEL_ROOT/FasterLivePortrait/checkpoints/chinese-hubert-base/pytorch_model.bin"

2. Prepare the OmniRT environment

On servers, keep the uv cache on a data disk and use a PyPI mirror to speed up dependency installation. PIP_INDEX_URL is a fallback for build steps that still read pip settings.

terminal
cd "$OMNIRT_REPO"
export UV_DEFAULT_INDEX="${UV_DEFAULT_INDEX:-https://pypi.tuna.tsinghua.edu.cn/simple}"
export PIP_INDEX_URL="${PIP_INDEX_URL:-$UV_DEFAULT_INDEX}"
export UV_CACHE_DIR="${UV_CACHE_DIR:-$DIGITAL_HUMAN_HOME/.uv-cache}"
uv sync --extra server --extra fasterliveportrait --python 3.11

The realtime FasterLivePortrait path uses TensorRT by default. The fasterliveportrait extra installs onnxruntime-gpu, tensorrt-cu12, tensorrt-cu12-bindings, and tensorrt-cu12-libs. The TensorRT libs wheel is about 4GB, so keep UV_CACHE_DIR on a data disk with enough space; do not let it fall back to a small /root/.cache/uv.

Before deployment, verify that uv run python -c "import tensorrt as trt; print(trt.__version__)" prints a version.

The TensorRT wheel places libnvinfer.so.10 under the OmniRT .venv site-packages/tensorrt_libs directory. Add that directory to the dynamic library search path before starting the TRT runtime; otherwise libgrid_sample_3d_plugin.so fails with libnvinfer.so.10: cannot open shared object file:

terminal
export TRT_LIB_DIR="$OMNIRT_REPO/.venv/lib/python3.11/site-packages/tensorrt_libs"
export LD_LIBRARY_PATH="$TRT_LIB_DIR:${LD_LIBRARY_PATH:-}"

3. Start the OmniRT FasterLivePortrait runtime

terminal
cd "$OMNIRT_REPO"
mkdir -p "$DIGITAL_HUMAN_HOME/logs"
nohup env \
  OMNIRT_FASTLIVEPORTRAIT_RUNTIME=1 \
  OMNIRT_FASTLIVEPORTRAIT_LOAD_MODELS=1 \
  OMNIRT_FASTLIVEPORTRAIT_ROOT="$FASTERLIVEPORTRAIT_HOME" \
  OMNIRT_FASTLIVEPORTRAIT_CHECKPOINTS_DIR="$OMNIRT_MODEL_ROOT/FasterLivePortrait/checkpoints" \
  OMNIRT_FASTLIVEPORTRAIT_CFG=configs/trt_infer.yaml \
  OMNIRT_FASTLIVEPORTRAIT_DEVICE=cuda:0 \
  OMNIRT_FASTLIVEPORTRAIT_JPEG_QUALITY=85 \
  uv run omnirt serve-avatar-ws --host 0.0.0.0 --port 9000 --backend cuda \
  > "$DIGITAL_HUMAN_HOME/logs/omnirt-fasterliveportrait-9000.log" 2>&1 &
echo $! > "$DIGITAL_HUMAN_HOME/logs/omnirt-fasterliveportrait-9000.pid"

Verify OmniRT reports the model:

terminal
curl -s http://127.0.0.1:9000/v1/audio2video/models | python3 -m json.tool

Expected status:

{"id":"fasterliveportrait","connected":true,"reason":"fasterliveportrait_runtime"}

4. Configure and start OpenTalking

Sync the OpenTalking environment first. Use the same uv mirror and cache directory as OmniRT.

terminal
cd "$OPENTALKING_HOME"
export UV_DEFAULT_INDEX="${UV_DEFAULT_INDEX:-https://pypi.tuna.tsinghua.edu.cn/simple}"
export PIP_INDEX_URL="${PIP_INDEX_URL:-$UV_DEFAULT_INDEX}"
export UV_CACHE_DIR="${UV_CACHE_DIR:-$DIGITAL_HUMAN_HOME/.uv-cache}"
uv sync --extra dev --python 3.11

OpenTalking configures fasterliveportrait as backend: omnirt by default. The realtime profile lives in configs/synthesis/fasterliveportrait.yaml; common defaults are:

configs/synthesis/fasterliveportrait.yaml
width: 448
fps: 25
chunk_samples: 16000
emit_frames_per_chunk: 25
head_motion_multiplier: 0.3
pose_motion_multiplier: 0.35
yaw_multiplier: 0.85
pitch_multiplier: 1.0
roll_multiplier: 0.85
animation_region: lip
expression_multiplier: 1.0
mouth_open_multiplier: 1.25
mouth_corner_multiplier: 0.85
cheek_jaw_multiplier: 0.9
driving_multiplier: 1.0
cfg_scale: 4.0
flag_relative_motion: true
flag_stitching: true
head_only_pasteback: false

Start OpenTalking against OmniRT. scripts/start_unified.sh sets OPENTALKING_FASTLIVEPORTRAIT_BACKEND=omnirt, OPENTALKING_DEFAULT_MODEL=fasterliveportrait, and OMNIRT_ENDPOINT, then starts the WebUI after the API is ready:

terminal
cd "$OPENTALKING_HOME"
bash scripts/start_unified.sh \
  --backend omnirt \
  --model fasterliveportrait \
  --omnirt http://127.0.0.1:9000 \
  --api-port 8000 \
  --web-port 5173 \
  --host 0.0.0.0

The previous command already starts the WebUI. To restart only the frontend while the API is already running on port 8000, use a second terminal:

terminal
cd "$OPENTALKING_HOME"
bash scripts/quickstart/start_frontend.sh --api-port 8000 --web-port 5173 --host 0.0.0.0

Verify OpenTalking sees the model:

terminal
curl -s http://127.0.0.1:8000/models | python3 -m json.tool

Expected status:

{"id":"fasterliveportrait","backend":"omnirt","connected":true,"reason":"omnirt"}

Also verify the video-clone entry:

terminal
curl -s http://127.0.0.1:8000/video-clone/status | python3 -m json.tool

Expected:

{"model":"fasterliveportrait","connected":true,"reason":"omnirt"}

5. Frontend controls and hot updates

After selecting FasterLivePortrait, the frontend shows a parameter panel. Before a session starts, clicking Apply stores values for the next session. During a session, clicking Apply sends a hot update and takes effect on the next audio chunk without restarting the conversation.

Parameter Effect Suggested range
head_motion_multiplier Overall head motion amplitude default 0.3, common 0.2-0.8
pose_motion_multiplier pitch/yaw/roll amplitude; lower this first when the head sways too much 0.2-0.5
yaw_multiplier Left/right head turn amplitude default 0.85, common 0.6-1.0
pitch_multiplier Up/down nod amplitude default 1.0, common 0.7-1.1
roll_multiplier Side tilt amplitude default 0.85, common 0.6-1.0
animation_region FLP animation region; realtime defaults to mouth-only to reduce wide eyes and exaggerated full-face motion default lip; use all for full expression
expression_multiplier Overall expression and lip amplitude default 1.0, common 0.9-1.2
mouth_open_multiplier Mouth opening amplitude default 1.25, common 1.0-1.4
mouth_corner_multiplier Mouth-corner movement default 0.85, common 0.7-1.0
cheek_jaw_multiplier Cheek and jaw movement default 0.9, common 0.7-1.1
driving_multiplier Overall keypoint driving amplitude 0.8-1.2
cfg_scale JoyVASA audio-following strength default 4.0, common 3.5-4.5

Start with head_motion_multiplier=0.3, pose_motion_multiplier=0.35, yaw_multiplier=0.85, roll_multiplier=0.85, animation_region=lip, expression_multiplier=1.0, mouth_open_multiplier=1.25, mouth_corner_multiplier=0.85, cheek_jaw_multiplier=0.9, cfg_scale=4.0, and keep flag_relative_motion=true. If the head sways left/right, lower yaw_multiplier to 0.7. If the mouth looks pursed or the smile is too strong, lower mouth_corner_multiplier to 0.75. Switch the region from lip to all only when you need richer facial expression. Do not improve speed by dropping mouth-open frames.

6. Video Clone Mode

Video Clone is shown in the WebUI top navigation next to “Realtime Conversation”. After entering it:

  • Source: select an existing avatar on the left, or upload a new source image. The source is the digital-human asset being driven.
  • Driving: select a camera on the right, or upload a driving video. Driving only provides expression, head motion, and mouth motion.
  • Output: inspect realtime output in the center, with sent frames, received frames, dropped frames, and latency.

The frontend connects to OpenTalking:

ws://<opentalking-host>/video-clone/fasterliveportrait/ws

OpenTalking then forwards the source image and driving frame stream to OmniRT:

ws://<omnirt-host>/v1/avatar/video-clone/fasterliveportrait

Common tuning notes:

  • Enable pasteback when you want to preserve the original source composition.
  • If uploaded driving video does not open the mouth enough, raise mouth opening first. If motion collapses into simple vertical mouth opening, lower lip retargeting.
  • If the mouth looks puffy or misaligned, first disable driving-face crop and confirm the driving input is not over-cropped.
  • If camera permission fails, open the page from localhost, 127.0.0.1, or HTTPS. You can also upload a driving video first to validate the backend.

When stopped or when the page changes, the frontend releases the camera track, WebSocket, and current video-clone session.

7. Performance check

terminal
cd "$OMNIRT_REPO"
uv run python scripts/bench_fasterliveportrait_ws.py \
  --url ws://127.0.0.1:9000/v1/audio2video/fasterliveportrait \
  --duration 30 \
  --chunk-samples 16000

For single-GPU realtime use, watch first packet latency, per-chunk render time, output fps, and whether the browser queue keeps growing. If 448px width cannot stay above 25fps, drop to 416px. Use 480px or 540px only for quality-first runs, not as the realtime default.