FasterLivePortrait / JoyVASA¶

Support Status¶

Item	Value
Model ID	`fasterliveportrait`
Backend	`omnirt`
Evidence level	Documented; realtime path exposed through the OmniRT runtime
Best for	Single-GPU realtime audio-driven portrait avatars, original-image pasteback, video clone, frontend amplitude hot updates

Common Errors¶

Symptom	Action
`/models` shows `runtime_not_enabled`	Ensure OmniRT was started with `OMNIRT_FASTLIVEPORTRAIT_RUNTIME=1`, then check checkpoint paths and `logs/omnirt`.
Audio driving has no lip motion	Check `JoyVASA/motion_generator`, `JoyVASA/motion_template`, and `chinese-hubert-base/pytorch_model.bin`.
Generation reports an ONNXRuntime `GridSample` error	Re-run `uv sync --extra server --extra fasterliveportrait --python 3.11`, confirm `import tensorrt` works, and start with `OMNIRT_FASTLIVEPORTRAIT_CFG=configs/trt_infer.yaml`.
Browser sees the model but session creation fails	Select an avatar whose `model_type` matches `fasterliveportrait`, or prepare a matching avatar bundle.

FasterLivePortrait also runs through the OmniRT audio2video compatibility path. OpenTalking owns sessions, TTS/audio streaming, WebRTC playback, and frontend parameter updates. OmniRT keeps FasterLivePortrait and JoyVASA resident and exposes /v1/audio2video/fasterliveportrait. This repository does not include an in-process local backend for FasterLivePortrait; even for single-machine deployments, start OmniRT on the same host and point OpenTalking at http://127.0.0.1:9000.

This path is intended for single-GPU realtime avatars. The default live profile uses 25fps, one-second audio chunks, a 448px width, and pasteback into the original avatar image. Full-body uploads are still driven through the detected face region; body motion is not synthesized by this runtime.

The same runtime can also serve the Video Clone workflow. OpenTalking keeps an avatar-library image as the source, streams browser camera frames or uploaded-video frames as driving input, and forwards them to OmniRT /v1/avatar/video-clone/fasterliveportrait. This path does not call LLM, STT, or TTS and does not reuse the realtime conversation speak queue.

1. Prepare code and weights¶

Prepare the shared directory variables first. FASTERLIVEPORTRAIT_HOME is the FasterLivePortrait source checkout; OMNIRT_MODEL_ROOT is the model-weight root. Do not put model weights inside the OpenTalking or OmniRT repository.

terminal

export DIGITAL_HUMAN_HOME="${DIGITAL_HUMAN_HOME:-/path/to/digital_human}"
export OPENTALKING_HOME="${OPENTALKING_HOME:-$DIGITAL_HUMAN_HOME/opentalking}"
export OMNIRT_REPO="${OMNIRT_REPO:-$DIGITAL_HUMAN_HOME/omnirt}"
export FASTERLIVEPORTRAIT_HOME="${FASTERLIVEPORTRAIT_HOME:-$DIGITAL_HUMAN_HOME/FasterLivePortrait}"
export OMNIRT_MODEL_ROOT="${OMNIRT_MODEL_ROOT:-/path/to/model}"
export FASTERLIVEPORTRAIT_REF="${FASTERLIVEPORTRAIT_REF:-5dcf03aa2e6b2eb2a55b971efdc28fc0afdb1494}"

The current OpenTalking video-clone path and OmniRT runtime depend on FasterLivePortrait patches for fine-grained motion controls, TensorRT output ordering, and new PyTorch checkpoint loading behavior. For now, deploy the pinned zyairehhh/FasterLivePortrait fork. Switch to the upstream package only after those patches are available in an official stable package.

terminal

if [ ! -d "$FASTERLIVEPORTRAIT_HOME/.git" ]; then
  git clone https://github.com/zyairehhh/FasterLivePortrait.git "$FASTERLIVEPORTRAIT_HOME"
fi

git -C "$FASTERLIVEPORTRAIT_HOME" fetch origin master
git -C "$FASTERLIVEPORTRAIT_HOME" checkout "$FASTERLIVEPORTRAIT_REF"

mkdir -p "$OMNIRT_MODEL_ROOT/FasterLivePortrait/checkpoints"

The checkpoint directory must include at least:

$OMNIRT_MODEL_ROOT/FasterLivePortrait/checkpoints/
  JoyVASA/
    motion_generator/motion_generator_hubert_chinese.pt
    motion_template/motion_template.pkl
  chinese-hubert-base/
    config.json
    preprocessor_config.json
    pytorch_model.bin
  liveportrait/ or appearance_feature_extractor.onnx and the other FasterLivePortrait ONNX/TRT files

If the model files already exist elsewhere, copy real files with rsync:

terminal

rsync -a /path/to/FasterLivePortrait/checkpoints/ \
  "$OMNIRT_MODEL_ROOT/FasterLivePortrait/checkpoints/"

Preflight check:

terminal

test -f "$OMNIRT_MODEL_ROOT/FasterLivePortrait/checkpoints/JoyVASA/motion_generator/motion_generator_hubert_chinese.pt"
test -f "$OMNIRT_MODEL_ROOT/FasterLivePortrait/checkpoints/JoyVASA/motion_template/motion_template.pkl"
test -f "$OMNIRT_MODEL_ROOT/FasterLivePortrait/checkpoints/chinese-hubert-base/pytorch_model.bin"

2. Prepare the OmniRT environment¶

On servers, keep the uv cache on a data disk and use a PyPI mirror to speed up dependency installation. PIP_INDEX_URL is a fallback for build steps that still read pip settings.

terminal

cd "$OMNIRT_REPO"
export UV_DEFAULT_INDEX="${UV_DEFAULT_INDEX:-https://pypi.tuna.tsinghua.edu.cn/simple}"
export PIP_INDEX_URL="${PIP_INDEX_URL:-$UV_DEFAULT_INDEX}"
export UV_CACHE_DIR="${UV_CACHE_DIR:-$DIGITAL_HUMAN_HOME/.uv-cache}"
uv sync --extra server --extra fasterliveportrait --python 3.11

The realtime FasterLivePortrait path uses TensorRT by default. The fasterliveportrait extra installs onnxruntime-gpu, tensorrt-cu12, tensorrt-cu12-bindings, and tensorrt-cu12-libs. The TensorRT libs wheel is about 4GB, so keep UV_CACHE_DIR on a data disk with enough space; do not let it fall back to a small /root/.cache/uv.

Before deployment, verify that uv run python -c "import tensorrt as trt; print(trt.__version__)" prints a version.

The TensorRT wheel places libnvinfer.so.10 under the OmniRT .venv site-packages/tensorrt_libs directory. Add that directory to the dynamic library search path before starting the TRT runtime; otherwise libgrid_sample_3d_plugin.so fails with libnvinfer.so.10: cannot open shared object file:

terminal

export TRT_LIB_DIR="$OMNIRT_REPO/.venv/lib/python3.11/site-packages/tensorrt_libs"
export LD_LIBRARY_PATH="$TRT_LIB_DIR:${LD_LIBRARY_PATH:-}"

3. Start the OmniRT FasterLivePortrait runtime¶

terminal

cd "$OMNIRT_REPO"
mkdir -p "$DIGITAL_HUMAN_HOME/logs"
nohup env \
  OMNIRT_FASTLIVEPORTRAIT_RUNTIME=1 \
  OMNIRT_FASTLIVEPORTRAIT_LOAD_MODELS=1 \
  OMNIRT_FASTLIVEPORTRAIT_ROOT="$FASTERLIVEPORTRAIT_HOME" \
  OMNIRT_FASTLIVEPORTRAIT_CHECKPOINTS_DIR="$OMNIRT_MODEL_ROOT/FasterLivePortrait/checkpoints" \
  OMNIRT_FASTLIVEPORTRAIT_CFG=configs/trt_infer.yaml \
  OMNIRT_FASTLIVEPORTRAIT_DEVICE=cuda:0 \
  OMNIRT_FASTLIVEPORTRAIT_JPEG_QUALITY=85 \
  uv run omnirt serve-avatar-ws --host 0.0.0.0 --port 9000 --backend cuda \
  > "$DIGITAL_HUMAN_HOME/logs/omnirt-fasterliveportrait-9000.log" 2>&1 &
echo $! > "$DIGITAL_HUMAN_HOME/logs/omnirt-fasterliveportrait-9000.pid"

Verify OmniRT reports the model:

terminal

curl -s http://127.0.0.1:9000/v1/audio2video/models | python3 -m json.tool

Expected status:

{"id":"fasterliveportrait","connected":true,"reason":"fasterliveportrait_runtime"}

4. Configure and start OpenTalking¶

Sync the OpenTalking environment first. Use the same uv mirror and cache directory as OmniRT.

terminal

cd "$OPENTALKING_HOME"
export UV_DEFAULT_INDEX="${UV_DEFAULT_INDEX:-https://pypi.tuna.tsinghua.edu.cn/simple}"
export PIP_INDEX_URL="${PIP_INDEX_URL:-$UV_DEFAULT_INDEX}"
export UV_CACHE_DIR="${UV_CACHE_DIR:-$DIGITAL_HUMAN_HOME/.uv-cache}"
uv sync --extra dev --python 3.11

OpenTalking configures fasterliveportrait as backend: omnirt by default. The realtime profile lives in configs/synthesis/fasterliveportrait.yaml; common defaults are:

configs/synthesis/fasterliveportrait.yaml

width: 448
fps: 25
chunk_samples: 16000
emit_frames_per_chunk: 25
head_motion_multiplier: 0.3
pose_motion_multiplier: 0.35
yaw_multiplier: 0.85
pitch_multiplier: 1.0
roll_multiplier: 0.85
animation_region: lip
expression_multiplier: 1.0
mouth_open_multiplier: 1.25
mouth_corner_multiplier: 0.85
cheek_jaw_multiplier: 0.9
driving_multiplier: 1.0
cfg_scale: 4.0
flag_relative_motion: true
flag_stitching: true
head_only_pasteback: false

Start OpenTalking against OmniRT. scripts/start_unified.sh sets OPENTALKING_FASTLIVEPORTRAIT_BACKEND=omnirt, OPENTALKING_DEFAULT_MODEL=fasterliveportrait, and OMNIRT_ENDPOINT, then starts the WebUI after the API is ready:

terminal

cd "$OPENTALKING_HOME"
bash scripts/start_unified.sh \
  --backend omnirt \
  --model fasterliveportrait \
  --omnirt http://127.0.0.1:9000 \
  --api-port 8000 \
  --web-port 5173 \
  --host 0.0.0.0

The previous command already starts the WebUI. To restart only the frontend while the API is already running on port 8000, use a second terminal:

terminal

cd "$OPENTALKING_HOME"
bash scripts/quickstart/start_frontend.sh --api-port 8000 --web-port 5173 --host 0.0.0.0

Verify OpenTalking sees the model:

terminal

curl -s http://127.0.0.1:8000/models | python3 -m json.tool

Expected status:

{"id":"fasterliveportrait","backend":"omnirt","connected":true,"reason":"omnirt"}

Also verify the video-clone entry:

terminal

curl -s http://127.0.0.1:8000/video-clone/status | python3 -m json.tool

Expected:

{"model":"fasterliveportrait","connected":true,"reason":"omnirt"}

5. Frontend controls and hot updates¶

After selecting FasterLivePortrait, the frontend shows a parameter panel. Before a session starts, clicking Apply stores values for the next session. During a session, clicking Apply sends a hot update and takes effect on the next audio chunk without restarting the conversation.

Parameter	Effect	Suggested range
`head_motion_multiplier`	Overall head motion amplitude	default 0.3, common 0.2-0.8
`pose_motion_multiplier`	pitch/yaw/roll amplitude; lower this first when the head sways too much	0.2-0.5
`yaw_multiplier`	Left/right head turn amplitude	default 0.85, common 0.6-1.0
`pitch_multiplier`	Up/down nod amplitude	default 1.0, common 0.7-1.1
`roll_multiplier`	Side tilt amplitude	default 0.85, common 0.6-1.0
`animation_region`	FLP animation region; realtime defaults to mouth-only to reduce wide eyes and exaggerated full-face motion	default `lip`; use `all` for full expression
`expression_multiplier`	Overall expression and lip amplitude	default 1.0, common 0.9-1.2
`mouth_open_multiplier`	Mouth opening amplitude	default 1.25, common 1.0-1.4
`mouth_corner_multiplier`	Mouth-corner movement	default 0.85, common 0.7-1.0
`cheek_jaw_multiplier`	Cheek and jaw movement	default 0.9, common 0.7-1.1
`driving_multiplier`	Overall keypoint driving amplitude	0.8-1.2
`cfg_scale`	JoyVASA audio-following strength	default 4.0, common 3.5-4.5

Start with head_motion_multiplier=0.3, pose_motion_multiplier=0.35, yaw_multiplier=0.85, roll_multiplier=0.85, animation_region=lip, expression_multiplier=1.0, mouth_open_multiplier=1.25, mouth_corner_multiplier=0.85, cheek_jaw_multiplier=0.9, cfg_scale=4.0, and keep flag_relative_motion=true. If the head sways left/right, lower yaw_multiplier to 0.7. If the mouth looks pursed or the smile is too strong, lower mouth_corner_multiplier to 0.75. Switch the region from lip to all only when you need richer facial expression. Do not improve speed by dropping mouth-open frames.

6. Video Clone Mode¶

Video Clone is shown in the WebUI top navigation next to “Realtime Conversation”. After entering it:

Source: select an existing avatar on the left, or upload a new source image. The source is the digital-human asset being driven.
Driving: select a camera on the right, or upload a driving video. Driving only provides expression, head motion, and mouth motion.
Output: inspect realtime output in the center, with sent frames, received frames, dropped frames, and latency.

The frontend connects to OpenTalking:

ws://<opentalking-host>/video-clone/fasterliveportrait/ws

OpenTalking then forwards the source image and driving frame stream to OmniRT:

ws://<omnirt-host>/v1/avatar/video-clone/fasterliveportrait

Common tuning notes:

Enable pasteback when you want to preserve the original source composition.
If uploaded driving video does not open the mouth enough, raise mouth opening first. If motion collapses into simple vertical mouth opening, lower lip retargeting.
If the mouth looks puffy or misaligned, first disable driving-face crop and confirm the driving input is not over-cropped.
If camera permission fails, open the page from localhost, 127.0.0.1, or HTTPS. You can also upload a driving video first to validate the backend.

When stopped or when the page changes, the frontend releases the camera track, WebSocket, and current video-clone session.

7. Performance check¶

terminal

cd "$OMNIRT_REPO"
uv run python scripts/bench_fasterliveportrait_ws.py \
  --url ws://127.0.0.1:9000/v1/audio2video/fasterliveportrait \
  --duration 30 \
  --chunk-samples 16000

For single-GPU realtime use, watch first packet latency, per-chunk render time, output fps, and whether the browser queue keeps growing. If 448px width cannot stay above 25fps, drop to 416px. Use 480px or 540px only for quality-first runs, not as the realtime default.