Skip to content

QuickTalk on Apple Silicon

This page is for running QuickTalk locally on Apple Silicon macOS. It is intended for development, demos, and integration checks. For stable realtime 25fps output, use the Linux CUDA path in QuickTalk Local Deployment or run QuickTalk behind OmniRT.

1. Install Dependencies

Terminal
brew install python@3.11 node uv

# Optional. OpenTalking can fall back to imageio-ffmpeg when this is absent.
brew install ffmpeg

Clone OpenTalking and create the environment with the CPU/macOS extra:

Terminal
git clone https://github.com/OpenTalker/opentalking.git
cd opentalking

export UV_INDEX_URL=https://pypi.tuna.tsinghua.edu.cn/simple
export PIP_INDEX_URL=https://pypi.tuna.tsinghua.edu.cn/simple
export UV_HTTP_TIMEOUT=300
export UV_LINK_MODE=copy

uv sync --extra dev --extra models --extra quicktalk-cpu --python 3.11
source .venv/bin/activate

Do not install quicktalk-cuda on Apple Silicon. onnxruntime-gpu does not provide a macOS arm64 wheel.

2. Download QuickTalk Assets

Download the QuickTalk weights and HuBERT files:

Terminal
mkdir -p models/quicktalk/checkpoints

hf download datascale-ai/quicktalk \
  quicktalk.pth \
  repair.npy \
  chinese-hubert-large/config.json \
  chinese-hubert-large/preprocessor_config.json \
  chinese-hubert-large/pytorch_model.bin \
  --local-dir models/quicktalk/checkpoints

Download InsightFace buffalo_l into the QuickTalk auxiliary directory:

Terminal
mkdir -p /tmp/opentalking-insightface \
  models/quicktalk/checkpoints/auxiliary/models/buffalo_l

curl -L \
  -o /tmp/opentalking-insightface/buffalo_l.zip \
  https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_l.zip

unzip -q -o /tmp/opentalking-insightface/buffalo_l.zip \
  -d /tmp/opentalking-insightface
rsync -a /tmp/opentalking-insightface/buffalo_l/ \
  models/quicktalk/checkpoints/auxiliary/models/buffalo_l/

The final layout should be:

models/quicktalk/
  checkpoints/
    quicktalk.pth
    repair.npy
    chinese-hubert-large/
      config.json
      preprocessor_config.json
      pytorch_model.bin
    auxiliary/models/buffalo_l/
      *.onnx

Check the required files:

Terminal
stat models/quicktalk/checkpoints/quicktalk.pth
stat models/quicktalk/checkpoints/repair.npy
stat models/quicktalk/checkpoints/chinese-hubert-large/pytorch_model.bin
stat models/quicktalk/checkpoints/auxiliary/models/buffalo_l/det_10g.onnx

3. Configure .env

Create .env if it does not exist:

Terminal
cp .env.example .env

Set these values:

.env
OPENTALKING_DEFAULT_MODEL=quicktalk
OPENTALKING_FFMPEG_BIN=
OPENTALKING_QUICKTALK_BACKEND=local
OPENTALKING_QUICKTALK_ASSET_ROOT=./models/quicktalk
OPENTALKING_QUICKTALK_MODEL_BACKEND=auto
OPENTALKING_QUICKTALK_WORKER_CACHE=1

# Optional. If unset, OpenTalking selects mps when PyTorch MPS is available,
# then falls back to cpu.
OPENTALKING_QUICKTALK_DEVICE=mps

# Apple Silicon default. Keep 12 so each generated chunk has enough audio budget.
OPENTALKING_QUICKTALK_SLICE_LEN=12

# Optional for long text. This lowers output cadence from model-native 25fps
# to 14fps so MPS generation can stay closer to playback.
OPENTALKING_QUICKTALK_FPS=14

Leaving OPENTALKING_FFMPEG_BIN= empty lets OpenTalking find system ffmpeg first and fall back to imageio-ffmpeg.

4. Check the Environment

Terminal
python - <<'PY'
from pathlib import Path
import torch
import onnxruntime as ort
from opentalking.models.quicktalk.runtime_v2 import ensure_ffmpeg

root = Path("models/quicktalk/checkpoints")
for path in [
    root / "quicktalk.pth",
    root / "repair.npy",
    root / "chinese-hubert-large/pytorch_model.bin",
    root / "auxiliary/models/buffalo_l/det_10g.onnx",
]:
    print(path, path.exists())
print("mps:", torch.backends.mps.is_available())
print("onnxruntime providers:", ort.get_available_providers())
print("ffmpeg:", ensure_ffmpeg())
PY

Every printed file path should be True. mps should be True on a healthy Apple Silicon PyTorch install, though OpenTalking can fall back to CPU.

5. Start OpenTalking

Terminal
bash scripts/start_unified.sh \
  --backend local \
  --model quicktalk \
  --api-port 8210 \
  --web-port 5280

Open http://127.0.0.1:5280, choose a front-facing avatar such as the built-in singer, and select quicktalk. The first run builds the avatar cache; later runs reuse it.

6. Verify the Realtime Digital Human Path

Terminal
curl -s http://127.0.0.1:8210/health | python -m json.tool
curl -s http://127.0.0.1:8210/models | python -m json.tool

The QuickTalk model should report connected: true with reason local_runtime.

Create a session and send a short sentence:

Terminal
curl -s -X POST http://127.0.0.1:8210/sessions \
  -H 'Content-Type: application/json' \
  -d '{"avatar_id":"singer","model":"quicktalk","tts_provider":"edge"}' \
  | tee /tmp/opentalking-session.json | python -m json.tool

sid=$(python - <<'PY'
import json
print(json.load(open("/tmp/opentalking-session.json"))["session_id"])
PY
)

curl -s -X POST "http://127.0.0.1:8210/sessions/$sid/start" \
  -H 'Content-Type: application/json' \
  -d '{}' | python -m json.tool

curl -s -X POST "http://127.0.0.1:8210/sessions/$sid/speak" \
  -H 'Content-Type: application/json' \
  -d '{"text":"Please confirm in one sentence that QuickTalk is running locally on this Mac.","tts_provider":"edge"}' \
  | python -m json.tool

When the session state returns from speaking to ready, and the WebUI shows generated audio and video frames for the selected avatar, the local realtime digital human path is working.

Performance Notes

Apple Silicon can run the local path, but it is not the recommended realtime production target. If long text stalls, try:

.env
OPENTALKING_QUICKTALK_SLICE_LEN=12
OPENTALKING_QUICKTALK_FPS=14
OPENTALKING_QUICKTALK_MAX_LONG_EDGE=720

This trades motion FPS or image size for smoother playback. Use Linux CUDA or OmniRT when stable 25fps realtime output matters.

Troubleshooting

Symptom Fix
onnxruntime-gpu fails to install Use quicktalk-cpu; do not install quicktalk-cuda on Apple Silicon.
ffmpeg is missing Keep OPENTALKING_FFMPEG_BIN= empty, or run brew install ffmpeg.
MPS shows an SVD CPU fallback warning This is a PyTorch MPS operator coverage limitation. It can affect speed but usually does not block execution.
First startup is slow The first run loads HuBERT, QuickTalk, and the avatar face cache. Reusing the same avatar is faster.