QuickTalk on Apple Silicon¶
This page is for running QuickTalk locally on Apple Silicon macOS. It is intended for development, demos, and integration checks. For stable realtime 25fps output, use the Linux CUDA path in QuickTalk Local Deployment or run QuickTalk behind OmniRT.
1. Install Dependencies¶
brew install python@3.11 node uv
# Optional. OpenTalking can fall back to imageio-ffmpeg when this is absent.
brew install ffmpeg
Clone OpenTalking and create the environment with the CPU/macOS extra:
git clone https://github.com/OpenTalker/opentalking.git
cd opentalking
export UV_INDEX_URL=https://pypi.tuna.tsinghua.edu.cn/simple
export PIP_INDEX_URL=https://pypi.tuna.tsinghua.edu.cn/simple
export UV_HTTP_TIMEOUT=300
export UV_LINK_MODE=copy
uv sync --extra dev --extra models --extra quicktalk-cpu --python 3.11
source .venv/bin/activate
Do not install quicktalk-cuda on Apple Silicon. onnxruntime-gpu does not provide a macOS arm64 wheel.
2. Download QuickTalk Assets¶
Download the QuickTalk weights and HuBERT files:
mkdir -p models/quicktalk/checkpoints
hf download datascale-ai/quicktalk \
quicktalk.pth \
repair.npy \
chinese-hubert-large/config.json \
chinese-hubert-large/preprocessor_config.json \
chinese-hubert-large/pytorch_model.bin \
--local-dir models/quicktalk/checkpoints
Download InsightFace buffalo_l into the QuickTalk auxiliary directory:
mkdir -p /tmp/opentalking-insightface \
models/quicktalk/checkpoints/auxiliary/models/buffalo_l
curl -L \
-o /tmp/opentalking-insightface/buffalo_l.zip \
https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_l.zip
unzip -q -o /tmp/opentalking-insightface/buffalo_l.zip \
-d /tmp/opentalking-insightface
rsync -a /tmp/opentalking-insightface/buffalo_l/ \
models/quicktalk/checkpoints/auxiliary/models/buffalo_l/
The final layout should be:
models/quicktalk/
checkpoints/
quicktalk.pth
repair.npy
chinese-hubert-large/
config.json
preprocessor_config.json
pytorch_model.bin
auxiliary/models/buffalo_l/
*.onnx
Check the required files:
stat models/quicktalk/checkpoints/quicktalk.pth
stat models/quicktalk/checkpoints/repair.npy
stat models/quicktalk/checkpoints/chinese-hubert-large/pytorch_model.bin
stat models/quicktalk/checkpoints/auxiliary/models/buffalo_l/det_10g.onnx
3. Configure .env¶
Create .env if it does not exist:
Set these values:
OPENTALKING_DEFAULT_MODEL=quicktalk
OPENTALKING_FFMPEG_BIN=
OPENTALKING_QUICKTALK_BACKEND=local
OPENTALKING_QUICKTALK_ASSET_ROOT=./models/quicktalk
OPENTALKING_QUICKTALK_MODEL_BACKEND=auto
OPENTALKING_QUICKTALK_WORKER_CACHE=1
# Optional. If unset, OpenTalking selects mps when PyTorch MPS is available,
# then falls back to cpu.
OPENTALKING_QUICKTALK_DEVICE=mps
# Apple Silicon default. Keep 12 so each generated chunk has enough audio budget.
OPENTALKING_QUICKTALK_SLICE_LEN=12
# Optional for long text. This lowers output cadence from model-native 25fps
# to 14fps so MPS generation can stay closer to playback.
OPENTALKING_QUICKTALK_FPS=14
Leaving OPENTALKING_FFMPEG_BIN= empty lets OpenTalking find system ffmpeg first and fall back to imageio-ffmpeg.
4. Check the Environment¶
python - <<'PY'
from pathlib import Path
import torch
import onnxruntime as ort
from opentalking.models.quicktalk.runtime_v2 import ensure_ffmpeg
root = Path("models/quicktalk/checkpoints")
for path in [
root / "quicktalk.pth",
root / "repair.npy",
root / "chinese-hubert-large/pytorch_model.bin",
root / "auxiliary/models/buffalo_l/det_10g.onnx",
]:
print(path, path.exists())
print("mps:", torch.backends.mps.is_available())
print("onnxruntime providers:", ort.get_available_providers())
print("ffmpeg:", ensure_ffmpeg())
PY
Every printed file path should be True. mps should be True on a healthy Apple Silicon PyTorch install, though OpenTalking can fall back to CPU.
5. Start OpenTalking¶
bash scripts/start_unified.sh \
--backend local \
--model quicktalk \
--api-port 8210 \
--web-port 5280
Open http://127.0.0.1:5280, choose a front-facing avatar such as the built-in singer, and select quicktalk. The first run builds the avatar cache; later runs reuse it.
6. Verify the Realtime Digital Human Path¶
curl -s http://127.0.0.1:8210/health | python -m json.tool
curl -s http://127.0.0.1:8210/models | python -m json.tool
The QuickTalk model should report connected: true with reason local_runtime.
Create a session and send a short sentence:
curl -s -X POST http://127.0.0.1:8210/sessions \
-H 'Content-Type: application/json' \
-d '{"avatar_id":"singer","model":"quicktalk","tts_provider":"edge"}' \
| tee /tmp/opentalking-session.json | python -m json.tool
sid=$(python - <<'PY'
import json
print(json.load(open("/tmp/opentalking-session.json"))["session_id"])
PY
)
curl -s -X POST "http://127.0.0.1:8210/sessions/$sid/start" \
-H 'Content-Type: application/json' \
-d '{}' | python -m json.tool
curl -s -X POST "http://127.0.0.1:8210/sessions/$sid/speak" \
-H 'Content-Type: application/json' \
-d '{"text":"Please confirm in one sentence that QuickTalk is running locally on this Mac.","tts_provider":"edge"}' \
| python -m json.tool
When the session state returns from speaking to ready, and the WebUI shows generated audio and video frames for the selected avatar, the local realtime digital human path is working.
Performance Notes¶
Apple Silicon can run the local path, but it is not the recommended realtime production target. If long text stalls, try:
OPENTALKING_QUICKTALK_SLICE_LEN=12
OPENTALKING_QUICKTALK_FPS=14
OPENTALKING_QUICKTALK_MAX_LONG_EDGE=720
This trades motion FPS or image size for smoother playback. Use Linux CUDA or OmniRT when stable 25fps realtime output matters.
Troubleshooting¶
| Symptom | Fix |
|---|---|
onnxruntime-gpu fails to install |
Use quicktalk-cpu; do not install quicktalk-cuda on Apple Silicon. |
ffmpeg is missing |
Keep OPENTALKING_FFMPEG_BIN= empty, or run brew install ffmpeg. |
| MPS shows an SVD CPU fallback warning | This is a PyTorch MPS operator coverage limitation. It can affect speed but usually does not block execution. |
| First startup is slow | The first run loads HuBERT, QuickTalk, and the avatar face cache. Reusing the same avatar is faster. |