Installation¶

OpenTalking ships two installation methods. Selecting the appropriate method is a function of two questions: where the work happens (development, single-machine production, multi-machine production) and what hardware is available (CPU only, NVIDIA GPU, or Ascend NPU).

This page presents the decision matrix and links to the detailed instructions. For a streamlined first-run procedure, see the Quickstart.

Choosing an installation method¶

Use case	Hardware	Recommended method	Detailed guide
Local development, frontend changes, API iteration	Any	Source install + mock synthesis	From source
CPU evaluation	CPU	Source install + mock synthesis	From source
Evaluation on a single GPU machine	NVIDIA 3090 / 4090 / A100 (CUDA 12)	Source install + model-specific backend	From source → single GPU
Evaluation on Ascend NPU	Huawei 910B (CANN 8.0+)	Source install on the host CANN environment	From source → Ascend 910B
Continuous integration	CPU	Source install or Docker Compose, depending on reproducibility needs	From source or Docker Compose → CPU profile
Production single-host deployment	Linux + GPU or NPU	Source install or Docker, depending on operations preference	From source → Production or Docker Compose
Production multi-host deployment with horizontal Worker scaling	Linux + GPU or NPU	Source install, API/Worker split, external Redis	From source → API and Worker split and Deployment

Platform support matrix¶

Platform	Synthesis backends	Notes
macOS (Apple Silicon and Intel)	`mock`	Suitable for orchestration and frontend development. Real talking-head models are not supported on macOS.
Linux x86_64 + CUDA 12	`mock`, `wav2lip`, `musetalk`, `flashtalk`, `flashhead`, `quicktalk`	Primary deployment target.
Linux aarch64 + Ascend 910B (CANN 8.0+)	`mock`, `wav2lip`, `flashtalk`	NPU production deployment path.
Windows	`mock` (WSL2 recommended)	Not part of the continuous integration matrix.

Common prerequisites¶

Independent of the installation method, the following components are required:

A DashScope (Bailian) API key for the default language model (qwen-flash) and speech recognition (paraformer-realtime-v2). Other OpenAI-compatible endpoints may be used; see Configuration §1.
WebRTC-compatible client. The bundled frontend has been tested against Chromium-based browsers. Safari requires additional CORS configuration.

Source-installation additional requirements:

Python 3.10 or later (3.11 recommended).
Node.js 18 or later for the frontend toolchain.
ffmpeg for the text-to-speech decoding stage.
Optionally Redis 6 or later for the API/Worker split deployment.

Docker Compose is a deployment packaging option, not the lightest evaluation path. Use it when repeatable images, containerized service boundaries, or production-like operations are more important than first-run simplicity.

Docker-installation additional requirements:

Docker Engine 20.10 or later and the Compose v2 plugin.
NVIDIA Container Toolkit when running the GPU profile.

Verification¶

Regardless of the installation method, the orchestrator can be verified with the following requests once it is running:

terminal

curl -s http://127.0.0.1:8000/health
# {"status":"ok"}

curl -s http://127.0.0.1:8000/models | jq
# Lists available synthesis backends.

Next steps¶

From source — install from a git checkout. Covers development, production, and Ascend variants.
Docker Compose — install with the packaged Docker stack for reproducible deployments.
Configuration — required environment configuration after installation.
Deployment — selecting a runtime topology.