OpenTalking¶

The open-source orchestration layer for real-time digital humans.

OpenTalking is not a talking-head model. It is the layer that integrates a talking-head model with everything else a production conversational digital human requires: streaming speech recognition, large language models, text-to-speech synthesis, WebRTC delivery, and per-session control. Plug in the model and provider combination that fits the deployment; the orchestration contract stays the same.

Get started in five minutes Stable docs URL View on GitHub

The site defaults to Chinese. English lives at https://datascale-ai.github.io/opentalking/en/.

What OpenTalking is for¶

Building a digital human application that talks and listens in real time involves roughly a dozen moving parts: speech recognition with end-pointing, a streaming language model client, sentence-level text-to-speech synthesis, audio decoding, talking-head rendering, WebRTC track management, barge-in handling, and session state. OpenTalking implements all of these as a single FastAPI service, exposes a small REST and WebSocket interface, and delegates synthesis to the configured model backend for each session.

If the question is "I have a wav2lip checkpoint, how do I serve a real-time chat experience on top?" — OpenTalking is the answer. If the question is "how should the model itself run?", choose a backend: local adapter, direct WebSocket service, OmniRT, or a mock path for tests.

Key capabilities¶

Realtime conversation pipeline

ASR, LLM, TTS, talking-head rendering, and WebRTC delivery in one interruptible session loop.
Pluggable model backends

Resolve synthesis by model + backend: mock, local, direct_ws, or omnirt.
Unified API

REST, SSE, WebSocket, and WebRTC signaling are exposed through one FastAPI service.
Replaceable providers

Use OpenAI-compatible LLM endpoints and switch TTS across Edge, DashScope, CosyVoice, or ElevenLabs.
Avatar and voice assets

Manage avatar bundles, custom portraits, voice catalog entries, and cloned voices.
Flexible deployment

Run unified, split API/Worker, Docker Compose, or remote GPU/NPU model services.

Pick your starting point¶

Quickstart

Five-minute walkthrough from source checkout to a working end-to-end session using the mock synthesis path.

Quickstart →
Configuration

Reference for every environment variable and YAML field, with default values and precedence rules.

Configuration →
Deployment

Deployment topologies covering single-process, API/Worker split, Docker Compose, and Ascend 910B.

Deployment →
API Reference

Complete REST, Server-Sent Events, and WebSocket reference for all endpoints.

API Reference →
Model Adapter

Integration guide for adding a new talking-head model to OpenTalking.

Model Adapter →
Architecture

System architecture, session lifecycle, and event bus reference.

Architecture →

Minimal example¶

terminal

git clone https://github.com/datascale-ai/opentalking.git
cd opentalking

uv sync --extra dev --python 3.11
source .venv/bin/activate
cp .env.example .env

# Configure OPENTALKING_LLM_API_KEY and DASHSCOPE_API_KEY in .env, then:
bash scripts/quickstart/start_mock.sh

After startup, open http://localhost:5173, select the demo-avatar and mock model, and initiate a conversation. To enable real talking-head synthesis, configure the selected model's backend, for example:

configs/default.yaml

models:
  wav2lip:
    backend: omnirt
  quicktalk:
    backend: local
  flashhead:
    backend: direct_ws

OMNIRT_ENDPOINT is required only for models using backend: omnirt; the client-side workflow remains unchanged.

System architecture¶

flowchart LR
    Browser([Browser])
    API[FastAPI<br/>HTTP / WS / WebRTC]
    Worker[Pipeline driver<br/>LLM &rarr; TTS &rarr; synthesis]
    Backend[(Synthesis backend<br/>local / direct_ws / OmniRT)]
    LLM[(LLM endpoint<br/>OpenAI-compatible)]
    TTS[(TTS<br/>Edge / DashScope / ElevenLabs)]

    Browser -->|HTTP / SSE / WebRTC| API
    API <-->|Redis or in-memory bus| Worker
    Worker --> LLM
    Worker --> TTS
    Worker --> Backend

The complete system view — components, deployment topologies, session lifecycle, and event bus schema — is documented in Architecture.

Where OpenTalking fits¶

Concern	OpenTalking	Synthesis backend	Hosted LLM	TTS provider
Session lifecycle and state	✓
HTTP and WebSocket API	✓
WebRTC signaling and tracks	✓
Speech recognition (via DashScope)	✓
Sentence-level streaming pipeline	✓
Barge-in propagation	✓
Voice catalog and cloning	✓
Avatar bundle management	✓
Talking-head model weights and inference		✓
GPU and NPU scheduling and batching		✓ when backend supports it
Chat completion inference			✓
Speech synthesis				✓

Use cases¶

Conversational assistants with a visual avatar for customer support, sales, or onboarding.
Live broadcast applications where the avatar responds to viewer interactions in real time.
Educational software that requires both speech recognition and synthesized speech feedback.
Internal productivity tools that pair a corporate language model with a digital human interface.
Research and prototyping for talking-head generation, where the orchestration layer is needed but not the focus of the work.

Community and support¶

GitHub — datascale-ai/opentalking for issues, pull requests, and discussions.
QQ group — 1103327938 (AI 数字人交流群), primarily Chinese-language community.
Documentation — User Guide, Developer Guide, API Reference.
Contributing — see the Contributing guide for submission guidelines.

License¶

OpenTalking is released under the Apache License, Version 2.0. Talking-head model weights and external model services are governed by their individual licenses; consult the respective model repositories or backend deployments for details.