LLM and STT¶

The LLM decides what the digital human says. STT is required only when users speak through the microphone; text-only speak requests do not need STT.

LLM¶

OpenTalking uses an OpenAI-compatible chat-completions interface. DashScope is the default because it works with the default Chinese demo settings.

.env

OPENTALKING_LLM_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
OPENTALKING_LLM_API_KEY=<dashscope-api-key>
OPENTALKING_LLM_MODEL=qwen-flash

Common alternatives:

Provider	Configuration notes
OpenAI	Set `OPENTALKING_LLM_BASE_URL=https://api.openai.com/v1` and use an OpenAI model id.
vLLM	Point `OPENTALKING_LLM_BASE_URL` to the vLLM OpenAI-compatible server.
Ollama	Use the Ollama OpenAI-compatible endpoint, usually `http://localhost:11434/v1`.
DeepSeek	Use the provider's OpenAI-compatible base URL and model id.
Atlas Cloud	OpenAI-compatible gateway hosting DeepSeek, Qwen, and other models. Set `OPENTALKING_LLM_BASE_URL=https://api.atlascloud.ai/v1`. See Atlas Cloud.

Atlas Cloud is OpenAI-compatible, so it works as a drop-in LLM backend:

.env

OPENTALKING_LLM_BASE_URL=https://api.atlascloud.ai/v1
OPENTALKING_LLM_API_KEY=<atlascloud-api-key>
OPENTALKING_LLM_MODEL=deepseek-ai/deepseek-v4-pro

deepseek-ai/deepseek-v4-pro is a reasoning model, so allow enough max_tokens (≥ 512) or the reply may be truncated before any visible text is produced.

All Atlas Cloud chat models (59)

- **Anthropic (Claude):** `anthropic/claude-haiku-4.5-20251001`, `anthropic/claude-opus-4.8`, `anthropic/claude-sonnet-4.6` - **OpenAI (GPT):** `openai/gpt-5.4`, `openai/gpt-5.5` - **Google (Gemini):** `google/gemini-3.1-flash-lite`, `google/gemini-3.1-pro-preview`, `google/gemini-3.5-flash` - **Qwen:** `qwen/qwen2.5-7b-instruct`, `Qwen/Qwen3-235B-A22B-Instruct-2507`, `qwen/qwen3-235b-a22b-thinking-2507`, `qwen/qwen3-30b-a3b`, `Qwen/Qwen3-30B-A3B-Instruct-2507`, `qwen/qwen3-30b-a3b-thinking-2507`, `qwen/qwen3-32b`, `qwen/qwen3-8b`, `Qwen/Qwen3-Coder`, `qwen/qwen3-coder-next`, `qwen/qwen3-max-2026-01-23`, `Qwen/Qwen3-Next-80B-A3B-Instruct`, `Qwen/Qwen3-Next-80B-A3B-Thinking`, `Qwen/Qwen3-VL-235B-A22B-Instruct`, `qwen/qwen3-vl-235b-a22b-thinking`, `qwen/qwen3-vl-30b-a3b-instruct`, `qwen/qwen3-vl-30b-a3b-thinking`, `qwen/qwen3-vl-8b-instruct`, `qwen/qwen3.5-122b-a10b`, `qwen/qwen3.5-27b`, `qwen/qwen3.5-35b-a3b`, `qwen/qwen3.5-397b-a17b`, `qwen/qwen3.6-35b-a3b`, `qwen/qwen3.6-plus` - **DeepSeek:** `deepseek-ai/deepseek-ocr`, `deepseek-ai/deepseek-r1-0528`, `deepseek-ai/DeepSeek-V3-0324`, `deepseek-ai/DeepSeek-V3.1`, `deepseek-ai/DeepSeek-V3.1-Terminus`, `deepseek-ai/deepseek-v3.2`, `deepseek-ai/DeepSeek-V3.2-Exp`, `deepseek-ai/deepseek-v4-flash`, `deepseek-ai/deepseek-v4-pro` - **Moonshot (Kimi):** `moonshotai/Kimi-K2-Instruct`, `moonshotai/Kimi-K2-Instruct-0905`, `moonshotai/Kimi-K2-Thinking`, `moonshotai/kimi-k2.5`, `moonshotai/kimi-k2.6` - **Zhipu (GLM):** `zai-org/GLM-4.6`, `zai-org/glm-4.7`, `zai-org/glm-5`, `zai-org/glm-5-turbo`, `zai-org/glm-5.1`, `zai-org/glm-5v-turbo` - **MiniMax:** `MiniMaxAI/MiniMax-M2`, `minimaxai/minimax-m2.1`, `minimaxai/minimax-m2.5`, `minimaxai/minimax-m2.7` - **xAI:** `xai/grok-4.3` - **Kwaipilot:** `kwaipilot/kat-coder-pro-v2` - **Other:** `owl`

Verify the API key and endpoint by starting OpenTalking and sending a text speak request after creating a mock session.

STT¶

Select the STT provider with OPENTALKING_STT_DEFAULT_PROVIDER. The frontend can also select local STT or API STT before a session starts. When API STT is selected, the provider-specific key must be configured; it is not populated from the LLM key.

DashScope Paraformer realtime¶

.env

OPENTALKING_STT_DEFAULT_PROVIDER=dashscope
OPENTALKING_STT_DASHSCOPE_API_KEY=<dashscope-api-key>
OPENTALKING_STT_DASHSCOPE_MODEL=paraformer-realtime-v2

For DashScope-based deployments, LLM and STT may use the same actual key, but it must be written explicitly to OPENTALKING_LLM_API_KEY and OPENTALKING_STT_DASHSCOPE_API_KEY. If microphone input fails but text speak works, verify the STT module key first.

Local SenseVoiceSmall¶

.env

OPENTALKING_STT_DEFAULT_PROVIDER=sensevoice
OPENTALKING_STT_ENABLED_PROVIDERS=sensevoice,dashscope
OPENTALKING_STT_SENSEVOICE_MODEL=iic/SenseVoiceSmall
OPENTALKING_STT_SENSEVOICE_MODEL_DIR=./models/local-audio/iic__SenseVoiceSmall
OPENTALKING_STT_SENSEVOICE_DEVICE=cpu

SenseVoiceSmall uses the local FunASR adapter and supports both uploaded audio and WebSocket PCM microphone input. CPU inference is usually enough for short realtime utterances, which makes it a good match for QuickTalk local.

Download the weights:

terminal

uv sync --extra dev --extra models --extra local-audio --python 3.11
python scripts/download_local_audio_models.py \
  --root ./models/local-audio \
  --model sensevoice-small

Verification¶

terminal

curl -fsS http://127.0.0.1:8000/health
curl -s -X POST http://127.0.0.1:8000/sessions \
  -H 'content-type: application/json' \
  -d '{"avatar_id":"demo-avatar","model":"mock"}'

Then use the frontend microphone flow to confirm STT events and LLM responses appear in the session event stream.

Frontend Entry¶

After the model or backend service is running, use the OpenTalking WebUI:

Terminal

cd "$OPENTALKING_HOME"
bash scripts/quickstart/start_frontend.sh --api-port 8000 --web-port 5173 --host 0.0.0.0

For a remote server, forward your local browser port to the server 5173, then open http://127.0.0.1:5173.