LLM and STT¶

The LLM decides what the digital human says. STT is required only when users speak through the microphone; text-only chat and speak requests do not need STT.

LLM¶

OpenTalking uses an OpenAI-compatible chat-completions interface. DashScope is the default because it works with the default Chinese demo settings.

.env

OPENTALKING_LLM_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
OPENTALKING_LLM_API_KEY=<dashscope-api-key>
OPENTALKING_LLM_MODEL=qwen-flash

Common alternatives:

Provider	Configuration notes
OpenAI	Set `OPENTALKING_LLM_BASE_URL=https://api.openai.com/v1` and use an OpenAI model id.
vLLM	Point `OPENTALKING_LLM_BASE_URL` to the vLLM OpenAI-compatible server.
Ollama	Use the Ollama OpenAI-compatible endpoint, usually `http://localhost:11434/v1`.
DeepSeek	Use the provider's OpenAI-compatible base URL and model id.

Verify the API key and endpoint by starting OpenTalking and sending a text chat request after creating a mock session.

STT¶

The default speech-recognition backend is DashScope Paraformer realtime.

.env

DASHSCOPE_API_KEY=<dashscope-api-key>
OPENTALKING_STT_MODEL=paraformer-realtime-v2

For DashScope-based deployments, DASHSCOPE_API_KEY and OPENTALKING_LLM_API_KEY can use the same key. If microphone input fails but text chat works, verify this key first.

Verification¶

terminal

curl -fsS http://127.0.0.1:8000/health
curl -s -X POST http://127.0.0.1:8000/sessions \
  -H 'content-type: application/json' \
  -d '{"avatar_id":"demo-avatar","model":"mock"}'

Then use the frontend microphone flow to confirm STT events and LLM responses appear in the session event stream.