TTS and Voices¶

Endpoints for one-off text-to-speech synthesis (without creating a session) and for management of cloned voices.

TTS preview¶

`POST /tts/preview`¶

Synthesizes a short audio clip without creating a session. Used by the frontend for voice auditioning before a session is started.

Request body — application/json

Field	Type	Required	Description
`text`	string	Yes	Text to synthesize. Recommended maximum length: 200 characters.
`voice`	string	No	Voice identifier. Format depends on `provider`.
`provider`	string	No	One of `edge`, `dashscope`, `cosyvoice`, `elevenlabs`. Defaults to `OPENTALKING_TTS_PROVIDER`.
`model`	string	No	Provider-specific model identifier.

Response — 200 OK

Content-Type: audio/wav. Body is a 16-bit PCM WAV file at the session sample rate (default 16000 Hz).

curl

curl -s -X POST http://localhost:8000/tts/preview \
  -H 'content-type: application/json' \
  -d '{"text": "Hello, this is a voice preview.", "provider": "edge", "voice": "en-US-AriaNeural"}' \
  -o preview.wav

Error responses

Code	Condition
`400`	`text` is empty or exceeds the configured length limit.
`502`	The upstream TTS provider returned an error.

Voices¶

The voice catalog persists cloned voices in a local SQLite database. Cloned voices remain available until explicitly deleted; they survive process restarts.

`GET /voices`¶

Lists cloned voices.

Query parameters

Parameter	Type	Description
`provider`	string \| null	Filter by provider (`cosyvoice` or `dashscope`).

Response — 200 OK

{
  "items": [
    {
      "id": 1,
      "user_id": null,
      "provider": "dashscope",
      "voice_id": "u3e7c12ab",
      "display_label": "Alice's Voice",
      "target_model": "qwen3-tts-flash-realtime",
      "source": "clone"
    }
  ]
}

Field descriptions:

Field	Type	Description
`id`	integer	Catalog primary key. Used in `DELETE /voices/{entry_id}`.
`user_id`	string \| null	Reserved for future multi-tenant deployments.
`provider`	string	`cosyvoice` or `dashscope`.
`voice_id`	string	Provider-side voice identifier. Pass this value as the session's `tts_voice` when using the cloned voice.
`display_label`	string	Human-readable label.
`target_model`	string	TTS model identifier this clone was created for.
`source`	string	Always `"clone"` for entries created by this endpoint.

curl

curl -s 'http://localhost:8000/voices?provider=dashscope' | jq

`POST /voices/clone`¶

Clones a voice from an audio sample. Two providers are supported, each with its own requirements.

Request body — multipart/form-data

Field	Type	Required	Description
`provider`	string	Yes	`cosyvoice` or `dashscope`.
`target_model`	string	Yes	TTS model identifier the clone will be used with. For DashScope: a voice-cloning-compatible model such as a Qwen VC model. For CosyVoice: a CosyVoice model identifier.
`display_label`	string	Yes	Human-readable label. The endpoint deduplicates labels by appending a timestamp suffix when a conflict exists.
`audio`	file	Yes	Audio sample. Minimum 256 bytes, maximum 12 MB.
`prefix`	string	No	CosyVoice only. Optional voice identifier prefix; random characters are generated when omitted.
`preferred_name`	string	No	DashScope only. Preferred voice name; a random identifier is generated when omitted.

Provider-specific requirements

CosyVoice uploads the audio sample and provides its public URL to DashScope; the OpenTalking server must be reachable from DashScope. Configure OPENTALKING_PUBLIC_BASE_URL to specify the public URL. The sample is removed from disk approximately 300 seconds after upload.
DashScope uses base64-encoded inline audio and does not require public reachability.

Response — 200 OK

{
  "ok": true,
  "entry_id": 12,
  "voice_id": "u3e7c12ab",
  "display_label": "Alice's Voice",
  "provider": "dashscope",
  "target_model": "qwen3-tts-flash-realtime",
  "message": "..."
}

Field	Type	Description
`entry_id`	integer	Catalog primary key, for later deletion.
`voice_id`	string	Provider-side voice identifier. Use this value when invoking `speak` or `chat` with the cloned voice.
`display_label`	string	Resolved label (may include a deduplication suffix).
`message`	string	Human-readable status message.

curl: DashScope clone

curl -s -X POST http://localhost:8000/voices/clone \
  -F provider=dashscope \
  -F target_model=qwen3-tts-flash-realtime \
  -F display_label="Alice's Voice" \
  -F audio=@sample.wav

Error responses

Code	Condition
`400`	`provider` is not `cosyvoice` or `dashscope`; audio is too short, missing, or exceeds 12 MB; audio format cannot be converted to 24 kHz mono WAV.
`502`	The upstream provider returned an error (DashScope rejection, CosyVoice cloning failure).

`DELETE /voices/{entry_id}`¶

Removes a cloned voice from the catalog.

Path parameters

Parameter	Type	Description
`entry_id`	integer	Catalog primary key from `GET /voices`.

Response — 200 OK

{"deleted": true}

Error responses

Code	Condition
`404`	The entry identifier was not found.

`GET /voice-uploads/{token}`¶

Internal endpoint that serves an uploaded audio sample for CosyVoice. The endpoint exists so that DashScope's CosyVoice service can retrieve the sample over HTTP.

Path parameters

Parameter	Type	Description
`token`	string	32-character hex token generated by `POST /voices/clone`.

Response — 200 OK

Content-Type: audio/wav. Body is the uploaded sample.

Error responses

Code	Condition
`404`	The token is malformed or the sample has expired.

Public exposure

This endpoint serves user-uploaded audio. Production deployments should rate-limit the path at the reverse proxy and ensure that OPENTALKING_PUBLIC_BASE_URL points to an internet-reachable address only when CosyVoice cloning is in use.

Provider matrix¶

Provider	Cloning	Voice format	Notes
`edge`	Not supported	`<lang>-<region>-<name>Neural` (e.g. `en-US-AriaNeural`)	Built-in, no API key required.
`dashscope`	Supported	Console-defined name (e.g. `xiaoxiao`) or `voice_id` from clone	Requires `DASHSCOPE_API_KEY`.
`cosyvoice`	Supported	`voice_id` returned by clone, prefixed	Requires the OpenTalking server to be reachable from DashScope.
`elevenlabs`	External to OpenTalking	ElevenLabs `voice_id`	Requires `OPENTALKING_TTS_ELEVENLABS_API_KEY`.

Source files¶

apps/api/routes/tts_preview.py — /tts/preview.
apps/api/routes/voices.py — /voices/*, /voice-uploads/{token}.
opentalking/providers/tts/dashscope_qwen/clone.py — DashScope and CosyVoice cloning implementations.
opentalking/voice/store.py — voice catalog (SQLite).
opentalking/tts/adapters/ — provider-specific TTS adapters used by /tts/preview and session synthesis.

TTS and Voices¶

TTS preview¶

POST /tts/preview¶

Voices¶

GET /voices¶

POST /voices/clone¶

DELETE /voices/{entry_id}¶

GET /voice-uploads/{token}¶

Provider matrix¶

Source files¶

`POST /tts/preview`¶

`GET /voices`¶

`POST /voices/clone`¶

`DELETE /voices/{entry_id}`¶

`GET /voice-uploads/{token}`¶