Speech Models¶
This directory collects speech-related model deployment, weight download, and verification for OpenTalking. Speech models are split into two groups:
- Speech Recognition Models: convert microphone or uploaded audio into text; locally deployable models include SenseVoice.
- Speech Generation Models: convert LLM text output into audio; locally deployable models include CosyVoice, IndexTTS, and Qwen3-TTS.
The LLM decides what to say and is not classified as a speech model; this section covers input recognition and output synthesis.