本地语音 + QuickTalk¶
这是一条面向私有化验证的全本地媒体链路:
flowchart LR
Mic["麦克风 / 上传音频"] --> STT["SenseVoiceSmall<br/>local CPU"]
Text["文本输入"] --> LLM["LLM<br/>OpenAI-compatible"]
STT --> LLM
LLM --> TTS["Fun-CosyVoice3-0.5B-2512<br/>local_cosyvoice service"]
TTS --> QT["QuickTalk local<br/>CUDA"]
QT --> RTC["WebRTC / 浏览器播放"]
LLM 仍是独立模块,可以指向百炼、OpenAI、vLLM、Ollama 或你自己的本地 OpenAI-compatible 服务。STT、TTS、Video 都可以在本机部署。
适合场景¶
- 希望语音输入和语音合成都在本地运行。
- 希望 QuickTalk 直接由 OpenTalking 的 local adapter 驱动,不先引入 OmniRT。
- 需要验证自定义头像、复刻音色和实时数字人链路。
不适合 8GB 显存机器直接全开本地 TTS + QuickTalk;显存紧张时优先保留 SenseVoiceSmall CPU + QuickTalk local,TTS 改为 Edge 或 DashScope。
Provider 配置¶
.env
OPENTALKING_LLM_PROVIDER=openai_compatible
OPENTALKING_LLM_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
OPENTALKING_LLM_API_KEY=<llm-key>
OPENTALKING_LLM_MODEL=qwen-flash
OPENTALKING_STT_DEFAULT_PROVIDER=sensevoice
OPENTALKING_STT_ENABLED_PROVIDERS=sensevoice,dashscope
OPENTALKING_STT_SENSEVOICE_MODEL=iic/SenseVoiceSmall
OPENTALKING_STT_SENSEVOICE_MODEL_DIR=./models/local-audio/iic__SenseVoiceSmall
OPENTALKING_STT_SENSEVOICE_DEVICE=cpu
OPENTALKING_TTS_DEFAULT_PROVIDER=local_cosyvoice
OPENTALKING_TTS_ENABLED_PROVIDERS=local_cosyvoice,dashscope,edge
OPENTALKING_TTS_LOCAL_COSYVOICE_MODEL=FunAudioLLM/Fun-CosyVoice3-0.5B-2512
OPENTALKING_TTS_LOCAL_COSYVOICE_MODEL_DIR=./models/local-audio/FunAudioLLM__Fun-CosyVoice3-0.5B-2512
OPENTALKING_TTS_LOCAL_COSYVOICE_RUNTIME_DIR=./models/local-audio/runtime/CosyVoice
OPENTALKING_TTS_LOCAL_COSYVOICE_SERVICE_URL=http://127.0.0.1:19090/synthesize
OPENTALKING_TTS_LOCAL_COSYVOICE_DEVICE=cuda:0
OPENTALKING_QUICKTALK_BACKEND=local
OPENTALKING_QUICKTALK_ASSET_ROOT=./models/quicktalk
OPENTALKING_QUICKTALK_WORKER_CACHE=1
OPENTALKING_TORCH_DEVICE=cuda:0
*_DEFAULT_PROVIDER 只决定默认选择,不是 fallback。前端选择 API STT/TTS 时,必须配置对应 provider 的 key,例如:
.env
OPENTALKING_STT_DASHSCOPE_API_KEY=<dashscope-stt-key>
OPENTALKING_TTS_DASHSCOPE_API_KEY=<dashscope-tts-key>
安装与模型¶
终端
uv sync --extra dev --extra models --extra local-audio --extra local-cosyvoice-service --python 3.11
python scripts/download_local_audio_models.py \
--root ./models/local-audio \
--model sensevoice-small \
--model fun-cosyvoice3-0.5b-2512
QuickTalk 权重按 QuickTalk 页面准备。CosyVoice runtime 放在模型目录下即可:
终端
mkdir -p ./models/local-audio/runtime
git clone https://github.com/FunAudioLLM/CosyVoice.git ./models/local-audio/runtime/CosyVoice
cd ./models/local-audio/runtime/CosyVoice
git submodule update --init --recursive
启动¶
先启动本地 TTS service:
终端
OPENTALKING_TTS_LOCAL_COSYVOICE_PRELOAD=1 \
python scripts/local_cosyvoice_service.py --host 127.0.0.1 --port 19090
再启动 OpenTalking:
验证¶
终端
curl -fsS http://127.0.0.1:19090/health
curl -fsS http://127.0.0.1:8000/api/runtime/status
curl -s http://127.0.0.1:8000/models | jq '.statuses[] | select(.id=="quicktalk")'
期望:
stt_provider=sensevoicetts_provider=local_cosyvoicequicktalk_backend=localquicktalk.connected=true
前端选择本地 STT、本地 CosyVoice3 和 QuickTalk avatar 后,分别测试文本输入、麦克风输入和 TTS preview。
与 API provider 混用¶
全本地部署不是强制固定。用户可以在前端选择 API STT 或 API TTS,但后端不会隐式使用 LLM key 或 DASHSCOPE_API_KEY。API provider 缺少 key 时,前端启动前会提示错误;会话中 API 返回错误时,数字人对话界面会显示错误消息。