Skip to content

F5-TTS Local Deployment

F5-TTS is integrated through OpenTalking's local_f5_tts provider. Use it for local voice cloning, short realtime replies, and offline video dubbing. The integration runs as a same-machine HTTP sidecar so the OpenTalking main process stays isolated from F5-TTS runtime and CUDA dependencies.

Use Cases

  • Local voice cloning without a hosted TTS API.
  • A 3-15 second reference clip and matching transcript are available.
  • F5-TTS dependencies should stay outside the main OpenTalking venv.

Weight Preparation

Use a single local audio model root, for example $OPENTALKING_LOCAL_AUDIO_MODEL_ROOT:

Terminal
cd "$OPENTALKING_HOME"
export OPENTALKING_LOCAL_AUDIO_MODEL_ROOT="${OPENTALKING_LOCAL_AUDIO_MODEL_ROOT:-$OPENTALKING_HOME/models/local-audio}"

python scripts/download_local_audio_models.py \
  --root "$OPENTALKING_LOCAL_AUDIO_MODEL_ROOT" \
  --model f5-tts-v1-base

The downloader maps SWivid/F5-TTS F5TTS_v1_Base/model_1250000.safetensors to:

$OPENTALKING_LOCAL_AUDIO_MODEL_ROOT/SWivid__F5-TTS__F5TTS_v1_Base/model_1250000.safetensors

Prepare the runtime and sidecar venv:

Terminal
mkdir -p "$OPENTALKING_LOCAL_AUDIO_MODEL_ROOT/runtime"
cd "$OPENTALKING_LOCAL_AUDIO_MODEL_ROOT/runtime"

if [ ! -d F5-TTS/.git ]; then
  git clone https://github.com/SWivid/F5-TTS.git F5-TTS
fi

python3 -m venv --system-site-packages "$OPENTALKING_LOCAL_AUDIO_MODEL_ROOT/runtime/.venv-f5-tts-system"
. "$OPENTALKING_LOCAL_AUDIO_MODEL_ROOT/runtime/.venv-f5-tts-system/bin/activate"
pip install -U pip wheel setuptools
pip install --no-deps -e "$OPENTALKING_LOCAL_AUDIO_MODEL_ROOT/runtime/F5-TTS"
pip install fastapi "uvicorn[standard]" soundfile cached_path hydra-core ema_pytorch vocos x_transformers transformers_stream_generator rjieba pypinyin tomli bitsandbytes pydub torchcodec torchdiffeq unidecode wandb

Configuration

.env
OPENTALKING_TTS_DEFAULT_PROVIDER=local_f5_tts
OPENTALKING_TTS_LOCAL_F5_TTS_SERVICE_URL=http://127.0.0.1:19095/synthesize
OPENTALKING_LOCAL_AUDIO_MODEL_ROOT=./models/local-audio
OPENTALKING_TTS_LOCAL_F5_TTS_RUNTIME_DIR=./models/local-audio/runtime/F5-TTS
OPENTALKING_TTS_LOCAL_F5_TTS_DEVICE=cuda

Voice Cloning

local_f5_tts requires reference audio. Upload a clone voice through the API, or prepare the directory manually:

$OPENTALKING_LOCAL_AUDIO_MODEL_ROOT/voices/clones/my-f5-voice/
  prompt.wav
  prompt.txt
  meta.json

Example meta.json:

{"provider":"local_f5_tts"}

After upload, /api/voices?provider=local_f5_tts returns the voice id. TTS preview, realtime dialogue, and video generation can all use that voice.

Start Command

Start the F5-TTS sidecar first, then OpenTalking:

Terminal
cd "$OPENTALKING_HOME"
export OPENTALKING_LOCAL_AUDIO_MODEL_ROOT="${OPENTALKING_LOCAL_AUDIO_MODEL_ROOT:-$OPENTALKING_HOME/models/local-audio}"
export OPENTALKING_F5_TTS_VENV_DIR="$OPENTALKING_LOCAL_AUDIO_MODEL_ROOT/runtime/.venv-f5-tts-system"
bash scripts/quickstart/start_local_f5_tts.sh --port 19095

export OPENTALKING_TTS_DEFAULT_PROVIDER=local_f5_tts
export OPENTALKING_TTS_LOCAL_F5_TTS_SERVICE_URL=http://127.0.0.1:19095/synthesize
python -m apps.api.main

Verification

Terminal
curl -fsS http://127.0.0.1:19095/health
curl -fsS http://127.0.0.1:8000/health

TTS preview should use provider local_f5_tts and a clone voice with prompt.wav. Save the result as WAV and verify the spoken text and voice by ASR or listening.

Benchmark Log

Item Command / API Target Measured
TTS preview /tts/preview + SenseVoiceSmall ASR Playable WAV, correct text Passed: preview returned a 16 kHz mono WAV; SenseVoiceSmall ASR matched the target text.
Realtime dialogue local mode dialogue / warm TTS RTF < 1.0 Passed: warm RTF 0.278 for 3.31s audio in 0.918s; prior warm checks were 0.386 and 0.518, all below 1
Offline video video generation API / CLI Generation succeeds, audio drives avatar Passed: QuickTalk + F5 clone voice generated an MP4; ffprobe showed H.264 video and 16 kHz mono AAC audio.

Common Errors

Symptom Action
Missing F5-TTS checkpoint Confirm model_1250000.safetensors is under SWivid__F5-TTS__F5TTS_v1_Base.
requires prompt_audio Select a clone voice or set OPENTALKING_TTS_LOCAL_F5_TTS_PROMPT_AUDIO.
Dependency conflicts Do not run the sidecar from OpenTalking's main .venv; use a separate venv such as $OPENTALKING_LOCAL_AUDIO_MODEL_ROOT/runtime/.venv-f5-tts-system and reuse the host PyTorch/CUDA environment when appropriate.
Slow first request Set OPENTALKING_TTS_LOCAL_F5_TTS_PRELOAD=1 and run a short warm-up request after startup.
QuickTalk v3 reshape error Keep OPENTALKING_QUICKTALK_RESOLUTION=256 for the current TorchScript export when generating video; 160/128 resolution makes internal feature shapes mismatch.