Text to Audio¶

Given target text and a reference audio clip, generate a .wav speech artifact. OmniRT currently exposes CosyVoice3 through cosyvoice3-triton-trtllm, which targets the official Triton/TensorRT-LLM service path instead of a local Python-only shortcut.

Minimal Example¶

PythonCLIYAML

from omnirt import generate
from omnirt.requests import text2audio

result = generate(text2audio(
    model="cosyvoice3-triton-trtllm",
    prompt="Hello from OmniRT.",
    audio="inputs/reference.wav",
    reference_text="This is the reference voice text.",
    backend="cuda",
    server_addr="localhost",
    server_port=18001,
    seed=42,
))

print(result.outputs[0].path)

omnirt generate \
  --task text2audio \
  --model cosyvoice3-triton-trtllm \
  --prompt "Hello from OmniRT." \
  --audio inputs/reference.wav \
  --reference-text "This is the reference voice text." \
  --backend cuda \
  --server-addr localhost \
  --server-port 18001 \
  --seed 42

task: text2audio
model: cosyvoice3-triton-trtllm
backend: cuda
inputs:
  prompt: Hello from OmniRT.
  audio: inputs/reference.wav
  reference_text: This is the reference voice text.
config:
  server_addr: localhost
  server_port: 18001
  seed: 42

Key Parameters¶

Parameter	Type	Default	Notes
`prompt`	`str`	required	Target text to synthesize
`audio`	`str`	required	Reference audio path, resampled to 16 kHz before the Triton request
`reference_text`	`str`	`""`	Transcript for the reference audio; recommended for zero-shot voice reuse
`server_addr`	`str`	`127.0.0.1`	Triton gRPC server address
`server_port`	`int`	`8001`	Triton gRPC port; the current 146 validation container uses `18001`
`model_name`	`str`	`cosyvoice3`	Triton model-repository name
`sample_rate`	`int`	`24000`	Output wav sample rate
`seed`	`int`	unset	Forwarded as a Triton request parameter; the server-side BLS must consume it for deterministic sampling

Deployment Notes¶

The stable 146-machine service profile is GPU1, token2wav=2, vocoder=2, and kv_cache_free_gpu_memory_fraction=0.2; Triton gRPC is exposed on 18001 inside the validation container. On 2026-04-28, the OmniRT text2audio wrapper generated a 2.92s / 24kHz wav with denoise_loop_ms=1969.611; the official 26-sample streaming benchmark measured RTF=0.1303 and 699.13ms average first-chunk latency.

Full record: CosyVoice Benchmark.

Troubleshooting¶

No local Triton service: this wrapper calls an external official service. Start CosyVoice3 runtime/triton_trtllm before running OmniRT.
Missing tritonclient or soundfile: install the CosyVoice/Triton client dependencies first.
seed still does not stabilize results: verify that the Triton BLS reads and forwards seed to the OpenAI/TensorRT-LLM request; client-side parameters alone cannot change sampling.