Text to Audio¶
Given target text and a reference audio clip, generate a .wav speech artifact. OmniRT currently exposes CosyVoice3 through cosyvoice3-triton-trtllm, which targets the official Triton/TensorRT-LLM service path instead of a local Python-only shortcut.
Minimal Example¶
from omnirt import generate
from omnirt.requests import text2audio
result = generate(text2audio(
model="cosyvoice3-triton-trtllm",
prompt="Hello from OmniRT.",
audio="inputs/reference.wav",
reference_text="This is the reference voice text.",
backend="cuda",
server_addr="localhost",
server_port=18001,
seed=42,
))
print(result.outputs[0].path)
Key Parameters¶
| Parameter | Type | Default | Notes |
|---|---|---|---|
prompt |
str |
required | Target text to synthesize |
audio |
str |
required | Reference audio path, resampled to 16 kHz before the Triton request |
reference_text |
str |
"" |
Transcript for the reference audio; recommended for zero-shot voice reuse |
server_addr |
str |
127.0.0.1 |
Triton gRPC server address |
server_port |
int |
8001 |
Triton gRPC port; the current 146 validation container uses 18001 |
model_name |
str |
cosyvoice3 |
Triton model-repository name |
sample_rate |
int |
24000 |
Output wav sample rate |
seed |
int |
unset | Forwarded as a Triton request parameter; the server-side BLS must consume it for deterministic sampling |
Deployment Notes¶
The stable 146-machine service profile is GPU1, token2wav=2, vocoder=2, and kv_cache_free_gpu_memory_fraction=0.2; Triton gRPC is exposed on 18001 inside the validation container. On 2026-04-28, the OmniRT text2audio wrapper generated a 2.92s / 24kHz wav with denoise_loop_ms=1969.611; the official 26-sample streaming benchmark measured RTF=0.1303 and 699.13ms average first-chunk latency.
Full record: CosyVoice Benchmark.
Troubleshooting¶
- No local Triton service: this wrapper calls an external official service. Start CosyVoice3
runtime/triton_trtllmbefore running OmniRT. - Missing
tritonclientorsoundfile: install the CosyVoice/Triton client dependencies first. seedstill does not stabilize results: verify that the Triton BLS reads and forwardsseedto the OpenAI/TensorRT-LLM request; client-side parameters alone cannot change sampling.