Skip to content

Generation Tasks

This section is organized by task surface. Every page follows the same structure: minimal example (Python / CLI / HTTP) → key parameterssupported modelscommon combinationstroubleshooting.

Task Shape Typical models Page
text2image text → single image sd15, sdxl-base-1.0, flux2.dev, qwen-image Text to Image
text2audio text + reference audio → speech cosyvoice3-triton-trtllm Text to Audio
image2image image + prompt → image sd15, sdxl-base-1.0 Image to Image
text2video text → video wan2.2-t2v-14b, cogvideox-2b, hunyuan-video Text to Video
image2video first frame + prompt → video svd-xt, wan2.2-i2v-14b Image to Video
audio2video audio + portrait → video soulx-flashtalk-14b Talking Head

Not sure where to start?

Read Text to Image first — it's the cheapest task to get running end-to-end on OmniRT, and it's what omnirt validate / omnirt generate defaults to when teaching the tool.