Skip to content

Generation Tasks

This section is organized by task surface. Every page follows the same structure: minimal example (Python / CLI / HTTP) → key parameterssupported modelscommon combinationstroubleshooting.

Task Shape Typical models Page
text2image text → single image sdxl-base-1.0, flux2.dev, qwen-image Text to Image
text2audio text + reference audio → speech cosyvoice3-triton-trtllm Text to Audio
audio2text audio → text sensevoice-small Audio to Text
image2image image + prompt → image sdxl-base-1.0, sdxl-refiner-1.0 Image to Image
text2video text → video wan2.2-t2v-14b, animate-diff-sdxl Text to Video
image2video first frame + prompt → video svd-xt, wan2.2-i2v-14b Image to Video
audio2video audio + portrait → video soulx-flashtalk-14b, soulx-flashhead-1.3b, soulx-liveact-14b Talking Head

Not sure where to start?

For digital-human products, start with Text to Audio and Talking Head. For learning the request contract at minimum cost, use Text to Image.