Generation Tasks¶

This section is organized by task surface. Every page follows the same structure: minimal example (Python / CLI / HTTP) → key parameters → supported models → common combinations → troubleshooting.

Task	Shape	Typical models	Page
`text2image`	text → single image	`sdxl-base-1.0`, `flux2.dev`, `qwen-image`	Text to Image
`text2audio`	text + reference audio → speech	`cosyvoice3-triton-trtllm`	Text to Audio
`audio2text`	audio → text	`sensevoice-small`	Audio to Text
`image2image`	image + prompt → image	`sdxl-base-1.0`, `sdxl-refiner-1.0`	Image to Image
`text2video`	text → video	`wan2.2-t2v-14b`, `animate-diff-sdxl`	Text to Video
`image2video`	first frame + prompt → video	`svd-xt`, `wan2.2-i2v-14b`	Image to Video
`audio2video`	audio + portrait → video	`soulx-flashtalk-14b`, `soulx-flashhead-1.3b`, `soulx-liveact-14b`	Talking Head

Not sure where to start?

For digital-human products, start with Text to Audio and Talking Head. For learning the request contract at minimum cost, use Text to Image.