Generation Tasks¶
This section is organized by task surface. Every page follows the same structure: minimal example (Python / CLI / HTTP) → key parameters → supported models → common combinations → troubleshooting.
| Task | Shape | Typical models | Page |
|---|---|---|---|
text2image |
text → single image | sdxl-base-1.0, flux2.dev, qwen-image |
Text to Image |
text2audio |
text + reference audio → speech | cosyvoice3-triton-trtllm |
Text to Audio |
audio2text |
audio → text | sensevoice-small |
Audio to Text |
image2image |
image + prompt → image | sdxl-base-1.0, sdxl-refiner-1.0 |
Image to Image |
text2video |
text → video | wan2.2-t2v-14b, animate-diff-sdxl |
Text to Video |
image2video |
first frame + prompt → video | svd-xt, wan2.2-i2v-14b |
Image to Video |
audio2video |
audio + portrait → video | soulx-flashtalk-14b, soulx-flashhead-1.3b, soulx-liveact-14b |
Talking Head |
Not sure where to start?
For digital-human products, start with Text to Audio and Talking Head. For learning the request contract at minimum cost, use Text to Image.