Skip to content

Text to Video

Generate an MP4 from a prompt. OmniRT wraps video tasks in the same request contract as image tasks; the artifact is always exported via imageio-ffmpeg.

Minimal example

from omnirt import generate
from omnirt.requests import text2video

result = generate(text2video(
    model="wan2.2-t2v-14b",
    prompt="aerial view over Shanghai skyline at dusk, cinematic",
    num_frames=81,
    fps=16,
    preset="balanced",
))
print(result.artifacts[0].path)   # path to the exported MP4
omnirt generate \
  --task text2video \
  --model wan2.2-t2v-14b \
  --prompt "aerial view over Shanghai skyline at dusk, cinematic" \
  --num-frames 81 --fps 16 --preset balanced
curl -sS http://localhost:8000/v1/generate \
  -H 'Content-Type: application/json' \
  -d '{
    "task": "text2video",
    "model": "wan2.2-t2v-14b",
    "inputs": {"prompt": "aerial view over Shanghai skyline at dusk"},
    "config": {"num_frames": 81, "fps": 16, "preset": "balanced"}
  }'

Key parameters

Parameter Type Default Notes
prompt str required text prompt
num_frames int? model default typically 81 for Wan2.2, 49 for CogVideoX, 129 for Hunyuan
fps int? model default output frames per second
negative_prompt str? None negative prompt (if supported)
preset fast/balanced/quality/low-vram balanced see Presets
num_inference_steps int? preset explicit denoise override
guidance_scale float? preset CFG
width / height int? model default video models impose strict aspect / divisor constraints
seed int? random reproducibility

Supported models

  • High-quality: wan2.2-t2v-14b (24 GB+), hunyuan-video (48 GB+)
  • Mid-tier: cogvideox-2b, cogvideox-5b
  • Experimental / roadmap: see Roadmap

Full list: omnirt models --task text2video.

Common recipes

omnirt generate --task text2video --model wan2.2-t2v-14b \
  --prompt "..." --num-frames 81 --fps 16 --preset balanced
omnirt generate --task text2video --model cogvideox-2b \
  --prompt "..." --num-frames 49 --fps 8 --preset low-vram

Troubleshooting

Warning

  • MP4 encode fails — make sure imageio-ffmpeg is installed via runtime extras (pip install '.[runtime]')
  • OOM — video is the most memory-intensive task; drop num_frames first, then width/height, then preset=low-vram
  • Temporal flicker / inconsistency — raise num_inference_steps or switch to preset=quality; some models are scheduler-sensitive (see Architecture)
  • Ascend falls back to eager on video models — expected; recorded in RunReport.backend_timeline. See Ascend Backend.