Model Support Roadmap¶

This document defines the recommended model support roadmap for omnirt based on the current implemented baseline and the next support waves that should follow.

It is intentionally aligned with the current open-source ecosystem, especially:

Diffusers official pipeline coverage
ComfyUI native workflow-template coverage
InvokeAI's practical model support focus

Status note:

Last reviewed: 2026-04-20
This is a recommended OmniRT roadmap, not an upstream framework commitment

Current implementation note:

OmniRT currently ships sd15, sd21, sdxl-base-1.0, sdxl-turbo, sd3-medium, sd3.5-large, sd3.5-large-turbo, svd, svd-xt, flux-dev, flux-schnell, flux2.dev / flux2-dev, glm-image, hunyuan-image-2.1, omnigen, qwen-image, sana-1.6b, ovis-image, hidream-i1, cogvideox-2b, cogvideox-5b, kandinsky5-t2v, kandinsky5-i2v, wan2.1-t2v-14b, wan2.1-i2v-14b, wan2.2-t2v-14b, wan2.2-i2v-14b, hunyuan-video, hunyuan-video-1.5-t2v, hunyuan-video-1.5-i2v, helios-t2v, helios-i2v, sana-video, ltx-video, ltx2-i2v, soulx-flashtalk-14b, and soulx-flashhead-1.3b
This means the codebase is already ahead of the older roadmap on Flux and Wan family versions
The roadmap below treats those newer Flux2 and Wan2.2 integrations as real baseline support, while keeping the still-important flux-dev, flux-schnell, and wan2.1-* compatibility targets visible where they remain strategically useful
For multi-task families, the current registry uses task-specific suffixes where needed, for example helios-t2v / helios-i2v and hunyuan-video-1.5-t2v / hunyuan-video-1.5-i2v

Current snapshot¶

The authoritative list of implemented models is generated from the live registry: Supported Models. This document focuses on priorities and outstanding work.

Highest-priority unsupported targets:

sdxl-refiner-1.0
qwen-image-edit
flux-fill
qwen-image-edit-plus
flux-kontext
chronoedit
helios
hunyuan-video-1.5

Planning principles¶

Prefer models with first-class Diffusers support.
Favor models that also appear in mainstream production-facing tools such as ComfyUI or InvokeAI.
Keep text2image and image2video as the primary compatibility targets. Audio-driven avatar generation is now also a first-class OmniRT task surface via audio2video.
Prioritize open-weight models and safe formats such as safetensors.
Avoid investing early in deprecated upstream pipelines.

Registry key convention¶

Use lowercase kebab-case IDs with the following pattern:

<family>-<variant>[-<size>|-<mode>|-<task-suffix>]

Examples:

sdxl-base-1.0
sdxl-refiner-1.0
sd3-medium
sd3.5-large
flux2.dev
flux-dev
flux-schnell
flux-fill
qwen-image
qwen-image-edit
svd
svd-xt
wan2.1-t2v-14b
wan2.1-i2v-14b
hunyuan-video-1.5
ltx2-i2v

Naming rules:

Prefer the public model-family name used by the original project.
Preserve meaningful upstream version markers such as 2.1, 3.5, or 1.5.
Use task suffixes only when the same family has multiple task-specific pipelines.
Avoid backend names in the registry key. Backend remains a runtime choice, not a model identity.
Avoid vendor prefixes when the family name is already unambiguous.

Support tiers¶

P0: must-have baseline
P1: next major compatibility targets
P2: high-value extensions
P3: watchlist / opportunistic additions

Phase roadmap¶

Phase A: Finish the baseline¶

Goal:

complete true end-to-end CUDA and Ascend validation for the current implemented baseline

Models:

sdxl-base-1.0
svd
svd-xt
flux2.dev
wan2.2-t2v-14b
wan2.2-i2v-14b

Phase B: Mainstream image compatibility¶

Goal:

cover the image models most commonly encountered across Diffusers, ComfyUI, and InvokeAI workflows, including compatibility with older but still widely-used Flux and Stable Diffusion families

Models:

sd15
sd21
sdxl-refiner-1.0
sdxl-turbo
sd3-medium
sd3.5-large
sd3.5-large-turbo
flux2.dev
flux-dev
flux-schnell
flux-fill
glm-image
hunyuan-image-2.1
omnigen
qwen-image
qwen-image-edit
sana-1.6b
ovis-image
hidream-i1

Phase C: Video-first expansion¶

Goal:

make OmniRT genuinely competitive as an open image and video runtime rather than only an SDXL + SVD + Wan wrapper

Models:

cogvideox-2b
cogvideox-5b
kandinsky5-t2v
kandinsky5-i2v
wan2.2-t2v-14b
wan2.2-i2v-14b
wan2.1-t2v-14b
wan2.1-i2v-14b
hunyuan-video
hunyuan-video-1.5
helios
sana-video
ltx-video
ltx2-i2v

Phase D: Controlled generation and editing¶

Goal:

add the higher-value editing and controlled-generation surfaces people expect from mature runtimes

Models:

flux-depth
flux-canny
flux-kontext
chronoedit
qwen-image-edit-plus
qwen-image-layered
animate-diff-sdxl

Detailed target list¶

Priority	Registry key	Task	CUDA	Ascend	Notes
P0	`sdxl-base-1.0`	text2image	required	required	current baseline
P0	`svd`	image2video	required	required	add the 14-frame variant
P0	`svd-xt`	image2video	required	required	current video baseline
P0	`flux2.dev`	text2image	required	recommended	already implemented; newer Flux generation path
P0	`wan2.2-t2v-14b`	text2video	required	watch	already implemented; strong current open video target
P0	`wan2.2-i2v-14b`	image2video	required	watch	already implemented; first-frame-guided video path
P1	`sd15`	text2image, image2image, inpaint	required	recommended	widest legacy ecosystem reach
P1	`sd21`	text2image, depth2image	recommended	optional	useful for older SD2 workflows
P1	`sdxl-refiner-1.0`	image refinement	required	recommended	completes two-stage SDXL
P1	`sdxl-turbo`	text2image	required	optional	low-latency generation
P1	`sd3-medium`	text2image	required	watch	practical SD3 entry point
P1	`sd3.5-large`	text2image	required	watch	strong modern SD family target
P1	`sd3.5-large-turbo`	text2image	required	watch	speed-oriented SD3.5 path
P1	`flux-dev`	text2image	required	recommended	major ecosystem priority
P1	`flux-schnell`	text2image	required	recommended	low-step Flux variant
P1	`flux-fill`	inpaint, outpaint	required	optional	high-value editing path
P1	`glm-image`	text2image, image2image	required	watch	strong text rendering and instruction-following image generation
P1	`hunyuan-image-2.1`	text2image	required	recommended	strong Chinese-language image generation target
P1	`omnigen`	multimodal-to-image	required	watch	unified instruction, editing, and conditional image generation path
P1	`qwen-image`	text2image	required	recommended	especially valuable for multilingual text rendering
P1	`qwen-image-edit`	image editing	required	recommended	editing path for Qwen image family
P1	`sana-1.6b`	text2image	recommended	optional	efficient high-res image generation
P1	`ovis-image`	text2image	recommended	watch	compact model with strong text rendering focus
P1	`hidream-i1`	text2image	watch	watch	newer modern image family worth tracking
P1	`cogvideox-2b`	text2video	required	watch	lower barrier video entry point
P1	`cogvideox-5b`	text2video	required	watch	stronger open video baseline
P1	`kandinsky5-t2v`	text2video	required	watch	high-quality open video family with lite and pro variants
P1	`kandinsky5-i2v`	image2video	required	watch	paired image-to-video path in the same family
P1	`wan2.1-t2v-14b`	text2video	required	watch	one of the most important current video targets
P1	`wan2.1-i2v-14b`	image2video	required	watch	especially aligned with OmniRT's video focus
P1	`hunyuan-video`	text2video	required	watch	strong open video family
P1	`hunyuan-video-1.5`	text2video, image2video	required	watch	newer family version worth tracking
P1	`helios`	text2video, image2video, video2video	required	watch	long-video and real-time generation candidate
P1	`sana-video`	text2video	recommended	watch	efficient small-model video option
P1	`ltx-video`	text2video	required	watch	attractive long-video and efficient inference path
P1	`ltx2-i2v`	image2video	required	watch	strong fit for OmniRT's video roadmap
P2	`flux-depth`	controlled text2image	required	optional	structure conditioning
P2	`flux-canny`	controlled text2image	required	optional	edge-conditioned generation
P2	`flux-kontext`	image editing	required	watch	next-generation Flux editing path
P2	`chronoedit`	physically consistent image editing	recommended	watch	video-backed image editing with temporal reasoning
P2	`qwen-image-edit-plus`	image editing	required	watch	more advanced Qwen editing
P2	`qwen-image-layered`	layered image editing	recommended	watch	useful for compositing workflows
P2	`kolors`	text2image	recommended	optional	optional multilingual image model add-on
P2	`pixart-sigma`	text2image	recommended	optional	additional DiT image family
P2	`animate-diff-sdxl`	text2video	recommended	watch	SDXL-adjacent motion support
P3	`bria-3.2`	text2image	watch	watch	monitor enterprise/commercial demand
P3	`lumina-t2x`	text2image	watch	watch	keep under observation
P3	`mochi`	text2video	watch	watch	monitor maturity and demand
P3	`skyreels-v2`	video	watch	watch	monitor maturity and demand

Capability roadmap by model family¶

Supporting a base model family usually is not enough. The following capability layers should be tracked explicitly.