Support Status¶
This document tracks the models already integrated into omnirt, the ones that have completed real hardware smoke tests, and the high-priority targets that are still pending.
Last updated: 2026-04-28
Current public task surfaces¶
text2imageimage2imagetext2audiotext2videoimage2videoaudio2video
Integrated models¶
The full list is generated from the live registry: Supported Models. This page only tracks real-hardware smoke status and partial-support notes.
Real hardware smoke completed¶
The following models have completed real hardware smoke tests using local model directories:
sdxl-base-1.0CUDA:validatedAscend:validatedsvd-xtCUDA:validatedAscend:validatedsoulx-flashtalk-14bAscend:validatedNotes:persistent_workeron 8-cardAscend 910B2has completed real-hardware validation.soulx-liveact-14bAscend:validatedNotes: the external SoulX-LiveActgenerate.pypath has been aligned to the 4-cardAscend 910Bofficial case; OmniRT exposes it through a script-backed wrapper. By default it prepares text context on one NPU before the 4-card inference job. Use--text-cache-visible-devices <single-card> --visible-devices <four-cards> --sample-steps 1for quick smoke.soulx-flashhead-1.3bAscend:validatedNotes: the external SoulX-FlashHead checkout has completed 910B NPU adaptation and quality-profile validation; OmniRT currently exposes it through a script-backed cold-start wrapper with2-step + 2D VAE split + latent_carry offdefaults. Real-hardware OmniRT cold-start benchmark: 2 NPU82.96s, 4 NPU84.08s, both producing512x512 / 10s / 250 frames.cosyvoice3-triton-trtllmCUDA:validatedNotes: the officialruntime/triton_trtllmservice has completed real benchmark runs. The stable profile istoken2wav=2,vocoder=2, andkv_cache_free_gpu_memory_fraction=0.2. The OmniRT wrapper generated a real2.92s / 24kHzwav withdenoise_loop_ms=1969.611; the official 26-sample streaming benchmark measuredRTF=0.1303and699.13msaverage first-chunk latency. Client-sideseedis forwarded, but the server-side BLS still needs to consume that parameter for fully deterministic sampling.
Integrated but still waiting for real hardware smoke¶
These models already have registry entries, request-surface integration, and local unit coverage, but they do not yet have repository-tracked local model directories plus verified dual-backend smoke results:
sdxl-refiner-1.0flux-fillflux-kontextqwen-image-editqwen-image-edit-plusqwen-image-layeredanimate-diff-sdxlkolorspixart-sigmabria-3.2lumina-t2xmochiskyreels-v2
Relevant smoke tests already exist. For the now-public image2image surface, the recommended starting models are sdxl-base-1.0, sdxl-refiner-1.0, sd15, and sd21:
tests/integration/test_sdxl_refiner_cuda.pytests/integration/test_sdxl_refiner_ascend.pytests/integration/test_flux_fill_cuda.pytests/integration/test_flux_fill_ascend.pytests/integration/test_image_edit_cuda.pytests/integration/test_image_edit_ascend.py
Partial support¶
heliosCurrently exposed as two registry keys:helios-t2vandhelios-i2v.hunyuan-video-1.5Currently exposed as two registry keys:hunyuan-video-1.5-t2vandhunyuan-video-1.5-i2v.
High-priority targets not completed yet¶
flux-depthflux-cannychronoedit