Skip to content

Support Status

This document tracks the models already integrated into omnirt, the ones that have completed real hardware smoke tests, and the high-priority targets that are still pending.

Last updated: 2026-04-28

Current public task surfaces

  • text2image
  • image2image
  • text2audio
  • text2video
  • image2video
  • audio2video

Integrated models

The full list is generated from the live registry: Supported Models. This page only tracks real-hardware smoke status and partial-support notes.

Real hardware smoke completed

The following models have completed real hardware smoke tests using local model directories:

  • sdxl-base-1.0 CUDA: validated Ascend: validated
  • svd-xt CUDA: validated Ascend: validated
  • soulx-flashtalk-14b Ascend: validated Notes: persistent_worker on 8-card Ascend 910B2 has completed real-hardware validation.
  • soulx-liveact-14b Ascend: validated Notes: the external SoulX-LiveAct generate.py path has been aligned to the 4-card Ascend 910B official case; OmniRT exposes it through a script-backed wrapper. By default it prepares text context on one NPU before the 4-card inference job. Use --text-cache-visible-devices <single-card> --visible-devices <four-cards> --sample-steps 1 for quick smoke.
  • soulx-flashhead-1.3b Ascend: validated Notes: the external SoulX-FlashHead checkout has completed 910B NPU adaptation and quality-profile validation; OmniRT currently exposes it through a script-backed cold-start wrapper with 2-step + 2D VAE split + latent_carry off defaults. Real-hardware OmniRT cold-start benchmark: 2 NPU 82.96s, 4 NPU 84.08s, both producing 512x512 / 10s / 250 frames.
  • cosyvoice3-triton-trtllm CUDA: validated Notes: the official runtime/triton_trtllm service has completed real benchmark runs. The stable profile is token2wav=2, vocoder=2, and kv_cache_free_gpu_memory_fraction=0.2. The OmniRT wrapper generated a real 2.92s / 24kHz wav with denoise_loop_ms=1969.611; the official 26-sample streaming benchmark measured RTF=0.1303 and 699.13ms average first-chunk latency. Client-side seed is forwarded, but the server-side BLS still needs to consume that parameter for fully deterministic sampling.

Integrated but still waiting for real hardware smoke

These models already have registry entries, request-surface integration, and local unit coverage, but they do not yet have repository-tracked local model directories plus verified dual-backend smoke results:

  • sdxl-refiner-1.0
  • flux-fill
  • flux-kontext
  • qwen-image-edit
  • qwen-image-edit-plus
  • qwen-image-layered
  • animate-diff-sdxl
  • kolors
  • pixart-sigma
  • bria-3.2
  • lumina-t2x
  • mochi
  • skyreels-v2

Relevant smoke tests already exist. For the now-public image2image surface, the recommended starting models are sdxl-base-1.0, sdxl-refiner-1.0, sd15, and sd21:

  • tests/integration/test_sdxl_refiner_cuda.py
  • tests/integration/test_sdxl_refiner_ascend.py
  • tests/integration/test_flux_fill_cuda.py
  • tests/integration/test_flux_fill_ascend.py
  • tests/integration/test_image_edit_cuda.py
  • tests/integration/test_image_edit_ascend.py

Partial support

  • helios Currently exposed as two registry keys: helios-t2v and helios-i2v.
  • hunyuan-video-1.5 Currently exposed as two registry keys: hunyuan-video-1.5-t2v and hunyuan-video-1.5-i2v.

High-priority targets not completed yet

  • flux-depth
  • flux-canny
  • chronoedit