Support Status¶
This document tracks OmniRT's digital-human model priorities, real-hardware smoke coverage, and the general models that are being contracted into the experimental tier.
Last updated: 2026-06-15
Current public task surfaces¶
text2imageimage2imagetext2audioaudio2texttext2videoimage2videoaudio2video
Model Maintenance Tiers¶
The full list is generated from the live registry: Supported Models. This page is no longer organized by model count; it is organized by digital-human maintenance priority:
| Tier | Maintenance promise | Current models |
|---|---|---|
| Core | Digital-human main path; requires registry, unit tests, real-hardware smoke, benchmark, and deployment docs | soulx-flashtalk-14b, soulx-liveact-14b, soulx-flashhead-1.3b, cosyvoice3-triton-trtllm, sensevoice-small, soulx-podcast-1.7b |
| Adjacent | Avatar assets, backgrounds, idle video material, and post-processing; smoke tests are added by digital-human scenario | sdxl-base-1.0, svd-xt, flux2.dev, qwen-image, wan2.2-* |
| Experimental | Integrated, but no longer a main investment line; keeps registry and basic tests, without a dual-backend smoke promise | kolors, pixart-sigma, bria-3.2, lumina-t2x, mochi, skyreels-v2, and similar general models |
Existing general image / video integrations are not being removed immediately, but README, roadmap, CI, and benchmarks should prioritize Core and Adjacent tiers.
Real hardware smoke completed¶
The following models have completed real hardware smoke tests using local model directories:
sdxl-base-1.0CUDA:validatedAscend:validatedsvd-xtCUDA:validatedAscend:validatedsoulx-flashtalk-14bAscend:validatedNotes:persistent_workeron 8-cardAscend 910B2has completed real-hardware validation.soulx-liveact-14bAscend:validatedNotes: the external SoulX-LiveActgenerate.pypath has been aligned to the 4-cardAscend 910Bofficial case; OmniRT now exposes it through thepersistent_workerexecution surface while retaining the script-backed generation path inside the worker. By default it prepares text context on one NPU before the 4-card inference job. Use--text-cache-visible-devices <single-card> --visible-devices <four-cards> --sample-steps 1for quick smoke.soulx-flashhead-1.3bAscend:validatedNotes: the external SoulX-FlashHead checkout has completed 910B NPU adaptation and quality-profile validation; OmniRT now exposes it through thepersistent_workerexecution surface while retaining the script-backed generation path inside the worker, with2-step + 2D VAE split + latent_carry offdefaults. Historical real-hardware OmniRT cold-start benchmark: 2 NPU82.96s, 4 NPU84.08s, both producing512x512 / 10s / 250 frames.cosyvoice3-triton-trtllmCUDA:validatedAscend:wrapper-readyNotes: the officialruntime/triton_trtllmservice has completed real CUDA benchmark runs. The stable profile istoken2wav=2,vocoder=2, andkv_cache_free_gpu_memory_fraction=0.2. The OmniRT wrapper generated a real2.92s / 24kHzwav withdenoise_loop_ms=1969.611; the official 26-sample streaming benchmark measuredRTF=0.1303and699.13msaverage first-chunk latency. The Ascend path is service-endpoint adaptation:--backend ascendrecordsservice_accelerator=ascend, but an external Triton-compatible service must already be deployed on NPU.sensevoice-smallAscend:runtime-readyNotes: theaudio2texttask surface, registry entry, CLI/Python API, and unit tests are integrated. With--backend ascend,device=autoresolves to FunASRnpu:0, and a skippable Ascend smoke is available. Real generation still depends on FunASR,torch_npu, and a local audio fixture.indexttsAscend:runtime-readyNotes: the residentserve-text2audioruntime supportsOMNIRT_INDEXTTS_DEVICE=ascend|npu|npu:0, defaults NPU to fp16, checkstorch_npubefore loading, and disables CUDA-kernel mode on NPU; Ascend env / load smoke coverage is available.soulx-podcast-1.7bAscend:wrapper-readyNotes: the OmniRT FastAPI wrapper can target an external Ascend-hosted SoulX-Podcast service with--backend ascendandservice_accelerator=ascend; actual NPU inference support remains the responsibility of that external service.
Adjacent: Smoke by Digital-Human Scenario¶
These models already have registry entries, request-surface integration, and local unit coverage, but future investment depends on whether they serve the digital-human product path:
sdxl-refiner-1.0flux-fillflux-kontextqwen-image-editqwen-image-edit-plusqwen-image-layeredanimate-diff-sdxl
Some smoke tests already exist. The next validation criterion is not model popularity; it is whether the model helps avatar assets, backgrounds, controlled edits, idle material, or digital-human video post-processing:
tests/integration/test_sdxl_refiner_cuda.pytests/integration/test_sdxl_refiner_ascend.pytests/integration/test_flux_fill_cuda.pytests/integration/test_flux_fill_ascend.pytests/integration/test_image_edit_cuda.pytests/integration/test_image_edit_ascend.py
Experimental: Contract General-Model Investment¶
The following models keep registry entries, generated docs, and basic unit coverage, but are no longer primary smoke / benchmark targets unless a concrete digital-human use case appears:
kolorspixart-sigmabria-3.2lumina-t2xmochiskyreels-v2- Other models that only serve general image / video generation
Partial support¶
heliosCurrently exposed as two registry keys:helios-t2vandhelios-i2v.hunyuan-video-1.5Currently exposed as two registry keys:hunyuan-video-1.5-t2vandhunyuan-video-1.5-i2v.
Digital-Human Targets Not Completed Yet¶
- ASR / speech understanding:
sensevoice-smallis the first integrated entrypoint and now has Ascend NPU device resolution; Whisper and Paraformer remain follow-up candidates - TTS and voice reuse: external Ascend service implementations for CosyVoice / SoulX-Podcast, CosyVoice profile caching, stable seed behavior, streaming first-chunk metrics
- Realtime avatars: resident workers, restart behavior, and hot-path benchmarks for FlashTalk / FlashHead / LiveAct
- Post-processing: GFPGAN / CodeFormer / Real-ESRGAN / RIFE / matting for digital-human enhancement