Welcome to OmniRT¶
Digital-human multimodal runtime with deployable Ascend / 910B adaptation and CUDA-compatible paths.
OmniRT is a unified generation runtime for digital-human pipelines. Voice generation, audio-driven avatars, avatar assets, idle video material, and post-processing share the same GenerateRequest / GenerateResult / RunReport contract, CLI / Python API, request validation flow, and hardware backend abstraction.
General image and video models that are already integrated remain available, but the project no longer grows by model count. The main line is a deployable, reproducible, benchmarkable digital-human vertical loop. Ascend / 910B is the priority path for private deployment adaptation, while CUDA remains the mainstream development, validation, and compatibility backend.
Where you start depends on what you want to do:
🚀 Run the avatar path The shortest path from install to validating TTS / talking-avatar requests.
📘 Build an application CLI / Python API, presets, service schema, deployment guides.
OmniRT is stable with¶
- Clear digital-human line — TTS, talking avatars, avatar assets, idle video, and post-processing are the highest-priority path
- Reproducible Ascend / 910B path — runtime profiles, resident workers, real-hardware smoke tests, benchmarks, and deployment notes move together
- One request contract —
GenerateRequest/GenerateResult/RunReportcover batch generation surfaces - Backend-neutral runtime — the same request validates and runs on
ascend,cuda, andcpu-stub; CUDA stays the mainstream compatibility path - Clear task surfaces —
text2audio,audio2video, and asset / material generation share the same API shape - Standardized artifacts — images export as
PNG, audio asWAV, videos asMP4, every run ships aRunReport - Self-describing models — the registry exposes
min_vram_gb, recommended presets, etc. viaomnirt models - Offline friendly — local model directories, HF repo ids, and single-file weights are all first-class
OmniRT is flexible with¶
- Three entry points — Python API, CLI (
omnirt generate / validate / models), and FastAPI server - Focused core models — FlashTalk / FlashHead / LiveAct / CosyVoice / SenseVoice / SoulX-Podcast are the current validation line
- China-region friendly — ModelScope, HF-Mirror, offline snapshots and internal mirrors work out of the box
- Async dispatch —
queue/worker/policiesfor batched requests and multi-model queues - Pluggable telemetry —
middleware.telemetryplugs into your observability stack - Safe defaults —
--dry-runandvalidatecatch misconfigurations before you burn GPU time
Model Maintenance Boundary¶
OmniRT now maintains models in three tiers:
- Core: the digital-human path. Requires real smoke, benchmarks, and deployment docs, for example
soulx-flashtalk-14b,soulx-liveact-14b,soulx-flashhead-1.3b,cosyvoice3-triton-trtllm,sensevoice-small, andsoulx-podcast-1.7b. - Adjacent: avatar assets, backgrounds, idle video, and other digital-human production inputs, for example
sdxl-base-1.0,flux2.dev,qwen-image,svd-xt, andwan2.2-*. - Experimental: existing general image / video integrations that are no longer headline promises. They keep registry entries, basic tests, and opportunistic maintenance.
See the full registry at Supported Models, and the digital-human priority boundary at Support Status.
Public task surfaces today¶
| Task | Inputs | Output | Representative models |
|---|---|---|---|
text2image |
prompt | PNG | sdxl-base-1.0, flux2.dev, qwen-image |
image2image |
prompt + image | PNG | sdxl-base-1.0, sdxl-refiner-1.0 |
text2audio |
prompt | WAV | cosyvoice3-triton-trtllm, indextts, soulx-podcast-1.7b |
audio2text |
audio | TXT | sensevoice-small |
text2video |
prompt | MP4 | wan2.2-t2v-14b, animate-diff-sdxl |
image2video |
prompt + first-frame | MP4 | svd, svd-xt, wan2.2-i2v-14b |
audio2video |
audio + portrait | MP4 | soulx-flashtalk-14b, soulx-flashhead-1.3b, soulx-liveact-14b |
Stable boundary
inpaint, edit, and video2video have runtime plumbing in place but are still evolving as public task surfaces. See support status.
Dig deeper¶
- Roadmap — digital-human priorities and general-model contraction boundaries
- Architecture — how the interface, engine, executors, and telemetry layers fit together
- Domestic deployment — ModelScope / HF-Mirror / offline snapshots