Ascend Backend Deployment¶
OmniRT natively supports Huawei's Ascend Atlas / 910 / 910B series. The Ascend backend shares the same external contract as CUDA (GenerateRequest / GenerateResult / RunReport), but its compile path is more conservative — failures fall back to eager automatically.
Hardware and system requirements¶
| Item | Requirement |
|---|---|
| Device | Atlas 300I Pro / 800I / 800T / 910 / 910B |
| CANN | 8.0.RC2+, matched with driver / firmware |
| torch_npu | version-matched with CANN; torch==2.1.0 + torch_npu==2.1.0.post6 is the currently validated combo |
| Driver / firmware | installed via Ascend-hdk-* packages; must share a major version with CANN |
| System tools | source the set_env.sh from Ascend-toolkit-* before launch |
| Python | 3.10+; CI currently uses 3.11 |
Install¶
# 0. CANN should already be on the machine (usually preinstalled by ops)
source /usr/local/Ascend/ascend-toolkit/set_env.sh
# 1. Install the matching torch + torch_npu
python -m pip install torch==2.1.0 torchvision==0.16.0
python -m pip install torch_npu==2.1.0.post6
# 2. Install OmniRT plus runtime extras
python -m pip install -e '.[runtime,dev]'
# 3. Smoke test
python -c "import torch, torch_npu; print(torch_npu.npu.is_available(), torch.npu.device_count())"
omnirt generate --task text2image --model sd15 \
--prompt "a lighthouse" --backend ascend --preset fast
# Air-gapped: download on a connected box, copy to the target host
python -m pip download torch==2.1.0 torchvision==0.16.0 \
torch_npu==2.1.0.post6 -d ./wheels
# On the target host:
python -m pip install --no-index --find-links ./wheels \
torch torchvision torch_npu
python -m pip install -e '.[runtime,dev]'
Execution model¶
- Backend name:
ascend - Device name:
npu - Compile attempt:
BackendRuntime.wrap_module(...)triestorch_npu.npu.graph_mode()first - Fallback: if graph mode init fails or a module isn't compilable, a
backend_timelineentry is recorded and the eager module is kept - Memory management:
torch_npu.empty_cache()fires at the end of each pipeline stage
Device visibility¶
ASCEND_RT_VISIBLE_DEVICES=0 omnirt generate ... # single card
ASCEND_RT_VISIBLE_DEVICES=0,1 omnirt generate ... # multi-card (public API still uses the first; multi-NPU parallelism is not a public feature yet)
ASCEND_RT_VISIBLE_DEVICES is the Ascend analog of CUDA_VISIBLE_DEVICES.
Validated models¶
The table below reflects the most recent Ascend smoke coverage. The source of truth is Support Status.
| Model | Task | CANN | Notes |
|---|---|---|---|
sd15 |
text2image |
8.0.RC2 | stable |
sdxl-base-1.0 |
text2image |
8.0.RC2 | stable |
svd-xt |
image2video |
8.0.RC2 | some ops fall back to eager |
wan2.2-t2v-14b |
text2video |
8.0.RC2+ | initial validation; preset=balanced recommended |
Smoke testing¶
The repo includes Ascend smoke tests. They run only when:
torch_npuis installed- diffusers runtime deps are installed (
pip install '.[runtime]') - model sources are supplied via
OMNIRT_SDXL_MODEL_SOURCEandOMNIRT_SVD_MODEL_SOURCE - execution happens on an Ascend-capable host
If any prerequisite is missing, the tests skip instead of failing noisily.
# Trigger Ascend smoke locally (when prerequisites are satisfied)
pytest tests/integration/test_ascend_smoke.py -q
Known issues¶
Warning
RuntimeError: graph mode init failed— the current CANN version lacks support for a specific op. OmniRT has already fallen back to eager; the entry inRunReport.backend_timelinetells you which op. No action required, but worth confirming it matches your expectation.- Memory not released — when a pipeline is reused across requests (e.g. FastAPI service), Ascend's
empty_cachedoes not immediately return memory to the OS. Force release withmax_concurrency=1+pipeline_cache_size=1; see HTTP Server. torch==2.1.0conflicts with the latest diffusers — pindiffusers==0.37.x(already declared in runtime extras).- Precision — Ascend defaults to
bf16, which is less numerically stable than CUDA for some models (FlashTalk, Flux2). Force--dtype fp16or--dtype fp32when you see artifacts. - Unable to fetch models from HuggingFace in China — see Domestic Deployment for ModelScope / HF-Mirror / offline snapshot workflows.
Related¶
- CUDA Deployment — contrast between the two backends
- Domestic Deployment — mirrors and offline workflow
- Docker Deployment — Ascend image template
- Architecture — backend layer details,
backend_timelinefields