Ascend Backend Deployment¶

OmniRT natively supports Huawei's Ascend Atlas / 910 / 910B series. The Ascend backend shares the same external contract as CUDA (GenerateRequest / GenerateResult / RunReport), but its compile path is more conservative — failures fall back to eager automatically.

Hardware and system requirements¶

Item	Requirement
Device	Atlas 300I Pro / 800I / 800T / 910 / 910B
CANN	8.0.RC2+, matched with driver / firmware
torch_npu	version-matched with CANN; `torch==2.1.0` + `torch_npu==2.1.0.post6` is the currently validated combo
Driver / firmware	installed via `Ascend-hdk-*` packages; must share a major version with CANN
System tools	`source` the `set_env.sh` from `Ascend-toolkit-*` before launch
Python	3.10+; CI currently uses 3.11

Install¶

pipOffline wheels

# 0. CANN should already be on the machine (usually preinstalled by ops)
source /usr/local/Ascend/ascend-toolkit/set_env.sh

# 1. Install the matching torch + torch_npu
python -m pip install torch==2.1.0 torchvision==0.16.0
python -m pip install torch_npu==2.1.0.post6

# 2. Install OmniRT plus runtime extras
python -m pip install -e '.[runtime,dev]'

# 3. Smoke test
python -c "import torch, torch_npu; print(torch_npu.npu.is_available(), torch.npu.device_count())"
omnirt generate --task text2image --model sd15 \
  --prompt "a lighthouse" --backend ascend --preset fast

# Air-gapped: download on a connected box, copy to the target host
python -m pip download torch==2.1.0 torchvision==0.16.0 \
  torch_npu==2.1.0.post6 -d ./wheels
# On the target host:
python -m pip install --no-index --find-links ./wheels \
  torch torchvision torch_npu
python -m pip install -e '.[runtime,dev]'

Execution model¶

Backend name: ascend
Device name: npu
Compile attempt: BackendRuntime.wrap_module(...) tries torch_npu.npu.graph_mode() first
Fallback: if graph mode init fails or a module isn't compilable, a backend_timeline entry is recorded and the eager module is kept
Memory management: torch_npu.empty_cache() fires at the end of each pipeline stage

Device visibility¶

ASCEND_RT_VISIBLE_DEVICES=0 omnirt generate ...      # single card
ASCEND_RT_VISIBLE_DEVICES=0,1 omnirt generate ...    # multi-card (public API still uses the first; multi-NPU parallelism is not a public feature yet)

ASCEND_RT_VISIBLE_DEVICES is the Ascend analog of CUDA_VISIBLE_DEVICES.

Validated models¶

The table below reflects the most recent Ascend smoke coverage. The source of truth is Support Status.

Model	Task	CANN	Notes
`sd15`	`text2image`	8.0.RC2	stable
`sdxl-base-1.0`	`text2image`	8.0.RC2	stable
`svd-xt`	`image2video`	8.0.RC2	some ops fall back to eager
`wan2.2-t2v-14b`	`text2video`	8.0.RC2+	initial validation; `preset=balanced` recommended

Smoke testing¶

The repo includes Ascend smoke tests. They run only when:

torch_npu is installed
diffusers runtime deps are installed (pip install '.[runtime]')
model sources are supplied via OMNIRT_SDXL_MODEL_SOURCE and OMNIRT_SVD_MODEL_SOURCE
execution happens on an Ascend-capable host

If any prerequisite is missing, the tests skip instead of failing noisily.

# Trigger Ascend smoke locally (when prerequisites are satisfied)
pytest tests/integration/test_ascend_smoke.py -q

Known issues¶

Warning

RuntimeError: graph mode init failed — the current CANN version lacks support for a specific op. OmniRT has already fallen back to eager; the entry in RunReport.backend_timeline tells you which op. No action required, but worth confirming it matches your expectation.
Memory not released — when a pipeline is reused across requests (e.g. FastAPI service), Ascend's empty_cache does not immediately return memory to the OS. Force release with max_concurrency=1 + pipeline_cache_size=1; see HTTP Server.
torch==2.1.0 conflicts with the latest diffusers — pin diffusers==0.37.x (already declared in runtime extras).
Precision — Ascend defaults to bf16, which is less numerically stable than CUDA for some models (FlashTalk, Flux2). Force --dtype fp16 or --dtype fp32 when you see artifacts.
Unable to fetch models from HuggingFace in China — see Domestic Deployment for ModelScope / HF-Mirror / offline snapshot workflows.

CUDA Deployment — contrast between the two backends
Domestic Deployment — mirrors and offline workflow
Docker Deployment — Ascend image template
Architecture — backend layer details, backend_timeline fields