Getting Started¶
This guide is the shortest path from a fresh checkout to a validated OmniRT request and a local docs preview.
Environment requirements¶
| Item | Minimum | Notes |
|---|---|---|
| Python | 3.10+ | 3.11 / 3.12 recommended, matching the python-version: "3.11" used in CI |
| OS | Linux x86_64 | macOS / Windows are fine for development plus cpu-stub; real inference runs on Linux |
| Host RAM | ≥ 16 GB | Model memory lives on the GPU / NPU |
OmniRT itself only declares torch>=2.1 + diffusers + safetensors (see the runtime extras in pyproject.toml). CUDA and Ascend wheels and kernel-level drivers are something you install yourself following the tables below — pip install '.[runtime,dev]' will not pull in a CUDA toolkit or CANN for you.
CPU stub (local development, request validation, CI)¶
Enough to run omnirt validate / --dry-run / pytest tests/unit tests/parity; no real inference:
--backend cpu-stub already covers most unit-level code paths without a GPU or NPU.
CUDA environment (NVIDIA GPU)¶
| Item | Requirement |
|---|---|
| GPU | NVIDIA Ampere or newer (A100 / L40S / RTX 3090 / 4090, etc.) |
| VRAM | Follow each model's resource_hint.min_vram_gb — SDXL ≥ 12 GB, SVD ≥ 14 GB, Flux2 ≥ 24 GB (omnirt models <id> prints the exact value) |
| NVIDIA driver | ≥ 535 (matches CUDA 12.1) |
| CUDA Toolkit | 12.1 or 12.4 |
| PyTorch | 2.1+, prefer the official CUDA wheel, e.g. torch==2.5.1+cu121 |
Recommended install order:
# 1. Install a CUDA-matched PyTorch wheel from pytorch.org
python -m pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu121
# 2. Install OmniRT plus the rest of the runtime extras
python -m pip install -e '.[runtime,dev]'
# 3. Smoke test
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
omnirt generate --task text2image --model sd15 --prompt "a lighthouse" --backend cuda --preset fast
Tips:
torch.compileis only stable on Ampere+; on older cards setOMNIRT_DISABLE_COMPILE=1to skip compile and stay on eager.- Multi-GPU parallelism, USP, and CFG sharding are not public capabilities today (see PLAN.md).
- Compile and memory events are captured in
RunReport.backend_timeline.
Ascend environment (Huawei Atlas / 910 / 910B)¶
| Item | Requirement |
|---|---|
| Device | Atlas 300I Pro / 800I / 800T / 910 / 910B family |
| CANN | 8.0.RC2 or newer, aligned with the machine's driver/firmware bundle |
| torch_npu | Wheel matching the CANN version; torch==2.1.0 + torch_npu==2.1.0.post6 is the currently validated combination |
| Driver / firmware | Shipped by the Ascend-hdk-* package; the major version must match CANN |
| Setup script | source the set_env.sh that ships with Ascend-toolkit-* before launch |
Recommended install order:
# 0. Make sure CANN is already installed on the host (ops usually does this)
source /usr/local/Ascend/ascend-toolkit/set_env.sh
# 1. Install the matching torch + torch_npu wheels
python -m pip install torch==2.1.0 torchvision==0.16.0
python -m pip install torch_npu==2.1.0.post6 -f https://download.pytorch.org/whl/torch_stable.html
# 2. OmniRT plus the rest of the runtime extras
python -m pip install -e '.[runtime,dev]'
# 3. Smoke test
python -c "import torch, torch_npu; print(torch_npu.npu.is_available(), torch.npu.device_count())"
omnirt generate --task text2image --model sd15 --prompt "a lighthouse" --backend ascend --preset fast
Notes:
torch_npugraph mode still fails on some ops; OmniRT automatically falls back to eager and records the attempt inRunReport.backend_timeline(see Architecture).- Deeper Ascend-specific guidance, known limitations, and smoke entrypoints live in Ascend Backend.
- Reaching
huggingface.cois usually not an option on internal Ascend hosts; the full mirror / offline-snapshot workflow is in Domestic Deployment. - Ascend device visibility is controlled by
ASCEND_RT_VISIBLE_DEVICES(the Ascend analog ofCUDA_VISIBLE_DEVICES).
Install¶
For general development:
To run real model pipelines, install runtime dependencies too (after the CUDA / Ascend wheels above):
To work on the documentation site:
If you want one environment for code, runtime, and docs:
Inspect the CLI¶
omnirt models is the quickest way to inspect the live registry without reading source code.
Validate the first request¶
Start with a dry validation pass before using accelerator hardware:
omnirt validate \
--task text2image \
--model qwen-image \
--prompt "a poster with a bold title" \
--backend cpu-stub
You can validate a config file too:
Run the first generation¶
omnirt generate \
--task text2image \
--model sd15 \
--prompt "a lighthouse in fog" \
--backend cuda \
--preset fast
Example YAML request:
task: text2image
model: flux2.dev
backend: auto
inputs:
prompt: "a cinematic sci-fi city at sunrise"
config:
preset: balanced
width: 1024
height: 1024
Run it with:
Run tests¶
Fast local coverage:
Error-path integration coverage:
Hardware-backed CUDA and Ascend smoke tests are available in CI and skip locally unless the expected runtime packages and model directories are present.
Preview the docs site¶
Build the static site with strict link checking:
The GitHub Pages deployment guide lives in Publishing Docs.