CUDA Deployment (NVIDIA GPU)¶
OmniRT's public baseline on NVIDIA GPUs: single-card inference on Ampere and newer.
Hardware requirements¶
| Item | Requirement | Notes |
|---|---|---|
| GPU | Ampere+ | A100 / L40S / RTX 3090 / 4090; torch.compile is only stable on Ampere+ |
| VRAM | per-model resource_hint.min_vram_gb |
check exact values with omnirt models <id>: SD1.5 ≥ 8 GB, SDXL ≥ 12 GB, SVD ≥ 14 GB, Flux2 / Wan2.2 ≥ 24 GB |
| Driver | ≥ 535 (pairs with CUDA 12.1) | verify with nvidia-smi |
| CUDA Toolkit | 12.1 or 12.4 | must match the PyTorch wheel |
| PyTorch | 2.1+ official CUDA wheel | e.g. torch==2.5.1+cu121 |
Install¶
Smoke test¶
# Confirm CUDA is available
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
# Dry-run validate the request contract against cpu-stub
omnirt validate --task text2image --model sd15 --prompt "a lighthouse" --backend cpu-stub
# Run a real generation
omnirt generate --task text2image --model sd15 \
--prompt "a lighthouse in fog" --backend cuda --preset fast
Production tuning¶
torch.compile: on by default, stable on Ampere+. If compilation fails, setOMNIRT_DISABLE_COMPILE=1to skip — failures are recorded inRunReport.backend_timeline.- Device visibility:
CUDA_VISIBLE_DEVICES=0(single card) or0,1(multi-card — but note: multi-GPU parallelism, USP, and CFG sharding are not yet public features, see PLAN.md). - VRAM peak: inspect
RunReport.memory; on OOM, switch to--preset low-vramor dropwidth/height/num_frames. - Telemetry:
omnirt.middleware.telemetryemits stage timings, peak memory, and fallback events — see Telemetry. - Serving: for FastAPI deployment see HTTP Server.
Known issues¶
Warning
torch.compilecrashes on older cards —OMNIRT_DISABLE_COMPILE=1falls back to eager; each fallback is trackedflashinfer/ custom attention kernel miss — failed kernel overrides automatically fall back to eager attention; checkkernel_overrideentries inRunReport.backend_timeline- Slow Triton compile with older versions — upgrade to the
tritonversion PyTorch recommends