Skip to content

Deployment

OmniRT's deployment surface splits along two axes: hardware backend and network environment.

Scenario Start here
NVIDIA GPU production CUDA deployment
Ascend Atlas / 910 / 910B Ascend backend
Domestic / intranet / offline Domestic deployment
Containerized (Docker / k8s) Docker & containers
Gateway + workers + Redis / OTLP Distributed serving

Validate before deploying

Before touching real hardware, run omnirt validate and omnirt generate --dry-run against --backend cpu-stub to confirm your request contract and model registry. See Quickstart.

If your target deployment needs async jobs, cross-process job state, Prometheus scraping, or remote workers, continue with Distributed serving.