Distributed Serving¶
When a single-process omnirt serve is no longer enough, split OmniRT into one HTTP gateway plus one or more gRPC workers, then add Redis and OTLP as needed.
Topologies¶
| Topology | Best for | Key components |
|---|---|---|
| Single-process server | development, local validation, light internal services | omnirt serve |
| Gateway + remote workers | separate heavy inference from HTTP ingress | omnirt serve --remote-worker ... + omnirt worker |
| Gateway + Redis + OTLP | async jobs, cross-process state, external observability | --redis-url + --otlp-endpoint |
Minimal distributed example¶
Start one worker:
Then start the gateway:
omnirt serve \
--host 0.0.0.0 \
--port 8000 \
--backend auto \
--remote-worker 'sdxl-a=127.0.0.1:50061@sdxl-base-1.0,sdxl-refiner-1.0'
--remote-worker uses this format:
@model1,model2declares which models the worker should prefer#tag1,tag2is an optional routing tag setserveprobes worker health before startup and fails fast if the target is unreachable
Adding Redis and OTLP¶
omnirt serve \
--host 0.0.0.0 \
--port 8000 \
--redis-url redis://127.0.0.1:6379/0 \
--otlp-endpoint http://127.0.0.1:4318/v1/traces \
--remote-worker 'sdxl-a=10.0.0.21:50061@sdxl-base-1.0'
omnirt worker \
--host 0.0.0.0 \
--port 50061 \
--worker-id sdxl-a \
--backend cuda \
--redis-url redis://127.0.0.1:6379/0 \
--otlp-endpoint http://127.0.0.1:4318/v1/traces
Recommended conventions:
- use the same Redis deployment for the gateway and workers so job state and event streams stay consistent
- export both gateway and worker traces to the same OTLP endpoint so
worker_idappears in the full trace view - keep gRPC on a private network, service mesh, or reverse proxy; the current transport uses plain
grpc.insecure_channel
Validation checklist¶
After startup, these probes quickly tell you whether the deployment is wired correctly:
Expected signals:
/readyzreturnsjob_store_backendandremote_worker_count/metricsincludesomnirt_jobs_total,omnirt_stage_duration_seconds, andomnirt_queue_depth- async jobs can be observed through
/v1/jobs/{id},/v1/jobs/{id}/events, and/v1/jobs/{id}/stream - with OTLP enabled,
/v1/jobs/{id}/tracereturns the trace view for the same job
Recommended rollout order¶
- Start with single-process
omnirt serve - Add Redis and stabilize async jobs plus event streaming
- Add one remote worker and verify routing plus
/readyz - Finally add Prometheus scraping and OTLP export
Known boundaries¶
- the current worker transport is a minimal unary gRPC transport, not a full multi-tenant RPC framework
- backend support is intentionally focused on
CUDA / Ascend / cpu-stub; there is noROCm / XPUsupport plan - real multi-host load testing and GPU/NPU baselines should still be run in your target environment