Dispatch & Queue¶

OmniRT's async execution core lives in OmniEngine. It takes validated GenerateRequest objects and routes them through the queue, batcher, executors, and optional remote workers.

Core components¶

Component	Role
`OmniEngine`	unified sync and async execution entry
`JobQueue`	queueing and priority management
`RequestBatcher`	merges compatible requests inside a short time window
`InMemoryJobStore` / `RedisJobStore`	persists job state and events
`Controller`	decides between local execution and remote workers
`GrpcWorkerClient` / `GrpcWorkerServer`	current remote-worker transport

Server-side entry point¶

Async submission does not go through POST /v1/jobs. It goes through:

POST /v1/generate

with:

{
  "task": "text2image",
  "model": "sdxl-base-1.0",
  "inputs": {"prompt": "a lighthouse"},
  "config": {},
  "async_run": true
}

The response includes job_id, and you can then observe the job through:

Route	Purpose
`GET /v1/jobs/{job_id}`	fetch job state and result
`DELETE /v1/jobs/{job_id}`	cancel a job
`GET /v1/jobs/{job_id}/events`	SSE event stream
`WS /v1/jobs/{job_id}/stream`	WebSocket event stream plus cancel control
`GET /v1/jobs/{job_id}/trace`	per-job trace view

POST /v1/jobs is currently reserved; clients should use POST /v1/generate directly.

When batching applies¶

Batching is intentionally narrow today. It only applies when all of these are true:

execution_mode="modular"
task="text2image"
prompt is a non-empty string
there is no image, mask, or audio
num_images_per_prompt=1

In other words, batching currently focuses on concurrent text-to-image throughput for the same model, not every task surface.

Direct Python usage¶

from omnirt.engine import OmniEngine
from omnirt.requests import text2image

engine = OmniEngine(max_concurrency=2, batch_window_ms=50, max_batch_size=4)
job = engine.submit(text2image(model="sdxl-base-1.0", prompt="a lighthouse"))
resolved = engine.wait(job.id)
print(resolved.result.metadata.batch_size)

Distributed execution¶

When serve is configured with --remote-worker, requests go through Controller and may be forwarded to a gRPC worker:

omnirt worker --host 0.0.0.0 --port 50061 --worker-id sdxl-a
omnirt serve --remote-worker 'sdxl-a=127.0.0.1:50061@sdxl-base-1.0'

When --redis-url is also enabled, job state and event streams are stored in RedisJobStore so they can be observed across processes.

Tuning guidance¶

low latency: keep batch_window_ms=0 to disable batching
same-model throughput: raise batch_window_ms and max_batch_size gradually
memory-sensitive hosts: reduce pipeline_cache_size and max_concurrency
remote workers: start with one worker and confirm remote_worker_count from /readyz