CLI Reference¶

Complete command reference for the omnirt CLI. For task-oriented examples see CLI.

Top-level commands¶

Command	Purpose
`generate`	run one generation
`validate`	validate a request only
`models`	query the model registry
`serve`	start the HTTP server
`bench`	run a benchmark
`worker`	start a gRPC worker

Shared request arguments¶

generate, validate, and bench share the same request argument family, including:

--task / --model / --backend
--prompt / --negative-prompt
--image / --mask / --audio
--width / --height / --num-frames / --fps
--num-inference-steps / --guidance-scale / --scheduler / --preset
--model-path / --repo-path
--device-map / --devices
--quantization / --quantization-backend
--enable-layerwise-casting
--cache tea_cache or --enable-tea-cache
--config <yaml_or_json>

Supported backends currently include:

auto
cuda
ascend
cpu-stub

`generate`¶

Additional flags:

--dry-run
--json

Example:

omnirt generate \
  --task text2image \
  --model flux2.dev \
  --prompt "a cinematic city at sunrise" \
  --preset balanced \
  --json

`validate`¶

Additional flags:

--json

Example:

omnirt validate \
  --task text2video \
  --model wan2.2-t2v-14b \
  --prompt "a paper ship drifting on moonlit water"

`models`¶

Usage:

omnirt models
omnirt models sdxl-base-1.0
omnirt models --format markdown
omnirt models --json

`serve`¶

Key flags:

Flag	Purpose
`--host` / `--port`	bind address
`--backend`	default backend
`--max-concurrency`	local concurrency
`--pipeline-cache-size`	executor / pipeline cache limit
`--api-key-file`	API-key file
`--model-aliases`	OpenAI model alias map
`--redis-url`	RedisJobStore URL
`--otlp-endpoint`	OTLP/HTTP endpoint
`--remote-worker`	remote worker spec, repeatable
`--device-map` / `--devices`	default request placement config
`--batch-window-ms` / `--max-batch-size`	batching config

--remote-worker format:

worker_id=host:port@model1,model2#tag1,tag2

Example:

omnirt serve \
  --port 8000 \
  --redis-url redis://127.0.0.1:6379/0 \
  --otlp-endpoint http://127.0.0.1:4318/v1/traces \
  --remote-worker 'sdxl-a=127.0.0.1:50061@sdxl-base-1.0'

`bench`¶

Additional flags:

Flag	Purpose
`--scenario`	built-in scenario name
`--concurrency`	request concurrency
`--total`	total measured requests
`--warmup`	warmup requests
`--batch-window-ms` / `--max-batch-size`	batching config
`--output`	JSON output path
`--json`	print JSON to stdout

Current built-in scenarios:

text2image_sdxl_concurrent4

`worker`¶

Key flags:

Flag	Purpose
`--host` / `--port`	gRPC bind address
`--worker-id`	stable worker id
`--backend`	default backend
`--max-concurrency`	local execution concurrency
`--pipeline-cache-size`	executor / pipeline cache limit
`--redis-url`	optional Redis URL
`--otlp-endpoint`	optional OTLP endpoint

Environment variables¶

Variable	Purpose
`OMNIRT_LOG_LEVEL`	log level
`OMNIRT_DISABLE_COMPILE`	disable compile paths
`CUDA_VISIBLE_DEVICES`	visible CUDA devices
`ASCEND_RT_VISIBLE_DEVICES`	visible Ascend devices
`HF_ENDPOINT`	Hugging Face mirror endpoint

CLI Reference¶

Top-level commands¶

Shared request arguments¶

generate¶

validate¶

models¶

serve¶

bench¶

worker¶

Environment variables¶

Related¶

`generate`¶

`validate`¶

`models`¶

`serve`¶

`bench`¶

`worker`¶