CLI Reference¶
Complete command reference for the omnirt CLI. For task-oriented examples see CLI.
Top-level commands¶
| Command | Purpose |
|---|---|
generate |
run one generation |
validate |
validate a request only |
models |
query the model registry |
serve |
start the HTTP server |
bench |
run a benchmark |
worker |
start a gRPC worker |
Shared request arguments¶
generate, validate, and bench share the same request argument family, including:
--task/--model/--backend--prompt/--negative-prompt--image/--mask/--audio--width/--height/--num-frames/--fps--num-inference-steps/--guidance-scale/--scheduler/--preset--model-path/--repo-path--device-map/--devices--quantization/--quantization-backend--enable-layerwise-casting--cache tea_cacheor--enable-tea-cache--config <yaml_or_json>
Supported backends currently include:
autocudaascendcpu-stub
generate¶
Additional flags:
--dry-run--json
Example:
omnirt generate \
--task text2image \
--model flux2.dev \
--prompt "a cinematic city at sunrise" \
--preset balanced \
--json
validate¶
Additional flags:
--json
Example:
omnirt validate \
--task text2video \
--model wan2.2-t2v-14b \
--prompt "a paper ship drifting on moonlit water"
models¶
Usage:
serve¶
Key flags:
| Flag | Purpose |
|---|---|
--host / --port |
bind address |
--backend |
default backend |
--max-concurrency |
local concurrency |
--pipeline-cache-size |
executor / pipeline cache limit |
--api-key-file |
API-key file |
--model-aliases |
OpenAI model alias map |
--redis-url |
RedisJobStore URL |
--otlp-endpoint |
OTLP/HTTP endpoint |
--remote-worker |
remote worker spec, repeatable |
--device-map / --devices |
default request placement config |
--batch-window-ms / --max-batch-size |
batching config |
--remote-worker format:
Example:
omnirt serve \
--port 8000 \
--redis-url redis://127.0.0.1:6379/0 \
--otlp-endpoint http://127.0.0.1:4318/v1/traces \
--remote-worker 'sdxl-a=127.0.0.1:50061@sdxl-base-1.0'
bench¶
Additional flags:
| Flag | Purpose |
|---|---|
--scenario |
built-in scenario name |
--concurrency |
request concurrency |
--total |
total measured requests |
--warmup |
warmup requests |
--batch-window-ms / --max-batch-size |
batching config |
--output |
JSON output path |
--json |
print JSON to stdout |
Current built-in scenarios:
text2image_sdxl_concurrent4
worker¶
Key flags:
| Flag | Purpose |
|---|---|
--host / --port |
gRPC bind address |
--worker-id |
stable worker id |
--backend |
default backend |
--max-concurrency |
local execution concurrency |
--pipeline-cache-size |
executor / pipeline cache limit |
--redis-url |
optional Redis URL |
--otlp-endpoint |
optional OTLP endpoint |
Environment variables¶
| Variable | Purpose |
|---|---|
OMNIRT_LOG_LEVEL |
log level |
OMNIRT_DISABLE_COMPILE |
disable compile paths |
CUDA_VISIBLE_DEVICES |
visible CUDA devices |
ASCEND_RT_VISIBLE_DEVICES |
visible Ascend devices |
HF_ENDPOINT |
Hugging Face mirror endpoint |