CLI¶
OmniRT currently exposes six top-level CLI commands:
omnirt generateomnirt validateomnirt modelsomnirt serveomnirt benchomnirt worker
For the full flag matrix see CLI Reference. This page focuses on task-oriented examples.
Request shape¶
The CLI shares the same request shape as GenerateRequest:
task: text2image
model: flux2.dev
backend: auto
inputs:
prompt: "a cinematic sci-fi city at sunrise"
config:
preset: balanced
width: 1024
height: 1024
Rule of thumb:
inputscontains semantic inputs such asprompt,image,mask, andaudioconfigcontains execution settings such aspreset,scheduler,device_map, andquantization
omnirt generate¶
Run one request directly:
omnirt generate \
--task text2image \
--model sdxl-base-1.0 \
--prompt "a lighthouse in fog" \
--preset fast
Run from a file:
Validate and resolve defaults without actually running inference:
omnirt validate¶
omnirt validate \
--task text2image \
--model flux2.dev \
--prompt "a poster with bold typography" \
--backend cpu-stub
Useful before touching real hardware:
- confirm task and model compatibility
- confirm required inputs are present
- confirm resolved backend and config defaults
omnirt models¶
omnirt serve¶
omnirt serve \
--host 0.0.0.0 \
--port 8000 \
--backend auto \
--redis-url redis://127.0.0.1:6379/0 \
--otlp-endpoint http://127.0.0.1:4318/v1/traces
To push a default placement policy into all service requests:
omnirt serve --protocol flashtalk-ws¶
Start the FlashTalk-compatible WebSocket service for OpenTalking-style realtime avatar clients:
omnirt serve \
--protocol flashtalk-ws \
--host 0.0.0.0 \
--port 8765 \
--repo-path .omnirt/model-repos/SoulX-FlashTalk \
--server-path model_backends/flashtalk/flashtalk_ws_server.py \
--ckpt-dir models/SoulX-FlashTalk-14B \
--wav2vec-dir models/chinese-wav2vec2-base
If the model environment does not have full OmniRT dependencies installed, install the FlashTalk runtime first and prefer the helper script lightweight entrypoint:
python -m omnirt.cli.main runtime install flashtalk --device ascend
bash scripts/start_flashtalk_ws.sh
See FlashTalk-compatible WebSocket for the full configuration.
omnirt worker¶
Pair it with serve:
omnirt bench¶
Built-in scenario:
omnirt bench \
--scenario text2image_sdxl_concurrent4 \
--total 100 \
--warmup 2 \
--output bench.json
Custom request:
omnirt bench \
--task text2image \
--model sdxl-base-1.0 \
--prompt "a cinematic portrait under neon rain" \
--concurrency 4 \
--total 20 \
--batch-window-ms 50 \
--max-batch-size 4
Legacy / runtime optimization example¶
omnirt generate \
--task text2image \
--model sd15 \
--prompt "a lighthouse" \
--enable-layerwise-casting \
--quantization int8 \
--quantization-backend torchao \
--enable-tea-cache
To confirm whether these knobs are actually taking effect, inspect RunReport and /metrics. See Legacy Optimization Guide.