Text to Image¶

Generate a PNG from a text prompt. The cheapest and most mature task to run end-to-end on OmniRT.

Minimal example¶

PythonCLIYAML + CLIHTTP

from omnirt import generate
from omnirt.requests import text2image

result = generate(text2image(
    model="sd15",
    prompt="a lighthouse in fog, cinematic, 35mm film",
    preset="fast",
))
print(result.artifacts[0].path)

omnirt generate \
  --task text2image \
  --model sd15 \
  --prompt "a lighthouse in fog, cinematic, 35mm film" \
  --preset fast \
  --backend auto

# request.yaml
task: text2image
model: flux2.dev
backend: auto
inputs:
  prompt: "a cinematic sci-fi city at sunrise"
config:
  preset: balanced
  width: 1024
  height: 1024

omnirt generate --config request.yaml --json

curl -sS http://localhost:8000/v1/generate \
  -H 'Content-Type: application/json' \
  -d '{
    "task": "text2image",
    "model": "sd15",
    "inputs": {"prompt": "a lighthouse in fog"},
    "config": {"preset": "fast"}
  }'

Key parameters¶

Parameter	Type	Default	Notes
`prompt`	`str`	required	text prompt
`negative_prompt`	`str?`	`None`	negative prompt; honored by SD / SDXL / SD3
`width` / `height`	`int?`	model default	output size; models enforce 8/16/32-multiple constraints
`preset`	`fast` / `balanced` / `quality` / `low-vram`	`balanced`	bundled steps / precision / guidance; see Presets
`num_inference_steps`	`int?`	preset	explicit denoise step override
`guidance_scale`	`float?`	preset	classifier-free guidance
`num_images_per_prompt`	`int?`	`1`	batch images per prompt
`seed`	`int?`	random	fix randomness for reproducibility
`scheduler`	`str?`	model default	see Architecture → Scheduler layer
`dtype`	`fp16` / `bf16` / `fp32`	`fp16`	compute dtype; Ascend defaults to `bf16`
`adapters`	`list[AdapterRef]?`	`[]`	LoRA / ControlNet adapters

Supported models¶

Typical quality / speed tradeoffs:

Highest quality: flux2.dev (≥ 24 GB VRAM), sdxl-base-1.0 + sdxl-refiner-1.0
Balanced: sdxl-base-1.0, sd3-medium, qwen-image
Low-resource: sd15 (12 GB OK), sd21

Full list: omnirt models or Supported Models.

Common recipes¶

Speed-firstQuality-firstLow-VRAM

omnirt generate --task text2image --model sd15 \
  --prompt "..." --preset fast --backend cuda

omnirt generate --task text2image --model flux2.dev \
  --prompt "..." --preset quality --width 1024 --height 1024 \
  --backend cuda --dtype bf16

omnirt generate --task text2image --model sd15 \
  --prompt "..." --preset low-vram --dtype fp16

Troubleshooting¶

Common issues

ValidationError: width must be multiple of 8 — most SD-family models require multiples of 8; Flux2 is stricter (16 / 32)
CUDA out of memory — switch to --preset low-vram or reduce width/height; or set OMNIRT_DISABLE_COMPILE=1 to skip torch.compile
adapter not supported for this model — check omnirt models <model_id>'s adapters field; LoRA / ControlNet compatibility is declared in ModelCapabilities

Running omnirt validate catches the first two without touching a GPU.