Skip to content

Developer Guide

For developers contributing to OmniRT — adding models, adding backends, or understanding how the runtime fits together.

  • Contributing — dev setup, tests, PR workflow, documentation conventions
  • Architecture — how the interface layer, engine, executors, middleware, observability, and distributed extensions fit together
  • Legacy Optimization Guide — offload, layout, quantization, and TeaCache knobs for legacy_call families
  • Benchmark Baseline — bench scenarios, JSON metrics, and release acceptance guidance
  • FlashTalk Resident Benchmark — first real-hardware resident benchmark on Ascend 910B2 x8
  • FlashHead Benchmark — first real-hardware result for soulx-flashhead-1.3b through OmniRT's subprocess wrapper
  • Model onboarding — how to register a new model family and pass validation
  • Backend onboarding — how to implement BackendRuntime and wire in a new hardware backend

First contribution?

Start with Contributing and Architecture, then pick Model Onboarding or Backend Onboarding based on your goal.