Developer Guide¶
For developers contributing to OmniRT — adding models, adding backends, or understanding how the runtime fits together.
- Contributing — dev setup, tests, PR workflow, documentation conventions
- Architecture — how the interface layer, engine, executors, middleware, observability, and distributed extensions fit together
- Legacy Optimization Guide — offload, layout, quantization, and TeaCache knobs for
legacy_callfamilies - Benchmark Baseline — bench scenarios, JSON metrics, and release acceptance guidance
- FlashTalk Resident Benchmark — first real-hardware resident benchmark on
Ascend 910B2 x8 - FlashHead Benchmark — first real-hardware result for
soulx-flashhead-1.3bthrough OmniRT'ssubprocesswrapper - Model onboarding — how to register a new model family and pass validation
- Backend onboarding — how to implement
BackendRuntimeand wire in a new hardware backend
First contribution?
Start with Contributing and Architecture, then pick Model Onboarding or Backend Onboarding based on your goal.