Serving¶
OmniRT offers batch generation entry points and realtime avatar entry points. Batch entry points share the same GenerateRequest contract; realtime entry points serve the audio chunk -> video frames path for OpenTalking and new clients.
| Entry point | Best for | Page |
|---|---|---|
| Python API | embedding in existing Python apps, notebook experiments | Python API |
| CLI | scripted batches, one-off validate / generate |
CLI |
| HTTP server | microservice, multi-tenant, OpenAI-compatible API, Prometheus / OTLP hooks | HTTP Server |
| FlashTalk WS | compatibility for existing OpenTalking clients, using AUDI / VIDX binary frames |
FlashTalk WebSocket |
| Realtime Avatar WS | recommended OmniRT-native realtime avatar protocol for new integrations | Realtime Avatar WebSocket |
| Worker server | gRPC execution node used by serve --remote-worker |
Distributed Serving |
Recommended order
For offline generation, start in Python or CLI to validate the contract, then deploy the HTTP server for concurrency, batching, and policy tuning. For an existing realtime avatar frontend, use the FlashTalk-compatible WebSocket path to connect the service first.