Docker & Container Deployment¶
OmniRT is a Python package; there is no official image. Use the templates below to layer it on top of any CUDA- or Ascend-capable base image.
CUDA image template¶
# Dockerfile.cuda
FROM nvidia/cuda:12.1.1-runtime-ubuntu22.04
ARG PYTORCH_INDEX=https://download.pytorch.org/whl/cu121
RUN apt-get update && apt-get install -y --no-install-recommends \
python3 python3-pip git ffmpeg \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /opt/omnirt
COPY . /opt/omnirt
RUN python3 -m pip install --no-cache-dir \
torch==2.5.1 torchvision==0.20.1 --index-url $PYTORCH_INDEX \
&& python3 -m pip install --no-cache-dir -e '.[runtime,server]'
EXPOSE 8000
CMD ["uvicorn", "omnirt.server.app:create_app", "--factory", "--host", "0.0.0.0", "--port", "8000"]
Build and run:
docker build -t omnirt:cuda -f Dockerfile.cuda .
docker run --gpus all -p 8000:8000 \
-v $HOME/.cache/huggingface:/root/.cache/huggingface \
omnirt:cuda
Ascend image template¶
# Dockerfile.ascend
FROM ascendhub.huawei.com/public-ascendhub/cann:8.0.RC2-ubuntu22.04
RUN apt-get update && apt-get install -y --no-install-recommends \
python3 python3-pip git ffmpeg \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /opt/omnirt
COPY . /opt/omnirt
RUN python3 -m pip install --no-cache-dir torch==2.1.0 torchvision==0.16.0 \
&& python3 -m pip install --no-cache-dir torch_npu==2.1.0.post6 \
&& python3 -m pip install --no-cache-dir -e '.[runtime,server]'
ENV ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ENV LD_LIBRARY_PATH=$ASCEND_TOOLKIT_HOME/lib64:$LD_LIBRARY_PATH
ENV PATH=$ASCEND_TOOLKIT_HOME/bin:$PATH
EXPOSE 8000
CMD ["bash", "-c", "source $ASCEND_TOOLKIT_HOME/set_env.sh && \
uvicorn omnirt.server.app:create_app --factory --host 0.0.0.0 --port 8000"]
Run:
docker run --device=/dev/davinci0 --device=/dev/davinci_manager \
--device=/dev/hisi_hdc --device=/dev/devmm_svm \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-p 8000:8000 omnirt:ascend
Docker Compose (dev)¶
# docker-compose.yml
services:
omnirt:
build:
context: .
dockerfile: Dockerfile.cuda
ports:
- "8000:8000"
deploy:
resources:
reservations:
devices:
- capabilities: ["gpu"]
volumes:
- ${HOME}/.cache/huggingface:/root/.cache/huggingface
- ${HOME}/.cache/omnirt:/root/.cache/omnirt
environment:
- OMNIRT_LOG_LEVEL=INFO
- HF_ENDPOINT=${HF_ENDPOINT:-} # e.g. https://hf-mirror.com behind the GFW
Slim-image tips¶
- Use
-runtimebase images for production, not-devel - Combine installs under a single
RUNwith--no-cache-dirto shrink layers - Don't ship
.[dev]extras in production images (keep those in CI only) - Mount model weights as a volume instead of
COPYing them in — weight directories are often tens of GB
Related¶
- HTTP Server — FastAPI serving, API keys, concurrency, batching
- Domestic Deployment — mirror strategy when HuggingFace is unreachable at build time
- Ascend Backend — Ascend driver / firmware / CANN version constraints