V100 + FasterLivePortrait + FlashHead 部署配方¶
从零开始,在 NVIDIA V100 (32GB) 上部署实时对话数字人系统 包含两套推理方案:FasterLivePortrait(真人驱动)+ FlashHead(AI生成) 适用环境:Ubuntu 22.04 + NVIDIA Driver 580 + CUDA 12.x
本页是一份面向 V100 单机的实战部署配方。通用模型说明请先阅读 FasterLivePortrait、FlashHead 和 OmniRT 部署。
一、系统架构¶
用户浏览器 (5173)
│
▼
OpenTalking 后端 (8000) ── LLM + TTS + WebRTC + 会话管理
│
├── OmniRT (9000) ── FasterLivePortrait(真人视频驱动)
│
└── FlashHead Server (8766) ── FlashHead 1.3B(AI 生成)
组件说明:
| 组件 | 端口 | 功能 |
|---|---|---|
| OpenTalking 前端 | 5173 | 浏览器界面(Vue + Vite) |
| OpenTalking 后端 | 8000 | 编排、LLM 对话、TTS 合成、WebRTC 传输 |
| OmniRT | 9000 | FasterLivePortrait TRT 推理引擎 |
| FlashHead Server | 8766 | FlashHead WebSocket 推理服务 |
二、服务器基础环境¶
2.1 硬件要求¶
- GPU:NVIDIA V100 32GB(或同代 Volta 架构)
- 内存:32GB+
- 磁盘:200GB+(模型文件约 50GB)
- 网络:公网 IP,需开放端口 5173、8000、8766、9000、UDP 40000-60000
2.2 V100 硬件特性(重要)¶
| 特性 | 支持情况 | 影响 |
|---|---|---|
| FP16 | ✅ 支持(有 Tensor Core) | 必须用 FP16,不能用 BF16 |
| BF16 | ❌ 不支持 | 新模型默认 BF16,必须手动改 FP16 |
| FP8 | ❌ 不支持 | — |
| FlashAttention 2 | ❌ 不支持(需 SM 80+) | 部分模型需要替换为标准 attention |
| TensorRT | 只支持 8.x(10.x 需 SM 75+) | 必须用 TRT 8.6 |
| torch.compile | ⚠️ 有限支持 | 不同输入形状会触发重编译,建议关闭 |
2.3 NVIDIA 驱动安装¶
# 检查当前驱动
nvidia-smi
# 如果驱动版本低于 535,需要升级
# 添加 NVIDIA 源
apt install -y software-properties-common
add-apt-repository -y ppa:graphics-drivers/ppa
apt update
# 安装驱动 580(推荐)
apt install -y nvidia-driver-580
reboot
# 验证
nvidia-smi # 应显示 Driver Version: 580.xx
2.4 基础软件安装¶
apt update && apt install -y \
python3.10 python3.10-venv python3-pip \
git git-lfs cmake build-essential \
ffmpeg libgl1-mesa-glx libglib2.0-0 \
redis-server
# 配置 pip 国内镜像(服务器在国内必须)
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
# 配置 HuggingFace 镜像
echo 'export HF_ENDPOINT=https://hf-mirror.com' >> ~/.bashrc
source ~/.bashrc
2.5 项目目录结构¶
mkdir -p /opt/digital-human
cd /opt/digital-human
# 最终目录结构
# /opt/digital-human/
# ├── omnirt/ # OmniRT 推理引擎
# ├── FasterLivePortrait/ # FLP 模型代码 + 权重
# ├── models/ # 共享模型权重
# ├── SoulX-FlashHead/ # FlashHead 模型代码
# ├── SoulX-FlashHead-WEB/ # FlashHead 前后端
# │ └── models/ # FlashHead 权重
# ├── opentalking/ # OpenTalking 前后端
# ├── flashhead-env/ # FlashHead Python 环境
# ├── flashhead_server.py # FlashHead WebSocket 服务器
# └── start_omnirt_trt.sh # OmniRT 启动脚本
三、FasterLivePortrait 部署(真人驱动方案)¶
3.1 方案特点¶
- 原理:录制一段真人视频作为底板,用音频驱动口型、表情、头部运动
- 优点:真实感最好,输出是"真人"
- 缺点:推理较慢,长句有轻微卡顿
- 性能:TRT 优化后约 15-20fps
- 素材需求:需要预先录制一段正面半身视频(512×512+,光线好,背景干净)
3.2 获取代码¶
cd /opt/digital-human
# 克隆 FasterLivePortrait
git clone https://github.com/KwaiVGI/LivePortrait.git FasterLivePortrait
cd FasterLivePortrait
# 下载 ONNX 模型(含 warping_spade 等 8 个模型)
# 从 HuggingFace 下载
hf download warmshao/FasterLivePortrait \
--local-dir ./checkpoints/liveportrait_onnx
# 下载其他必要权重
hf download KwaiVGI/LivePortrait \
--include "pretrained_weights/*" \
--local-dir ./checkpoints
3.3 创建 Python 环境¶
python3.10 -m venv /opt/digital-human/omnirt/.venv310
source /opt/digital-human/omnirt/.venv310/bin/activate
# PyTorch(V100 必须用 cu124)
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 \
--index-url https://download.pytorch.org/whl/cu124
# 验证 CUDA
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
# 输出: True Tesla V100-PCIE-32GB
# 如果报 nvjitlink 符号错误,设置:
export LD_LIBRARY_PATH=$(python -c "import nvidia.nvjitlink; import os; print(os.path.dirname(nvidia.nvjitlink.__file__))")/lib:$LD_LIBRARY_PATH
3.4 安装 TensorRT 8.6¶
# 方案A:系统包安装(推荐)
# 添加 NVIDIA apt 源
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
dpkg -i cuda-keyring_1.1-1_all.deb
apt update
apt install -y tensorrt=8.6.1.6-1+cuda12.0
# Python 绑定
pip install tensorrt==8.6.1
# 方案B:pip 直接安装(如果方案A失败)
pip install tensorrt==8.6.1
# 验证
python -c "import tensorrt; print(tensorrt.__version__)"
# 输出: 8.6.1
3.5 安装其他依赖¶
pip install onnxruntime-gpu==1.17.0
pip install numpy==1.26.4 opencv-python-headless==4.8.1.78
pip install librosa soundfile scipy
3.6 ONNX → TRT 引擎转换¶
将 8 个 ONNX 模型转换为 TRT FP16 引擎:
cd /opt/digital-human/FasterLivePortrait
# 逐个转换(每个耗时 1-5 分钟)
for model in \
retinaface_det_static \
face_2dpose_106_static \
landmark \
motion_extractor \
appearance_feature_extractor \
stitching \
stitching_lip \
stitching_eye \
warping_spade-fix; do
echo "Converting $model ..."
python scripts/onnx2trt.py \
-o ./checkpoints/liveportrait_onnx/${model}.onnx \
-p fp16
done
# 验证所有 .trt 文件已生成
ls -lh ./checkpoints/liveportrait_onnx/*.trt
# 应有 9 个 .trt 文件
3.7 编译 GridSample3D 插件¶
warping_spade 模型需要自定义 TRT 插件:
# 获取插件源码
git clone https://github.com/NVIDIA/TensorRT.git /tmp/TensorRT
cd /tmp/TensorRT/plugin/gridSamplePlugin
# 编译
mkdir build && cd build
cmake .. -DTRT_LIB_DIR=/usr/lib/x86_64-linux-gnu \
-DTRT_INCLUDE_DIR=/usr/include/x86_64-linux-gnu \
-DCUDA_VERSION=12.0
make -j$(nproc)
# 复制插件到模型目录
cp libgrid_sample_3d_plugin.so \
/opt/digital-human/FasterLivePortrait/checkpoints/liveportrait_onnx/
3.8 安装 OmniRT¶
cd /opt/digital-human
git clone https://github.com/datascale-ai/omnirt.git
cd omnirt
source /opt/digital-human/omnirt/.venv310/bin/activate
pip install -e .
3.9 启动 OmniRT¶
创建启动脚本 /opt/digital-human/start_omnirt_trt.sh:
#!/bin/bash
source /opt/digital-human/omnirt/.venv310/bin/activate
export LD_LIBRARY_PATH=/opt/digital-human/omnirt/.venv310/lib/python3.10/site-packages/nvidia/cuda_runtime/lib
export OMNIRT_FASTLIVEPORTRAIT_RUNTIME=true
export OMNIRT_FASTLIVEPORTRAIT_ROOT=/opt/digital-human/FasterLivePortrait
export OMNIRT_FASTLIVEPORTRAIT_CHECKPOINTS_DIR=/opt/digital-human/FasterLivePortrait/checkpoints
export OMNIRT_FASTLIVEPORTRAIT_CFG=configs/trt_infer.yaml
export OMNIRT_FASTLIVEPORTRAIT_LOAD_MODELS=true
cd /opt/digital-human/omnirt
python -c "from omnirt.server.avatar_app import create_avatar_app; import uvicorn; app = create_avatar_app(default_backend='cuda'); uvicorn.run(app, host='0.0.0.0', port=9000)"
验证:
3.10 FasterLivePortrait 参数调优¶
文件:opentalking/configs/synthesis/fasterliveportrait.yaml
# 分辨率(与底板视频匹配)
width: 448
height: 900
fps: 25
# 动画区域:all=全脸,lip=只动嘴
animation_region: all
# 运动幅度(越大动作越大)
head_motion_multiplier: 0.8 # 头部运动
expression_multiplier: 1.5 # 表情
mouth_open_multiplier: 1.25 # 嘴张开幅度
mouth_corner_multiplier: 0.85 # 嘴角
# 音频分块
chunk_samples: 16000
emit_frames_per_chunk: 25
# 帧插值(V100 上可能更卡,建议关闭)
disable_frame_interpolation: true
四、FlashHead 部署(AI 生成方案)¶
4.1 方案特点¶
- 原理:扩散模型,从一张参考图 + 音频实时生成说话视频
- 优点:速度极快(69fps),不卡顿,只需一张照片
- 缺点:AI 生成痕迹明显,真实感不如真人驱动
- 性能:V100 FP16,单次推理 0.48s/33帧,峰值显存 4.94GB
- 素材需求:只需一张正面照片(PNG 格式,512×512+,光线好)
4.2 创建 Python 环境¶
python3.10 -m venv /opt/digital-human/flashhead-env
source /opt/digital-human/flashhead-env/bin/activate
# PyTorch
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 \
--index-url https://download.pytorch.org/whl/cu124
# 解决 nvjitlink 符号问题
export LD_LIBRARY_PATH=/opt/digital-human/flashhead-env/lib/python3.10/site-packages/nvidia/nvjitlink/lib:$LD_LIBRARY_PATH
# 验证
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
# 安装依赖
pip install fastapi uvicorn websockets librosa soundfile \
opencv-python-headless imageio pillow transformers accelerate \
openai edge-tts einops diffusers av xfuser easydict \
scikit-image loguru mediapipe==0.10.9 pydantic-settings \
-i https://pypi.tuna.tsinghua.edu.cn/simple
4.3 获取模型代码¶
4.4 下载模型权重¶
# FlashHead 1.3B(~15GB,含 Lite 和 Pro 两个版本)
hf download Soul-AILab/SoulX-FlashHead-1_3B \
--local-dir /opt/digital-human/SoulX-FlashHead-WEB/models/SoulX-FlashHead-1_3B
# wav2vec2 音频编码器(~1.1GB)
hf download facebook/wav2vec2-base-960h \
--local-dir /opt/digital-human/SoulX-FlashHead-WEB/models/wav2vec2-base-960h
验证:
ls -lh /opt/digital-human/SoulX-FlashHead-WEB/models/SoulX-FlashHead-1_3B/
# 应有 Model_Lite/ Model_Pro/ VAE_LTX/ VAE_Wan/ 等目录
du -sh /opt/digital-human/SoulX-FlashHead-WEB/models/SoulX-FlashHead-1_3B/
# 约 15GB
4.5 V100 必要补丁¶
补丁 1:BF16 → FP16(关键!不做会极慢)¶
# Pipeline 默认精度
sed -i 's/param_dtype=torch.bfloat16/param_dtype=torch.float16/' \
/opt/digital-human/SoulX-FlashHead/flash_head/src/pipeline/flash_head_pipeline.py
# LTX VAE 默认精度
sed -i 's/dtype = torch.bfloat16/dtype = torch.float16/' \
/opt/digital-human/SoulX-FlashHead/flash_head/ltx_video/ltx_vae.py
效果对比:
| 修改前 (FP32 fallback) | 修改后 (FP16) | |
|---|---|---|
| FPS | 9.7 | 69 |
| 推理时间 | 3.41s | 0.48s |
| 峰值显存 | 8.26 GB | 4.94 GB |
补丁 2:关闭 torch.compile(防止 einops 形状错误)¶
sed -i 's/COMPILE_MODEL = True/COMPILE_MODEL = False/' \
/opt/digital-human/SoulX-FlashHead/flash_head/src/pipeline/flash_head_pipeline.py
sed -i 's/COMPILE_VAE = True/COMPILE_VAE = False/' \
/opt/digital-human/SoulX-FlashHead/flash_head/src/pipeline/flash_head_pipeline.py
原因: torch.compile 缓存特定输入形状的编译结果。音频长度不同时触发重新编译,einops 的 rearrange 操作报形状不匹配错误。
4.6 FlashHead WebSocket 服务器¶
OpenTalking 通过 WebSocket 二进制协议与 FlashHead 通信。需要写一个桥接服务器:
创建 /opt/digital-human/flashhead_server.py:
"""
FlashHead WebSocket 服务器
实现 OpenTalking 的 /v1/avatar/realtime 协议
协议:客户端发 AUDI + PCM int16,服务端回 VIDX + JPEG 帧序列
"""
import asyncio, base64, json, struct, time, tempfile, os, sys
import numpy as np, torch, cv2
sys.path.insert(0, '/opt/digital-human/SoulX-FlashHead')
os.chdir('/opt/digital-human/SoulX-FlashHead')
from flash_head.inference import get_pipeline, get_base_data, get_audio_embedding, run_pipeline
import websockets
MAGIC_AUDIO = b"AUDI"
MAGIC_VIDEO = b"VIDX"
MODEL_DIR = "/opt/digital-human/SoulX-FlashHead-WEB/models/SoulX-FlashHead-1_3B"
WAV2VEC_DIR = "/opt/digital-human/SoulX-FlashHead-WEB/models/wav2vec2-base-960h"
pipeline = None
def load_pipeline():
global pipeline
print("[FlashHead] Loading model...")
pipeline = get_pipeline(world_size=1, ckpt_dir=MODEL_DIR, model_type="lite", wav2vec_dir=WAV2VEC_DIR)
print(f"[FlashHead] Model loaded, VRAM: {torch.cuda.memory_allocated()/1024**3:.2f} GB")
def pcm_to_float(pcm_bytes):
audio = np.frombuffer(pcm_bytes, dtype=np.int16).astype(np.float32) / 32768.0
target = 21120 # 33 frames @ 16kHz/25fps
if len(audio) < target:
audio = np.pad(audio, (0, target - len(audio)))
elif len(audio) > target:
audio = audio[:target]
return audio
def frames_to_jpeg_response(frames_np):
parts = []
for i in range(frames_np.shape[0]):
bgr = cv2.cvtColor(frames_np[i], cv2.COLOR_RGB2BGR)
_, buf = cv2.imencode('.jpg', bgr, [cv2.IMWRITE_JPEG_QUALITY, 85])
parts.append(struct.pack('<I', len(buf)) + buf.tobytes())
return MAGIC_VIDEO + struct.pack('<I', frames_np.shape[0]) + b''.join(parts)
async def handle_connection(websocket):
sid = str(int(time.time() * 1000))
ref_path = None
print(f"[{sid}] Connected")
try:
async for msg in websocket:
if isinstance(msg, str):
data = json.loads(msg)
if data.get("type") == "session.create":
img_bytes = base64.b64decode(data["inputs"]["image_b64"])
ref_path = tempfile.NamedTemporaryFile(suffix='.png', delete=False).name
with open(ref_path, 'wb') as f: f.write(img_bytes)
get_base_data(pipeline, cond_image_path_or_dir=ref_path, base_seed=9999, use_face_crop=False)
await websocket.send(json.dumps({"type": "session.created", "session_id": sid,
"audio": {"sample_rate": 16000, "chunk_samples": 17920},
"video": {"fps": 25, "width": 512, "height": 512, "frame_count": 29}}))
print(f"[{sid}] Session created")
elif data.get("type") == "session.close":
await websocket.send(json.dumps({"type": "session.closed"}))
break
elif isinstance(msg, bytes) and msg[:4] == MAGIC_AUDIO:
audio = pcm_to_float(msg[4:])
t0 = time.time()
emb = get_audio_embedding(pipeline, audio)
frames = run_pipeline(pipeline, emb)
if frames is not None:
arr = frames.cpu().numpy()
if arr.ndim == 4 and arr.shape[-1] != 3: arr = arr.transpose(0, 2, 3, 1)
if arr.max() <= 1.0: arr = (arr * 255).astype(np.uint8)
await websocket.send(frames_to_jpeg_response(arr))
print(f"[{sid}] {arr.shape[0]} frames in {time.time()-t0:.2f}s")
except websockets.exceptions.ConnectionClosed: pass
finally:
if ref_path and os.path.exists(ref_path): os.unlink(ref_path)
print(f"[{sid}] Disconnected")
async def main():
load_pipeline()
print("[FlashHead] Server started: ws://0.0.0.0:8766/v1/avatar/realtime")
async with websockets.serve(handle_connection, "0.0.0.0", 8766, max_size=50*1024*1024):
await asyncio.Future()
if __name__ == "__main__":
asyncio.run(main())
启动:
source /opt/digital-human/flashhead-env/bin/activate
export LD_LIBRARY_PATH=/opt/digital-human/flashhead-env/lib/python3.10/site-packages/nvidia/nvjitlink/lib:$LD_LIBRARY_PATH
python /opt/digital-human/flashhead_server.py
五、OpenTalking 部署¶
5.1 获取代码¶
5.2 后端环境¶
# 复用 OmniRT 的 3.10 环境,或新建
python3.10 -m venv .venv310
source .venv310/bin/activate
# 安装依赖
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
5.3 前端环境¶
5.4 配置文件¶
configs/default.yaml 关键配置:
# 服务端口
api:
host: 0.0.0.0
port: 8000
# FlashHead 配置(指向本地服务器)
flashhead:
ws_url: ws://127.0.0.1:8766/v1/avatar/realtime
base_url: http://127.0.0.1:8766
model: soulx-flashhead-1.3b
fps: 25
width: 512
height: 512
# 默认模型选择
models:
fasterliveportrait:
backend: omnirt
flashhead:
backend: direct_ws
5.5 LLM 配置¶
在 opentalking/backend/.env 中:
# DeepSeek API(或其他 OpenAI 兼容 API)
OPEN_AI_API_KEY=你的API密钥
OPEN_AI_URL=https://api.deepseek.com/v1
LLM_MODEL=deepseek-chat
# TTS
TTS_TYPE=edge
EDGE_TTS_VOICE=zh-CN-XiaoxiaoNeural
5.6 启动后端¶
cd /opt/digital-human/opentalking
source .venv310/bin/activate
export LD_LIBRARY_PATH=/opt/digital-human/omnirt/.venv310/lib/python3.10/site-packages/nvidia/cuda_runtime/lib
python -m apps.unified.main
5.7 启动前端¶
5.8 WebRTC 公网 IP 修复¶
如果服务器在 NAT 后面(如华为云),需要在 SDP 中注入公网 IP:
找到 adapter.py 中的 handle_offer 方法(约第 334 行),在 SDP 处理中注入公网 IP:
5.9 华为云安全组¶
需要开放的端口:
| 端口 | 协议 | 用途 |
|---|---|---|
| 5173 | TCP | 前端页面 |
| 8000 | TCP | 后端 API |
| 8766 | TCP | FlashHead WebSocket |
| 9000 | TCP | OmniRT |
| 40000-60000 | UDP | WebRTC 视频传输 |
六、完整启动流程¶
按以下顺序启动 4 个服务(每个占用一个终端):
终端 1:OmniRT(FasterLivePortrait 引擎)¶
等待看到 Application startup complete 后继续。
终端 2:FlashHead 服务器¶
source /opt/digital-human/flashhead-env/bin/activate
export LD_LIBRARY_PATH=/opt/digital-human/flashhead-env/lib/python3.10/site-packages/nvidia/nvjitlink/lib:$LD_LIBRARY_PATH
python /opt/digital-human/flashhead_server.py
等待看到 [FlashHead] Server started 后继续。
终端 3:OpenTalking 后端¶
cd /opt/digital-human/opentalking
source .venv310/bin/activate
export LD_LIBRARY_PATH=/opt/digital-human/omnirt/.venv310/lib/python3.10/site-packages/nvidia/cuda_runtime/lib
python -m apps.unified.main
终端 4:前端¶
访问¶
浏览器打开 http://服务器IP:5173
选择模型(FasterLivePortrait 或 FlashHead),上传参考素材,开始对话。
七、录制视频(含音频)¶
默认录制功能只保存视频帧,不保存音频。需要打补丁。
7.1 修改 recording.py¶
文件:opentalking/pipeline/recording/recording.py
在 export_flashtalk_recording 函数前添加:
import wave
import subprocess
_audio_buffers: dict[str, bytearray] = {}
_audio_sample_rates: dict[str, int] = {}
def flashtalk_recording_audio_path(session_id: str) -> Path:
return flashtalk_recording_session_dir(session_id) / "audio.wav"
def append_flashtalk_audio(session_id: str, pcm_int16, sample_rate: int = 16000) -> None:
pcm = np.asarray(pcm_int16, dtype=np.int16).reshape(-1)
if session_id not in _audio_buffers:
_audio_buffers[session_id] = bytearray()
_audio_sample_rates[session_id] = sample_rate
_audio_buffers[session_id].extend(pcm.tobytes())
def flush_audio_buffer(session_id: str) -> None:
buf = _audio_buffers.pop(session_id, None)
sr = _audio_sample_rates.pop(session_id, 16000)
if not buf: return
audio_path = flashtalk_recording_audio_path(session_id)
audio_path.parent.mkdir(parents=True, exist_ok=True)
with wave.open(str(audio_path), "wb") as wf:
wf.setnchannels(1)
wf.setsampwidth(2)
wf.setframerate(sr)
wf.writeframes(bytes(buf))
在 export_flashtalk_recording 函数开头添加:
在 export_flashtalk_recording 函数末尾(return output 前)添加:
# 用 ffmpeg 合并音频
audio_path = flashtalk_recording_audio_path(session_id)
if audio_path.is_file():
final = flashtalk_recording_session_dir(session_id) / "flashtalk_with_audio.mp4"
try:
subprocess.run(
["ffmpeg", "-y", "-i", str(output), "-i", str(audio_path),
"-c:v", "copy", "-c:a", "aac", "-shortest", str(final)],
capture_output=True, timeout=60)
if final.is_file() and final.stat().st_size > 0:
output.unlink(missing_ok=True)
final.rename(output)
except Exception: pass
7.2 修改 synthesis_runner.py¶
文件:opentalking/pipeline/speak/synthesis_runner.py
在 _append_recording_frames_if_enabled 方法后添加:
async def _append_recording_audio_if_enabled(self, pcm_int16) -> None:
try:
recording = await self.redis.hget(
session_key(self.session_id),
FLASHTALK_DISK_RECORDING_FIELD,
)
except Exception:
return
if str(recording or "").strip() != "1":
return
try:
from opentalking.pipeline.recording.recording import append_flashtalk_audio
append_flashtalk_audio(self.session_id, pcm_int16)
except Exception:
log.exception("Recording audio append failed: session=%s", self.session_id)
找到 await self._append_recording_frames_if_enabled(frames) 这一行,在其后添加:
注意: Python 的 wave 模块不支持 "ab" 追加模式,所以音频必须先缓存在内存中,导出时一次性写入。
八、两个模型对比¶
| FasterLivePortrait | FlashHead | |
|---|---|---|
| 原理 | 真人视频驱动 | AI 扩散模型生成 |
| 口型同步 | ✅ 精准 | ✅ 基本准确 |
| 头部运动 | ✅ 主动驱动 | ✅ 自然生成 |
| 表情 | ✅ 丰富 | ✅ 丰富 |
| 真实感 | ✅ 好(真人底板) | ⚠️ AI 生成感明显 |
| 速度 | 15-20fps(TRT) | 69fps(FP16) |
| 峰值显存 | ~10GB | ~5GB |
| 素材需求 | 需录制真人视频 | 只需一张照片 |
| 长句表现 | 有轻微卡顿 | 流畅不卡 |
| 适合场景 | 高真实感展示 | 实时对话、快速部署 |
九、V100 部署经验总结¶
-
BF16 必须改 FP16:新模型默认 BF16,V100 不支持。改两行代码,速度提升 7 倍。
-
torch.compile 要关闭或谨慎使用:输入形状不固定时会反复重编译并报错。
-
TRT 只能用 8.x:TRT 10 不支持 SM 7.0。Python 3.10 + TRT 8.6 最稳。
-
PyTorch 用 cu124 版本:
torch==2.5.1+cu124在 V100 上稳定。 -
LD_LIBRARY_PATH 必须设置:nvjitlink 符号冲突是 V100 常见问题。
-
国内镜像必须配:pip 用清华源,HuggingFace 用 hf-mirror.com。
-
音频长度要对齐:FlashHead 要求固定长度音频输入(21120 samples = 33 帧),不足要填充。
-
wave 模块不支持追加:Python 标准库
wave只支持"w"/"wb"/"r"/"rb",不支持"ab"。需要先内存缓存再一次性写入。 -
WebRTC NAT 穿透:华为云需要在 SDP 中注入公网 IP。
-
安全组要提前开好端口:5173、8000、8766、9000、UDP 40000-60000。
十、关键文件索引¶
| 文件 | 路径 | 说明 |
|---|---|---|
| OmniRT 启动脚本 | /opt/digital-human/start_omnirt_trt.sh |
FasterLivePortrait TRT 引擎 |
| FlashHead 服务器 | /opt/digital-human/flashhead_server.py |
WebSocket 桥接服务 |
| FlashHead 模型代码 | /opt/digital-human/SoulX-FlashHead/ |
需要 BF16→FP16 补丁 |
| FlashHead 模型权重 | /opt/digital-human/SoulX-FlashHead-WEB/models/ |
1.3B + wav2vec2 |
| FlashHead Python 环境 | /opt/digital-human/flashhead-env/ |
独立 venv |
| OmniRT | /opt/digital-human/omnirt/ |
推理引擎 |
| FasterLivePortrait | /opt/digital-human/FasterLivePortrait/ |
模型代码 + TRT 引擎 |
| OpenTalking 后端 | /opt/digital-human/opentalking/ |
编排 + API |
| OpenTalking 前端 | /opt/digital-human/opentalking/apps/web/ |
Vue 前端 |
| FLP 合成配置 | opentalking/configs/synthesis/fasterliveportrait.yaml |
运动幅度等参数 |
| FlashHead 配置 | opentalking/configs/default.yaml |
ws_url 等 |
| 录制模块 | opentalking/pipeline/recording/recording.py |
需打音频补丁 |
| TRT 推理配置 | FasterLivePortrait/configs/trt_infer.yaml |
TRT 引擎路径 |