Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/CN/source/tutorial/deepseek_deployment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,7 @@ PD (Prefill-Decode) 分离模式将预填充和解码阶段分离部署,可以

# PD prefill 模式 for DeepSeek-R1 (DP+EP) on H200
# 使用方法: sh pd_prefill.sh <host> <pd_master_ip>
# 默认使用 NIXL 传输;如需使用 NCCL 数据面,可设置 LIGHTLLM_PD_KV_TRANSPORT_BACKEND=nccl
# nvidia-cuda-mps-control -d,运行MPS(可选, 有mps支持性能会好特别多,但是部分显卡和驱动环境开启mps会容易出现错误,建议升级驱动到较高版本,特别是H系列卡)

export host=$1
Expand All @@ -201,6 +202,7 @@ PD (Prefill-Decode) 分离模式将预填充和解码阶段分离部署,可以

# PD decode 模式 for DeepSeek-R1 (DP+EP) on H200
# 使用方法: sh pd_decode.sh <host> <pd_master_ip>
# 默认使用 NIXL 传输;如需使用 NCCL 数据面,可设置 LIGHTLLM_PD_KV_TRANSPORT_BACKEND=nccl
export host=$1
export pd_master_ip=$2
nvidia-cuda-mps-control -d
Expand Down Expand Up @@ -336,4 +338,4 @@ PD (Prefill-Decode) 分离模式将预填充和解码阶段分离部署,可以
--tokenizer_path /path/DeepSeek-R1/ \
--url http://127.0.0.1:8088/generate_stream

以上所有脚本可以参考 `test/start_scripts/multi_pd_master/` 目录下的脚本。
以上所有脚本可以参考 `test/start_scripts/multi_pd_master/` 目录下的脚本。
4 changes: 3 additions & 1 deletion docs/EN/source/tutorial/deepseek_deployment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,7 @@ PD (Prefill-Decode) disaggregation mode separates prefill and decode stages for

# PD prefill mode for DeepSeek-R1 (DP+EP) on H200
# Usage: sh pd_prefill.sh <host> <pd_master_ip>
# NIXL is used by default. To use NCCL as the data-plane backend, set LIGHTLLM_PD_KV_TRANSPORT_BACKEND=nccl.
# nvidia-cuda-mps-control -d, run MPS (optional, performance will be much better with mps support, but some GPUs may encounter errors when enabling mps, it's recommended to upgrade to a higher driver version, especially for H-series cards)

export host=$1
Expand All @@ -198,6 +199,7 @@ PD (Prefill-Decode) disaggregation mode separates prefill and decode stages for

# PD decode mode for DeepSeek-R1 (DP+EP) on H200
# Usage: sh pd_decode.sh <host> <pd_master_ip>
# NIXL is used by default. To use NCCL as the data-plane backend, set LIGHTLLM_PD_KV_TRANSPORT_BACKEND=nccl.
export host=$1
export pd_master_ip=$2
nvidia-cuda-mps-control -d
Expand Down Expand Up @@ -333,4 +335,4 @@ Supports multiple PD Master nodes, providing better load balancing and high avai
--tokenizer_path /path/DeepSeek-R1/ \
--url http://127.0.0.1:8088/generate_stream

All the above scripts can be referenced from the scripts in the `test/start_scripts/multi_pd_master/` directory.
All the above scripts can be referenced from the scripts in the `test/start_scripts/multi_pd_master/` directory.
6 changes: 0 additions & 6 deletions lightllm/common/basemodel/basemodel.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,6 @@ def __init__(self, kvargs):
# 这可能会占用大量的显存,所以,req_manger 中保存的 mem_manger 是mem manager 初始化后再赋值
self.req_manager.mem_manager = self.mem_manager

self._init_kv_move_buffer()
self._check_mem_size()
self._init_infer_layer()
self._init_some_value()
Expand Down Expand Up @@ -197,11 +196,6 @@ def _init_mem_manager(self):
)
return

def _init_kv_move_buffer(self):
# p d 分离的推理模式下才需要做这一步初始化
if self.run_mode in ["prefill", "decode"]:
self.mem_manager.alloc_kv_move_buffer(self.mem_manager.size)

def _check_mem_size(self):
self.max_total_token_num = self.mem_manager.size

Expand Down
138 changes: 0 additions & 138 deletions lightllm/common/basemodel/infer_lock.py

This file was deleted.

Loading
Loading