Support platform server#1143
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the distributed manager to dynamically load platform-specific devices via a registry and introduces server startup and task posting scripts for the hidream_o1_image model. The review feedback highlights several critical improvements: ensuring platform modules are correctly registered upon import, replacing hardcoded absolute paths in shell scripts with dynamic resolution to improve portability, avoiding trailing colons in environment variables (PYTHONPATH and LD_LIBRARY_PATH) to mitigate security risks, and implementing robust retry logic for transient network errors during task polling.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| import lightx2v_platform # noqa: F401 | ||
| from lightx2v_platform.registry_factory import PLATFORM_DEVICE_REGISTER |
There was a problem hiding this comment.
Importing lightx2v_platform directly might not automatically load and register specific platform modules (such as lightx2v_platform.base.cambricon_mlu) unless they are explicitly imported in lightx2v_platform/__init__.py. If they are not imported, PLATFORM_DEVICE_REGISTER will be empty at runtime, causing a RuntimeError. Please ensure that lightx2v_platform/__init__.py automatically imports all available platform modules, or explicitly import them here.
| @@ -0,0 +1,34 @@ | |||
| #!/bin/bash | |||
|
|
|||
| lightx2v_path=/root/yongyang3/LightX2V | |||
There was a problem hiding this comment.
Hardcoding absolute paths like /root/yongyang3/... makes the script non-portable and prone to failure on other environments. Consider dynamically resolving the repository root path relative to the script's location.
| lightx2v_path=/root/yongyang3/LightX2V | |
| lightx2v_path=$(cd "$(dirname "$0")/../.." && pwd) |
| timeout_seconds=1200 | ||
| poll_interval=2.0 | ||
|
|
||
| export PYTHONPATH="${lightx2v_path}:${PYTHONPATH:-}" |
There was a problem hiding this comment.
When PYTHONPATH is empty or unset, appending :${PYTHONPATH:-} results in a trailing colon (e.g., /path/to/lightx2v:). In Python, a trailing colon in PYTHONPATH implicitly adds the current working directory (.) to the module search path, which can lead to unexpected import behaviors or security risks. Use the ${VAR:+:${VAR}} syntax to safely append only when the variable is non-empty.
| export PYTHONPATH="${lightx2v_path}:${PYTHONPATH:-}" | |
| export PYTHONPATH="${lightx2v_path}${PYTHONPATH:+:${PYTHONPATH}}" |
| lightx2v_path=/root/yongyang3/LightX2V | ||
| test_json=${TEST_JSON:-/root/test.json} |
There was a problem hiding this comment.
Hardcoding absolute paths like /root/yongyang3/... and /root/test.json makes the script non-portable. Consider dynamically resolving the repository root path relative to the script's location, and using a relative or configurable path for the test JSON.
| lightx2v_path=/root/yongyang3/LightX2V | |
| test_json=${TEST_JSON:-/root/test.json} | |
| lightx2v_path=$(cd "$(dirname "$0")/../.." && pwd) | |
| test_json=${TEST_JSON:-${lightx2v_path}/test.json} |
| port=${PORT:-8000} | ||
| server_url=${SERVER_URL:-http://127.0.0.1:${port}} | ||
|
|
||
| export PYTHONPATH="${lightx2v_path}:${PYTHONPATH:-}" |
There was a problem hiding this comment.
When PYTHONPATH is empty or unset, appending :${PYTHONPATH:-} results in a trailing colon. This implicitly adds the current working directory (.) to the module search path, which can lead to unexpected import behaviors or security risks. Use the ${VAR:+:${VAR}} syntax to safely append only when the variable is non-empty.
| export PYTHONPATH="${lightx2v_path}:${PYTHONPATH:-}" | |
| export PYTHONPATH="${lightx2v_path}${PYTHONPATH:+:${PYTHONPATH}}" |
| while time.time() < deadline: | ||
| response = requests.get(status_url, timeout=15) | ||
| if response.status_code != 200: | ||
| raise RuntimeError(f"Get task status failed ({response.status_code}): {response.text}") | ||
|
|
||
| status = response.json() | ||
| task_status = status.get("status") | ||
| print(f"[poll] task_id={task_id}, status={task_status}") | ||
|
|
||
| if task_status == "completed": | ||
| return status | ||
| if task_status in ("failed", "cancelled"): | ||
| raise RuntimeError(f"Task ended with status={task_status}, detail={status.get('error')}") | ||
|
|
||
| time.sleep(poll_interval) |
There was a problem hiding this comment.
During long-running polling operations, transient network issues or temporary server restarts can cause requests.get to raise a RequestException or return a non-200 status code. Crashing immediately with an unhandled exception or RuntimeError will fail the entire batch run. It is much more robust to catch requests.exceptions.RequestException, log a warning, and retry after the poll interval.
while time.time() < deadline:
try:
response = requests.get(status_url, timeout=15)
if response.status_code != 200:
print(f"[poll] Get task status failed ({response.status_code}): {response.text}. Retrying...")
time.sleep(poll_interval)
continue
status = response.json()
except requests.exceptions.RequestException as e:
print(f"[poll] Connection error: {e}. Retrying...")
time.sleep(poll_interval)
continue
task_status = status.get("status")
print(f"[poll] task_id={task_id}, status={task_status}")
if task_status == "completed":
return status
if task_status in ("failed", "cancelled"):
raise RuntimeError(f"Task ended with status={task_status}, detail={status.get('error')}")
time.sleep(poll_interval)| lightx2v_path=/root/yongyang3/LightX2V | ||
| model_path=/root/wushuo/models/HiDream-ai/HiDream-O1-Image-Dev-2604 | ||
| config_json=/root/yongyang3/LightX2V/configs/hidream_o1_image/mlu/hidream_o1_image_t2i_dev_2604_dist.json |
There was a problem hiding this comment.
Hardcoding absolute paths like /root/yongyang3/... and /root/wushuo/... makes the script non-portable and prone to failure on other environments or for non-root users. Consider dynamically resolving the repository root path relative to the script's location, and allowing overrides via environment variables.
| lightx2v_path=/root/yongyang3/LightX2V | |
| model_path=/root/wushuo/models/HiDream-ai/HiDream-O1-Image-Dev-2604 | |
| config_json=/root/yongyang3/LightX2V/configs/hidream_o1_image/mlu/hidream_o1_image_t2i_dev_2604_dist.json | |
| lightx2v_path=$(cd "$(dirname "$0")/../.." && pwd) | |
| model_path=${MODEL_PATH:-/root/wushuo/models/HiDream-ai/HiDream-O1-Image-Dev-2604} | |
| config_json=${CONFIG_JSON:-${lightx2v_path}/configs/hidream_o1_image/mlu/hidream_o1_image_t2i_dev_2604_dist.json} |
| export PLATFORM=cambricon_mlu | ||
| export MLU_VISIBLE_DEVICES=0,1,2,3 | ||
| export PYTORCH_MLU_ALLOC_CONF=expandable_segments:True | ||
| export LD_LIBRARY_PATH=/usr/local/neuware/lib64:${LD_LIBRARY_PATH} |
There was a problem hiding this comment.
If LD_LIBRARY_PATH is empty or unset, appending :${LD_LIBRARY_PATH} results in a trailing colon (e.g., /usr/local/neuware/lib64:). In Linux, a trailing colon in LD_LIBRARY_PATH implicitly includes the current directory (.) in the shared library search path, which is a security risk. Use the ${VAR:+:${VAR}} syntax to safely append only when the variable is non-empty.
| export LD_LIBRARY_PATH=/usr/local/neuware/lib64:${LD_LIBRARY_PATH} | |
| export LD_LIBRARY_PATH="/usr/local/neuware/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" |
No description provided.