Skip to content

Commit 566215f

Browse files
authored
Merge branch 'main' into dependabot/github_actions/docker/login-action-3.7.0
2 parents b2b454c + e2555a0 commit 566215f

18 files changed

Lines changed: 4496 additions & 2519 deletions

.github/workflows/code_checks.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ jobs:
2828
run-code-check:
2929
runs-on: ubuntu-latest
3030
steps:
31-
- uses: actions/checkout@v6.0.1
31+
- uses: actions/checkout@v6.0.2
3232
- name: Install uv
3333
uses: astral-sh/setup-uv@v7
3434
with:

.github/workflows/docker.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ jobs:
3131
backend: [vllm, sglang]
3232
steps:
3333
- name: Checkout repository
34-
uses: actions/checkout@v6.0.1
34+
uses: actions/checkout@v6.0.2
3535

3636
- name: Extract backend version
3737
id: backend-version

.github/workflows/docs.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ jobs:
5151
runs-on: ubuntu-latest
5252
steps:
5353
- name: Checkout code
54-
uses: actions/checkout@v6.0.1
54+
uses: actions/checkout@v6.0.2
5555
with:
5656
fetch-depth: 0 # Fetch all history for proper versioning
5757

@@ -88,7 +88,7 @@ jobs:
8888
runs-on: ubuntu-latest
8989
steps:
9090
- name: Checkout code
91-
uses: actions/checkout@v6.0.1
91+
uses: actions/checkout@v6.0.2
9292
with:
9393
fetch-depth: 0 # Fetch all history for proper versioning
9494

.github/workflows/publish.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ jobs:
1313
sudo apt-get update
1414
sudo apt-get install libcurl4-openssl-dev libssl-dev
1515
16-
- uses: actions/checkout@v6.0.1
16+
- uses: actions/checkout@v6.0.2
1717

1818
- name: Install uv
1919
uses: astral-sh/setup-uv@v7

.github/workflows/unit_tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ jobs:
4343
matrix:
4444
python-version: ["3.10", "3.11", "3.12"]
4545
steps:
46-
- uses: actions/checkout@v6.0.1
46+
- uses: actions/checkout@v6.0.2
4747

4848
- name: Install uv
4949
uses: astral-sh/setup-uv@v7

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ repos:
1717
- id: check-toml
1818

1919
- repo: https://github.com/astral-sh/ruff-pre-commit
20-
rev: 'v0.14.10'
20+
rev: 'v0.14.14'
2121
hooks:
2222
- id: ruff
2323
args: [--fix, --exit-non-zero-on-fix]

MODEL_TRACKING.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -175,8 +175,9 @@ This document tracks all model weights available in the `/model-weights` directo
175175
### Qwen: Qwen3
176176
| Model | Configuration |
177177
|:------|:-------------|
178-
| `Qwen3-14B` ||
178+
| `Qwen3-0.6B` ||
179179
| `Qwen3-8B` ||
180+
| `Qwen3-14B` ||
180181
| `Qwen3-32B` ||
181182
| `Qwen3-235B-A22B` ||
182183
| `Qwen3-Embedding-8B` ||
@@ -233,7 +234,8 @@ This document tracks all model weights available in the `/model-weights` directo
233234
#### Moonshot AI: Kimi
234235
| Model | Configuration |
235236
|:------|:-------------|
236-
| `Kimi-K2-Instruct` ||
237+
| `Kimi-K2-Instruct` ||
238+
| `Kimi-K2.5` ||
237239

238240
#### Mistral AI: Ministral
239241
| Model | Configuration |

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,11 @@
77
[![code checks](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml)
88
[![docs](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs.yml)
99
[![codecov](https://codecov.io/github/VectorInstitute/vector-inference/branch/main/graph/badge.svg?token=NI88QSIGAC)](https://app.codecov.io/github/VectorInstitute/vector-inference/tree/main)
10-
[![vLLM](https://img.shields.io/badge/vLLM-0.12.0-blue)](https://docs.vllm.ai/en/v0.12.0/)
11-
[![SGLang](https://img.shields.io/badge/SGLang-0.5.5.post3-blue)](https://docs.sglang.io/index.html)
10+
[![vLLM](https://img.shields.io/badge/vLLM-0.15.0-blue)](https://docs.vllm.ai/en/v0.15.0/)
11+
[![SGLang](https://img.shields.io/badge/SGLang-0.5.8-blue)](https://docs.sglang.io/index.html)
1212
![GitHub License](https://img.shields.io/github/license/VectorInstitute/vector-inference)
1313

14-
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using open-source inference engines ([vLLM](https://docs.vllm.ai/en/v0.12.0/), [SGLang](https://docs.sglang.io/index.html)). **This package runs natively on the Vector Institute cluster environments**. To adapt to other environments, follow the instructions in [Installation](#installation).
14+
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using open-source inference engines ([vLLM](https://docs.vllm.ai/en/v0.15.0/), [SGLang](https://docs.sglang.io/index.html)). **This package runs natively on the Vector Institute cluster environments**. To adapt to other environments, follow the instructions in [Installation](#installation).
1515

1616
**NOTE**: Supported models on Killarney are tracked [here](./MODEL_TRACKING.md)
1717

@@ -49,7 +49,7 @@ You should see an output like the following:
4949
* `--account`, `-A`: The Slurm account, this argument can be set to default by setting environment variable `VEC_INF_ACCOUNT`.
5050
* `--work-dir`, `-D`: A working directory other than your home directory, this argument can be set to default by seeting environment variable `VEC_INF_WORK_DIR`.
5151

52-
Models that are already supported by `vec-inf` would be launched using the cached configuration (set in [slurm_vars.py](vec_inf/client/slurm_vars.py)) or [default configuration](vec_inf/config/models.yaml). You can override these values by providing additional parameters. Use `vec-inf launch --help` to see the full list of parameters that can be overriden. You can also launch your own custom model as long as the model architecture is supported by the underlying inference engine. For detailed instructions on how to customize your model launch, check out the [`launch` command section in User Guide](https://vectorinstitute.github.io/vector-inference/latest/user_guide/#launch-command)
52+
Models that are already supported by `vec-inf` would be launched using the cached configuration (set in [slurm_vars.py](vec_inf/client/slurm_vars.py)) or [default configuration](vec_inf/config/models.yaml). You can override these values by providing additional parameters. Use `vec-inf launch --help` to see the full list of parameters that can be overriden. You can also launch your own custom model as long as the model architecture is supported by the underlying inference engine. For detailed instructions on how to customize your model launch, check out the [`launch` command section in User Guide](https://vectorinstitute.github.io/vector-inference/latest/user_guide/#launch-command). During the launch process, relevant log files and scripts will be written to a log directory (default to `.vec-inf-logs` in your home directory), and a cache directory (`.vec-inf-cache`) will be created in your working directory (defaults to your home directory if not specified or required) for torch compile cache.
5353

5454
#### Other commands
5555

@@ -138,7 +138,7 @@ The example provided above is for the Vector Killarney cluster, change the varia
138138
If you found Vector Inference useful in your research or applications, please cite using the following BibTeX template:
139139
```
140140
@software{vector_inference,
141-
title = {Vector Inference: Efficient LLM inference on Slurm clusters using vLLM},
141+
title = {Vector Inference: Efficient LLM inference on Slurm clusters},
142142
author = {Wang, Marshall},
143143
organization = {Vector Institute},
144144
year = {<YEAR_OF_RELEASE>},

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Vector Inference: Easy inference on Slurm clusters
22

3-
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using open-source inference engines ([vLLM](https://docs.vllm.ai/en/v0.12.0/), [SGLang](https://docs.sglang.io/index.html)). **This package runs natively on the Vector Institute cluster environments**. To adapt to other environments, follow the instructions in [Installation](#installation).
3+
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using open-source inference engines ([vLLM](https://docs.vllm.ai/en/v0.15.0/), [SGLang](https://docs.sglang.io/index.html)). **This package runs natively on the Vector Institute cluster environments**. To adapt to other environments, follow the instructions in [Installation](#installation).
44

55

66
**NOTE**: Supported models on Killarney are tracked [here](https://github.com/VectorInstitute/vector-inference/blob/main/MODEL_TRACKING.md)

docs/user_guide.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,8 @@ export VEC_INF_MODEL_CONFIG=/h/<username>/my-model-config.yaml
110110

111111
**NOTE**: There are other parameters that can also be added to the config but not shown in this example, check the [`ModelConfig`](https://github.com/VectorInstitute/vector-inference/blob/main/vec_inf/client/config.py) for details.
112112

113+
During the launch process, relevant log files and scripts will be written to a log directory (default to `.vec-inf-logs` in your home directory), and a cache directory (`.vec-inf-cache`) will be created in your working directory (defaults to your home directory if not specified or required) for torch compile cache.
114+
113115
### `batch-launch` command
114116

115117
The `batch-launch` command allows users to launch multiple inference servers at once, here is an example of launching 2 models:

0 commit comments

Comments
 (0)