|
7 | 7 | [](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml) |
8 | 8 | [](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs.yml) |
9 | 9 | [](https://app.codecov.io/github/VectorInstitute/vector-inference/tree/main) |
10 | | -[](https://docs.vllm.ai/en/v0.12.0/) |
11 | | -[](https://docs.sglang.io/index.html) |
| 10 | +[](https://docs.vllm.ai/en/v0.15.0/) |
| 11 | +[](https://docs.sglang.io/index.html) |
12 | 12 |  |
13 | 13 |
|
14 | | -This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using open-source inference engines ([vLLM](https://docs.vllm.ai/en/v0.12.0/), [SGLang](https://docs.sglang.io/index.html)). **This package runs natively on the Vector Institute cluster environments**. To adapt to other environments, follow the instructions in [Installation](#installation). |
| 14 | +This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using open-source inference engines ([vLLM](https://docs.vllm.ai/en/v0.15.0/), [SGLang](https://docs.sglang.io/index.html)). **This package runs natively on the Vector Institute cluster environments**. To adapt to other environments, follow the instructions in [Installation](#installation). |
15 | 15 |
|
16 | 16 | **NOTE**: Supported models on Killarney are tracked [here](./MODEL_TRACKING.md) |
17 | 17 |
|
@@ -49,7 +49,7 @@ You should see an output like the following: |
49 | 49 | * `--account`, `-A`: The Slurm account, this argument can be set to default by setting environment variable `VEC_INF_ACCOUNT`. |
50 | 50 | * `--work-dir`, `-D`: A working directory other than your home directory, this argument can be set to default by seeting environment variable `VEC_INF_WORK_DIR`. |
51 | 51 |
|
52 | | -Models that are already supported by `vec-inf` would be launched using the cached configuration (set in [slurm_vars.py](vec_inf/client/slurm_vars.py)) or [default configuration](vec_inf/config/models.yaml). You can override these values by providing additional parameters. Use `vec-inf launch --help` to see the full list of parameters that can be overriden. You can also launch your own custom model as long as the model architecture is supported by the underlying inference engine. For detailed instructions on how to customize your model launch, check out the [`launch` command section in User Guide](https://vectorinstitute.github.io/vector-inference/latest/user_guide/#launch-command) |
| 52 | +Models that are already supported by `vec-inf` would be launched using the cached configuration (set in [slurm_vars.py](vec_inf/client/slurm_vars.py)) or [default configuration](vec_inf/config/models.yaml). You can override these values by providing additional parameters. Use `vec-inf launch --help` to see the full list of parameters that can be overriden. You can also launch your own custom model as long as the model architecture is supported by the underlying inference engine. For detailed instructions on how to customize your model launch, check out the [`launch` command section in User Guide](https://vectorinstitute.github.io/vector-inference/latest/user_guide/#launch-command). During the launch process, relevant log files and scripts will be written to a log directory (default to `.vec-inf-logs` in your home directory), and a cache directory (`.vec-inf-cache`) will be created in your working directory (defaults to your home directory if not specified or required) for torch compile cache. |
53 | 53 |
|
54 | 54 | #### Other commands |
55 | 55 |
|
|
0 commit comments