From c4d71232770f7c7b57e255196b4010480226eeed Mon Sep 17 00:00:00 2001 From: Wiktor Iwaszko Date: Tue, 16 Jun 2026 13:31:28 +0200 Subject: [PATCH 1/2] [DOCS] Update docs to sync with OA repo --- .../order-accuracy/dine-in/get-started.md | 13 ++++- .../dine-in/get-started/build-from-source.md | 2 +- .../get-started/system-requirements.md | 56 +++++++++++++------ .../order-accuracy/dine-in/how-to-use.md | 4 +- .../order-accuracy/take-away/get-started.md | 15 ++++- .../get-started/system-requirements.md | 50 ++++++++++++----- .../take-away/ta-benchmarking.md | 2 +- 7 files changed, 99 insertions(+), 43 deletions(-) diff --git a/docs/user-guide/order-accuracy/dine-in/get-started.md b/docs/user-guide/order-accuracy/dine-in/get-started.md index 9a8a75a..2b6c19f 100644 --- a/docs/user-guide/order-accuracy/dine-in/get-started.md +++ b/docs/user-guide/order-accuracy/dine-in/get-started.md @@ -8,9 +8,16 @@ This guide walks you through installation, configuration, and first run of the D - Docker 24.0+ with Compose V2 - Intel GPU with drivers installed -- 16 GB RAM minimum (32 GB recommended) +- 16 GB RAM minimum (64 GB recommended for production) - 50 GB free disk space +> **Notes:** +> **KV Cache on iGPU / low-RAM systems:** 16 GB RAM is sufficient for **inference**. +> For first-time model export, a higher-memory host (48–64 GB) is recommended. +> On iGPU platforms, the KV cache is allocated from **system RAM** — set `export CACHE_SIZE=2` +> before running `setup_models.sh` to reduce KV cache to 2 GB (default is 4 GB). +> See [ovms-service/README.md — Tuning the KV Cache Size](https://github.com/intel-retail/order-accuracy/blob/main/ovms-service/README.md#tuning-the-kv-cache-size) for a full per-platform guide. + ```bash docker --version docker compose version @@ -154,7 +161,7 @@ Key variables: | `BENCHMARK_WORKERS` | 1 | Concurrent workers | | `BENCHMARK_DURATION` | 180 | Duration (seconds) | | `BENCHMARK_TARGET_LATENCY_MS` | 25000 | Latency threshold (ms) | -| `TARGET_DEVICE` | GPU | Device: CPU, GPU, NPU | +| `TARGET_DEVICE` | GPU | Device: CPU, GPU | ### Stream Density Test @@ -178,7 +185,7 @@ make plot-metrics # Generate visualisation plots ## Changing Inference Device -To switch between GPU, CPU, or NPU, update `TARGET_DEVICE` in `.env` and re-run model setup: +To switch between GPU and CPU, update `TARGET_DEVICE` in `.env` and re-run model setup: ```bash # In .env diff --git a/docs/user-guide/order-accuracy/dine-in/get-started/build-from-source.md b/docs/user-guide/order-accuracy/dine-in/get-started/build-from-source.md index 8f97552..b981be5 100644 --- a/docs/user-guide/order-accuracy/dine-in/get-started/build-from-source.md +++ b/docs/user-guide/order-accuracy/dine-in/get-started/build-from-source.md @@ -35,7 +35,7 @@ dine-in/ │ └── inventory.json # Known menu items ├── images/ # Test plate images (user-supplied) ├── results/ # Benchmark output -├── docker-compose.yml +├── docker-compose.yaml ├── Dockerfile # python:3.13-slim base ├── Makefile └── requirements.txt diff --git a/docs/user-guide/order-accuracy/dine-in/get-started/system-requirements.md b/docs/user-guide/order-accuracy/dine-in/get-started/system-requirements.md index 3f193eb..2a16316 100644 --- a/docs/user-guide/order-accuracy/dine-in/get-started/system-requirements.md +++ b/docs/user-guide/order-accuracy/dine-in/get-started/system-requirements.md @@ -8,23 +8,35 @@ Hardware, software, and network requirements for deploying Dine-In Order Accurac ### Development / Single Station -| Component | Requirement | -| --------- | ---------------------------------------------- | -| CPU | 8+ cores | -| RAM | 16 GB | -| GPU | Intel Arc A770 (16 GB) or equivalent Intel GPU | -| Storage | 50 GB SSD | - -### Production - -| Component | Requirement | -| --------- | ------------------------------------------------- | -| CPU | 16+ cores | -| RAM | 32 GB | -| GPU | Intel Data Center GPU (for concurrent validation) | -| Storage | 200 GB NVMe SSD | - -**GPU VRAM guidance:** The Qwen2.5-VL-7B INT8 model requires ~6–8 GB of VRAM. +| Component | Specification | +| --------- | -------------------------------------------------------------------------- | +| CPU | 8+ cores | +| RAM | 16 GB min; 64 GB recommended for production / heavy model export workloads | +| GPU | Intel® Arc™ A770 (16 GB) or equivalent Intel GPU | +| Storage | 50 GB SSD | + +### Production / Multi-Station + +| Component | Specification | +| --------- | -------------------------------------------------- | +| CPU | 16+ cores | +| RAM | 64 GB | +| GPU | Intel® Data Center GPU (for concurrent validation) | +| Storage | 200 GB NVMe SSD | + +**GPU VRAM guidance:** The Qwen2.5-VL-7B INT8 model requires ~8 GB of VRAM. +The default `cache_size=4` reserves an additional 4 GB VRAM for the KV cache. Total VRAM needed +is around 12 GB, which fits in an Intel® Arc™ A770 16 GB. On **integrated GPU** (iGPU) +platforms such as Wildcat Lake and Meteor Lake, the KV cache is drawn from **system RAM** +instead of dedicated VRAM; in such a case, use a smaller value (e.g. `CACHE_SIZE=2`) to avoid +exhausting system RAM. Set `export CACHE_SIZE=` before running `setup_models.sh`. For a +full per-platform sizing table and step-by-step instructions see [ovms-service/README.md — Tuning the KV Cache Size](https://github.com/intel-retail/order-accuracy/blob/main/ovms-service/README.md#tuning-the-kv-cache-size). + +> **Model Export RAM Note:** 16 GB system RAM is sufficient for **inference-only** +> deployments. For first-time model export (`setup_models.sh` INT8 quantization), a +> higher-memory host (48–64 GB recommended) avoids potential OOM and corrupt IR files — export +> once there and copy `ovms-service/models/` to the target system. If you must export on 16 GB, +> set `export CACHE_SIZE=2` first. See [ovms-service/README.md — Tuning the KV Cache Size](https://github.com/intel-retail/order-accuracy/blob/main/ovms-service/README.md#tuning-the-kv-cache-size) for details. ## Software Requirements @@ -41,7 +53,7 @@ Ubuntu 22.04 LTS is the validated platform (matches the `python:3.13-slim` base ### GPU Drivers -Intel GPU drivers must be installed from . Verify the GPU is accessible to Docker: +Intel GPU drivers must be installed from [packages.intel.com](https://packages.intel.com). Verify the GPU is accessible to Docker: ```bash ls /dev/dri/ @@ -64,3 +76,11 @@ Expected output includes `GPU`. | OVMS VLM | 8002 | Model inference (external) | | Semantic Service | 8081 | Semantic matching (external) | | Metrics Collector | 8084 | System metrics | + +--- + +## Next Steps + +- [Get Started](../get-started.md) - Set up and run the application +- [API Reference](../api-reference.md) - REST endpoint documentation +- [How to Build](./build-from-source.md) - Build from source code diff --git a/docs/user-guide/order-accuracy/dine-in/how-to-use.md b/docs/user-guide/order-accuracy/dine-in/how-to-use.md index 25ca6e3..422eb5f 100644 --- a/docs/user-guide/order-accuracy/dine-in/how-to-use.md +++ b/docs/user-guide/order-accuracy/dine-in/how-to-use.md @@ -2,7 +2,7 @@ Guide to using the Dine-In Order Accuracy application features. -> **Note — `TARGET_DEVICE`**: To change the inference device, set `TARGET_DEVICE` in `.env` to `GPU`, `CPU`, or `NPU`, then re-run setup: +> **Note — `TARGET_DEVICE`**: To change the inference device, set `TARGET_DEVICE` in `.env` to `GPU` or `CPU`, then re-run setup: > > ```bash > cd ../ovms-service && ./setup_models.sh --app dine-in && cd ../dine-in @@ -257,7 +257,7 @@ Configuration options: | `BENCHMARK_INIT_DURATION` | 60 | Warmup time (seconds) | | `BENCHMARK_MIN_REQUESTS` | 3 | Min requests before measuring | | `BENCHMARK_REQUEST_TIMEOUT` | 300 | Request timeout (seconds) | -| `TARGET_DEVICE` | GPU | Target device: CPU, GPU, NPU | +| `TARGET_DEVICE` | GPU | Target device: CPU, GPU | | `RESULTS_DIR` | results | Output directory | | `REGISTRY` | false | Use registry images (true/false) | diff --git a/docs/user-guide/order-accuracy/take-away/get-started.md b/docs/user-guide/order-accuracy/take-away/get-started.md index 6051177..b537887 100644 --- a/docs/user-guide/order-accuracy/take-away/get-started.md +++ b/docs/user-guide/order-accuracy/take-away/get-started.md @@ -25,10 +25,19 @@ For detailed hardware and software requirements, see the [System Requirements](. | Component | Minimum | Recommended | | --------- | -------------------- | -------------------- | | CPU | Intel Xeon 8 cores | Intel Xeon 16+ cores | -| RAM | 16GB | 32GB+ | +| RAM | 16GB | 64GB+ | | GPU | Intel Arc A770 (8GB) | Intel Arc | | Storage | 50GB SSD | 200GB NVMe | +> **Note:** **RAM note** 16 GB system RAM is sufficient for **inference**. For first-time model +> export (`setup_models.sh`), a higher-memory host (48–64 GB recommended) avoids potential OOM +> — export there and copy `ovms-service/models/` to the target system. 64 GB+ is recommended +> for production or multi-station deployments. + +> **KV Cache on iGPU / low-RAM systems:** On iGPU platforms the KV cache is allocated from +> **system RAM**. Set `export CACHE_SIZE=2` before running `setup_models.sh` to reduce KV cache +> to 2 GB (default is 4 GB). See [ovms-service/README.md — Tuning the KV Cache Size](https://github.com/intel-retail/order-accuracy/blob/main/ovms-service/README.md#tuning-the-kv-cache-size) for a full per-platform guide. + ### Software Requirements | Software | Version | Purpose | @@ -112,7 +121,7 @@ make up VLM_BACKEND=ovms OVMS_ENDPOINT=http://ovms-vlm:8000 OVMS_MODEL_NAME=Qwen/Qwen2.5-VL-7B-Instruct -TARGET_DEVICE=GPU # 'GPU', 'CPU', or 'NPU' — also set OPENVINO_DEVICE to match +TARGET_DEVICE=GPU # 'GPU' or 'CPU' — also set OPENVINO_DEVICE to match # ============================================================================= # Inference Device (must match TARGET_DEVICE) @@ -135,7 +144,7 @@ MINIO_ROOT_PASSWORD= MINIO_ENDPOINT=minio:9000 ``` -> **Changing the inference device:** Set both `TARGET_DEVICE` and `OPENVINO_DEVICE` to the same value (`GPU`, `CPU`, or `NPU`), then re-run `./setup_models.sh` to re-export the model for that device. +> **Changing the inference device:** Set both `TARGET_DEVICE` and `OPENVINO_DEVICE` to the same value (`GPU` or `CPU`), then re-run `./setup_models.sh` to re-export the model for that device. ### Validate Configuration diff --git a/docs/user-guide/order-accuracy/take-away/get-started/system-requirements.md b/docs/user-guide/order-accuracy/take-away/get-started/system-requirements.md index 8e9b0c5..c008bba 100644 --- a/docs/user-guide/order-accuracy/take-away/get-started/system-requirements.md +++ b/docs/user-guide/order-accuracy/take-away/get-started/system-requirements.md @@ -6,25 +6,37 @@ Hardware, software, and network requirements for deploying Take-Away Order Accur ## Hardware Requirements -### Development / Single-Station +### Development / Single Station -| Component | Specification | -| ----------- | ---------------------------------------------- | -| **CPU** | 8+ cores | -| **RAM** | 16 GB | -| **GPU** | Intel Arc A770 (16 GB) or equivalent Intel GPU | -| **Storage** | 50 GB SSD | +| Component | Specification | +| ----------- | -------------------------------------------------------------------------- | +| **CPU** | 8+ cores | +| **RAM** | 16 GB min; 64 GB recommended for production / heavy model export workloads | +| **GPU** | Intel® Arc™ A770 (16 GB) or equivalent Intel GPU | +| **Storage** | 50 GB SSD | ### Production / Multi-Station -| Component | Specification | -| ----------- | -------------------------------------------------------------- | -| **CPU** | 16+ cores | -| **RAM** | 32 GB | -| **GPU** | Intel Data Center GPU Max (48 GB) — for 4+ concurrent stations | -| **Storage** | 200 GB NVMe SSD | - -**GPU VRAM guidance:** The Qwen2.5-VL-7B INT8 model requires ~6–8 GB of VRAM. Reserve at least 8 GB for the VLM; additional VRAM headroom allows more concurrent requests. +| Component | Specification | +| ----------- | --------------------------------------------------------------- | +| **CPU** | 16+ cores | +| **RAM** | 64 GB | +| **GPU** | Intel® Data Center GPU Max (48 GB) — for 4+ concurrent stations | +| **Storage** | 200 GB NVMe SSD | + +**GPU VRAM guidance:** The Qwen2.5-VL-7B INT8 model requires ~8 GB of VRAM. +The default `cache_size=4` reserves an additional 4 GB VRAM for the KV cache. Total VRAM needed +is around 12 GB, which fits in an Intel® Arc™ A770 16 GB. On **integrated GPU** (iGPU) +platforms such as Wildcat Lake and Meteor Lake, the KV cache is drawn from **system RAM** +instead of dedicated VRAM; in such a case, use a smaller value (e.g. `CACHE_SIZE=2`) to avoid +exhausting system RAM. Set `export CACHE_SIZE=` before running `setup_models.sh`. For a +full per-platform sizing table and step-by-step instructions see [ovms-service/README.md — Tuning the KV Cache Size](https://github.com/intel-retail/order-accuracy/blob/main/ovms-service/README.md#tuning-the-kv-cache-size). + +> **Model Export RAM Note:** 16 GB system RAM is sufficient for **inference-only** +> deployments. For first-time model export (`setup_models.sh` INT8 quantization), a +> higher-memory host (48–64 GB recommended) avoids potential OOM and corrupt IR files — export +> once there and copy `ovms-service/models/` to the target system. If you must export on 16 GB, +> set `export CACHE_SIZE=2` first. See [ovms-service/README.md — Tuning the KV Cache Size](https://github.com/intel-retail/order-accuracy/blob/main/ovms-service/README.md#tuning-the-kv-cache-size) for details. --- @@ -77,3 +89,11 @@ Expected output includes `GPU`. | Codec | H.264 | | Resolution | 720p–1080p | | Frame Rate | 15–30 FPS | + +--- + +## Next Steps + +- [Get Started](../get-started.md) - Set up and run the application +- [API Reference](../api-reference.md) - REST endpoint documentation +- [How to Build](./build-from-source.md) - Build from source code diff --git a/docs/user-guide/order-accuracy/take-away/ta-benchmarking.md b/docs/user-guide/order-accuracy/take-away/ta-benchmarking.md index 87a3f36..6cf9a2c 100644 --- a/docs/user-guide/order-accuracy/take-away/ta-benchmarking.md +++ b/docs/user-guide/order-accuracy/take-away/ta-benchmarking.md @@ -2,7 +2,7 @@ This guide covers performance testing, stream density benchmarking, and metrics collection for the Take-Away Order Accuracy system. -> **Note — Inference Device**: The default device is `GPU`. To switch to a different device (`CPU` or `NPU`), you must do **both** steps below, otherwise the model will be exported for the wrong device: +> **Note — Inference Device:** The default device is `GPU`. To switch to `CPU`, you must do **both** steps below, otherwise the model will be exported for the wrong device: > > 1. Set **both** variables in your `.env` file: > From 2235470dc5ba28001c86582777db791f526d8a40 Mon Sep 17 00:00:00 2001 From: Wiktor Iwaszko Date: Wed, 17 Jun 2026 07:22:04 +0200 Subject: [PATCH 2/2] Add pre-deployment checklist from arch-and-sys-req to GS --- docs/user-guide/order-accuracy/dine-in/get-started.md | 11 +++++++++++ .../order-accuracy/take-away/get-started.md | 11 +++++++++++ 2 files changed, 22 insertions(+) diff --git a/docs/user-guide/order-accuracy/dine-in/get-started.md b/docs/user-guide/order-accuracy/dine-in/get-started.md index 2b6c19f..1061a85 100644 --- a/docs/user-guide/order-accuracy/dine-in/get-started.md +++ b/docs/user-guide/order-accuracy/dine-in/get-started.md @@ -238,6 +238,17 @@ make help # All commands --- +## Pre-Deployment Checklist + +- [ ] Docker and Docker Compose installed and working +- [ ] Intel GPU drivers installed and GPU visible to Docker +- [ ] Required ports available (7861, 8083, 8002, 8081, 8084) +- [ ] At least 50 GB free disk space +- [ ] **16 GB+ RAM available** (sufficient for inference; for first-time model export 48–64 GB recommended — export on a high-RAM host and copy `ovms-service/models/` to the target system) +- [ ] VLM model downloaded (`setup_models.sh` completed) +- [ ] `.env` file created (`make init-env`) +- [ ] Plate images placed in `images/` and `configs/orders.json` updated + ## Next Steps - [System Requirements](./get-started/system-requirements.md) - Check the requirements diff --git a/docs/user-guide/order-accuracy/take-away/get-started.md b/docs/user-guide/order-accuracy/take-away/get-started.md index b537887..e584943 100644 --- a/docs/user-guide/order-accuracy/take-away/get-started.md +++ b/docs/user-guide/order-accuracy/take-away/get-started.md @@ -321,6 +321,17 @@ make benchmark-stream-density # Run stream density benchmark --- +## Pre-Deployment Checklist + +- [ ] Docker and Docker Compose installed and working +- [ ] Intel GPU drivers installed and GPU visible to Docker +- [ ] Required ports available (8000, 7860, 8001, 9000, 9001, 8080) +- [ ] At least 50 GB free disk space +- [ ] **16 GB+ RAM available** (sufficient for inference; for first-time model export 48–64 GB recommended — export on a high-RAM host and copy `ovms-service/models/` to the target system) +- [ ] VLM model downloaded (`setup_models.sh` completed) +- [ ] `.env` file configured +- [ ] Camera RTSP URLs accessible from host (parallel mode) + ## Next Steps - [System Requirements](./get-started/system-requirements.md) - Check the detailed requirements