Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 21 additions & 3 deletions docs/user-guide/order-accuracy/dine-in/get-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,16 @@ This guide walks you through installation, configuration, and first run of the D

- Docker 24.0+ with Compose V2
- Intel GPU with drivers installed
- 16 GB RAM minimum (32 GB recommended)
- 16 GB RAM minimum (64 GB recommended for production)
- 50 GB free disk space

> **Notes:**
> **KV Cache on iGPU / low-RAM systems:** 16 GB RAM is sufficient for **inference**.
> For first-time model export, a higher-memory host (48–64 GB) is recommended.
> On iGPU platforms, the KV cache is allocated from **system RAM** — set `export CACHE_SIZE=2`
> before running `setup_models.sh` to reduce KV cache to 2 GB (default is 4 GB).
> See [ovms-service/README.md — Tuning the KV Cache Size](https://github.com/intel-retail/order-accuracy/blob/main/ovms-service/README.md#tuning-the-kv-cache-size) for a full per-platform guide.

```bash
docker --version
docker compose version
Expand Down Expand Up @@ -154,7 +161,7 @@ Key variables:
| `BENCHMARK_WORKERS` | 1 | Concurrent workers |
| `BENCHMARK_DURATION` | 180 | Duration (seconds) |
| `BENCHMARK_TARGET_LATENCY_MS` | 25000 | Latency threshold (ms) |
| `TARGET_DEVICE` | GPU | Device: CPU, GPU, NPU |
| `TARGET_DEVICE` | GPU | Device: CPU, GPU |

### Stream Density Test

Expand All @@ -178,7 +185,7 @@ make plot-metrics # Generate visualisation plots

## Changing Inference Device

To switch between GPU, CPU, or NPU, update `TARGET_DEVICE` in `.env` and re-run model setup:
To switch between GPU and CPU, update `TARGET_DEVICE` in `.env` and re-run model setup:

```bash
# In .env
Expand Down Expand Up @@ -231,6 +238,17 @@ make help # All commands

---

## Pre-Deployment Checklist

- [ ] Docker and Docker Compose installed and working
- [ ] Intel GPU drivers installed and GPU visible to Docker
- [ ] Required ports available (7861, 8083, 8002, 8081, 8084)
- [ ] At least 50 GB free disk space
- [ ] **16 GB+ RAM available** (sufficient for inference; for first-time model export 48–64 GB recommended — export on a high-RAM host and copy `ovms-service/models/` to the target system)
- [ ] VLM model downloaded (`setup_models.sh` completed)
- [ ] `.env` file created (`make init-env`)
- [ ] Plate images placed in `images/` and `configs/orders.json` updated

## Next Steps

- [System Requirements](./get-started/system-requirements.md) - Check the requirements
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ dine-in/
│ └── inventory.json # Known menu items
├── images/ # Test plate images (user-supplied)
├── results/ # Benchmark output
├── docker-compose.yml
├── docker-compose.yaml
├── Dockerfile # python:3.13-slim base
├── Makefile
└── requirements.txt
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,35 @@ Hardware, software, and network requirements for deploying Dine-In Order Accurac

### Development / Single Station

| Component | Requirement |
| --------- | ---------------------------------------------- |
| CPU | 8+ cores |
| RAM | 16 GB |
| GPU | Intel Arc A770 (16 GB) or equivalent Intel GPU |
| Storage | 50 GB SSD |

### Production

| Component | Requirement |
| --------- | ------------------------------------------------- |
| CPU | 16+ cores |
| RAM | 32 GB |
| GPU | Intel Data Center GPU (for concurrent validation) |
| Storage | 200 GB NVMe SSD |

**GPU VRAM guidance:** The Qwen2.5-VL-7B INT8 model requires ~6–8 GB of VRAM.
| Component | Specification |
| --------- | -------------------------------------------------------------------------- |
| CPU | 8+ cores |
| RAM | 16 GB min; 64 GB recommended for production / heavy model export workloads |
| GPU | Intel® Arc™ A770 (16 GB) or equivalent Intel GPU |
| Storage | 50 GB SSD |

### Production / Multi-Station

| Component | Specification |
| --------- | -------------------------------------------------- |
| CPU | 16+ cores |
| RAM | 64 GB |
| GPU | Intel® Data Center GPU (for concurrent validation) |
| Storage | 200 GB NVMe SSD |

**GPU VRAM guidance:** The Qwen2.5-VL-7B INT8 model requires ~8 GB of VRAM.
The default `cache_size=4` reserves an additional 4 GB VRAM for the KV cache. Total VRAM needed
is around 12 GB, which fits in an Intel® Arc™ A770 16 GB. On **integrated GPU** (iGPU)
platforms such as Wildcat Lake and Meteor Lake, the KV cache is drawn from **system RAM**
instead of dedicated VRAM; in such a case, use a smaller value (e.g. `CACHE_SIZE=2`) to avoid
exhausting system RAM. Set `export CACHE_SIZE=<N>` before running `setup_models.sh`. For a
full per-platform sizing table and step-by-step instructions see [ovms-service/README.md — Tuning the KV Cache Size](https://github.com/intel-retail/order-accuracy/blob/main/ovms-service/README.md#tuning-the-kv-cache-size).

> **Model Export RAM Note:** 16 GB system RAM is sufficient for **inference-only**
> deployments. For first-time model export (`setup_models.sh` INT8 quantization), a
> higher-memory host (48–64 GB recommended) avoids potential OOM and corrupt IR files — export
> once there and copy `ovms-service/models/` to the target system. If you must export on 16 GB,
> set `export CACHE_SIZE=2` first. See [ovms-service/README.md — Tuning the KV Cache Size](https://github.com/intel-retail/order-accuracy/blob/main/ovms-service/README.md#tuning-the-kv-cache-size) for details.

## Software Requirements

Expand All @@ -41,7 +53,7 @@ Ubuntu 22.04 LTS is the validated platform (matches the `python:3.13-slim` base

### GPU Drivers

Intel GPU drivers must be installed from <https://dgpu-docs.intel.com/driver/installation.html>. Verify the GPU is accessible to Docker:
Intel GPU drivers must be installed from [packages.intel.com](https://packages.intel.com). Verify the GPU is accessible to Docker:

```bash
ls /dev/dri/
Expand All @@ -64,3 +76,11 @@ Expected output includes `GPU`.
| OVMS VLM | 8002 | Model inference (external) |
| Semantic Service | 8081 | Semantic matching (external) |
| Metrics Collector | 8084 | System metrics |

---

## Next Steps

- [Get Started](../get-started.md) - Set up and run the application
- [API Reference](../api-reference.md) - REST endpoint documentation
- [How to Build](./build-from-source.md) - Build from source code
4 changes: 2 additions & 2 deletions docs/user-guide/order-accuracy/dine-in/how-to-use.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Guide to using the Dine-In Order Accuracy application features.

> **Note — `TARGET_DEVICE`**: To change the inference device, set `TARGET_DEVICE` in `.env` to `GPU`, `CPU`, or `NPU`, then re-run setup:
> **Note — `TARGET_DEVICE`**: To change the inference device, set `TARGET_DEVICE` in `.env` to `GPU` or `CPU`, then re-run setup:
>
> ```bash
> cd ../ovms-service && ./setup_models.sh --app dine-in && cd ../dine-in
Expand Down Expand Up @@ -257,7 +257,7 @@ Configuration options:
| `BENCHMARK_INIT_DURATION` | 60 | Warmup time (seconds) |
| `BENCHMARK_MIN_REQUESTS` | 3 | Min requests before measuring |
| `BENCHMARK_REQUEST_TIMEOUT` | 300 | Request timeout (seconds) |
| `TARGET_DEVICE` | GPU | Target device: CPU, GPU, NPU |
| `TARGET_DEVICE` | GPU | Target device: CPU, GPU |
| `RESULTS_DIR` | results | Output directory |
| `REGISTRY` | false | Use registry images (true/false) |

Expand Down
26 changes: 23 additions & 3 deletions docs/user-guide/order-accuracy/take-away/get-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,19 @@ For detailed hardware and software requirements, see the [System Requirements](.
| Component | Minimum | Recommended |
| --------- | -------------------- | -------------------- |
| CPU | Intel Xeon 8 cores | Intel Xeon 16+ cores |
| RAM | 16GB | 32GB+ |
| RAM | 16GB | 64GB+ |
| GPU | Intel Arc A770 (8GB) | Intel Arc |
| Storage | 50GB SSD | 200GB NVMe |

> **Note:** **RAM note** 16 GB system RAM is sufficient for **inference**. For first-time model
> export (`setup_models.sh`), a higher-memory host (48–64 GB recommended) avoids potential OOM
> — export there and copy `ovms-service/models/` to the target system. 64 GB+ is recommended
> for production or multi-station deployments.

> **KV Cache on iGPU / low-RAM systems:** On iGPU platforms the KV cache is allocated from
> **system RAM**. Set `export CACHE_SIZE=2` before running `setup_models.sh` to reduce KV cache
> to 2 GB (default is 4 GB). See [ovms-service/README.md — Tuning the KV Cache Size](https://github.com/intel-retail/order-accuracy/blob/main/ovms-service/README.md#tuning-the-kv-cache-size) for a full per-platform guide.

### Software Requirements

| Software | Version | Purpose |
Expand Down Expand Up @@ -112,7 +121,7 @@ make up
VLM_BACKEND=ovms
OVMS_ENDPOINT=http://ovms-vlm:8000
OVMS_MODEL_NAME=Qwen/Qwen2.5-VL-7B-Instruct
TARGET_DEVICE=GPU # 'GPU', 'CPU', or 'NPU' — also set OPENVINO_DEVICE to match
TARGET_DEVICE=GPU # 'GPU' or 'CPU' — also set OPENVINO_DEVICE to match

# =============================================================================
# Inference Device (must match TARGET_DEVICE)
Expand All @@ -135,7 +144,7 @@ MINIO_ROOT_PASSWORD=<your-minio-password>
MINIO_ENDPOINT=minio:9000
```

> **Changing the inference device:** Set both `TARGET_DEVICE` and `OPENVINO_DEVICE` to the same value (`GPU`, `CPU`, or `NPU`), then re-run `./setup_models.sh` to re-export the model for that device.
> **Changing the inference device:** Set both `TARGET_DEVICE` and `OPENVINO_DEVICE` to the same value (`GPU` or `CPU`), then re-run `./setup_models.sh` to re-export the model for that device.

### Validate Configuration

Expand Down Expand Up @@ -312,6 +321,17 @@ make benchmark-stream-density # Run stream density benchmark

---

## Pre-Deployment Checklist

- [ ] Docker and Docker Compose installed and working
- [ ] Intel GPU drivers installed and GPU visible to Docker
- [ ] Required ports available (8000, 7860, 8001, 9000, 9001, 8080)
- [ ] At least 50 GB free disk space
- [ ] **16 GB+ RAM available** (sufficient for inference; for first-time model export 48–64 GB recommended — export on a high-RAM host and copy `ovms-service/models/` to the target system)
- [ ] VLM model downloaded (`setup_models.sh` completed)
- [ ] `.env` file configured
- [ ] Camera RTSP URLs accessible from host (parallel mode)

## Next Steps

- [System Requirements](./get-started/system-requirements.md) - Check the detailed requirements
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,25 +6,37 @@ Hardware, software, and network requirements for deploying Take-Away Order Accur

## Hardware Requirements

### Development / Single-Station
### Development / Single Station

| Component | Specification |
| ----------- | ---------------------------------------------- |
| **CPU** | 8+ cores |
| **RAM** | 16 GB |
| **GPU** | Intel Arc A770 (16 GB) or equivalent Intel GPU |
| **Storage** | 50 GB SSD |
| Component | Specification |
| ----------- | -------------------------------------------------------------------------- |
| **CPU** | 8+ cores |
| **RAM** | 16 GB min; 64 GB recommended for production / heavy model export workloads |
| **GPU** | Intel® Arc A770 (16 GB) or equivalent Intel GPU |
| **Storage** | 50 GB SSD |

### Production / Multi-Station

| Component | Specification |
| ----------- | -------------------------------------------------------------- |
| **CPU** | 16+ cores |
| **RAM** | 32 GB |
| **GPU** | Intel Data Center GPU Max (48 GB) — for 4+ concurrent stations |
| **Storage** | 200 GB NVMe SSD |

**GPU VRAM guidance:** The Qwen2.5-VL-7B INT8 model requires ~6–8 GB of VRAM. Reserve at least 8 GB for the VLM; additional VRAM headroom allows more concurrent requests.
| Component | Specification |
| ----------- | --------------------------------------------------------------- |
| **CPU** | 16+ cores |
| **RAM** | 64 GB |
| **GPU** | Intel® Data Center GPU Max (48 GB) — for 4+ concurrent stations |
| **Storage** | 200 GB NVMe SSD |

**GPU VRAM guidance:** The Qwen2.5-VL-7B INT8 model requires ~8 GB of VRAM.
The default `cache_size=4` reserves an additional 4 GB VRAM for the KV cache. Total VRAM needed
is around 12 GB, which fits in an Intel® Arc™ A770 16 GB. On **integrated GPU** (iGPU)
platforms such as Wildcat Lake and Meteor Lake, the KV cache is drawn from **system RAM**
instead of dedicated VRAM; in such a case, use a smaller value (e.g. `CACHE_SIZE=2`) to avoid
exhausting system RAM. Set `export CACHE_SIZE=<N>` before running `setup_models.sh`. For a
full per-platform sizing table and step-by-step instructions see [ovms-service/README.md — Tuning the KV Cache Size](https://github.com/intel-retail/order-accuracy/blob/main/ovms-service/README.md#tuning-the-kv-cache-size).

> **Model Export RAM Note:** 16 GB system RAM is sufficient for **inference-only**
> deployments. For first-time model export (`setup_models.sh` INT8 quantization), a
> higher-memory host (48–64 GB recommended) avoids potential OOM and corrupt IR files — export
> once there and copy `ovms-service/models/` to the target system. If you must export on 16 GB,
> set `export CACHE_SIZE=2` first. See [ovms-service/README.md — Tuning the KV Cache Size](https://github.com/intel-retail/order-accuracy/blob/main/ovms-service/README.md#tuning-the-kv-cache-size) for details.

---

Expand Down Expand Up @@ -77,3 +89,11 @@ Expected output includes `GPU`.
| Codec | H.264 |
| Resolution | 720p–1080p |
| Frame Rate | 15–30 FPS |

---

## Next Steps

- [Get Started](../get-started.md) - Set up and run the application
- [API Reference](../api-reference.md) - REST endpoint documentation
- [How to Build](./build-from-source.md) - Build from source code
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This guide covers performance testing, stream density benchmarking, and metrics collection for the Take-Away Order Accuracy system.

> **Note — Inference Device**: The default device is `GPU`. To switch to a different device (`CPU` or `NPU`), you must do **both** steps below, otherwise the model will be exported for the wrong device:
> **Note — Inference Device:** The default device is `GPU`. To switch to `CPU`, you must do **both** steps below, otherwise the model will be exported for the wrong device:
>
> 1. Set **both** variables in your `.env` file:
>
Expand Down
Loading