validatedpatterns · dminnear-rh · May 13, 2026
diff --git a/content/patterns/rag-quickstart/_index.adoc b/content/patterns/rag-quickstart/_index.adoc
@@ -0,0 +1,31 @@
+---
+title: RAG AI Quickstart
+date: 2026-05-13
+tier: sandbox
+summary: This pattern deploys the RAG AI Quickstart with test pipelines on CPU or GPU.
+rh_products:
+  - Red Hat OpenShift Container Platform
+  - Red Hat OpenShift GitOps
+  - Red Hat OpenShift AI
+industries:
+  - General
+aliases: /rag-quickstart/
+links:
+  github: https://github.com/validatedpatterns-sandbox/ai-quickstart-rag
+  install: getting-started
+  bugs: https://github.com/validatedpatterns-sandbox/ai-quickstart-rag/issues
+  feedback: https://docs.google.com/forms/d/e/1FAIpQLScI76b6tD1WyPu2-d_9CCVDr3Fu5jYERthqLKJDUGwqBg7Vcg/viewform
+---
+:toc:
+:imagesdir: /images
+:_content-type: ASSEMBLY
+include::modules/comm-attributes.adoc[]
+
+include::modules/rag-quickstart-about.adoc[leveloffset=+1]
+
+include::modules/rag-quickstart-architecture.adoc[leveloffset=+1]
+
+[id="next-steps-rag-quickstart"]
+== Next steps
+
+* link:rag-quickstart-getting-started[Install this pattern.]
diff --git a/content/patterns/rag-quickstart/cluster-sizing.adoc b/content/patterns/rag-quickstart/cluster-sizing.adoc
@@ -0,0 +1,13 @@
+---
+title: Cluster sizing
+weight: 30
+aliases: /rag-quickstart/cluster-sizing/
+---
+
+:toc:
+:imagesdir: /images
+:_content-type: ASSEMBLY
+include::modules/comm-attributes.adoc[]
+include::modules/ai-quickstart-rag/metadata-ai-quickstart-rag.adoc[]
+
+include::modules/cluster-sizing-template.adoc[]
diff --git a/content/patterns/rag-quickstart/customizing-this-pattern.adoc b/content/patterns/rag-quickstart/customizing-this-pattern.adoc
@@ -0,0 +1,126 @@
+---
+title: Customizing this pattern
+weight: 20
+aliases: /rag-quickstart/customizing/
+---
+
+:toc:
+:imagesdir: /images
+:_content-type: ASSEMBLY
+include::modules/comm-attributes.adoc[]
+
+[id="customizing-rag-quickstart"]
+== Customizing the RAG AI Quickstart pattern
+
+Without any changes, this pattern runs a CPU-backed LLM and does not require a GPU. This can be limiting in terms of usable models as well as speed, so you might want to use a GPU instead.
+
+[id="enabling-gpu"]
+=== Enabling GPU support
+
+To enable GPU support, set `global.device` to `gpu` in `values-global.yaml` and push your changes to GitHub. This adds NFD and the NVIDIA GPU Operator to the pattern installation and enables the models to run using an NVIDIA accelerator.
+
+[NOTE]
+====
+If you are running this pattern on an OpenShift cluster on AWS, setting `global.device` to `gpu` automatically creates a GPU (`g6.2xlarge`) machine and add it as a worker node to your cluster.
+====
+
+[id="changing-models"]
+=== Changing models
+
+To update the models, edit `overrides/values-cpu.yaml` (if `global.device` is set to `cpu`) or `overrides/values-gpu.yaml` (if set to `gpu`).
+
+The default CPU-based model is defined as follows:
+
+[source,yaml]
+----
+global:
+  models:
+    llama-3-2-3b-instruct-cpu:
+      id: meta-llama/Llama-3.2-3B-Instruct
+      enabled: true
+      resources:
+        limits:
+          cpu: "6"
+          memory: 48Gi
+        requests:
+          cpu: "2"
+          memory: 24Gi
+      args:
+        - --enable-auto-tool-choice
+        - --chat-template
+        - /chat-templates/tool_chat_template_llama3.2_json.jinja
+        - --tool-call-parser
+        - llama3_json
+        - --dtype
+        - auto
+        - --max-model-len
+        - "16384"
+        - --max-num-seqs
+        - "1"
+----
+
+You can change this to any vLLM-compatible model that you have accepted the terms and conditions for with your HuggingFace API token. You can also adjust the resource parameters as needed for your environment.
+
+The runtime defaults to `vllm/vllm-openai:v0.11.1`. If you need a later version, you can override the image:
+
+[source,yaml]
+----
+llm-service:
+  deviceConfigs:
+    gpu:
+      image: vllm/vllm-openai:nightly
+----
+
+[NOTE]
+====
+The example above sets a GPU-specific container image. To override the CPU-based image instead, use the key `llm-service.deviceConfigs.cpu.image`.
+====
+
+[id="multiple-models"]
+=== Defining multiple models
+
+You can define multiple LLM models to be served simultaneously. For example:
+
+[source,yaml]
+----
+global:
+  models:
+    deepseek-r1:
+      id: Valdemardi/DeepSeek-R1-Distill-Llama-70B-AWQ
+      enabled: true
+      resources:
+        limits:
+          cpu: "32"
+          memory: 200Gi
+        requests:
+          cpu: "24"
+          memory: 150Gi
+      args:
+        - --reasoning-parser
+        - deepseek_r1
+        - --tool-call-parser
+        - llama3_json
+        - --enable-auto-tool-choice
+        - --quantization
+        - awq_marlin
+        - --dtype
+        - float16
+        - --max-model-len
+        - "65536"
+    gpt-oss-120b:
+      id: openai/gpt-oss-120b
+      enabled: true
+      resources:
+        limits:
+          cpu: "32"
+          memory: 200Gi
+        requests:
+          cpu: "24"
+          memory: 150Gi
+      args:
+        - --tool-call-parser
+        - openai
+        - --enable-auto-tool-choice
+----
+
+For a complete list of customizable values, see the link:https://github.com/rh-ai-quickstart/ai-architecture-charts[AI Architecture charts] repository.
diff --git a/content/patterns/rag-quickstart/getting-started.adoc b/content/patterns/rag-quickstart/getting-started.adoc
@@ -0,0 +1,167 @@
+---
+title: Getting started
+weight: 10
+aliases: /rag-quickstart/getting-started/
+---
+
+:toc:
+:imagesdir: /images
+:_content-type: ASSEMBLY
+include::modules/comm-attributes.adoc[]
+
+[id="deploying-rag-quickstart-pattern"]
+== Deploying the RAG AI Quickstart pattern
+
+.Prerequisites
+
+* An OpenShift cluster (version 4.18 or later)
+ ** To create an OpenShift cluster, go to the https://console.redhat.com/[Red Hat Hybrid Cloud console].
+ ** Select *OpenShift \-> Red Hat OpenShift Container Platform \-> Create cluster*.
+* A https://huggingface.co/[HuggingFace] account with an API token that has read permissions.
+ ** You must accept the terms and conditions for the https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct[meta-llama/Llama-3.2-3B-Instruct] model with the account that the API token belongs to.
+* The Helm binary. For instructions, see link:https://helm.sh/docs/intro/install/[Installing Helm].
+* Additional installation tool dependencies. For details, see link:https://validatedpatterns.io/learn/quickstart/[Patterns quick start].
+
+[id="preparing-for-deployment"]
+== Preparing for deployment
+.Procedure
+
+. Fork the link:https://github.com/validatedpatterns-sandbox/ai-quickstart-rag[ai-quickstart-rag] repository on GitHub. You must fork the repository to customize this pattern.
+
+. Clone the forked copy of this repository.
++
+[source,terminal]
+----
+$ git clone git@github.com:your-username/ai-quickstart-rag.git
+----
+
+. Go to the root directory of your Git repository:
++
+[source,terminal]
+----
+$ cd ai-quickstart-rag
+----
+
+. Run the following command to set the upstream repository:
++
+[source,terminal]
+----
+$ git remote add -f upstream git@github.com:validatedpatterns-sandbox/ai-quickstart-rag.git
+----
+
+. Verify the setup of your remote repositories by running the following command:
++
+[source,terminal]
+----
+$ git remote -v
+----
++
+.Example output
++
+[source,terminal]
+----
+origin	git@github.com:your-username/ai-quickstart-rag.git (fetch)
+origin	git@github.com:your-username/ai-quickstart-rag.git (push)
+upstream	git@github.com:validatedpatterns-sandbox/ai-quickstart-rag.git (fetch)
+upstream	git@github.com:validatedpatterns-sandbox/ai-quickstart-rag.git (push)
+----
+
+. Make a local copy of the secrets template outside of your repository to hold credentials for the pattern.
++
+[WARNING]
+====
+Do not add, commit, or push this file to your repository. Doing so may expose personal credentials to GitHub.
+====
++
+Run the following command:
++
+[source,terminal]
+----
+$ cp values-secret.yaml.template ~/values-secret-ai-quickstart-rag.yaml
+----
+
+. Populate this file with secrets, or credentials, that are needed to deploy the pattern successfully:
++
+[source,terminal]
+----
+$ vim ~/values-secret-ai-quickstart-rag.yaml
+----
+
+.. Edit the `llm-service` section to use your HuggingFace API token:
++
+[source,yaml]
+----
+  - name: llm-service
+    fields:
+    - name: hf_token
+      value: hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+----
+
+. Optional: To customize the deployment, create and switch to a new branch by running the following command:
++
+[source,terminal]
+----
+$ git checkout -b my-branch
+----
++
+Make your changes, then stage and commit them:
++
+[source,terminal]
+----
+$ git add <changed-files>
+$ git commit -m "Customize deployment"
+----
++
+Push the changes to your forked repository:
++
+[source,terminal]
+----
+$ git push origin my-branch
+----
+
+[id="deploying-cluster-using-patternsh-file"]
+== Deploying the pattern by using the pattern.sh file
+
+To deploy the pattern by using the `pattern.sh` file, complete the following steps:
+
+. Log in to your cluster by following this procedure:
+
+.. Obtain an API token by visiting link:https://oauth-openshift.apps.<your_cluster>.<domain>/oauth/token/request[https://oauth-openshift.apps.<your_cluster>.<domain>/oauth/token/request].
+
+.. Log in to the cluster by running the following command:
++
+[source,terminal]
+----
+$ oc login --token=<retrieved-token> --server=https://api.<your_cluster>.<domain>:6443
+----
++
+Or log in by running the following command:
++
+[source,terminal]
+----
+$ export KUBECONFIG=~/<path_to_kubeconfig>
+----
+
+. Deploy the pattern to your cluster. Run the following command:
++
+[source,terminal]
+----
+$ ./pattern.sh make install
+----
+
+.Verification
+
+To verify a successful installation, check the health of the ArgoCD applications:
+
+. Run the following command:
++
+[source,terminal]
+----
+$ ./pattern.sh make argo-healthcheck
+----
++
+It might take several minutes for all applications to synchronize and reach a healthy state. This includes downloading the LLM models and populating the vector database.
+
+. Verify that the Operators are installed by navigating to *Operators -> Installed Operators* in the {ocp} web console.
+
+. After all applications are healthy, open the RAG chatbot UI by clicking the route link in the *Networking -> Routes* page of the `ai-quickstart-rag-prod` namespace.