Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions content/patterns/rag-quickstart/_index.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
title: RAG AI Quickstart
date: 2026-05-13
tier: sandbox
summary: This pattern deploys the RAG AI Quickstart with test pipelines on CPU or GPU.
rh_products:
- Red Hat OpenShift Container Platform
- Red Hat OpenShift GitOps
- Red Hat OpenShift AI
industries:
- General
aliases: /rag-quickstart/
links:
github: https://github.com/validatedpatterns-sandbox/ai-quickstart-rag
install: getting-started
bugs: https://github.com/validatedpatterns-sandbox/ai-quickstart-rag/issues
feedback: https://docs.google.com/forms/d/e/1FAIpQLScI76b6tD1WyPu2-d_9CCVDr3Fu5jYERthqLKJDUGwqBg7Vcg/viewform
---
:toc:
:imagesdir: /images
:_content-type: ASSEMBLY
include::modules/comm-attributes.adoc[]

include::modules/rag-quickstart-about.adoc[leveloffset=+1]

include::modules/rag-quickstart-architecture.adoc[leveloffset=+1]

[id="next-steps-rag-quickstart"]
== Next steps

* link:rag-quickstart-getting-started[Install this pattern.]
13 changes: 13 additions & 0 deletions content/patterns/rag-quickstart/cluster-sizing.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: Cluster sizing
weight: 30
aliases: /rag-quickstart/cluster-sizing/
---

:toc:
:imagesdir: /images
:_content-type: ASSEMBLY
include::modules/comm-attributes.adoc[]
include::modules/ai-quickstart-rag/metadata-ai-quickstart-rag.adoc[]

include::modules/cluster-sizing-template.adoc[]
126 changes: 126 additions & 0 deletions content/patterns/rag-quickstart/customizing-this-pattern.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
---
title: Customizing this pattern
weight: 20
aliases: /rag-quickstart/customizing/
---

:toc:
:imagesdir: /images
:_content-type: ASSEMBLY
include::modules/comm-attributes.adoc[]

[id="customizing-rag-quickstart"]
== Customizing the RAG AI Quickstart pattern

Without any changes, this pattern runs a CPU-backed LLM and does not require a GPU. This can be limiting in terms of usable models as well as speed, so you might want to use a GPU instead.

[id="enabling-gpu"]
=== Enabling GPU support

To enable GPU support, set `global.device` to `gpu` in `values-global.yaml` and push your changes to GitHub. This adds NFD and the NVIDIA GPU Operator to the pattern installation and enables the models to run using an NVIDIA accelerator.

[NOTE]
====
If you are running this pattern on an OpenShift cluster on AWS, setting `global.device` to `gpu` automatically creates a GPU (`g6.2xlarge`) machine and add it as a worker node to your cluster.
====

[id="changing-models"]
=== Changing models

To update the models, edit `overrides/values-cpu.yaml` (if `global.device` is set to `cpu`) or `overrides/values-gpu.yaml` (if set to `gpu`).

The default CPU-based model is defined as follows:

[source,yaml]
----
global:
models:
llama-3-2-3b-instruct-cpu:
id: meta-llama/Llama-3.2-3B-Instruct
enabled: true
resources:
limits:
cpu: "6"
memory: 48Gi
requests:
cpu: "2"
memory: 24Gi
args:
- --enable-auto-tool-choice
- --chat-template
- /chat-templates/tool_chat_template_llama3.2_json.jinja
- --tool-call-parser
- llama3_json
- --dtype
- auto
- --max-model-len
- "16384"
- --max-num-seqs
- "1"
----

You can change this to any vLLM-compatible model that you have accepted the terms and conditions for with your HuggingFace API token. You can also adjust the resource parameters as needed for your environment.

The runtime defaults to `vllm/vllm-openai:v0.11.1`. If you need a later version, you can override the image:

[source,yaml]
----
llm-service:
deviceConfigs:
gpu:
image: vllm/vllm-openai:nightly
----

[NOTE]
====
The example above sets a GPU-specific container image. To override the CPU-based image instead, use the key `llm-service.deviceConfigs.cpu.image`.
====

[id="multiple-models"]
=== Defining multiple models

You can define multiple LLM models to be served simultaneously. For example:

[source,yaml]
----
global:
models:
deepseek-r1:
id: Valdemardi/DeepSeek-R1-Distill-Llama-70B-AWQ
enabled: true
resources:
limits:
cpu: "32"
memory: 200Gi
requests:
cpu: "24"
memory: 150Gi
args:
- --reasoning-parser
- deepseek_r1
- --tool-call-parser
- llama3_json
- --enable-auto-tool-choice
- --quantization
- awq_marlin
- --dtype
- float16
- --max-model-len
- "65536"
gpt-oss-120b:
id: openai/gpt-oss-120b
enabled: true
resources:
limits:
cpu: "32"
memory: 200Gi
requests:
cpu: "24"
memory: 150Gi
args:
- --tool-call-parser
- openai
- --enable-auto-tool-choice
----

For a complete list of customizable values, see the link:https://github.com/rh-ai-quickstart/ai-architecture-charts[AI Architecture charts] repository.
167 changes: 167 additions & 0 deletions content/patterns/rag-quickstart/getting-started.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
---
title: Getting started
weight: 10
aliases: /rag-quickstart/getting-started/
---

:toc:
:imagesdir: /images
:_content-type: ASSEMBLY
include::modules/comm-attributes.adoc[]

[id="deploying-rag-quickstart-pattern"]
== Deploying the RAG AI Quickstart pattern

.Prerequisites

* An OpenShift cluster (version 4.18 or later)
** To create an OpenShift cluster, go to the https://console.redhat.com/[Red Hat Hybrid Cloud console].
** Select *OpenShift \-> Red Hat OpenShift Container Platform \-> Create cluster*.
* A https://huggingface.co/[HuggingFace] account with an API token that has read permissions.
** You must accept the terms and conditions for the https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct[meta-llama/Llama-3.2-3B-Instruct] model with the account that the API token belongs to.
* The Helm binary. For instructions, see link:https://helm.sh/docs/intro/install/[Installing Helm].
* Additional installation tool dependencies. For details, see link:https://validatedpatterns.io/learn/quickstart/[Patterns quick start].

[id="preparing-for-deployment"]
== Preparing for deployment
.Procedure

. Fork the link:https://github.com/validatedpatterns-sandbox/ai-quickstart-rag[ai-quickstart-rag] repository on GitHub. You must fork the repository to customize this pattern.

. Clone the forked copy of this repository.
+
[source,terminal]
----
$ git clone git@github.com:your-username/ai-quickstart-rag.git
----

. Go to the root directory of your Git repository:
+
[source,terminal]
----
$ cd ai-quickstart-rag
----

. Run the following command to set the upstream repository:
+
[source,terminal]
----
$ git remote add -f upstream git@github.com:validatedpatterns-sandbox/ai-quickstart-rag.git
----

. Verify the setup of your remote repositories by running the following command:
+
[source,terminal]
----
$ git remote -v
----
+
.Example output
+
[source,terminal]
----
origin git@github.com:your-username/ai-quickstart-rag.git (fetch)
origin git@github.com:your-username/ai-quickstart-rag.git (push)
upstream git@github.com:validatedpatterns-sandbox/ai-quickstart-rag.git (fetch)
upstream git@github.com:validatedpatterns-sandbox/ai-quickstart-rag.git (push)
----

. Make a local copy of the secrets template outside of your repository to hold credentials for the pattern.
+
[WARNING]
====
Do not add, commit, or push this file to your repository. Doing so may expose personal credentials to GitHub.
====
+
Run the following command:
+
[source,terminal]
----
$ cp values-secret.yaml.template ~/values-secret-ai-quickstart-rag.yaml
----

. Populate this file with secrets, or credentials, that are needed to deploy the pattern successfully:
+
[source,terminal]
----
$ vim ~/values-secret-ai-quickstart-rag.yaml
----

.. Edit the `llm-service` section to use your HuggingFace API token:
+
[source,yaml]
----
- name: llm-service
fields:
- name: hf_token
value: hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
----

. Optional: To customize the deployment, create and switch to a new branch by running the following command:
+
[source,terminal]
----
$ git checkout -b my-branch
----
+
Make your changes, then stage and commit them:
+
[source,terminal]
----
$ git add <changed-files>
$ git commit -m "Customize deployment"
----
+
Push the changes to your forked repository:
+
[source,terminal]
----
$ git push origin my-branch
----

[id="deploying-cluster-using-patternsh-file"]
== Deploying the pattern by using the pattern.sh file

To deploy the pattern by using the `pattern.sh` file, complete the following steps:

. Log in to your cluster by following this procedure:

.. Obtain an API token by visiting link:https://oauth-openshift.apps.<your_cluster>.<domain>/oauth/token/request[https://oauth-openshift.apps.<your_cluster>.<domain>/oauth/token/request].

.. Log in to the cluster by running the following command:
+
[source,terminal]
----
$ oc login --token=<retrieved-token> --server=https://api.<your_cluster>.<domain>:6443
----
+
Or log in by running the following command:
+
[source,terminal]
----
$ export KUBECONFIG=~/<path_to_kubeconfig>
----

. Deploy the pattern to your cluster. Run the following command:
+
[source,terminal]
----
$ ./pattern.sh make install
----

.Verification

To verify a successful installation, check the health of the ArgoCD applications:

. Run the following command:
+
[source,terminal]
----
$ ./pattern.sh make argo-healthcheck
----
+
It might take several minutes for all applications to synchronize and reach a healthy state. This includes downloading the LLM models and populating the vector database.

. Verify that the Operators are installed by navigating to *Operators -> Installed Operators* in the {ocp} web console.

. After all applications are healthy, open the RAG chatbot UI by clicking the route link in the *Networking -> Routes* page of the `ai-quickstart-rag-prod` namespace.
Loading