From dd02eabb98e3e1b1a4460081ace9a623c1ab419c Mon Sep 17 00:00:00 2001 From: Nitin Kanukolanu Date: Tue, 9 Jun 2026 19:20:09 -0400 Subject: [PATCH 1/4] docs: add MCP user-guide notebook for search, upsert, and ADK agent Add docs/user_guide/15_mcp.ipynb, a hands-on guide that creates and loads a Redis index, writes and validates an MCP config, starts the RedisVL MCP server over Streamable HTTP, exercises the search-records and upsert-records tools from an MCP client, and wires the same server to a Google ADK agent. Register the notebook in the how-to guides index (card, quick reference, and toctree). --- docs/user_guide/15_mcp.ipynb | 657 +++++++++++++++++++++++++ docs/user_guide/how_to_guides/index.md | 3 + 2 files changed, 660 insertions(+) create mode 100644 docs/user_guide/15_mcp.ipynb diff --git a/docs/user_guide/15_mcp.ipynb b/docs/user_guide/15_mcp.ipynb new file mode 100644 index 00000000..9f390039 --- /dev/null +++ b/docs/user_guide/15_mcp.ipynb @@ -0,0 +1,657 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Serve an Index over MCP\n", + "\n", + "The [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) is an open standard that lets AI agents discover and call external tools through one uniform interface. RedisVL ships an MCP server, the `rvl mcp` command, that exposes a single existing Redis index to any MCP client through two tools:\n", + "\n", + "- **`search-records`** : semantic, full-text, or hybrid retrieval over the index.\n", + "- **`upsert-records`** : add or overwrite records in the index.\n", + "\n", + "The server owns the embedding model, so clients only ever send **text**. No raw vectors cross the client boundary, retrieval behavior lives entirely in one config file, and the same index can be shared with ADK, Claude Desktop, Cursor, or any other MCP client with zero custom code.\n", + "\n", + "This guide walks the full loop end to end:\n", + "\n", + "1. Create and load a Redis index.\n", + "2. Write the MCP config that binds the server to that index.\n", + "3. Start the RedisVL MCP server over Streamable HTTP.\n", + "4. Call `search-records` and `upsert-records` from a plain MCP client.\n", + "5. Wire the same server to a [Google ADK](https://google.github.io/adk-docs/) agent so a model can retrieve and write knowledge through MCP.\n", + "\n", + "## Prerequisites\n", + "\n", + "Before you begin, ensure you have:\n", + "- A running Redis instance ([Redis 8+](https://redis.io/downloads/) or [Redis Cloud](https://redis.io/cloud)) with the Search capability.\n", + "- [`uv`](https://docs.astral.sh/uv/) on your `PATH` (the server is launched with `uvx`/`rvl`).\n", + "\n", + "For the complete config schema and tool contracts, see the [Run RedisVL MCP](how_to_guides/mcp.md) how-to guide.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Install Packages\n", + "\n", + "The MCP server lives behind the `mcp` extra. We also install the `sentence-transformers` extra so query and record embedding can run locally with no API key.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -q \"redisvl[mcp,sentence-transformers]>=0.20.0\" nest_asyncio pandas" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Connect to Redis\n", + "\n", + "The MCP server reads from a normal Redis URL. We use the same URL here to create the index it will serve.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import warnings\n", + "\n", + "import nest_asyncio\n", + "import pandas as pd\n", + "\n", + "# Notebook event loops are already running; this lets the MCP client and the\n", + "# ADK runner use top-level await cleanly.\n", + "warnings.filterwarnings(\"ignore\")\n", + "nest_asyncio.apply()\n", + "\n", + "REDIS_URL = os.environ.get(\"REDIS_URL\", \"redis://localhost:6379\")\n", + "\n", + "from redis import Redis\n", + "\n", + "Redis.from_url(REDIS_URL).ping()\n", + "print(\"Connected to\", REDIS_URL)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Create and Load an Index\n", + "\n", + "The MCP server only ever **binds to an index that already exists**, it never creates one. So the first step is the ordinary RedisVL flow: define a schema, embed some records, and load them.\n", + "\n", + "We use a tiny Redis knowledge corpus and embed each record's `text` field locally with `HFTextVectorizer` (`all-MiniLM-L6-v2`, 384 dimensions). The `doc_id` tag gives every record a stable identifier so upserts can update in place.\n", + "\n", + "> **Reserved field names:** RedisVL MCP uses `id`, `score`, and a few others in its response envelope. Name your identifier field something else (here, `doc_id`) so it does not collide.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from redisvl.index import SearchIndex\n", + "from redisvl.utils.vectorize import HFTextVectorizer\n", + "\n", + "INDEX_NAME = \"redisvl_mcp_guide\"\n", + "INDEX_PREFIX = \"mcp_guide\"\n", + "EMBEDDING_MODEL = \"sentence-transformers/all-MiniLM-L6-v2\"\n", + "\n", + "documents = [\n", + " {\"doc_id\": \"redisvl-intro\", \"title\": \"What is RedisVL\",\n", + " \"text\": \"RedisVL is a Python client for building AI applications on Redis, \"\n", + " \"with vector search, semantic caching, and semantic routing.\"},\n", + " {\"doc_id\": \"vector-search\", \"title\": \"Vector search\",\n", + " \"text\": \"Redis stores embeddings and runs k-nearest-neighbor vector similarity \"\n", + " \"search using HNSW or FLAT indexes.\"},\n", + " {\"doc_id\": \"semantic-cache\", \"title\": \"Semantic caching\",\n", + " \"text\": \"SemanticCache stores past LLM responses and returns a cached answer when \"\n", + " \"a new prompt is semantically similar to an earlier one.\"},\n", + " {\"doc_id\": \"mcp-server\", \"title\": \"RedisVL MCP server\",\n", + " \"text\": \"The rvl mcp command serves an existing Redis index to MCP clients through \"\n", + " \"the search-records and upsert-records tools.\"},\n", + "]\n", + "\n", + "vectorizer = HFTextVectorizer(model=EMBEDDING_MODEL)\n", + "\n", + "schema = {\n", + " \"index\": {\"name\": INDEX_NAME, \"prefix\": INDEX_PREFIX, \"storage_type\": \"hash\"},\n", + " \"fields\": [\n", + " {\"name\": \"doc_id\", \"type\": \"tag\"},\n", + " {\"name\": \"title\", \"type\": \"text\"},\n", + " {\"name\": \"text\", \"type\": \"text\"},\n", + " {\n", + " \"name\": \"embedding\",\n", + " \"type\": \"vector\",\n", + " \"attrs\": {\n", + " \"algorithm\": \"hnsw\",\n", + " \"dims\": vectorizer.dims,\n", + " \"distance_metric\": \"cosine\",\n", + " \"datatype\": \"float32\",\n", + " },\n", + " },\n", + " ],\n", + "}\n", + "\n", + "index = SearchIndex.from_dict(schema, redis_url=REDIS_URL)\n", + "index.create(overwrite=True, drop=True)\n", + "\n", + "records = []\n", + "for doc in documents:\n", + " record = dict(doc)\n", + " record[\"embedding\"] = vectorizer.embed(doc[\"text\"], as_buffer=True)\n", + " records.append(record)\n", + "\n", + "keys = index.load(records, id_field=\"doc_id\")\n", + "print(f\"Loaded {len(keys)} records into index '{INDEX_NAME}'\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Write the MCP Config\n", + "\n", + "The config binds **one logical MCP server to one existing index**. It has three parts:\n", + "\n", + "- **`server`** : how to reach Redis (`redis_url`).\n", + "- **`indexes..search`** : how the `search-records` tool queries. `type: vector` embeds the query text and runs vector similarity. (`fulltext` and `hybrid` are also supported, hybrid needs native Redis support.)\n", + "- **`indexes..runtime`** : field mappings and guardrails.\n", + " - `vector_field_name` / `text_field_name` : which fields to search.\n", + " - `default_embed_text_field` : the field the server embeds, both for incoming queries and for new records on upsert. This is what makes the server **embed text itself** so clients never send vectors.\n", + " - `default_limit` / `max_limit` / `max_result_window` : cap result sizes.\n", + "\n", + "We point the same `HFTextVectorizer` model at the server so its query embeddings match the vectors we stored. Because we do **not** pass `--read-only` when launching, both `search-records` and `upsert-records` are exposed.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "\n", + "import yaml\n", + "\n", + "from redisvl.mcp import load_mcp_config\n", + "\n", + "mcp_config_path = (Path.cwd() / \"redisvl_mcp_guide.yaml\").resolve()\n", + "\n", + "mcp_config = {\n", + " \"server\": {\"redis_url\": REDIS_URL},\n", + " \"indexes\": {\n", + " INDEX_NAME: {\n", + " \"redis_name\": INDEX_NAME,\n", + " \"vectorizer\": {\"class\": \"HFTextVectorizer\", \"model\": EMBEDDING_MODEL},\n", + " \"search\": {\"type\": \"vector\"},\n", + " \"runtime\": {\n", + " \"text_field_name\": \"text\",\n", + " \"vector_field_name\": \"embedding\",\n", + " \"default_embed_text_field\": \"text\",\n", + " \"default_limit\": 3,\n", + " \"max_limit\": 10,\n", + " \"max_result_window\": 100,\n", + " # The first call loads the embedding model into memory, which can\n", + " # take 30+ seconds. Give startup and requests plenty of room.\n", + " \"request_timeout_seconds\": 120,\n", + " \"startup_timeout_seconds\": 120,\n", + " },\n", + " }\n", + " },\n", + "}\n", + "\n", + "mcp_config_path.write_text(yaml.safe_dump(mcp_config, sort_keys=False), encoding=\"utf-8\")\n", + "\n", + "# load_mcp_config validates the file the way the server will at startup.\n", + "validated = load_mcp_config(str(mcp_config_path))\n", + "print(\"Wrote and validated\", mcp_config_path)\n", + "print({\n", + " \"binding_id\": validated.binding_id,\n", + " \"redis_name\": validated.redis_name,\n", + " \"search_type\": validated.binding.search.type,\n", + "})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Start the RedisVL MCP Server\n", + "\n", + "The server must be running before any client can connect. We use **Streamable HTTP** transport, which works reliably in notebooks (stdio transport breaks under Jupyter/Colab because they wrap `stdout`/`stderr`).\n", + "\n", + "**Option A : from a terminal (recommended for local development).** Open a separate terminal and run:\n", + "\n", + "```bash\n", + "uvx --from \"redisvl[mcp,sentence-transformers]\" rvl mcp \\\n", + " --config redisvl_mcp_guide.yaml \\\n", + " --transport streamable-http --host 127.0.0.1 --port 8000\n", + "```\n", + "\n", + "Then skip the next code cell and set `MCP_URL = \"http://127.0.0.1:8000/mcp\"`.\n", + "\n", + "**Option B : from the notebook (required for Colab).** The next cell launches the server as a background subprocess. It first refuses to continue if the port is already taken, then waits until the server answers a real MCP handshake (confirming it is our server and the embedding model has finished loading), not merely until the socket opens.\n", + "\n", + "```{warning}\n", + "Streamable HTTP is **unauthenticated by default**. Only bind to public interfaces (`--host 0.0.0.0`) on trusted networks or behind an authenticating proxy. Without `--read-only`, the `upsert-records` write tool is exposed to any client that can reach the server.\n", + "```\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# NBVAL_SKIP\n", + "import socket\n", + "import subprocess\n", + "import time\n", + "\n", + "from fastmcp import Client\n", + "\n", + "MCP_HOST, MCP_PORT = \"127.0.0.1\", 8000\n", + "MCP_URL = f\"http://{MCP_HOST}:{MCP_PORT}/mcp\"\n", + "\n", + "\n", + "def port_in_use(host: str, port: int) -> bool:\n", + " \"\"\"Return True if something is already listening on host:port.\"\"\"\n", + " with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:\n", + " sock.settimeout(0.5)\n", + " return sock.connect_ex((host, port)) == 0\n", + "\n", + "\n", + "# Fail loudly on a port clash. A bare TCP probe would otherwise treat an\n", + "# unrelated server already on this port as \"ready\" and surface a confusing\n", + "# \"Session terminated\" error on the first tool call.\n", + "if port_in_use(MCP_HOST, MCP_PORT):\n", + " raise RuntimeError(\n", + " f\"Port {MCP_PORT} is already in use. Stop whatever is bound to it, or pick a free \"\n", + " f\"port by setting MCP_PORT and MCP_URL above before launching the server.\"\n", + " )\n", + "\n", + "# A clearer search-tool description steers the model toward real field names.\n", + "# RedisVL still appends schema-derived filter and return_fields hints.\n", + "os.environ[\"REDISVL_MCP_TOOL_SEARCH_DESCRIPTION\"] = (\n", + " \"Search the Redis knowledge base. Fields: doc_id (tag), title (text), text (text). \"\n", + " \"Only use these field names in return_fields and filters.\"\n", + ")\n", + "\n", + "mcp_process = subprocess.Popen(\n", + " [\n", + " \"rvl\", \"mcp\",\n", + " \"--config\", str(mcp_config_path),\n", + " \"--transport\", \"streamable-http\",\n", + " \"--host\", MCP_HOST,\n", + " \"--port\", str(MCP_PORT),\n", + " ],\n", + " stdin=subprocess.PIPE,\n", + " stdout=subprocess.DEVNULL,\n", + " stderr=subprocess.DEVNULL,\n", + " env=os.environ.copy(),\n", + " start_new_session=True, # own process group, so we can stop the whole tree\n", + ")\n", + "\n", + "\n", + "async def wait_until_ready(url: str, timeout: float = 120.0) -> set[str]:\n", + " \"\"\"Wait for a real MCP handshake, not just an open socket.\n", + "\n", + " Connecting with an MCP client and confirming ``search-records`` is exposed\n", + " proves the process is actually our RedisVL server (not some other listener\n", + " on this port) and that it has finished loading the embedding model, so the\n", + " first real call will not cold-start.\n", + " \"\"\"\n", + " deadline = time.time() + timeout\n", + " last_error = None\n", + " while time.time() < deadline:\n", + " if mcp_process.poll() is not None:\n", + " raise RuntimeError(f\"MCP server exited early (code {mcp_process.returncode})\")\n", + " try:\n", + " async with Client(url) as client:\n", + " tool_names = {t.name for t in await client.list_tools()}\n", + " if \"search-records\" in tool_names:\n", + " return tool_names\n", + " except Exception as exc: # not ready yet, keep polling\n", + " last_error = exc\n", + " time.sleep(1.0)\n", + " raise RuntimeError(f\"MCP server not ready after {timeout:.0f}s (last error: {last_error})\")\n", + "\n", + "\n", + "tool_names = await wait_until_ready(MCP_URL)\n", + "print(f\"MCP server ready (PID {mcp_process.pid}); tools exposed: {sorted(tool_names)}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Call the Tools from an MCP Client\n", + "\n", + "Any MCP client can now connect. We use [`fastmcp`](https://gofastmcp.com/)'s `Client` (installed with the `mcp` extra) to show exactly what crosses the wire. Notice the request is **plain text**: the server embeds it before searching.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# NBVAL_SKIP\n", + "from fastmcp import Client\n", + "\n", + "async with Client(MCP_URL) as client:\n", + " tools = await client.list_tools()\n", + " print(\"Tools exposed:\", [t.name for t in tools])\n", + "\n", + " result = await client.call_tool(\n", + " \"search-records\",\n", + " {\"query\": \"How do I cache LLM responses?\", \"limit\": 3},\n", + " )\n", + "\n", + "# result.data is the structured JSON response from the server.\n", + "search_payload = result.data\n", + "\n", + "# Verify the tool actually returned grounded results.\n", + "assert search_payload[\"results\"], \"search-records returned no results\"\n", + "print(f\"search_type={search_payload['search_type']} | {len(search_payload['results'])} results returned\")\n", + "\n", + "rows = [\n", + " {**hit[\"record\"], \"score\": round(hit[\"score\"], 3), \"score_type\": hit[\"score_type\"]}\n", + " for hit in search_payload[\"results\"]\n", + "]\n", + "pd.DataFrame(rows)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Upsert a New Record\n", + "\n", + "`upsert-records` takes a list of records, each carrying the `default_embed_text_field` (`text`) the server will embed. Passing `id_field=\"doc_id\"` makes writes idempotent: the same `doc_id` overwrites in place rather than creating a duplicate. We send no vector, the server generates it.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# NBVAL_SKIP\n", + "async with Client(MCP_URL) as client:\n", + " upsert_result = await client.call_tool(\n", + " \"upsert-records\",\n", + " {\n", + " \"records\": [\n", + " {\n", + " \"doc_id\": \"semantic-router\",\n", + " \"title\": \"Semantic routing\",\n", + " \"text\": \"SemanticRouter classifies an incoming query and routes it to the \"\n", + " \"best matching route using vector similarity over route references.\",\n", + " }\n", + " ],\n", + " \"id_field\": \"doc_id\",\n", + " },\n", + " )\n", + " print(\"Upsert:\", upsert_result.data)\n", + "\n", + " # The new record is immediately searchable.\n", + " verify = await client.call_tool(\n", + " \"search-records\",\n", + " {\"query\": \"route queries to the right topic\", \"limit\": 2},\n", + " )\n", + "\n", + "# Verify the write landed and is retrievable.\n", + "assert upsert_result.data[\"keys_upserted\"] == 1, upsert_result.data\n", + "assert any(\n", + " hit[\"record\"][\"doc_id\"] == \"semantic-router\" for hit in verify.data[\"results\"]\n", + "), \"the upserted record was not found by a follow-up search\"\n", + "print(\"Verified: upserted\", upsert_result.data[\"keys\"], \"and found it via search\")\n", + "\n", + "pd.DataFrame(\n", + " {**h[\"record\"], \"score\": round(h[\"score\"], 3)} for h in verify.data[\"results\"]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Use the Index from a Google ADK Agent\n", + "\n", + "The same server drops straight into an agent framework. This is the pattern from the [`redisvl-mcp-rag-agent`](https://github.com/redis-applied-ai/redisvl-mcp-rag-agent) demo, reduced to its core: a [Google ADK](https://google.github.io/adk-docs/) `LlmAgent` whose only tools are the RedisVL MCP `search-records` and `upsert-records`, reached over the same Streamable HTTP endpoint.\n", + "\n", + "The agent orchestrates with a chat model (here OpenAI's `gpt-4o` via [LiteLLM](https://docs.litellm.ai/)) but **retrieval and writes go through MCP**, so the model only ever sends text.\n", + "\n", + "This section needs `google-adk`, `litellm`, and an `OPENAI_API_KEY`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# NBVAL_SKIP\n", + "%pip install -q \"google-adk>=1.0.0\" litellm" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# NBVAL_SKIP\n", + "from getpass import getpass\n", + "\n", + "if not os.environ.get(\"OPENAI_API_KEY\"):\n", + " os.environ[\"OPENAI_API_KEY\"] = getpass(\"OPENAI_API_KEY: \")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Build the Agent\n", + "\n", + "`McpToolset` connects ADK to the running server. The `tool_filter` lists exactly which MCP tools the agent may call, here both `search-records` and `upsert-records`. The `instruction` tells the model to ground every answer in retrieved records and to write only when explicitly asked.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# NBVAL_SKIP\n", + "import uuid\n", + "\n", + "from google.adk.agents import LlmAgent\n", + "from google.adk.models.lite_llm import LiteLlm\n", + "from google.adk.runners import Runner\n", + "from google.adk.sessions import InMemorySessionService\n", + "from google.adk.tools.mcp_tool import McpToolset\n", + "from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPConnectionParams\n", + "from google.genai import types as genai_types\n", + "\n", + "APP_NAME, USER_ID = \"redisvl_mcp_guide\", \"notebook_user\"\n", + "\n", + "INSTRUCTION = (\n", + " \"You are a Redis knowledge assistant backed by one RedisVL index. \"\n", + " \"Both your tools take plain TEXT and the server embeds it, so never construct vectors.\\n\"\n", + " \"- To answer, call search-records first, then ground your answer in the results and \"\n", + " \"name the titles you used. If nothing relevant comes back, say so; do not invent an answer.\\n\"\n", + " \"- Only call upsert-records when the user clearly asks to save or correct knowledge. Put the \"\n", + " \"content in the `text` field, pass id_field='doc_id' with a `doc_id`, and confirm what you wrote.\"\n", + ")\n", + "\n", + "toolset = McpToolset(\n", + " connection_params=StreamableHTTPConnectionParams(url=MCP_URL),\n", + " tool_filter=[\"search-records\", \"upsert-records\"],\n", + ")\n", + "\n", + "agent = LlmAgent(\n", + " name=\"redis_mcp_agent\",\n", + " model=LiteLlm(model=\"openai/gpt-4o\"),\n", + " instruction=INSTRUCTION,\n", + " tools=[toolset],\n", + ")\n", + "\n", + "session_service = InMemorySessionService()\n", + "runner = Runner(app_name=APP_NAME, agent=agent, session_service=session_service)\n", + "\n", + "\n", + "async def ask_agent(query: str) -> dict:\n", + " \"\"\"Run one turn; return the final answer plus the MCP tool calls it made.\"\"\"\n", + " session_id = f\"session-{uuid.uuid4().hex[:8]}\"\n", + " await session_service.create_session(\n", + " app_name=APP_NAME, user_id=USER_ID, session_id=session_id, state={}\n", + " )\n", + " message = genai_types.Content(role=\"user\", parts=[genai_types.Part(text=query)])\n", + "\n", + " answer, tool_calls = \"\", []\n", + " async for event in runner.run_async(\n", + " user_id=USER_ID, session_id=session_id, new_message=message\n", + " ):\n", + " for call in event.get_function_calls() or []:\n", + " tool_calls.append({\"name\": call.name, \"args\": dict(call.args or {})})\n", + " if event.is_final_response() and event.content and event.content.parts:\n", + " answer = \"\".join(part.text or \"\" for part in event.content.parts)\n", + " return {\"answer\": answer, \"tool_calls\": tool_calls}\n", + "\n", + "\n", + "print(\"Agent ready, wired to\", MCP_URL)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Ask a Grounded Question\n", + "\n", + "The agent calls `search-records` and answers from what Redis returns.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# NBVAL_SKIP\n", + "result = await ask_agent(\"What does RedisVL offer for semantic caching?\")\n", + "print(result[\"answer\"])\n", + "print(\"\\nTool calls:\", result[\"tool_calls\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Ask the Agent to Save Knowledge\n", + "\n", + "Because `upsert-records` is exposed, the agent can add to the corpus mid-conversation, then retrieve it.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# NBVAL_SKIP\n", + "save = await ask_agent(\n", + " \"Remember this: RedisVL EmbeddingsCache stores embeddings keyed by text so you do not \"\n", + " \"re-embed the same input twice. Save it with doc_id 'embeddings-cache'.\"\n", + ")\n", + "print(save[\"answer\"])\n", + "print(\"\\nTool calls:\", save[\"tool_calls\"])\n", + "\n", + "check = await ask_agent(\"How can I avoid re-embedding the same text?\")\n", + "print(\"\\n---\\n\", check[\"answer\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Cleanup\n", + "\n", + "Close the toolset, stop the background server (skip if you started it from a terminal, stop it there with Ctrl-C), and drop the index.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import signal\n", + "\n", + "try:\n", + " await toolset.close()\n", + "except NameError:\n", + " pass # agent section was skipped\n", + "\n", + "if \"mcp_process\" in dir() and mcp_process.poll() is None:\n", + " os.killpg(os.getpgid(mcp_process.pid), signal.SIGTERM)\n", + " mcp_process.wait(timeout=10)\n", + " print(\"MCP server stopped.\")\n", + "\n", + "index.delete(drop=True)\n", + "mcp_config_path.unlink(missing_ok=True)\n", + "print(\"Index dropped and config removed.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Recap\n", + "\n", + "You served a single Redis index over MCP and used it two ways:\n", + "\n", + "1. **Index + config** : created a RedisVL index, then bound it to the `rvl mcp` server with a small YAML config. `default_embed_text_field` makes the server embed text itself, so clients send only text.\n", + "2. **Direct client** : connected with `fastmcp` and called `search-records` and `upsert-records` over Streamable HTTP.\n", + "3. **ADK agent** : pointed a Google ADK `LlmAgent` at the same endpoint with `McpToolset`, so the model retrieves and writes knowledge through MCP with no Redis-specific code.\n", + "\n", + "Any MCP-compatible client (ADK, Claude Desktop, Cursor) can reuse this exact server and index. For the full config schema, tool contracts, transports, and read-only mode, see the [Run RedisVL MCP](how_to_guides/mcp.md) how-to guide.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/user_guide/how_to_guides/index.md b/docs/user_guide/how_to_guides/index.md index 91d15b78..027bf71e 100644 --- a/docs/user_guide/how_to_guides/index.md +++ b/docs/user_guide/how_to_guides/index.md @@ -44,6 +44,7 @@ How-to guides are **task-oriented** recipes that help you accomplish specific go - [Manage Indices with the CLI](../cli.ipynb): create, inspect, and delete indices from your terminal - [Run RedisVL MCP](mcp.md): expose an existing Redis index to MCP clients - [Authenticate RedisVL MCP](mcp_authentication.md): require JWT bearer tokens and gate read vs write +- [Serve an Index over MCP](../15_mcp.ipynb): hands-on notebook for the search and upsert tools and a Google ADK agent ::: :::: @@ -67,6 +68,7 @@ How-to guides are **task-oriented** recipes that help you accomplish specific go | Manage indices from terminal | [Manage Indices with the CLI](../cli.ipynb) | | Expose an index through MCP | [Run RedisVL MCP](mcp.md) | | Authenticate the MCP server | [Authenticate RedisVL MCP](mcp_authentication.md) | +| Run a hands-on MCP notebook with search, upsert, and an ADK agent | [Serve an Index over MCP](../15_mcp.ipynb) | | Plan and run a supported index migration | [Migrate an Index](migrate-indexes.md) | | Quantize vectors with resume, rollback, and the wizard | [Migrate an Index: Quantization, Resume, Backup, Wizard](../14_index_migration.ipynb) | @@ -87,6 +89,7 @@ Use Advanced Query Types <../11_advanced_queries> Write SQL Queries for Redis <../12_sql_to_redis_queries> Run RedisVL MCP Authenticate RedisVL MCP +Serve an Index over MCP <../15_mcp> Migrate an Index Migrate an Index: Quantization, Resume, Backup, Wizard <../14_index_migration> ``` From e47d2b7b81bf5e4458bdcf20d05e927a97c7fd76 Mon Sep 17 00:00:00 2001 From: Nitin Kanukolanu Date: Mon, 15 Jun 2026 16:20:01 -0400 Subject: [PATCH 2/4] docs: add JWT authentication section to MCP notebook Add Section 6 to the MCP user-guide notebook demonstrating optional JWT bearer authentication for the HTTP transports (PR #628): - Start a second, authenticated server with a server.auth block, minting a throwaway RSAKeyPair and validating against its static public key so the demo needs no external IdP. - Show a no-token request rejected (401), a read-only token that can search but is forbidden from upsert, and a read+write token that can upsert (coarse read/write scope gating). - Update the Section 3 warning, cleanup, and recap to cover auth and link to the new Authenticate RedisVL MCP how-to guide. --- docs/user_guide/15_mcp.ipynb | 188 ++++++++++++++++++++++++++++++++++- 1 file changed, 185 insertions(+), 3 deletions(-) diff --git a/docs/user_guide/15_mcp.ipynb b/docs/user_guide/15_mcp.ipynb index 9f390039..919c2dc2 100644 --- a/docs/user_guide/15_mcp.ipynb +++ b/docs/user_guide/15_mcp.ipynb @@ -245,7 +245,7 @@ "**Option B : from the notebook (required for Colab).** The next cell launches the server as a background subprocess. It first refuses to continue if the port is already taken, then waits until the server answers a real MCP handshake (confirming it is our server and the embedding model has finished loading), not merely until the socket opens.\n", "\n", "```{warning}\n", - "Streamable HTTP is **unauthenticated by default**. Only bind to public interfaces (`--host 0.0.0.0`) on trusted networks or behind an authenticating proxy. Without `--read-only`, the `upsert-records` write tool is exposed to any client that can reach the server.\n", + "Streamable HTTP and SSE are **unauthenticated by default**. Binding to a non-loopback host (`--host 0.0.0.0`) without auth now fails closed unless you pass `--allow-unauthenticated`; binding to loopback (`127.0.0.1`, as here) without auth only warns. For any networked deployment, enable JWT authentication (see Section 6 below and the [Authenticate RedisVL MCP](how_to_guides/mcp_authentication.md) guide) rather than relying on `--allow-unauthenticated`. Without `--read-only`, the `upsert-records` write tool is exposed to any client that can reach the server.\n", "```\n" ] }, @@ -584,6 +584,180 @@ "print(\"\\n---\\n\", check[\"answer\"])" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. Secure the Server with JWT Authentication\n", + "\n", + "Everything so far ran **unauthenticated**, which is fine for local development on `127.0.0.1`. For any networked deployment you should require a token. The HTTP transports (`streamable-http`, `sse`) support optional **JWT bearer authentication**; `stdio` is a local subprocess and is never authenticated.\n", + "\n", + "In OAuth terms RedisVL MCP is a **resource server**: it validates access tokens that *your* identity provider issued, it does not run a login flow or mint tokens. On each request it checks the token **signature** (against a JWKS endpoint or a static public key), **issuer**, **audience**, and **expiry**, then gates `search-records` behind a **read scope** and `upsert-records` behind a **write scope**.\n", + "\n", + "> In production you point `jwks_uri` at your IdP (Auth0, Okta, Entra, Cognito, Keycloak). To keep this notebook self-contained we mint a throwaway RSA keypair with FastMCP's `RSAKeyPair` and validate tokens against its static `public_key`, so no external IdP is required.\n", + "\n", + "We start a **second** server on port 8001 with a `server.auth` block, leaving the unauthenticated server above untouched. For the full configuration reference and the gateway boundary, see the [Authenticate RedisVL MCP](how_to_guides/mcp_authentication.md) guide." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# NBVAL_SKIP\n", + "import copy\n", + "\n", + "from fastmcp.client.auth import BearerAuth\n", + "from fastmcp.server.auth.providers.jwt import RSAKeyPair\n", + "\n", + "# In production you point `jwks_uri` at your identity provider and validate the\n", + "# tokens it issues. Here we mint a throwaway RSA keypair locally and validate\n", + "# against its static public key, so the demo needs no external IdP.\n", + "ISSUER = \"https://auth.redis.example/notebook/v2.0\"\n", + "AUDIENCE = \"api://redisvl-mcp\"\n", + "READ_SCOPE = \"kb.search.read\" # required to connect and to call search-records\n", + "WRITE_SCOPE = \"kb.search.write\" # additionally required to call upsert-records\n", + "\n", + "key = RSAKeyPair.generate()\n", + "\n", + "# Reuse the same index and search settings, but add a server.auth block.\n", + "auth_config = copy.deepcopy(mcp_config)\n", + "auth_config[\"server\"][\"auth\"] = {\n", + " \"type\": \"jwt\",\n", + " \"public_key\": key.public_key,\n", + " \"issuer\": ISSUER,\n", + " \"audience\": AUDIENCE,\n", + " \"required_scopes\": [READ_SCOPE], # token must carry this scope to connect\n", + " \"read_scope\": READ_SCOPE, # gate for search-records\n", + " \"write_scope\": WRITE_SCOPE, # gate for upsert-records\n", + "}\n", + "\n", + "auth_config_path = (Path.cwd() / \"redisvl_mcp_guide_auth.yaml\").resolve()\n", + "auth_config_path.write_text(yaml.safe_dump(auth_config, sort_keys=False), encoding=\"utf-8\")\n", + "load_mcp_config(str(auth_config_path)) # validate the auth block the way startup will\n", + "\n", + "AUTH_HOST, AUTH_PORT = \"127.0.0.1\", 8001\n", + "AUTH_URL = f\"http://{AUTH_HOST}:{AUTH_PORT}/mcp\"\n", + "\n", + "if port_in_use(AUTH_HOST, AUTH_PORT):\n", + " raise RuntimeError(\n", + " f\"Port {AUTH_PORT} is already in use. Free it or pick another port for the \"\n", + " f\"authenticated server before re-running this cell.\"\n", + " )\n", + "\n", + "auth_process = subprocess.Popen(\n", + " [\n", + " \"rvl\", \"mcp\",\n", + " \"--config\", str(auth_config_path),\n", + " \"--transport\", \"streamable-http\",\n", + " \"--host\", AUTH_HOST,\n", + " \"--port\", str(AUTH_PORT),\n", + " ],\n", + " stdin=subprocess.PIPE,\n", + " stdout=subprocess.DEVNULL,\n", + " stderr=subprocess.DEVNULL,\n", + " env=os.environ.copy(),\n", + " start_new_session=True,\n", + ")\n", + "\n", + "\n", + "async def wait_until_ready_auth(url, auth, timeout=120.0):\n", + " # Readiness probe that sends a valid token, since unauthenticated handshakes\n", + " # are now rejected with 401.\n", + " deadline = time.time() + timeout\n", + " last_error = None\n", + " while time.time() < deadline:\n", + " if auth_process.poll() is not None:\n", + " raise RuntimeError(\n", + " f\"Authenticated MCP server exited early (code {auth_process.returncode})\"\n", + " )\n", + " try:\n", + " async with Client(url, auth=auth) as client:\n", + " tool_names = {t.name for t in await client.list_tools()}\n", + " if \"search-records\" in tool_names:\n", + " return tool_names\n", + " except Exception as exc: # not ready yet, keep polling\n", + " last_error = exc\n", + " time.sleep(1.0)\n", + " raise RuntimeError(\n", + " f\"Authenticated MCP server not ready after {timeout:.0f}s (last error: {last_error})\"\n", + " )\n", + "\n", + "\n", + "connect_token = key.create_token(\n", + " subject=\"notebook\", issuer=ISSUER, audience=AUDIENCE, scopes=[READ_SCOPE]\n", + ")\n", + "await wait_until_ready_auth(AUTH_URL, BearerAuth(connect_token))\n", + "print(f\"Authenticated MCP server ready (PID {auth_process.pid}) at {AUTH_URL}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Connect With and Without a Token\n", + "\n", + "The same `fastmcp` client now sends a **bearer token**. We exercise three callers against the authenticated server:\n", + "\n", + "- **No token** is rejected at the transport with HTTP 401.\n", + "- A **read-only token** (carries the read scope) can `search-records` but is forbidden from `upsert-records`.\n", + "- A **read+write token** (carries both scopes) can also `upsert-records`.\n", + "\n", + "This is *coarse* authorization: RedisVL authenticates the caller and gates read vs write. Mapping a token's identity to a specific Redis ACL user or per-tenant query filters is a gateway concern, covered in the [Authenticate RedisVL MCP](how_to_guides/mcp_authentication.md) guide." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# NBVAL_SKIP\n", + "from fastmcp.exceptions import ToolError\n", + "\n", + "# 1. No token at all: rejected at the transport with HTTP 401.\n", + "try:\n", + " async with Client(AUTH_URL) as client:\n", + " await client.list_tools()\n", + " raise AssertionError(\"expected the unauthenticated request to be rejected\")\n", + "except Exception as exc:\n", + " assert \"401\" in str(exc), f\"expected a 401, got {exc!r}\"\n", + " print(\"No token -> rejected (401)\")\n", + "\n", + "# 2. Read-only token: holds the read scope, so search works but upsert is forbidden.\n", + "read_token = key.create_token(\n", + " subject=\"reader\", issuer=ISSUER, audience=AUDIENCE, scopes=[READ_SCOPE]\n", + ")\n", + "async with Client(AUTH_URL, auth=BearerAuth(read_token)) as client:\n", + " found = await client.call_tool(\"search-records\", {\"query\": \"vector search\", \"limit\": 1})\n", + " assert found.data[\"results\"], \"the read token should be able to search\"\n", + " print(f\"Read token -> search OK ({len(found.data['results'])} result)\")\n", + " try:\n", + " await client.call_tool(\n", + " \"upsert-records\",\n", + " {\"records\": [{\"doc_id\": \"blocked\", \"title\": \"x\", \"text\": \"blocked write\"}],\n", + " \"id_field\": \"doc_id\"},\n", + " )\n", + " raise AssertionError(\"expected the read-only token to be blocked from upsert\")\n", + " except ToolError:\n", + " print(\"Read token -> upsert FORBIDDEN (missing write scope)\")\n", + "\n", + "# 3. Read+write token: holds both scopes, so upsert succeeds.\n", + "rw_token = key.create_token(\n", + " subject=\"editor\", issuer=ISSUER, audience=AUDIENCE, scopes=[READ_SCOPE, WRITE_SCOPE]\n", + ")\n", + "async with Client(AUTH_URL, auth=BearerAuth(rw_token)) as client:\n", + " written = await client.call_tool(\n", + " \"upsert-records\",\n", + " {\"records\": [{\"doc_id\": \"auth-demo\", \"title\": \"Authenticated write\",\n", + " \"text\": \"This record was written through an authenticated MCP call.\"}],\n", + " \"id_field\": \"doc_id\"},\n", + " )\n", + " assert written.data[\"keys_upserted\"] == 1, written.data\n", + " print(f\"Read+write token -> upsert OK ({written.data['keys_upserted']} record)\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -612,8 +786,15 @@ " mcp_process.wait(timeout=10)\n", " print(\"MCP server stopped.\")\n", "\n", + "if \"auth_process\" in dir() and auth_process.poll() is None:\n", + " os.killpg(os.getpgid(auth_process.pid), signal.SIGTERM)\n", + " auth_process.wait(timeout=10)\n", + " print(\"Authenticated MCP server stopped.\")\n", + "\n", "index.delete(drop=True)\n", "mcp_config_path.unlink(missing_ok=True)\n", + "if \"auth_config_path\" in dir():\n", + " auth_config_path.unlink(missing_ok=True)\n", "print(\"Index dropped and config removed.\")" ] }, @@ -623,13 +804,14 @@ "source": [ "## Recap\n", "\n", - "You served a single Redis index over MCP and used it two ways:\n", + "You served a single Redis index over MCP and used it several ways:\n", "\n", "1. **Index + config** : created a RedisVL index, then bound it to the `rvl mcp` server with a small YAML config. `default_embed_text_field` makes the server embed text itself, so clients send only text.\n", "2. **Direct client** : connected with `fastmcp` and called `search-records` and `upsert-records` over Streamable HTTP.\n", "3. **ADK agent** : pointed a Google ADK `LlmAgent` at the same endpoint with `McpToolset`, so the model retrieves and writes knowledge through MCP with no Redis-specific code.\n", + "4. **Authenticated access** : started a second server with a `server.auth` JWT block, then watched a no-token request get rejected, a read-only token search but fail to write, and a read+write token upsert successfully.\n", "\n", - "Any MCP-compatible client (ADK, Claude Desktop, Cursor) can reuse this exact server and index. For the full config schema, tool contracts, transports, and read-only mode, see the [Run RedisVL MCP](how_to_guides/mcp.md) how-to guide.\n" + "Any MCP-compatible client (ADK, Claude Desktop, Cursor) can reuse this exact server and index. For the full config schema, tool contracts, transports, and read-only mode, see the [Run RedisVL MCP](how_to_guides/mcp.md) and [Authenticate RedisVL MCP](how_to_guides/mcp_authentication.md) how-to guides.\n" ] } ], From 0738063c7c7a91a69a60fcc987e04a288131b245 Mon Sep 17 00:00:00 2001 From: Nitin Kanukolanu Date: Mon, 15 Jun 2026 18:39:51 -0400 Subject: [PATCH 3/4] docs: make MCP notebook nbval-safe per review Address Copilot review on PR #626: - Mark the %pip install cell with # NBVAL_SKIP so notebook tests use the already-synced uv env instead of overriding editable redisvl from PyPI. - Guard the pandas import (optional, only used to render result tables) so make test-notebooks does not fail when pandas is absent. - Import asyncio and use await asyncio.sleep(1.0) in both readiness pollers instead of time.sleep(1.0), which blocked the notebook event loop. --- docs/user_guide/15_mcp.ipynb | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/docs/user_guide/15_mcp.ipynb b/docs/user_guide/15_mcp.ipynb index 919c2dc2..34ebf4e8 100644 --- a/docs/user_guide/15_mcp.ipynb +++ b/docs/user_guide/15_mcp.ipynb @@ -45,6 +45,7 @@ "metadata": {}, "outputs": [], "source": [ + "# NBVAL_SKIP\n", "%pip install -q \"redisvl[mcp,sentence-transformers]>=0.20.0\" nest_asyncio pandas" ] }, @@ -67,7 +68,10 @@ "import warnings\n", "\n", "import nest_asyncio\n", - "import pandas as pd\n", + "try:\n", + " import pandas as pd # optional: only used to render result tables\n", + "except ImportError: # keep notebook tests working without pandas installed\n", + " pd = None\n", "\n", "# Notebook event loops are already running; this lets the MCP client and the\n", "# ADK runner use top-level await cleanly.\n", @@ -256,6 +260,7 @@ "outputs": [], "source": [ "# NBVAL_SKIP\n", + "import asyncio\n", "import socket\n", "import subprocess\n", "import time\n", @@ -325,7 +330,7 @@ " return tool_names\n", " except Exception as exc: # not ready yet, keep polling\n", " last_error = exc\n", - " time.sleep(1.0)\n", + " await asyncio.sleep(1.0)\n", " raise RuntimeError(f\"MCP server not ready after {timeout:.0f}s (last error: {last_error})\")\n", "\n", "\n", @@ -679,7 +684,7 @@ " return tool_names\n", " except Exception as exc: # not ready yet, keep polling\n", " last_error = exc\n", - " time.sleep(1.0)\n", + " await asyncio.sleep(1.0)\n", " raise RuntimeError(\n", " f\"Authenticated MCP server not ready after {timeout:.0f}s (last error: {last_error})\"\n", " )\n", From 8c703e4bc004dc76e163409a3315be07431d7f37 Mon Sep 17 00:00:00 2001 From: Nitin Kanukolanu Date: Mon, 15 Jun 2026 18:48:33 -0400 Subject: [PATCH 4/4] docs: address bot review on MCP notebook - Bump notebook install pin to redisvl>=0.21.0 since Section 6 (JWT auth) requires the auth feature introduced in v0.21.0. - Make Section 6 self-contained: import asyncio/socket/subprocess/time and Client, and define port_in_use locally, so the JWT demo runs even when Section 3 used Option A (terminal-launched server) and its cell was skipped. --- docs/user_guide/15_mcp.ipynb | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/docs/user_guide/15_mcp.ipynb b/docs/user_guide/15_mcp.ipynb index 34ebf4e8..013df851 100644 --- a/docs/user_guide/15_mcp.ipynb +++ b/docs/user_guide/15_mcp.ipynb @@ -46,7 +46,7 @@ "outputs": [], "source": [ "# NBVAL_SKIP\n", - "%pip install -q \"redisvl[mcp,sentence-transformers]>=0.20.0\" nest_asyncio pandas" + "%pip install -q \"redisvl[mcp,sentence-transformers]>=0.21.0\" nest_asyncio pandas" ] }, { @@ -611,8 +611,13 @@ "outputs": [], "source": [ "# NBVAL_SKIP\n", + "import asyncio\n", "import copy\n", + "import socket\n", + "import subprocess\n", + "import time\n", "\n", + "from fastmcp import Client\n", "from fastmcp.client.auth import BearerAuth\n", "from fastmcp.server.auth.providers.jwt import RSAKeyPair\n", "\n", @@ -645,6 +650,14 @@ "AUTH_HOST, AUTH_PORT = \"127.0.0.1\", 8001\n", "AUTH_URL = f\"http://{AUTH_HOST}:{AUTH_PORT}/mcp\"\n", "\n", + "# Section 6 is self-contained: define the port probe here so the cell runs even\n", + "# if you used Option A in Section 3 (terminal-launched server) and skipped its cell.\n", + "def port_in_use(host: str, port: int) -> bool:\n", + " with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:\n", + " sock.settimeout(0.5)\n", + " return sock.connect_ex((host, port)) == 0\n", + "\n", + "\n", "if port_in_use(AUTH_HOST, AUTH_PORT):\n", " raise RuntimeError(\n", " f\"Port {AUTH_PORT} is already in use. Free it or pick another port for the \"\n",