{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "IR5wivW8THt7"
   },
   "source": [
    "# Tutorial: Building an Agentic RAG with Fallback to Websearch\n",
    "\n",
    "- **Level**: Intermediate\n",
    "- **Time to complete**: 10 minutes\n",
    "- **Components Used**: [`ConditionalRouter`](https://docs.haystack.deepset.ai/docs/conditionalrouter), [`SerperDevWebSearch`](https://docs.haystack.deepset.ai/docs/serperdevwebsearch), [`ChatPromptBuilder`](https://docs.haystack.deepset.ai/docs/chatpromptbuilder), [`OpenAIChatGenerator`](https://docs.haystack.deepset.ai/docs/openaichatgenerator)\n",
    "- **Prerequisites**: You must have an [OpenAI API Key](https://platform.openai.com/api-keys) and a [Serper API Key](https://serper.dev/api-key) for this tutorial\n",
    "- **Goal**: After completing this tutorial, you'll have learned how to create an agentic RAG pipeline with conditional routing that can fallback to websearch if the answer is not present in your dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "F-a-MAMVat-o"
   },
   "source": [
    "## Overview\n",
    "\n",
    "When developing applications using **retrieval augmented generation ([RAG](https://www.deepset.ai/blog/llms-retrieval-augmentation))**, the retrieval step plays a critical role. It serves as the primary information source for **large language models (LLMs)** to generate responses. However, if your database lacks the necessary information, the retrieval step's effectiveness is limited. In such scenarios, it may be practical incorporate **agentic behavior** and use the web as a fallback data source for your RAG application. By implementing a conditional routing mechanism in your system, you gain complete control over the data flow, enabling you to design a system that can leverage the web as its data source under some conditions.\n",
    "\n",
    "In this tutorial, you will learn how to create an agentic RAG pipeline with conditional routing that directs the query to a **web-based RAG** route if the answer is not found in the initially given documents."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LSwNKkeKeq0f"
   },
   "source": [
    "## Development Environment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "eGJ7GmCBas4R"
   },
   "source": [
    "### Prepare the Colab Environment\n",
    "\n",
    "- [Enable GPU Runtime in Colab](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration)\n",
    "- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/setting-the-log-level)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "FwIgIpE2XqpO"
   },
   "source": [
    "### Install Haystack\n",
    "\n",
    "Install Haystack with `pip`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "uba0mntlqs_O"
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "pip install haystack-ai"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QfECEAy2Jdqs"
   },
   "source": [
    "### Enter API Keys\n",
    "\n",
    "Enter API keys required for this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "13U7Z_k3yE-F",
    "outputId": "53eea993-2c78-4fce-d4d2-f63034940fad"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Enter OpenAI API key:··········\n",
      "Enter Serper Api key: ··········\n"
     ]
    }
   ],
   "source": [
    "from getpass import getpass\n",
    "import os\n",
    "\n",
    "if \"OPENAI_API_KEY\" not in os.environ:\n",
    "    os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter OpenAI API key:\")\n",
    "if \"SERPERDEV_API_KEY\" not in os.environ:\n",
    "    os.environ[\"SERPERDEV_API_KEY\"] = getpass(\"Enter Serper Api key: \")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "i_AlhPv1T-4t"
   },
   "source": [
    "## Populate a Document Store\n",
    "\n",
    "Create a Document about Munich, where the answer to your question will be initially searched and write it to [InMemoryDocumentStore](https://docs.haystack.deepset.ai/docs/inmemorydocumentstore)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "5CHbQlLMyVbg",
    "outputId": "3421301e-af07-4f46-f0e3-c71e0ced379b"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from haystack.dataclasses import Document\n",
    "from haystack.document_stores.in_memory import InMemoryDocumentStore\n",
    "\n",
    "document_store = InMemoryDocumentStore()\n",
    "\n",
    "documents = [\n",
    "    Document(\n",
    "        content=\"\"\"Munich, the vibrant capital of Bavaria in southern Germany, exudes a perfect blend of rich cultural\n",
    "heritage and modern urban sophistication. Nestled along the banks of the Isar River, Munich is renowned\n",
    "for its splendid architecture, including the iconic Neues Rathaus (New Town Hall) at Marienplatz and\n",
    "the grandeur of Nymphenburg Palace. The city is a haven for art enthusiasts, with world-class museums like the\n",
    "Alte Pinakothek housing masterpieces by renowned artists. Munich is also famous for its lively beer gardens, where\n",
    "locals and tourists gather to enjoy the city's famed beers and traditional Bavarian cuisine. The city's annual\n",
    "Oktoberfest celebration, the world's largest beer festival, attracts millions of visitors from around the globe.\n",
    "Beyond its cultural and culinary delights, Munich offers picturesque parks like the English Garden, providing a\n",
    "serene escape within the heart of the bustling metropolis. Visitors are charmed by Munich's warm hospitality,\n",
    "making it a must-visit destination for travelers seeking a taste of both old-world charm and contemporary allure.\"\"\"\n",
    "    )\n",
    "]\n",
    "\n",
    "document_store.write_documents(documents)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zMNy0tjtUh_L"
   },
   "source": [
    "## Creating the Initial RAG Pipeline Components\n",
    "\n",
    "First, you need to initalize components for a [RAG pipeline](https://haystack.deepset.ai/tutorials/27_first_rag_pipeline). For that, define a prompt instructing the LLM to respond with the text `\"no_answer\"` if the provided documents do not offer enough context to answer the query. Next, initialize a [ChatPromptBuilder](https://docs.haystack.deepset.ai/docs/chatpromptbuilder) with that prompt. `ChatPromptBuilder` accepts prompts in the form of `ChatMessage`. It's crucial that the LLM replies with `\"no_answer\"` as you will use this keyword to indicate that the query should be directed to the fallback web search route.\n",
    "\n",
    "As the LLM, you will use an [OpenAIChatGenerator](https://docs.haystack.deepset.ai/docs/openaichatgenerator) with the `gpt-4o-mini` model.\n",
    "\n",
    "> The provided prompt works effectively with the `gpt-4o-mini` model. If you prefer to use a different [ChatGenerator](https://docs.haystack.deepset.ai/docs/generators), you may need to update the prompt to provide clear instructions to your model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "id": "nzhn2kDfqvbs"
   },
   "outputs": [],
   "source": [
    "from haystack.components.builders import ChatPromptBuilder\n",
    "from haystack.dataclasses import ChatMessage\n",
    "from haystack.components.generators.chat import OpenAIChatGenerator\n",
    "from haystack.components.retrievers.in_memory import InMemoryBM25Retriever\n",
    "\n",
    "retriever = InMemoryBM25Retriever(document_store)\n",
    "\n",
    "prompt_template = [\n",
    "    ChatMessage.from_user(\n",
    "        \"\"\"\n",
    "Answer the following query given the documents.\n",
    "If the answer is not contained within the documents reply with 'no_answer'\n",
    "\n",
    "Documents:\n",
    "{% for document in documents %}\n",
    "  {{document.content}}\n",
    "{% endfor %}\n",
    "Query: {{query}}\n",
    "\"\"\"\n",
    "    )\n",
    "]\n",
    "\n",
    "prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables=\"*\")\n",
    "llm = OpenAIChatGenerator(model=\"gpt-4o-mini\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LepACkkWPsBx"
   },
   "source": [
    "## Initializing the Web-RAG Components\n",
    "\n",
    "Initialize the necessary components for a web-based RAG application. Along with a `ChatPromptBuilder` and an `OpenAIChatGenerator`, you will need a [SerperDevWebSearch](https://docs.haystack.deepset.ai/docs/serperdevwebsearch) to retrieve relevant documents for the query from the web.\n",
    "\n",
    "> If desired, you can use a different [ChatGenerator](https://docs.haystack.deepset.ai/docs/generators) for the web-based RAG branch of the pipeline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "VEYchFgQPxZ_"
   },
   "outputs": [],
   "source": [
    "from haystack.components.websearch.serper_dev import SerperDevWebSearch\n",
    "\n",
    "prompt_for_websearch = [\n",
    "    ChatMessage.from_user(\n",
    "        \"\"\"\n",
    "Answer the following query given the documents retrieved from the web.\n",
    "Your answer should indicate that your answer was generated from websearch.\n",
    "\n",
    "Documents:\n",
    "{% for document in documents %}\n",
    "  {{document.content}}\n",
    "{% endfor %}\n",
    "\n",
    "Query: {{query}}\n",
    "\"\"\"\n",
    "    )\n",
    "]\n",
    "\n",
    "websearch = SerperDevWebSearch()\n",
    "prompt_builder_for_websearch = ChatPromptBuilder(template=prompt_for_websearch, required_variables=\"*\")\n",
    "llm_for_websearch = OpenAIChatGenerator(model=\"gpt-4o-mini\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "vnacak_tVWqv"
   },
   "source": [
    "## Creating the ConditionalRouter\n",
    "\n",
    "[ConditionalRouter](https://docs.haystack.deepset.ai/docs/conditionalrouter) is the key component that enables agentic behavior and handles data routing on specific conditions. You need to define a `condition`, an `output`, an `output_name` and an `output_type` for each route. Each route that the `ConditionalRouter` creates acts as the output of this component and can be connected to other components in the same pipeline.  \n",
    "\n",
    "In this case, you need to define two routes:\n",
    "- If the LLM replies with the `\"no_answer\"` keyword, the pipeline should perform web search. It means that you will put the original `query` in the output value to pass to the next component (in this case the next component will be the `SerperDevWebSearch`) and the output name will be `go_to_websearch`.\n",
    "- Otherwise, the given documents are enough for an answer and pipeline execution ends here. Return the LLM reply in the output named `answer`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "id": "qyE9rGcawX3F"
   },
   "outputs": [],
   "source": [
    "from haystack.components.routers import ConditionalRouter\n",
    "\n",
    "routes = [\n",
    "    {\n",
    "        \"condition\": \"{{'no_answer' in replies[0].text}}\",\n",
    "        \"output\": \"{{query}}\",\n",
    "        \"output_name\": \"go_to_websearch\",\n",
    "        \"output_type\": str,\n",
    "    },\n",
    "    {\n",
    "        \"condition\": \"{{'no_answer' not in replies[0].text}}\",\n",
    "        \"output\": \"{{replies[0].text}}\",\n",
    "        \"output_name\": \"answer\",\n",
    "        \"output_type\": str,\n",
    "    },\n",
    "]\n",
    "\n",
    "router = ConditionalRouter(routes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Wdyko78oXb5a"
   },
   "source": [
    "## Building the Agentic RAG Pipeline\n",
    "\n",
    "Add all components to your pipeline and connect them. `go_to_websearch` output of the `router` should be connected to the `websearch` to retrieve documents from the web and also to `prompt_builder_for_websearch` to use in the prompt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "4sCyBwc0oTVs",
    "outputId": "d7572db1-e2c6-4d2d-ecbd-9733edfc1eea"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<haystack.core.pipeline.pipeline.Pipeline object at 0x7c78c79cdfd0>\n",
       "🚅 Components\n",
       "  - retriever: InMemoryBM25Retriever\n",
       "  - prompt_builder: ChatPromptBuilder\n",
       "  - llm: OpenAIChatGenerator\n",
       "  - router: ConditionalRouter\n",
       "  - websearch: SerperDevWebSearch\n",
       "  - prompt_builder_for_websearch: ChatPromptBuilder\n",
       "  - llm_for_websearch: OpenAIChatGenerator\n",
       "🛤️ Connections\n",
       "  - retriever.documents -> prompt_builder.documents (List[Document])\n",
       "  - prompt_builder.prompt -> llm.messages (List[ChatMessage])\n",
       "  - llm.replies -> router.replies (List[ChatMessage])\n",
       "  - router.go_to_websearch -> websearch.query (str)\n",
       "  - router.go_to_websearch -> prompt_builder_for_websearch.query (str)\n",
       "  - websearch.documents -> prompt_builder_for_websearch.documents (List[Document])\n",
       "  - prompt_builder_for_websearch.prompt -> llm_for_websearch.messages (List[ChatMessage])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from haystack import Pipeline\n",
    "\n",
    "agentic_rag_pipe = Pipeline()\n",
    "agentic_rag_pipe.add_component(\"retriever\", retriever)\n",
    "agentic_rag_pipe.add_component(\"prompt_builder\", prompt_builder)\n",
    "agentic_rag_pipe.add_component(\"llm\", llm)\n",
    "agentic_rag_pipe.add_component(\"router\", router)\n",
    "agentic_rag_pipe.add_component(\"websearch\", websearch)\n",
    "agentic_rag_pipe.add_component(\"prompt_builder_for_websearch\", prompt_builder_for_websearch)\n",
    "agentic_rag_pipe.add_component(\"llm_for_websearch\", llm_for_websearch)\n",
    "\n",
    "agentic_rag_pipe.connect(\"retriever\", \"prompt_builder.documents\")\n",
    "agentic_rag_pipe.connect(\"prompt_builder.prompt\", \"llm.messages\")\n",
    "agentic_rag_pipe.connect(\"llm.replies\", \"router.replies\")\n",
    "agentic_rag_pipe.connect(\"router.go_to_websearch\", \"websearch.query\")\n",
    "agentic_rag_pipe.connect(\"router.go_to_websearch\", \"prompt_builder_for_websearch.query\")\n",
    "agentic_rag_pipe.connect(\"websearch.documents\", \"prompt_builder_for_websearch.documents\")\n",
    "agentic_rag_pipe.connect(\"prompt_builder_for_websearch\", \"llm_for_websearch\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "d0HmdbUJKJ_9"
   },
   "source": [
    "### Visualize the Pipeline\n",
    "\n",
    "To understand how you formed this pipeline with conditional routing, use [show()](https://docs.haystack.deepset.ai/docs/visualizing-pipelines) method of the pipeline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "svF_SUK4rFwv"
   },
   "outputs": [],
   "source": [
    "# agentic_rag_pipe.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jgk1z6GGYH6J"
   },
   "source": [
    "## Running the Pipeline!\n",
    "\n",
    "In the `run()`, pass the query to `retriever`, `prompt_builder`, and the `router`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "d_l4rYmCoVki",
    "outputId": "3e5671fe-d992-4a65-810c-7b216d58a2e7"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Munich is the vibrant capital of Bavaria in southern Germany.\n"
     ]
    }
   ],
   "source": [
    "query = \"What region of Germany is Munich in?\"\n",
    "\n",
    "result = agentic_rag_pipe.run(\n",
    "    {\"retriever\": {\"query\": query}, \"prompt_builder\": {\"query\": query}, \"router\": {\"query\": query}}\n",
    ")\n",
    "\n",
    "# Print the `answer` coming from the ConditionalRouter\n",
    "print(result[\"router\"][\"answer\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dBN8eLSKgb16"
   },
   "source": [
    "✅ The answer to this query can be found in the defined document.\n",
    "\n",
    "Now, try a different query that doesn't have an answer in the given document and test if the web search works as expected:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "_v-WdlSy365M",
    "outputId": "fd2080db-ee20-494d-ba16-79c0ad4e073b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Based on recent estimates, the population of Munich is approximately 1.51 million people as of 2023. This makes it the third-largest city in Germany, following Berlin and Hamburg. Additionally, different estimates suggest the population could be around 1,456,039 or even higher depending on the source, but the figure of 1.51 million is widely recognized at this time. \n",
      "\n",
      "(Source: Web search results)\n"
     ]
    }
   ],
   "source": [
    "query = \"How many people live in Munich?\"\n",
    "\n",
    "result = agentic_rag_pipe.run(\n",
    "    {\"retriever\": {\"query\": query}, \"prompt_builder\": {\"query\": query}, \"router\": {\"query\": query}}\n",
    ")\n",
    "\n",
    "# Print the `replies` generated using the web searched Documents\n",
    "print(result[\"llm_for_websearch\"][\"replies\"][0].text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wUkuXoWnHa5c"
   },
   "source": [
    "If you check the whole result, you will see that `websearch` component also provides links to Documents retrieved from the web:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "_EYLZguZGznY",
    "outputId": "0e9831c9-a5cd-4807-b341-308f184d0fce"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'websearch': {'links': ['https://www.britannica.com/place/Munich-Bavaria-Germany',\n",
       "   'https://en.wikipedia.org/wiki/Munich',\n",
       "   'https://worldpopulationreview.com/cities/germany/munich',\n",
       "   'https://en.wikipedia.org/wiki/Demographics_of_Munich',\n",
       "   'https://www.macrotrends.net/cities/204371/munich/population',\n",
       "   'https://www.statista.com/statistics/505774/munich-population/',\n",
       "   'https://www.citypopulation.de/en/germany/bayern/m%C3%BCnchen_stadt/09162000__m%C3%BCnchen/',\n",
       "   'https://eurocities.eu/cities/munich/',\n",
       "   'https://www.youtube.com/watch?v=Yw1G6a9rqus',\n",
       "   'https://www.statista.com/statistics/519723/munich-population-by-age-group/']},\n",
       " 'llm_for_websearch': {'replies': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text='Based on recent estimates, the population of Munich is approximately 1.51 million people as of 2023. This makes it the third-largest city in Germany, following Berlin and Hamburg. Additionally, different estimates suggest the population could be around 1,456,039 or even higher depending on the source, but the figure of 1.51 million is widely recognized at this time. \\n\\n(Source: Web search results)')], _name=None, _meta={'model': 'gpt-4o-mini-2024-07-18', 'index': 0, 'finish_reason': 'stop', 'usage': {'completion_tokens': 86, 'prompt_tokens': 455, 'total_tokens': 541, 'completion_tokens_details': CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), 'prompt_tokens_details': PromptTokensDetails(audio_tokens=0, cached_tokens=0)}})]}}"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6nhdYK-vHpNM"
   },
   "source": [
    "## What's next\n",
    "\n",
    "🎉 Congratulations! You've built an agentic RAG pipeline with conditional routing! You can now customize the condition for your specific use case and create a custom Haystack pipeline to meet your needs.\n",
    "\n",
    "If you liked this tutorial, there's more to learn about Haystack:\n",
    "- [Build a Tool-Calling Agent](https://haystack.deepset.ai/tutorials/43_building_a_tool_calling_agent)\n",
    "- [Evaluating RAG Pipelines](https://haystack.deepset.ai/tutorials/35_evaluating_rag_pipelines)\n",
    "\n",
    "To stay up to date on the latest Haystack developments, you can [sign up for our newsletter](https://landing.deepset.ai/haystack-community-updates) or [join Haystack discord community](https://discord.gg/Dr63fr9NDS).\n",
    "\n",
    "Thanks for reading!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "GefOjdDd3890"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
