{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b2s1IpH6OGZQ"
      },
      "source": [
        "# Haystack 💙 Google Gemini\n",
        "\n",
        "*by Tuana Celik: [Twitter](https://twitter.com/tuanacelik), [LinkedIn](https://www.linkedin.com/in/tuanacelik/), Tilde Thurium: [Twitter](https://twitter.com/annthurium), [LinkedIn](https://www.linkedin.com/in/annthurium/) and Silvano Cerza: [LinkedIn](https://www.linkedin.com/in/silvanocerza/)*\n",
        "\n",
        "This is a notebook showing how you can use Gemini + Vertex AI with Haystack.\n",
        "\n",
        "To use Gemini models on the Gemini Developer API with Haystack, check out our [documentation](https://docs.haystack.deepset.ai/docs/googlegenaichatgenerator).\n",
        "\n",
        "\n",
        "\n",
        "Gemini is Google's newest model. You can read more about its capabilities [here](https://deepmind.google/technologies/gemini/#capabilities).\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XromVwB1nQ76"
      },
      "source": [
        "## Install dependencies\n",
        "\n",
        "As a prerequisite, you need to have a Google Cloud Project set up that has access to Vertex AI and Gemini.\n",
        "\n",
        "Useful resources:\n",
        "- [Vertex AI quick start](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)\n",
        "- [Gemini API in Vertex AI quickstart](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart)\n",
        "\n",
        "Following that, you'll only need to authenticate yourself in this Colab.\n",
        "\n",
        "First thing first we need to install our dependencies including [Google Gen AI](https://haystack.deepset.ai/integrations/google-genai) integration:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "wxGffegfOGZR"
      },
      "outputs": [],
      "source": [
        "! pip install haystack-ai google-genai-haystack trafilatura"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tHDiEzI2OGZU"
      },
      "source": [
        "Let's login using Application Default Credentials (ADCs). For more info see the [official documentation](https://cloud.google.com/docs/authentication/provide-credentials-adc)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "id": "iKvKRuRXOGZU"
      },
      "outputs": [],
      "source": [
        "from google.colab import auth\n",
        "\n",
        "auth.authenticate_user()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5uw5F0M_OGZT"
      },
      "source": [
        "Remember to set the `project_id` variable to a valid project ID that you have enough authorization to use for Gemini.\n",
        "We're going to use this one throughout the example!\n",
        "\n",
        "To find your project ID you can find it in the [GCP resource manager](https://console.cloud.google.com/cloud-resource-manager) or locally by running `gcloud projects list` in your terminal. For more info on the gcloud CLI see the [official documentation](https://cloud.google.com/cli)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VzA_x7iFOGZT"
      },
      "outputs": [],
      "source": [
        "project_id = input(\"Enter your project ID:\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "08TA9zAQlqy6"
      },
      "source": [
        "## Use `gemini-2.5-flash`"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "c1ynAXT5mI1s"
      },
      "source": [
        "### Answer Questions\n",
        "\n",
        "Now that we setup everything we can create an instance of our [`GoogleGenAIChatGenerator`](https://docs.haystack.deepset.ai/docs/googlegenaichatgenerator). This component supports both Gemini and Vertex AI. For this demo, we will set `api=\"vertex\"`, and pass our project_id as vertex_ai_project."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "id": "f6Ql3qSlOGZV"
      },
      "outputs": [],
      "source": [
        "from haystack_integrations.components.generators.google_genai import GoogleGenAIChatGenerator\n",
        "\n",
        "gemini = GoogleGenAIChatGenerator(model=\"gemini-2.5-flash\", api=\"vertex\", vertex_ai_project=project_id, vertex_ai_location=\"europe-west1\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0qRBY_hGOGZV"
      },
      "source": [
        "Let's start by asking something simple.\n",
        "\n",
        "This component expects a list of `ChatMessage` as input to the `run()` method. You can pass text or function calls through the messages."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "id": "QbqFt4IiOGZV",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "7e7bcd01-56b9-4727-aa9f-c9558f580c37"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "The most interesting thing I know, and one of the most profound mysteries in all of science, is that **about 95% of the universe is made of something we cannot see or directly detect: dark energy and dark matter.**\n",
            "\n",
            "Imagine if 95% of the world around you was completely invisible and unknown, yet it fundamentally shaped everything you *could* see. That's our current situation with the cosmos.\n",
            "\n",
            "*   **Dark Matter** makes up about 27% of the universe. We know it exists because of its gravitational effects – it holds galaxies together, prevents clusters from flying apart, and influenced the large-scale structure of the early universe. But it doesn't absorb, reflect, or emit light, making it \"dark.\" We don't know what particles it's made of.\n",
            "*   **Dark Energy** makes up about 68% of the universe. It's an even bigger enigma. We infer its existence because it's responsible for the accelerated expansion of the universe. It's essentially pushing the cosmos apart, overcoming the attractive force of gravity. Its nature is one of the biggest unsolved problems in physics.\n",
            "\n",
            "This means that all the stars, planets, galaxies, gas, and dust – everything we can observe with telescopes – makes up only about 5% of the universe's total mass-energy content. The vast majority of reality is utterly mysterious, and understanding it is one of the greatest scientific quests of our time. It dictates the fate of the cosmos itself.\n"
          ]
        }
      ],
      "source": [
        "from haystack.dataclasses import ChatMessage\n",
        "\n",
        "messages = [ChatMessage.from_user(\"What is the most interesting thing you know?\")]\n",
        "result = gemini.run(messages = messages)\n",
        "for answer in result[\"replies\"]:\n",
        "    print(answer.text)"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Answer Questions about Images\n",
        "\n",
        "Let's try something a bit different! `gemini-2.5-flash` can also work with images, let's see if we can have it answer questions about some robots 👇\n",
        "\n",
        "We're going to download some images for this example. 🤖"
      ],
      "metadata": {
        "id": "VjFgF37tcKB_"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from haystack.dataclasses import ImageContent\n",
        "\n",
        "urls = [\n",
        "    \"https://upload.wikimedia.org/wikipedia/en/5/5c/C-3PO_droid.png\",\n",
        "    \"https://platform.theverge.com/wp-content/uploads/sites/2/chorus/assets/4658579/terminator_endoskeleton_1020.jpg\",\n",
        "    \"https://upload.wikimedia.org/wikipedia/en/3/39/R2-D2_Droid.png\",\n",
        "]\n",
        "\n",
        "images = [ImageContent.from_url(url) for url in urls]\n",
        "\n",
        "messages = [ChatMessage.from_user(content_parts=[\"What can you tell me about these robots? Be short and graceful.\", *images])]\n",
        "result = gemini.run(messages = messages)\n",
        "for answer in result[\"replies\"]:\n",
        "    print(answer.text)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "HNLuSPXKcUoi",
        "outputId": "fdd4df25-8628-4162-9e0e-228ca21f3a69"
      },
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "These are iconic robots from popular culture:\n",
            "\n",
            "1.  **C-3PO:** A refined protocol droid, fluent in countless languages, known for his golden appearance and nervous demeanor.\n",
            "2.  **T-800 Endoskeleton:** A formidable, relentless combat machine, skeletal and chilling, from a dystopian future.\n",
            "3.  **R2-D2:** A courageous and resourceful astromech, full of personality, who communicates in beeps and whistles.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gLIIQ4PZmX-H"
      },
      "source": [
        "## Function Calling with `gemini-2.5-flash`"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iCiSz840mfME"
      },
      "source": [
        "\n",
        "With `gemini-2.5-flash`, we can also use function calling!\n",
        "So let's see how we can do that 👇\n",
        "\n",
        "Let's see if we can build a system that can run a `get_current_weather` function, based on a question asked in natural language.\n",
        "\n",
        "First we create our function definition and tool (learn more about [Tools](https://docs.haystack.deepset.ai/docs/tool) in the docs).\n",
        "\n",
        "For demonstration purposes, we're simply creating a `get_current_weather` function that returns an object which will _always_ tell us it's 'Sunny, and 21.8 degrees'... If it's Celsius, that's a good day! ☀️"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "id": "4Wa_IoDDNg9V"
      },
      "outputs": [],
      "source": [
        "from haystack.components.tools import ToolInvoker\n",
        "from haystack.tools import tool\n",
        "from typing import Annotated\n",
        "\n",
        "@tool\n",
        "def get_current_weather(\n",
        "    location: Annotated[str, \"The city for which to get the weather, e.g. 'San Francisco'\"] = \"Munich\",\n",
        "    unit: Annotated[str, \"The unit for the temperature, e.g. 'celsius'\"] = \"celsius\",\n",
        "):\n",
        "  return {\"weather\": \"sunny\", \"temperature\": 21.8, \"unit\": unit}"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "id": "HD4G61Z0OGZX",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "dd732a1d-cd76-4504-b7f1-e91be9c0bb68"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=''), ToolCall(tool_name='get_current_weather', arguments={'unit': 'celsius', 'location': 'Berlin'}, id=None)], _name=None, _meta={'model': 'gemini-2.5-flash', 'finish_reason': 'stop', 'usage': {'prompt_tokens': 53, 'completion_tokens': 10, 'total_tokens': 126}})]\n"
          ]
        }
      ],
      "source": [
        "user_message = [ChatMessage.from_user(\"What is the temperature in celsius in Berlin?\")]\n",
        "replies = gemini.run(messages=user_message, tools=[get_current_weather])[\"replies\"]\n",
        "print(replies)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bbEpArmfOGZX"
      },
      "source": [
        "Look at that! We go a message with some interesting information now.\n",
        "We can use that information to call a real function locally.\n",
        "\n",
        "Let's do exactly that and pass the result back to Gemini."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "id": "c-DurWKOOSOk",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "e43f370d-84c5-4d1c-9e97-89f8d26c3824"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[ChatMessage(_role=<ChatRole.TOOL: 'tool'>, _content=[ToolCallResult(result=\"{'weather': 'sunny', 'temperature': 21.8, 'unit': 'celsius'}\", origin=ToolCall(tool_name='get_current_weather', arguments={'unit': 'celsius', 'location': 'Berlin'}, id=None), error=False)], _name=None, _meta={})]\n",
            "The temperature in Berlin is 21.8°C and it's sunny.\n"
          ]
        }
      ],
      "source": [
        "tool_invoker = ToolInvoker(tools=[get_current_weather])\n",
        "tool_messages = tool_invoker.run(messages=replies)[\"tool_messages\"]\n",
        "print(tool_messages)\n",
        "\n",
        "messages = user_message + replies + tool_messages\n",
        "\n",
        "res = gemini.run(messages = messages)\n",
        "print(res[\"replies\"][0].text)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YsiwN7dbOwSW"
      },
      "source": [
        "Seems like the weather is nice and sunny, remember to put on your sunglasses. 😎"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Pi37EVlDenPw"
      },
      "source": [
        "## Build a full Retrieval-Augmented Generation Pipeline with `gemini-2.5-flash`"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fQz9_N46hniU"
      },
      "source": [
        "As a final exercise, let's add the `GoogleGenAIChatGenerator` to a full RAG pipeline. In the example below, we are building a RAG pipeline that does question answering on the web, using `gemini-2.5-flash`"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "43QF7y1PhRQn"
      },
      "outputs": [],
      "source": [
        "from haystack.components.fetchers.link_content import LinkContentFetcher\n",
        "from haystack.components.converters import HTMLToDocument\n",
        "from haystack.components.preprocessors import DocumentSplitter\n",
        "from haystack.components.rankers import SentenceTransformersSimilarityRanker\n",
        "from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder\n",
        "from haystack import Pipeline\n",
        "\n",
        "fetcher = LinkContentFetcher()\n",
        "converter = HTMLToDocument()\n",
        "document_splitter = DocumentSplitter(split_by=\"word\", split_length=50)\n",
        "similarity_ranker = SentenceTransformersSimilarityRanker(top_k=3)\n",
        "gemini = GoogleGenAIChatGenerator(model=\"gemini-2.5-flash\", api=\"vertex\", vertex_ai_project=project_id, vertex_ai_location=\"europe-west1\")\n",
        "\n",
        "prompt_template = [ChatMessage.from_user(\"\"\"\n",
        "According to these documents:\n",
        "\n",
        "{% for doc in documents %}\n",
        "  {{ doc.content }}\n",
        "{% endfor %}\n",
        "\n",
        "Answer the given question: {{question}}\n",
        "Answer:\n",
        "\"\"\")]\n",
        "prompt_builder = ChatPromptBuilder(template=prompt_template)\n",
        "\n",
        "pipeline = Pipeline()\n",
        "pipeline.add_component(\"fetcher\", fetcher)\n",
        "pipeline.add_component(\"converter\", converter)\n",
        "pipeline.add_component(\"splitter\", document_splitter)\n",
        "pipeline.add_component(\"ranker\", similarity_ranker)\n",
        "pipeline.add_component(\"prompt_builder\", prompt_builder)\n",
        "pipeline.add_component(\"gemini\", gemini)\n",
        "\n",
        "pipeline.connect(\"fetcher.streams\", \"converter.sources\")\n",
        "pipeline.connect(\"converter.documents\", \"splitter.documents\")\n",
        "pipeline.connect(\"splitter.documents\", \"ranker.documents\")\n",
        "pipeline.connect(\"ranker.documents\", \"prompt_builder.documents\")\n",
        "pipeline.connect(\"prompt_builder.prompt\", \"gemini\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cGqt94B0fZi1"
      },
      "source": [
        "Let's try asking Gemini to tell us about Haystack and how to use it."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {
        "id": "EhEx8xO7jMf9",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "6d09c658-b29b-42c8-a156-a1208d381217"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "In Haystack, pipelines are structured as graphs. Specifically, Haystack 1.x pipelines were based on Directed Acyclic Graphs (DAGs). In Haystack 2.0, the \"A\" (acyclic) is being removed from DAG, meaning pipelines can now branch out, join, and cycle back to other components, allowing for more complex graph structures that can retry or loop.\n"
          ]
        }
      ],
      "source": [
        "question = \"What do graphs have to do with Haystack?\"\n",
        "result = pipeline.run({\"prompt_builder\": {\"question\": question},\n",
        "                   \"ranker\": {\"query\": question},\n",
        "                   \"fetcher\": {\"urls\": [\"https://haystack.deepset.ai/blog/introducing-haystack-2-beta-and-advent\"]}})\n",
        "\n",
        "for message in result[\"gemini\"][\"replies\"]:\n",
        "  print(message.text)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xk2SAGT_l25K"
      },
      "source": [
        "Now you've seen some of what Gemini can do, as well as how to integrate it with Haystack. If you want to learn more, check out the Haystack [docs](https://docs.haystack.deepset.ai/docs) or [tutorials](https://haystack.deepset.ai/tutorials)"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": ".venv",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.13.5"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}