{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cFFW8D-weE2S"
   },
   "source": [
    "# Tutorial: Serializing LLM Pipelines\n",
    "\n",
    "- **Level**: Beginner\n",
    "- **Time to complete**: 10 minutes\n",
    "- **Components Used**: [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator), [`ChatPromptBuilder`](https://docs.haystack.deepset.ai/docs/chatpromptbuilder)\n",
    "- **Prerequisites**: None\n",
    "- **Goal**: After completing this tutorial, you'll understand how to serialize and deserialize between YAML and Python code."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "DxhqjpHfenQl"
   },
   "source": [
    "## Overview\n",
    "\n",
    "**📚 Useful Documentation:** [Serialization](https://docs.haystack.deepset.ai/docs/serialization)\n",
    "\n",
    "Serialization means converting a pipeline to a format that you can save on your disk and load later. It's especially useful because a serialized pipeline can be saved on disk or a database, get sent over a network and more. \n",
    "\n",
    "Although it's possible to serialize into other formats too, Haystack supports YAML out of the box to make it easy for humans to make changes without the need to go back and forth with Python code. In this tutorial, we will create a very simple pipeline in Python code, serialize it into YAML, make changes to it, and deserialize it back into a Haystack `Pipeline`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TLaHxdJcfWtI"
   },
   "source": [
    "## Installing Haystack\n",
    "\n",
    "Install Haystack with `pip`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "CagzMFdkeBBp",
    "outputId": "e304450a-24e3-4ef8-e642-1fbb573e7d55"
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "pip install haystack-ai"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "kS8rz9gGgMBb"
   },
   "source": [
    "## Creating a Simple Pipeline\n",
    "\n",
    "First, let's create a very simple pipeline that expects a `topic` from the user, and generates a summary about the topic with `Qwen/Qwen2.5-1.5B-Instruct`. Feel free to modify the pipeline as you wish. Note that in this pipeline we are using a local model that we're getting from Hugging Face. We're using a relatively small, open-source LLM."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "odZJjD7KgO1g"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<haystack.core.pipeline.pipeline.Pipeline object at 0x13cc77370>\n",
       "🚅 Components\n",
       "  - builder: ChatPromptBuilder\n",
       "  - llm: HuggingFaceLocalChatGenerator\n",
       "🛤️ Connections\n",
       "  - builder.prompt -> llm.messages (List[ChatMessage])"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from haystack import Pipeline\n",
    "from haystack.components.builders import ChatPromptBuilder\n",
    "from haystack.dataclasses import ChatMessage\n",
    "from haystack.components.generators.chat import HuggingFaceLocalChatGenerator\n",
    "\n",
    "template = [\n",
    "    ChatMessage.from_user(\n",
    "        \"\"\"\n",
    "Please create a summary about the following topic:\n",
    "{{ topic }}\n",
    "\"\"\"\n",
    "    )\n",
    "]\n",
    "\n",
    "builder = ChatPromptBuilder(template=template)\n",
    "llm = HuggingFaceLocalChatGenerator(model=\"Qwen/Qwen2.5-1.5B-Instruct\", generation_kwargs={\"max_new_tokens\": 150})\n",
    "\n",
    "pipeline = Pipeline()\n",
    "pipeline.add_component(name=\"builder\", instance=builder)\n",
    "pipeline.add_component(name=\"llm\", instance=llm)\n",
    "\n",
    "pipeline.connect(\"builder.prompt\", \"llm.messages\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "W-onTCXfqFjG",
    "outputId": "e81cd5ea-db66-4f0e-f787-5aed7a7b4692"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Climate change is a global issue that has been on the rise in recent years due to various factors such as human activities, natural disasters, and climate variability. The impacts of climate change can be seen in extreme weather events, rising sea levels, melting ice caps, and changing ecosystems.\n",
      "There are different types of climate change, including global warming, local climate change, and regional climate change. Global warming refers to changes in the Earth's temperature caused by greenhouse gas emissions from human activities, while local climate change refers to changes in the climate of specific areas or regions.\n",
      "To address climate change, there needs to be a concerted effort from governments, businesses, and individuals around the world to reduce greenhouse gas emissions, protect natural habitats, and invest in renewable energy sources\n"
     ]
    }
   ],
   "source": [
    "topic = \"Climate change\"\n",
    "result = pipeline.run(data={\"builder\": {\"topic\": topic}})\n",
    "print(result[\"llm\"][\"replies\"][0].text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "61r7hc1vuUMH"
   },
   "source": [
    "## Serialize the Pipeline to YAML\n",
    "\n",
    "Out of the box, Haystack supports YAML. Use `dumps()` to convert the pipeline to YAML:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "vYOEAesbrn4w",
    "outputId": "ef037904-79f4-46a4-c8e7-d03ea8dcb6c2"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "components:\n",
      "  builder:\n",
      "    init_parameters:\n",
      "      required_variables: null\n",
      "      template:\n",
      "      - _content:\n",
      "        - text: '\n",
      "\n",
      "          Please create a summary about the following topic:\n",
      "\n",
      "          {{ topic }}\n",
      "\n",
      "          '\n",
      "        _meta: {}\n",
      "        _name: null\n",
      "        _role: user\n",
      "      variables: null\n",
      "    type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder\n",
      "  llm:\n",
      "    init_parameters:\n",
      "      generation_kwargs:\n",
      "        max_new_tokens: 150\n",
      "        stop_sequences: []\n",
      "      huggingface_pipeline_kwargs:\n",
      "        device: cpu\n",
      "        model: Qwen/Qwen2.5-1.5B-Instruct\n",
      "        task: text-generation\n",
      "      streaming_callback: null\n",
      "      token:\n",
      "        env_vars:\n",
      "        - HF_API_TOKEN\n",
      "        - HF_TOKEN\n",
      "        strict: false\n",
      "        type: env_var\n",
      "    type: haystack.components.generators.chat.hugging_face_local.HuggingFaceLocalChatGenerator\n",
      "connections:\n",
      "- receiver: llm.messages\n",
      "  sender: builder.prompt\n",
      "max_runs_per_component: 100\n",
      "metadata: {}\n",
      "\n"
     ]
    }
   ],
   "source": [
    "yaml_pipeline = pipeline.dumps()\n",
    "\n",
    "print(yaml_pipeline)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0C7zGsUCGszq"
   },
   "source": [
    "You should get a pipeline YAML that looks like the following:\n",
    "\n",
    "```yaml\n",
    "components:\n",
    "  builder:\n",
    "    init_parameters:\n",
    "      required_variables: null\n",
    "      template:\n",
    "      - _content:\n",
    "        - text: '\n",
    "\n",
    "            Please create a summary about the following topic:\n",
    "\n",
    "            {{ topic }}\n",
    "\n",
    "            '\n",
    "        _meta: {}\n",
    "        _name: null\n",
    "        _role: user\n",
    "      variables: null\n",
    "    type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder\n",
    "  llm:\n",
    "    init_parameters:\n",
    "      init_parameters:\n",
    "      generation_kwargs:\n",
    "        max_new_tokens: 150\n",
    "        stop_sequences: []\n",
    "      huggingface_pipeline_kwargs:\n",
    "        device: cpu\n",
    "        model: Qwen/Qwen2.5-1.5B-Instruct\n",
    "        task: text-generation\n",
    "      streaming_callback: null\n",
    "      token:\n",
    "        env_vars:\n",
    "        - HF_API_TOKEN\n",
    "        - HF_TOKEN\n",
    "        strict: false\n",
    "        type: env_var\n",
    "    type: haystack.components.generators.chat.hugging_face_local.HuggingFaceLocalChatGenerator\n",
    "connections:\n",
    "- receiver: llm.messages\n",
    "  sender: builder.prompt\n",
    "max_runs_per_component: 100\n",
    "metadata: {}\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "f9MknQ-1vQ8r"
   },
   "source": [
    "## Editing a Pipeline in YAML\n",
    "\n",
    "Let's see how we can make changes to serialized pipelines. For example, below, let's modify the `ChatPromptBuilder`'s template to translate provided `sentence` to French:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "id": "U332-VjovFfn"
   },
   "outputs": [],
   "source": [
    "yaml_pipeline = \"\"\"\n",
    "components:\n",
    "  builder:\n",
    "    init_parameters:\n",
    "      template:\n",
    "      - _content:\n",
    "        - text: 'Please translate the following to French: \\n{{ sentence }}\\n'\n",
    "        _meta: {}\n",
    "        _name: null\n",
    "        _role: user\n",
    "      variables: null\n",
    "    type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder\n",
    "  llm:\n",
    "    init_parameters:\n",
    "      generation_kwargs:\n",
    "        max_new_tokens: 150\n",
    "        stop_sequences: []\n",
    "      huggingface_pipeline_kwargs:\n",
    "        device: cpu\n",
    "        model: Qwen/Qwen2.5-1.5B-Instruct\n",
    "        task: text-generation\n",
    "      streaming_callback: null\n",
    "      chat_template : \"{% for message in messages %}{% if message['role'] == 'user' %}{{ ' ' }}{% endif %}{{ message['content'] }}{% if not loop.last %}{{ '  ' }}{% endif %}{% endfor %}{{ eos_token }}\"\n",
    "      token:\n",
    "        env_vars:\n",
    "        - HF_API_TOKEN\n",
    "        - HF_TOKEN\n",
    "        strict: false\n",
    "        type: env_var\n",
    "    type: haystack.components.generators.chat.hugging_face_local.HuggingFaceLocalChatGenerator\n",
    "connections:\n",
    "- receiver: llm.messages\n",
    "  sender: builder.prompt\n",
    "max_runs_per_component: 100\n",
    "metadata: {}\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xLBtgY0Ov8nX"
   },
   "source": [
    "## Deseriazling a YAML Pipeline back to Python\n",
    "\n",
    "You can deserialize a pipeline by calling `loads()`. Below, we're deserializing our edited `yaml_pipeline`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "id": "OdlLnw-9wVN-"
   },
   "outputs": [],
   "source": [
    "from haystack import Pipeline\n",
    "\n",
    "new_pipeline = Pipeline.loads(yaml_pipeline)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "eVPh2cV6wcu9"
   },
   "source": [
    "Now we can run the new pipeline we defined in YAML. We had changed it so that the `ChatPromptBuilder` expects a `sentence` and translates the sentence to French:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "oGLi3EB_wbu6",
    "outputId": "ec6eae9f-a7ea-401d-c0ab-792748f6db6f"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'llm': {'replies': [ChatMessage(content='J'aime les capybaras', role=<ChatRole.ASSISTANT: 'assistant'>, name=None, meta={'finish_reason': 'stop', 'index': 0, 'model': 'Qwen/Qwen2.5-1.5B-Instruct', 'usage': {'completion_tokens': 13, 'prompt_tokens': 16, 'total_tokens': 29}})]}}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_pipeline.run(data={\"builder\": {\"sentence\": \"I love capybaras\"}})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What's next\n",
    "\n",
    "🎉 Congratulations! You've serialized a pipeline into YAML, edited it and ran it again!\n",
    "\n",
    "If you liked this tutorial, you may also enjoy:\n",
    "-  [Creating Your First QA Pipeline with Retrieval-Augmentation](https://haystack.deepset.ai/tutorials/27_first_rag_pipeline)\n",
    "\n",
    "To stay up to date on the latest Haystack developments, you can [sign up for our newsletter](https://landing.deepset.ai/haystack-community-updates). Thanks for reading!"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
