{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "2ErVy6A2NisJ"
   },
   "source": [
    " ## Tutorial: Build an Extractive QA Pipeline\n",
    "\n",
    "- **Level**: Beginner\n",
    "- **Time to complete**: 15 minutes\n",
    "- **Components Used**: [`ExtractiveReader`](https://docs.haystack.deepset.ai/docs/extractivereader), [`InMemoryDocumentStore`](https://docs.haystack.deepset.ai/docs/inmemorydocumentstore), [`InMemoryEmbeddingRetriever`](https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever), [`DocumentWriter`](https://docs.haystack.deepset.ai/docs/documentwriter), [`SentenceTransformersDocumentEmbedder`](https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder), [`SentenceTransformersTextEmbedder`](https://docs.haystack.deepset.ai/docs/sentencetransformerstextembedder)\n",
    "- **Goal**: After completing this tutorial, you'll have learned how to build a Haystack pipeline that uses an extractive model to display where the answer to your query is."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uClfTB7jN6g-"
   },
   "source": [
    "## Overview\n",
    "\n",
    "What is extractive question answering? So glad you asked! The short answer is that extractive models pull verbatim answers out of text. It's good for use cases where accuracy is paramount, and you need to know exactly where in the text that the answer came from. If you want additional context, here's [a deep dive on extractive versus generative language models](https://haystack.deepset.ai/blog/generative-vs-extractive-models).\n",
    "\n",
    "In this tutorial you'll create a Haystack pipeline that extracts answers to questions, based on the provided documents.\n",
    "\n",
    "To get data into the extractive pipeline, you'll also build an indexing pipeline to ingest the [Wikipedia pages of Seven Wonders of the Ancient World dataset](https://en.wikipedia.org/wiki/Wonders_of_the_World)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zQnSZtyyUJVF"
   },
   "source": [
    "#Installation\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "rwgpwV4eHVoo"
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "pip install haystack-ai accelerate \"sentence-transformers>=4.1.0\" \"datasets>=2.6.1\" \"transformers<5\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "U8I641xobh_w"
   },
   "source": [
    "## Load data into the `DocumentStore`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "b2HaHlBrSvLa"
   },
   "source": [
    "Before you can use this data in the extractive pipeline, you'll use an indexing pipeline to fetch it, process it, and load it into the document store.\n",
    "\n",
    "\n",
    "The data has already been cleaned and preprocessed, so turning it into Haystack `Documents` is fairly straightfoward.\n",
    "\n",
    "Using an `InMemoryDocumentStore` here keeps things simple. However, this general approach would work with [any document store that Haystack supports](https://docs.haystack.deepset.ai/docs/document-store).\n",
    "\n",
    "The `SentenceTransformersDocumentEmbedder` transforms each `Document` into a vector. Here we've used [`sentence-transformers/multi-qa-mpnet-base-dot-v1`](https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1). You can substitute any embedding model you like, as long as you use the same one in your extractive pipeline.\n",
    "\n",
    "Lastly, the `DocumentWriter` writes the vectorized documents to the `DocumentStore`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 638,
     "referenced_widgets": [
      "7003e95fe7594baa9dcf3b78001dae8c",
      "475b3edb1c8946c4963a36940ef409f2",
      "30d45a7d15334dda8554c4de497dc266",
      "3f9fd21272334981a8ed63e89f2415ec",
      "99a3f8d980294094a1304fe769da9cf3",
      "adce05857f4e4f8097af93187c183dcb",
      "a9c67d9a3d854287812f645eaa406436",
      "f476084c8a1b44f9a465a13e8a32fcfe",
      "7b2d68d73e09466cb58a6ccbe099c75f",
      "9ab932c6b3554e67bb75e905ec08f22b",
      "0202224cea944f43b8a35ef77499c249",
      "f0128d87740d449eb7e5efcc3045f44e",
      "7e46c39662474a7cbb4861e7d6cce0d2",
      "f82ebf66ad3140b68a39faf607a7ae05",
      "224cc6a051cb454e918b14082032c7ed",
      "5d71632d7d114cf3b52de382d701e5e3",
      "d7a1628ac76b4dfe806ba2168deecd9a",
      "8b5d0d408c7e46e89e2dc8120a1199e4",
      "76ea41afbf6d464cb5d1d64801bbf56b",
      "d592cc3aab6b424eb84241535ebb2022",
      "f9d7d8c70d62422bb6ff5ba0456aa55b",
      "979d095d9f084bb39b44177390ea7900",
      "518a021bc14546388ffc719adaa45c18",
      "5c115b3e3dd1404c83c2014ff4808044",
      "b2afa0fe92054d879b05735dfb21c44e",
      "a87486487eec43fdab9d6ddee6d140d2",
      "d52f4f25e582419a93ce0ea34fa56841",
      "bce92bccc0854b77945e624b188da8b1",
      "b3f9f63d5d3a4352be01235084c5bdd7",
      "71f0f3f109a8420faf5c0a8ada76e290",
      "42c58991151c4f2c987e10764cd133ce",
      "175af232479e442088c5a5f845ef002c",
      "50c70417ea1646e6a656e2a1f576659e",
      "0ca27757f7c64134931aef12165f4d74",
      "f0202e19e1864023814eecdebda87193",
      "98c062c25d9640079bc3e6e1f4470b6e",
      "9c1bf7c623b14e78859c18bf204563f9",
      "48ae0c0ea7bc4bef8e2c8ee23e479a0b",
      "e4aea1385f6c4a15b9aa67e3970a656f",
      "9d286197dabc4fa4bac113f5f38a4b99",
      "81d98fb3ed614541a20ea93178307b61",
      "1f54fa95ed6b4e2a88ee57abfe888bd0",
      "e73c764916aa4294a5c97a3e4c2a617d",
      "0de86e9226a84be0823d7ebfa942cf5f",
      "30f823aaa04f4fc78ca022339c60bea7",
      "a6d1096957204d2489827a1899c283bf",
      "8dbd465a74b246bcac26ba85afbe0eb4",
      "d7da5f0bad574c5496df9fe27f19566c",
      "c194badee8e34bb69b112917c4e2dc1c",
      "c6d5e596363e49409fdc4bcad63c2bf0",
      "ef9584e37fd849f982450d82f93094b7",
      "6663981b715d47e79b26f17c6836f0bf",
      "2f579637ed3f44828778817e7c83241e",
      "476349486a32467c84378337deb484ca",
      "5ed000371e164acd9b9227904a8b710f",
      "5b0ad080573e40919752004106dcd523",
      "b97b8e872dea42e2859b6e656088ecbd",
      "2ff5b6a518094b0a962f02324052dea8",
      "d2969fb654ba4726bd8550a6635bb866",
      "e7219dbd042f4204a879a2f7fdc2d719",
      "c500018d8d214625b83999ac7670fea1",
      "173490c10d384c71ba7d3e72de5db1a6",
      "c7acb3353aad4f48aba56ccaa67fc853",
      "bd05a9328cab4a77a5eba6c8b4a61dc1",
      "ac0232f1edb44c3f918ed3e4b3d4a17c",
      "2eabf5742d0d4b63bcfe782c8dce5a8b",
      "a2768619988240c591a4aa2d6fc8e4b6",
      "a349a1cdee974e2fbc3797095f92bb51",
      "9159372eac05469fbf21ab56c59c8bb4",
      "272017c143c94037a6fc130def971cde",
      "6500be115df542db817f40bfd6c506f1",
      "340916516ef346319ea009a261bcd40e",
      "f072eecea1d84a6cbd7f31d3dd2a9f77",
      "c458af282949462ba280a34ae94bb1bb",
      "a71e6fc0ace548e89354e317eeb76afb",
      "42fe2b9c15b6496cb614fe8caddb8f4a",
      "5dc04057ebc244a8bc5111a7e865edfe",
      "fdb7c19e3e2a4408ac6ed1a1dd69d0e2",
      "2968eb1ad75f4086a2585f1c27f23c37",
      "6394d8ae393c41f1ba527eb6a1b0771b",
      "38b1c602cb834efe9ea56fd7282e9d6e",
      "fe7c4395153e4b23856ac146d2c77325",
      "7be68e052eef434281208ba35e720fce",
      "e837d9ae9d4b40faaea48903ce478fa7",
      "42c15ff5baf049d8b99e986b85b94b70",
      "dc57731bb72d4987acc3f3dd05e039d4",
      "e9474371e64e4e9fb515b7bec8fc8eb9",
      "f09e48f4b6454a0e9888ec441b431383",
      "a9f74674fc7b42aab231ba01aa112636",
      "80e52255636e465092841acea1ac3c6d",
      "7299414939a342b0866323fb5839fd95",
      "3a244064929a4d2585d144a13e2e23a8",
      "1ec2d38b7b93475e907b55dec890b880",
      "0afab8e898b84b9d83540a1ada688d6a",
      "2f362b90e7664d9b8a9955be8379a75e",
      "d7092315286c4594a80a759a3565f6e5",
      "1a56fb0eaff348d6aae4e8c78411c87a",
      "1d2ff096c8914f4c8b7d288ad9d2cf82",
      "9eee8522ddac481ba405e6fdee8b1659",
      "b95eac15198143c0a559320d8ba233e4",
      "e91527a684ce4d168caa0f3050db562c",
      "53abd273abf34b479e933904330a5ae5",
      "367e1751248d42759858963e2b194fea",
      "6c7b70dfdcc445028390e9ca9b887d3f",
      "cd69d8d2e4a5435ea5f74b6d07dc937f",
      "1afaf3142d174edfa2a82f232c49e6ff",
      "d0d1db284c8447e28e98d2fd3b897971",
      "442495d7acab427e971dd2e352c6f543",
      "2bdbab35557740d79e2f8f61df1998c3",
      "c2ad5ff1b6a94539955ac093fe0494d4",
      "0a5717f7492c4749a7bf9f5bf7c87181",
      "be52701f8fc34c679d0d3c4622f38195",
      "8ce3ffff5c6d4079932e1308fa8344de",
      "5dca16ddb9bc4364813e7f32be7b0f59",
      "3995ecaa239e44b8be25a0cd61ce6fd1",
      "538cb1aef566490a98f014aac7ba6383",
      "9de915e695c844369f7e4bcf896d5925",
      "3e066eaeeb7646c3a8476031dd63f75a",
      "8a87b26aad0e4d9eab7ca56407ddb2bb",
      "1e8f67d9a3af4ef7b8d6a6f088b0ff12",
      "5271f73417c54cd48e8a24eb706be77a",
      "bf5a8dcb41694fd6a27ed94c5e132ff5",
      "6c47c607a9424b6ab7fb34e4686f64f2",
      "c0fc7a8491e341df897fef53b84d511b",
      "404d64211cd9483889c3f83c380c234f",
      "f9c7e678853b40f58e146852b4e79671",
      "8c47ca2297e04c0ab93171977212d54a",
      "d831ae801e784cffb576b3c7bcb7d798",
      "d548c868f465484ea1e5ac4a0742458f",
      "333e0415c4c546bdb3ae72bcbdbfc724",
      "15d48e9d248c43edbdc831e2336d0714",
      "3b28bb7163ec41759c0d1a26a226bb92",
      "dbd79d2df046429bb10cbd9b52b4e5c3",
      "ef995ef6d51a483088f8d0c531f9320c",
      "691c37c5a2e94731870e29282b6692cf",
      "d2c27767c42544eeaf506e3fac948117",
      "55cf502615b446ce8489fdbe095e7154",
      "0365c35f5cc648a29d0724ebc08cf161",
      "9aeb652733c44f5e8a053fe51c8cbe33",
      "3ed01a521a864fdda62c6948566938cf",
      "535b7879a0aa4c2b8e8225df19b3dd44",
      "914faf0be1874608a4f08226e16b791c",
      "3fade2d6146d4c53b317fa8ac775964b",
      "d2c569dd2a9b4287ab4af93dad846998",
      "9c914ef6d771447eb333008f3ddd7820",
      "ec94b5573faf4500af9517edeb587604",
      "2a5097b774ab418d86b5cf6a4c199ef2",
      "27650d786afd4623a41e3f64bfe443b5",
      "42a68650e1de4857885a2b86840d8f31",
      "c9f57a1f81834292a5039cb4343ed7b9",
      "12a6e44c9f30411c90bcadcd196b53c5",
      "f67969304ab44df6a909d9e848c23fac",
      "4fa6ef81b6264518b92cbd20ce7914bc",
      "80bddc5382e24172bdf36fe63f6ef8bd",
      "31e2d766917c4e6fa927e6605e6e1b88",
      "b378a4791244423ba9f4aca488fde80e",
      "a5e0a8e548de4e58ae8fa332436ce642",
      "a2a1dd38787d4a4180ddf47e10812752",
      "b5ca8173c24344e484397293966483c5",
      "59dba6c06ad0464e82b769570b345e85",
      "2ed5ae8c80e3459faa8019d5bfe7f186",
      "9e64f90d02ba4c82aa5cd8db038608c6",
      "77724c1bbe9c46789ae6332bd6952308",
      "1f8fd46400e7410687e7191b840e4177",
      "e77013b0620a4d2d8ddc8346d7d702ef"
     ]
    },
    "id": "ttuq7kLtaV5b",
    "outputId": "01877b76-f083-4a94-a90e-6717bcecc3d3"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: \n",
      "The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
      "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
      "You will be able to reuse this secret in all of your notebooks.\n",
      "Please note that authentication is recommended but still optional to access public models or datasets.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7003e95fe7594baa9dcf3b78001dae8c",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading readme:   0%|          | 0.00/46.0 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f0128d87740d449eb7e5efcc3045f44e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading data:   0%|          | 0.00/119k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "518a021bc14546388ffc719adaa45c18",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Generating train split: 0 examples [00:00, ? examples/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0ca27757f7c64134931aef12165f4d74",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "30f823aaa04f4fc78ca022339c60bea7",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "5b0ad080573e40919752004106dcd523",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "README.md:   0%|          | 0.00/8.66k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "a2768619988240c591a4aa2d6fc8e4b6",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "fdb7c19e3e2a4408ac6ed1a1dd69d0e2",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "a9f74674fc7b42aab231ba01aa112636",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b95eac15198143c0a559320d8ba233e4",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0a5717f7492c4749a7bf9f5bf7c87181",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "bf5a8dcb41694fd6a27ed94c5e132ff5",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "dbd79d2df046429bb10cbd9b52b4e5c3",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d2c569dd2a9b4287ab4af93dad846998",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "31e2d766917c4e6fa927e6605e6e1b88",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Batches:   0%|          | 0/5 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "{'writer': {'documents_written': 151}}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from datasets import load_dataset\n",
    "from haystack import Document\n",
    "from haystack import Pipeline\n",
    "from haystack.document_stores.in_memory import InMemoryDocumentStore\n",
    "from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever\n",
    "from haystack.components.readers import ExtractiveReader\n",
    "from haystack.components.embedders import SentenceTransformersDocumentEmbedder\n",
    "from haystack.components.writers import DocumentWriter\n",
    "\n",
    "\n",
    "dataset = load_dataset(\"bilgeyucel/seven-wonders\", split=\"train\")\n",
    "\n",
    "documents = [Document(content=doc[\"content\"], meta=doc[\"meta\"]) for doc in dataset]\n",
    "\n",
    "model = \"sentence-transformers/multi-qa-mpnet-base-dot-v1\"\n",
    "\n",
    "document_store = InMemoryDocumentStore()\n",
    "\n",
    "indexing_pipeline = Pipeline()\n",
    "\n",
    "indexing_pipeline.add_component(instance=SentenceTransformersDocumentEmbedder(model=model), name=\"embedder\")\n",
    "indexing_pipeline.add_component(instance=DocumentWriter(document_store=document_store), name=\"writer\")\n",
    "indexing_pipeline.connect(\"embedder.documents\", \"writer.documents\")\n",
    "\n",
    "indexing_pipeline.run({\"documents\": documents})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "r5CL5VXaVQqE"
   },
   "source": [
    "## Build an Extractive QA Pipeline\n",
    "\n",
    "Your extractive QA pipeline will consist of three components: an embedder, retriever, and reader.\n",
    "\n",
    "- The `SentenceTransformersTextEmbedder` turns a query into a vector, using the same embedding model defined above.\n",
    "\n",
    "- Vector search allows the retriever to efficiently return relevant documents from the document store. Retrievers are tightly coupled with document stores; thus, you'll use an `InMemoryEmbeddingRetriever`to go with the `InMemoryDocumentStore`.\n",
    "\n",
    "- The `ExtractiveReader` returns answers to that query, as well as their location in the source document, and a confidence score.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "xZGGv8yaHZtV"
   },
   "outputs": [],
   "source": [
    "from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever\n",
    "from haystack.components.readers import ExtractiveReader\n",
    "from haystack.components.embedders import SentenceTransformersTextEmbedder\n",
    "\n",
    "\n",
    "retriever = InMemoryEmbeddingRetriever(document_store=document_store)\n",
    "reader = ExtractiveReader()\n",
    "reader.warm_up()\n",
    "\n",
    "extractive_qa_pipeline = Pipeline()\n",
    "\n",
    "extractive_qa_pipeline.add_component(instance=SentenceTransformersTextEmbedder(model=model), name=\"embedder\")\n",
    "extractive_qa_pipeline.add_component(instance=retriever, name=\"retriever\")\n",
    "extractive_qa_pipeline.add_component(instance=reader, name=\"reader\")\n",
    "\n",
    "extractive_qa_pipeline.connect(\"embedder.embedding\", \"retriever.query_embedding\")\n",
    "extractive_qa_pipeline.connect(\"retriever.documents\", \"reader.documents\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Yxi6IYXPZMFw"
   },
   "source": [
    "Try extracting some answers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "mvG3XJOtZR79"
   },
   "outputs": [],
   "source": [
    "query = \"Who was Pliny the Elder?\"\n",
    "extractive_qa_pipeline.run(\n",
    "    data={\"embedder\": {\"text\": query}, \"retriever\": {\"top_k\": 3}, \"reader\": {\"query\": query, \"top_k\": 2}}\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GOKWgMDCWGRd"
   },
   "source": [
    "## `ExtractiveReader`: a closer look\n",
    "\n",
    "Here's an example answer:\n",
    "```python\n",
    "[ExtractedAnswer(query='Who was Pliny the Elder?', score=0.8306006193161011, data='Roman writer', document=Document(id=bb2c5f3d2e2e2bf28d599c7b686ab47ba10fbc13c07279e612d8632af81e5d71, content: 'The Roman writer Pliny the Elder, writing in the first century AD, argued that the Great Pyramid had...', meta: {'url': 'https://en.wikipedia.org/wiki/Great_Pyramid_of_Giza', '_split_id': 16}\n",
    "```\n",
    "\n",
    "The confidence score ranges from 0 to 1. Higher scores mean the model has more confidence in the answer's relevance.\n",
    "\n",
    "The Reader sorts the answers based on their probability scores, with higher probability listed first. You can limit the number of answers the Reader returns in the optional `top_k` parameter.\n",
    "\n",
    "By default, the Reader sets a `no_answer=True` parameter. This param returns an `ExtractedAnswer` with no text, and the probability that none of the returned answers are correct.\n",
    "\n",
    "```python\n",
    "ExtractedAnswer(query='Who was Pliny the Elder?', score=0.04606167031102615, data=None, document=None, context=None, document_offset=None, context_offset=None, meta={})]}}\n",
    "```\n",
    "\n",
    "`.0.04606167031102615` means the model is fairly confident the provided answers are correct in this case. You can disable this behavior and return only answers by setting the `no_answer` param to `False` when initializing your `ExtractiveReader`.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "e8EQJuGVY_GG"
   },
   "source": [
    "## Wrapping it up\n",
    "\n",
    "If you've been following along, now you know how to build an extractive question answering pipeline with Haystack. 🎉 Thanks for reading!\n",
    "\n",
    "\n",
    "If you liked this tutorial, there's more to learn about Haystack:\n",
    "- [Classifying Documents & Queries by Language](https://haystack.deepset.ai/tutorials/32_classifying_documents_and_queries_by_language)\n",
    "-  [Generating Structured Output with Loop-Based Auto-Correction](https://haystack.deepset.ai/tutorials/28_structured_output_with_loop)\n",
    "- [Preprocessing Different File Types](https://haystack.deepset.ai/tutorials/30_file_type_preprocessing_index_pipeline)\n",
    "\n",
    "To stay up to date on the latest Haystack developments, you can [sign up for our newsletter](https://landing.deepset.ai/haystack-community-updates)."
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "T4",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}