{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "3b0435cb", "metadata": {}, "source": [ "# Question answering using embeddings-based search\n", "\n", "GPT excels at answering questions, but only on topics it remembers from its training data.\n", "\n", "What should you do if you want GPT to answer questions about unfamiliar topics? E.g.,\n", "- Recent events after Sep 2021\n", "- Your non-public documents\n", "- Information from past conversations\n", "- etc.\n", "\n", "This notebook demonstrates a two-step Search-Ask method for enabling GPT to answer questions using a library of reference text.\n", "\n", "1. **Search:** search your library of text for relevant text sections\n", "2. **Ask:** insert the retrieved text sections into a message to GPT and ask it the question" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e6e01be1", "metadata": {}, "source": [ "## Why search is better than fine-tuning\n", "\n", "GPT can learn knowledge in two ways:\n", "\n", "- Via model weights (i.e., fine-tune the model on a training set)\n", "- Via model inputs (i.e., insert the knowledge into an input message)\n", "\n", "Although fine-tuning can feel like the more natural option—training on data is how GPT learned all of its other knowledge, after all—we generally do not recommend it as a way to teach the model knowledge. Fine-tuning is better suited to teaching specialized tasks or styles, and is less reliable for factual recall.\n", "\n", "As an analogy, model weights are like long-term memory. When you fine-tune a model, it's like studying for an exam a week away. When the exam arrives, the model may forget details, or misremember facts it never read.\n", "\n", "In contrast, message inputs are like short-term memory. When you insert knowledge into a message, it's like taking an exam with open notes. With notes in hand, the model is more likely to arrive at correct answers.\n", "\n", "One downside of text search relative to fine-tuning is that each model is limited by a maximum amount of text it can read at once:\n", "\n", "| Model | Maximum text length |\n", "|-----------------|---------------------------|\n", "| `gpt-3.5-turbo` | 4,096 tokens (~5 pages) |\n", "| `gpt-4` | 8,192 tokens (~10 pages) |\n", "| `gpt-4-32k` | 32,768 tokens (~40 pages) |\n", "\n", "Continuing the analogy, you can think of the model like a student who can only look at a few pages of notes at a time, despite potentially having shelves of textbooks to draw upon.\n", "\n", "Therefore, to build a system capable of drawing upon large quantities of text to answer questions, we recommend using a Search-Ask approach.\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "78fba1de", "metadata": {}, "source": [ "## Search\n", "\n", "Text can be searched in many ways. E.g.,\n", "\n", "- Lexical-based search\n", "- Graph-based search\n", "- Embedding-based search\n", "\n", "This example notebook uses embedding-based search. [Embeddings](https://platform.openai.com/docs/guides/embeddings) are simple to implement and work especially well with questions, as questions often don't lexically overlap with their answers.\n", "\n", "Consider embeddings-only search as a starting point for your own system. Better search systems might combine multiple search methods, along with features like popularity, recency, user history, redundancy with prior search results, click rate data, etc. Q&A retrieval performance may be also be improved with techniques like [HyDE](https://arxiv.org/abs/2212.10496), in which questions are first transformed into hypothetical answers before being embedded. Similarly, GPT can also potentially improve search results by automatically transforming questions into sets of keywords or search terms." ] }, { "attachments": {}, "cell_type": "markdown", "id": "c4ca8276-e829-4cff-8905-47534e4b4d4e", "metadata": {}, "source": [ "## Full procedure\n", "\n", "Specifically, this notebook demonstrates the following procedure:\n", "\n", "1. Prepare search data (once)\n", " 1. Collect: We'll download a few hundred Wikipedia articles about the 2022 Olympics\n", " 2. Chunk: Documents are split into short, mostly self-contained sections to be embedded\n", " 3. Embed: Each section is embedded with the OpenAI API\n", " 4. Store: Embeddings are saved (for large datasets, use a vector database)\n", "2. Search (once per query)\n", " 1. Given a user question, generate an embedding for the query from the OpenAI API\n", " 2. Using the embeddings, rank the text sections by relevance to the query\n", "3. Ask (once per query)\n", " 1. Insert the question and the most relevant sections into a message to GPT\n", " 2. Return GPT's answer\n", "\n", "### Costs\n", "\n", "Because GPT is more expensive than embeddings search, a system with a high volume of queries will have its costs dominated by step 3.\n", "\n", "- For `gpt-3.5-turbo` using ~1,000 tokens per query, it costs ~$0.002 per query, or ~500 queries per dollar (as of Apr 2023)\n", "- For `gpt-4`, again assuming ~1,000 tokens per query, it costs ~$0.03 per query, or ~30 queries per dollar (as of Apr 2023)\n", "\n", "Of course, exact costs will depend on the system specifics and usage patterns." ] }, { "attachments": {}, "cell_type": "markdown", "id": "9ebd41d8", "metadata": {}, "source": [ "## Preamble\n", "\n", "We'll begin by:\n", "- Importing the necessary libraries\n", "- Selecting models for embeddings search and question answering\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "9e3839a6-9146-4f60-b74b-19abbc24278d", "metadata": {}, "outputs": [], "source": [ "# imports\n", "import ast # for converting embeddings saved as strings back to arrays\n", "import openai # for calling the OpenAI API\n", "import pandas as pd # for storing text and embeddings data\n", "import tiktoken # for counting tokens\n", "from scipy import spatial # for calculating vector similarities for search\n", "\n", "\n", "# models\n", "EMBEDDING_MODEL = \"text-embedding-ada-002\"\n", "GPT_MODEL = \"gpt-3.5-turbo\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8fcace0f", "metadata": {}, "source": [ "#### Troubleshooting: Installing libraries\n", "\n", "If you need to install any of the libraries above, run `pip install {library_name}` in your terminal.\n", "\n", "For example, to install the `openai` library, run:\n", "```zsh\n", "pip install openai\n", "```\n", "\n", "(You can also do this in a notebook cell with `!pip install openai` or `%pip install openai`.)\n", "\n", "After installing, restart the notebook kernel so the libraries can be loaded.\n", "\n", "#### Troubleshooting: Setting your API key\n", "\n", "The OpenAI library will try to read your API key from the `OPENAI_API_KEY` environment variable. If you haven't already, you can set this environment variable by following [these instructions](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety)." ] }, { "attachments": {}, "cell_type": "markdown", "id": "9312f62f-e208-4030-a648-71ad97aee74f", "metadata": {}, "source": [ "### Motivating example: GPT cannot answer questions about current events\n", "\n", "Because the training data for `gpt-3.5-turbo` and `gpt-4` mostly ends in September 2021, the models cannot answer questions about more recent events, such as the 2022 Winter Olympics.\n", "\n", "For example, let's try asking 'Which athletes won the gold medal in curling in 2022?':" ] }, { "cell_type": "code", "execution_count": 2, "id": "a167516c-7c19-4bda-afa5-031aa0ae13bb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I'm sorry, but as an AI language model, I don't have information about the future events. The 2022 Winter Olympics will be held in Beijing, China, from February 4 to 20, 2022. The curling events will take place during the games, and the winners of the gold medal in curling will be determined at that time.\n" ] } ], "source": [ "# an example question about the 2022 Olympics\n", "query = 'Which athletes won the gold medal in curling at the 2022 Winter Olympics?'\n", "\n", "response = openai.ChatCompletion.create(\n", " messages=[\n", " {'role': 'system', 'content': 'You answer questions about the 2022 Winter Olympics.'},\n", " {'role': 'user', 'content': query},\n", " ],\n", " model=GPT_MODEL,\n", " temperature=0,\n", ")\n", "\n", "print(response['choices'][0]['message']['content'])" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1af18d66-d47a-496d-ae5f-4c5d53caa434", "metadata": {}, "source": [ "In this case, the model has no knowledge of 2022 and is unable to answer the question.\n", "\n", "### You can give GPT knowledge about a topic by inserting it into an input message\n", "\n", "To help give the model knowledge of curling at the 2022 Winter Olympics, we can copy and paste the top half of a relevant Wikipedia article into our message:" ] }, { "cell_type": "code", "execution_count": 3, "id": "02e7281d", "metadata": {}, "outputs": [], "source": [ "# text copied and pasted from: https://en.wikipedia.org/wiki/Curling_at_the_2022_Winter_Olympics\n", "# I didn't bother to format or clean the text, but GPT will still understand it\n", "# the entire article is too long for gpt-3.5-turbo, so I only included the top few sections\n", "\n", "wikipedia_article_on_curling = \"\"\"Curling at the 2022 Winter Olympics\n", "\n", "Article\n", "Talk\n", "Read\n", "Edit\n", "View history\n", "From Wikipedia, the free encyclopedia\n", "Curling\n", "at the XXIV Olympic Winter Games\n", "Curling pictogram.svg\n", "Curling pictogram\n", "Venue\tBeijing National Aquatics Centre\n", "Dates\t2–20 February 2022\n", "No. of events\t3 (1 men, 1 women, 1 mixed)\n", "Competitors\t114 from 14 nations\n", "← 20182026 →\n", "Men's curling\n", "at the XXIV Olympic Winter Games\n", "Medalists\n", "1st place, gold medalist(s)\t\t Sweden\n", "2nd place, silver medalist(s)\t\t Great Britain\n", "3rd place, bronze medalist(s)\t\t Canada\n", "Women's curling\n", "at the XXIV Olympic Winter Games\n", "Medalists\n", "1st place, gold medalist(s)\t\t Great Britain\n", "2nd place, silver medalist(s)\t\t Japan\n", "3rd place, bronze medalist(s)\t\t Sweden\n", "Mixed doubles's curling\n", "at the XXIV Olympic Winter Games\n", "Medalists\n", "1st place, gold medalist(s)\t\t Italy\n", "2nd place, silver medalist(s)\t\t Norway\n", "3rd place, bronze medalist(s)\t\t Sweden\n", "Curling at the\n", "2022 Winter Olympics\n", "Curling pictogram.svg\n", "Qualification\n", "Statistics\n", "Tournament\n", "Men\n", "Women\n", "Mixed doubles\n", "vte\n", "The curling competitions of the 2022 Winter Olympics were held at the Beijing National Aquatics Centre, one of the Olympic Green venues. Curling competitions were scheduled for every day of the games, from February 2 to February 20.[1] This was the eighth time that curling was part of the Olympic program.\n", "\n", "In each of the men's, women's, and mixed doubles competitions, 10 nations competed. The mixed doubles competition was expanded for its second appearance in the Olympics.[2] A total of 120 quota spots (60 per sex) were distributed to the sport of curling, an increase of four from the 2018 Winter Olympics.[3] A total of 3 events were contested, one for men, one for women, and one mixed.[4]\n", "\n", "Qualification\n", "Main article: Curling at the 2022 Winter Olympics – Qualification\n", "Qualification to the Men's and Women's curling tournaments at the Winter Olympics was determined through two methods (in addition to the host nation). Nations qualified teams by placing in the top six at the 2021 World Curling Championships. Teams could also qualify through Olympic qualification events which were held in 2021. Six nations qualified via World Championship qualification placement, while three nations qualified through qualification events. In men's and women's play, a host will be selected for the Olympic Qualification Event (OQE). They would be joined by the teams which competed at the 2021 World Championships but did not qualify for the Olympics, and two qualifiers from the Pre-Olympic Qualification Event (Pre-OQE). The Pre-OQE was open to all member associations.[5]\n", "\n", "For the mixed doubles competition in 2022, the tournament field was expanded from eight competitor nations to ten.[2] The top seven ranked teams at the 2021 World Mixed Doubles Curling Championship qualified, along with two teams from the Olympic Qualification Event (OQE) – Mixed Doubles. This OQE was open to a nominated host and the fifteen nations with the highest qualification points not already qualified to the Olympics. As the host nation, China qualified teams automatically, thus making a total of ten teams per event in the curling tournaments.[6]\n", "\n", "Summary\n", "Nations\tMen\tWomen\tMixed doubles\tAthletes\n", " Australia\t\t\tYes\t2\n", " Canada\tYes\tYes\tYes\t12\n", " China\tYes\tYes\tYes\t12\n", " Czech Republic\t\t\tYes\t2\n", " Denmark\tYes\tYes\t\t10\n", " Great Britain\tYes\tYes\tYes\t10\n", " Italy\tYes\t\tYes\t6\n", " Japan\t\tYes\t\t5\n", " Norway\tYes\t\tYes\t6\n", " ROC\tYes\tYes\t\t10\n", " South Korea\t\tYes\t\t5\n", " Sweden\tYes\tYes\tYes\t11\n", " Switzerland\tYes\tYes\tYes\t12\n", " United States\tYes\tYes\tYes\t11\n", "Total: 14 NOCs\t10\t10\t10\t114\n", "Competition schedule\n", "\n", "The Beijing National Aquatics Centre served as the venue of the curling competitions.\n", "Curling competitions started two days before the Opening Ceremony and finished on the last day of the games, meaning the sport was the only one to have had a competition every day of the games. The following was the competition schedule for the curling competitions:\n", "\n", "RR\tRound robin\tSF\tSemifinals\tB\t3rd place play-off\tF\tFinal\n", "Date\n", "Event\n", "Wed 2\tThu 3\tFri 4\tSat 5\tSun 6\tMon 7\tTue 8\tWed 9\tThu 10\tFri 11\tSat 12\tSun 13\tMon 14\tTue 15\tWed 16\tThu 17\tFri 18\tSat 19\tSun 20\n", "Men's tournament\t\t\t\t\t\t\t\tRR\tRR\tRR\tRR\tRR\tRR\tRR\tRR\tRR\tSF\tB\tF\t\n", "Women's tournament\t\t\t\t\t\t\t\t\tRR\tRR\tRR\tRR\tRR\tRR\tRR\tRR\tSF\tB\tF\n", "Mixed doubles\tRR\tRR\tRR\tRR\tRR\tRR\tSF\tB\tF\t\t\t\t\t\t\t\t\t\t\t\t\n", "Medal summary\n", "Medal table\n", "Rank\tNation\tGold\tSilver\tBronze\tTotal\n", "1\t Great Britain\t1\t1\t0\t2\n", "2\t Sweden\t1\t0\t2\t3\n", "3\t Italy\t1\t0\t0\t1\n", "4\t Japan\t0\t1\t0\t1\n", " Norway\t0\t1\t0\t1\n", "6\t Canada\t0\t0\t1\t1\n", "Totals (6 entries)\t3\t3\t3\t9\n", "Medalists\n", "Event\tGold\tSilver\tBronze\n", "Men\n", "details\t Sweden\n", "Niklas Edin\n", "Oskar Eriksson\n", "Rasmus Wranå\n", "Christoffer Sundgren\n", "Daniel Magnusson\t Great Britain\n", "Bruce Mouat\n", "Grant Hardie\n", "Bobby Lammie\n", "Hammy McMillan Jr.\n", "Ross Whyte\t Canada\n", "Brad Gushue\n", "Mark Nichols\n", "Brett Gallant\n", "Geoff Walker\n", "Marc Kennedy\n", "Women\n", "details\t Great Britain\n", "Eve Muirhead\n", "Vicky Wright\n", "Jennifer Dodds\n", "Hailey Duff\n", "Mili Smith\t Japan\n", "Satsuki Fujisawa\n", "Chinami Yoshida\n", "Yumi Suzuki\n", "Yurika Yoshida\n", "Kotomi Ishizaki\t Sweden\n", "Anna Hasselborg\n", "Sara McManus\n", "Agnes Knochenhauer\n", "Sofia Mabergs\n", "Johanna Heldin\n", "Mixed doubles\n", "details\t Italy\n", "Stefania Constantini\n", "Amos Mosaner\t Norway\n", "Kristin Skaslien\n", "Magnus Nedregotten\t Sweden\n", "Almida de Val\n", "Oskar Eriksson\n", "Teams\n", "Men\n", " Canada\t China\t Denmark\t Great Britain\t Italy\n", "Skip: Brad Gushue\n", "Third: Mark Nichols\n", "Second: Brett Gallant\n", "Lead: Geoff Walker\n", "Alternate: Marc Kennedy\n", "\n", "Skip: Ma Xiuyue\n", "Third: Zou Qiang\n", "Second: Wang Zhiyu\n", "Lead: Xu Jingtao\n", "Alternate: Jiang Dongxu\n", "\n", "Skip: Mikkel Krause\n", "Third: Mads Nørgård\n", "Second: Henrik Holtermann\n", "Lead: Kasper Wiksten\n", "Alternate: Tobias Thune\n", "\n", "Skip: Bruce Mouat\n", "Third: Grant Hardie\n", "Second: Bobby Lammie\n", "Lead: Hammy McMillan Jr.\n", "Alternate: Ross Whyte\n", "\n", "Skip: Joël Retornaz\n", "Third: Amos Mosaner\n", "Second: Sebastiano Arman\n", "Lead: Simone Gonin\n", "Alternate: Mattia Giovanella\n", "\n", " Norway\t ROC\t Sweden\t Switzerland\t United States\n", "Skip: Steffen Walstad\n", "Third: Torger Nergård\n", "Second: Markus Høiberg\n", "Lead: Magnus Vågberg\n", "Alternate: Magnus Nedregotten\n", "\n", "Skip: Sergey Glukhov\n", "Third: Evgeny Klimov\n", "Second: Dmitry Mironov\n", "Lead: Anton Kalalb\n", "Alternate: Daniil Goriachev\n", "\n", "Skip: Niklas Edin\n", "Third: Oskar Eriksson\n", "Second: Rasmus Wranå\n", "Lead: Christoffer Sundgren\n", "Alternate: Daniel Magnusson\n", "\n", "Fourth: Benoît Schwarz\n", "Third: Sven Michel\n", "Skip: Peter de Cruz\n", "Lead: Valentin Tanner\n", "Alternate: Pablo Lachat\n", "\n", "Skip: John Shuster\n", "Third: Chris Plys\n", "Second: Matt Hamilton\n", "Lead: John Landsteiner\n", "Alternate: Colin Hufman\n", "\n", "Women\n", " Canada\t China\t Denmark\t Great Britain\t Japan\n", "Skip: Jennifer Jones\n", "Third: Kaitlyn Lawes\n", "Second: Jocelyn Peterman\n", "Lead: Dawn McEwen\n", "Alternate: Lisa Weagle\n", "\n", "Skip: Han Yu\n", "Third: Wang Rui\n", "Second: Dong Ziqi\n", "Lead: Zhang Lijun\n", "Alternate: Jiang Xindi\n", "\n", "Skip: Madeleine Dupont\n", "Third: Mathilde Halse\n", "Second: Denise Dupont\n", "Lead: My Larsen\n", "Alternate: Jasmin Lander\n", "\n", "Skip: Eve Muirhead\n", "Third: Vicky Wright\n", "Second: Jennifer Dodds\n", "Lead: Hailey Duff\n", "Alternate: Mili Smith\n", "\n", "Skip: Satsuki Fujisawa\n", "Third: Chinami Yoshida\n", "Second: Yumi Suzuki\n", "Lead: Yurika Yoshida\n", "Alternate: Kotomi Ishizaki\n", "\n", " ROC\t South Korea\t Sweden\t Switzerland\t United States\n", "Skip: Alina Kovaleva\n", "Third: Yulia Portunova\n", "Second: Galina Arsenkina\n", "Lead: Ekaterina Kuzmina\n", "Alternate: Maria Komarova\n", "\n", "Skip: Kim Eun-jung\n", "Third: Kim Kyeong-ae\n", "Second: Kim Cho-hi\n", "Lead: Kim Seon-yeong\n", "Alternate: Kim Yeong-mi\n", "\n", "Skip: Anna Hasselborg\n", "Third: Sara McManus\n", "Second: Agnes Knochenhauer\n", "Lead: Sofia Mabergs\n", "Alternate: Johanna Heldin\n", "\n", "Fourth: Alina Pätz\n", "Skip: Silvana Tirinzoni\n", "Second: Esther Neuenschwander\n", "Lead: Melanie Barbezat\n", "Alternate: Carole Howald\n", "\n", "Skip: Tabitha Peterson\n", "Third: Nina Roth\n", "Second: Becca Hamilton\n", "Lead: Tara Peterson\n", "Alternate: Aileen Geving\n", "\n", "Mixed doubles\n", " Australia\t Canada\t China\t Czech Republic\t Great Britain\n", "Female: Tahli Gill\n", "Male: Dean Hewitt\n", "\n", "Female: Rachel Homan\n", "Male: John Morris\n", "\n", "Female: Fan Suyuan\n", "Male: Ling Zhi\n", "\n", "Female: Zuzana Paulová\n", "Male: Tomáš Paul\n", "\n", "Female: Jennifer Dodds\n", "Male: Bruce Mouat\n", "\n", " Italy\t Norway\t Sweden\t Switzerland\t United States\n", "Female: Stefania Constantini\n", "Male: Amos Mosaner\n", "\n", "Female: Kristin Skaslien\n", "Male: Magnus Nedregotten\n", "\n", "Female: Almida de Val\n", "Male: Oskar Eriksson\n", "\n", "Female: Jenny Perret\n", "Male: Martin Rios\n", "\n", "Female: Vicky Persinger\n", "Male: Chris Plys\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 4, "id": "fceaf665-2602-4788-bc44-9eb256a6f955", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There were three events in curling at the 2022 Winter Olympics, so there were three sets of athletes who won gold medals. The gold medalists in men's curling were Sweden's Niklas Edin, Oskar Eriksson, Rasmus Wranå, Christoffer Sundgren, and Daniel Magnusson. The gold medalists in women's curling were Great Britain's Eve Muirhead, Vicky Wright, Jennifer Dodds, Hailey Duff, and Mili Smith. The gold medalists in mixed doubles curling were Italy's Stefania Constantini and Amos Mosaner.\n" ] } ], "source": [ "query = f\"\"\"Use the below article on the 2022 Winter Olympics to answer the subsequent question. If the answer cannot be found, write \"I don't know.\"\n", "\n", "Article:\n", "\\\"\\\"\\\"\n", "{wikipedia_article_on_curling}\n", "\\\"\\\"\\\"\n", "\n", "Question: Which athletes won the gold medal in curling at the 2022 Winter Olympics?\"\"\"\n", "\n", "response = openai.ChatCompletion.create(\n", " messages=[\n", " {'role': 'system', 'content': 'You answer questions about the 2022 Winter Olympics.'},\n", " {'role': 'user', 'content': query},\n", " ],\n", " model=GPT_MODEL,\n", " temperature=0,\n", ")\n", "\n", "print(response['choices'][0]['message']['content'])" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ee85ee77-d8d2-4788-b57e-0785f2d7e2e3", "metadata": {}, "source": [ "Thanks to the Wikipedia article included in the input message, GPT answers correctly.\n", "\n", "In this particular case, GPT was intelligent enough to realize that the original question was underspecified, as there were three curling gold medals, not just one.\n", "\n", "Of course, this example partly relied on human intelligence. We knew the question was about curling, so we inserted a Wikipedia article on curling.\n", "\n", "The rest of this notebook shows how to automate this knowledge insertion with embeddings-based search." ] }, { "attachments": {}, "cell_type": "markdown", "id": "ccc2d8de", "metadata": {}, "source": [ "## 1. Prepare search data\n", "\n", "To save you the time & expense, we've prepared a pre-embedded dataset of a few hundred Wikipedia articles about the 2022 Winter Olympics.\n", "\n", "To see how we constructed this dataset, or to modify it, see [Embedding Wikipedia articles for search](Embedding_Wikipedia_articles_for_search.ipynb)." ] }, { "cell_type": "code", "execution_count": 5, "id": "46d50792", "metadata": {}, "outputs": [], "source": [ "# download pre-chunked text and pre-computed embeddings\n", "# this file is ~200 MB, so may take a minute depending on your connection speed\n", "embeddings_path = \"https://cdn.openai.com/API/examples/data/winter_olympics_2022.csv\"\n", "\n", "df = pd.read_csv(embeddings_path)" ] }, { "cell_type": "code", "execution_count": 6, "id": "70307f8e", "metadata": {}, "outputs": [], "source": [ "# convert embeddings from CSV str type back to list type\n", "df['embedding'] = df['embedding'].apply(ast.literal_eval)" ] }, { "cell_type": "code", "execution_count": 7, "id": "424162c2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | text | \n", "embedding | \n", "
---|---|---|
0 | \n", "Lviv bid for the 2022 Winter Olympics\\n\\n{{Oly... | \n", "[-0.005021067801862955, 0.00026050032465718687... | \n", "
1 | \n", "Lviv bid for the 2022 Winter Olympics\\n\\n==His... | \n", "[0.0033927420154213905, -0.007447326090186834,... | \n", "
2 | \n", "Lviv bid for the 2022 Winter Olympics\\n\\n==Ven... | \n", "[-0.00915789045393467, -0.008366798982024193, ... | \n", "
3 | \n", "Lviv bid for the 2022 Winter Olympics\\n\\n==Ven... | \n", "[0.0030951891094446182, -0.006064314860850573,... | \n", "
4 | \n", "Lviv bid for the 2022 Winter Olympics\\n\\n==Ven... | \n", "[-0.002936174161732197, -0.006185177247971296,... | \n", "
... | \n", "... | \n", "... | \n", "
6054 | \n", "Anaïs Chevalier-Bouchet\\n\\n==Personal life==\\n... | \n", "[-0.027750400826334953, 0.001746018067933619, ... | \n", "
6055 | \n", "Uliana Nigmatullina\\n\\n{{short description|Rus... | \n", "[-0.021714167669415474, 0.016001321375370026, ... | \n", "
6056 | \n", "Uliana Nigmatullina\\n\\n==Biathlon results==\\n\\... | \n", "[-0.029143543913960457, 0.014654331840574741, ... | \n", "
6057 | \n", "Uliana Nigmatullina\\n\\n==Biathlon results==\\n\\... | \n", "[-0.024266039952635765, 0.011665306985378265, ... | \n", "
6058 | \n", "Uliana Nigmatullina\\n\\n==Biathlon results==\\n\\... | \n", "[-0.021818075329065323, 0.005420385394245386, ... | \n", "
6059 rows × 2 columns
\n", "