Refine Weaviate docs and add RAG example (#13057)

- **Description:** Refine Weaviate tutorial and add an example for
Retrieval-Augmented Generation (RAG)
  - **Issue:** (not applicable),
  - **Dependencies:** none
  - **Tag maintainer:** @baskaryan <!--
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
  - **Twitter handle:** @helloiamleonie

Co-authored-by: Leonie <leonie@Leonies-MBP-2.fritz.box>
pull/13301/head
Leonie 8 months ago committed by GitHub
parent f22f273f93
commit 32c493e3df
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -10,9 +10,19 @@
"\n",
">[Weaviate](https://weaviate.io/) is an open-source vector database. It allows you to store data objects and vector embeddings from your favorite ML-models, and scale seamlessly into billions of data objects.\n",
"\n",
"This notebook shows how to use functionality related to the `Weaviate`vector database.\n",
"This notebook shows how to use the functionality related to the `Weaviate` vector database.\n",
"\n",
"See the `Weaviate` [installation instructions](https://weaviate.io/developers/weaviate/installation)."
"`Weaviate` can be deployed in many different ways depending on your requirements. For example, you can either connect to a [Weaviate Cloud Services](https://console.weaviate.cloud) instance or a [local Docker instance](https://weaviate.io/developers/weaviate/installation/docker-compose). \n",
"See the `Weaviate` [installation instructions](https://weaviate.io/developers/weaviate/installation) for more information."
]
},
{
"cell_type": "markdown",
"id": "5fb59dec",
"metadata": {},
"source": [
"## Prerequisites\n",
"Install the `weaviate-client` package and set the relevant environment variables."
]
},
{
@ -27,19 +37,21 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: weaviate-client in /workspaces/langchain/.venv/lib/python3.9/site-packages (3.19.1)\n",
"Requirement already satisfied: requests<2.29.0,>=2.28.0 in /workspaces/langchain/.venv/lib/python3.9/site-packages (from weaviate-client) (2.28.2)\n",
"Requirement already satisfied: validators<=0.21.0,>=0.18.2 in /workspaces/langchain/.venv/lib/python3.9/site-packages (from weaviate-client) (0.20.0)\n",
"Requirement already satisfied: tqdm<5.0.0,>=4.59.0 in /workspaces/langchain/.venv/lib/python3.9/site-packages (from weaviate-client) (4.65.0)\n",
"Requirement already satisfied: authlib>=1.1.0 in /workspaces/langchain/.venv/lib/python3.9/site-packages (from weaviate-client) (1.2.0)\n",
"Requirement already satisfied: cryptography>=3.2 in /workspaces/langchain/.venv/lib/python3.9/site-packages (from authlib>=1.1.0->weaviate-client) (40.0.2)\n",
"Requirement already satisfied: charset-normalizer<4,>=2 in /workspaces/langchain/.venv/lib/python3.9/site-packages (from requests<2.29.0,>=2.28.0->weaviate-client) (3.1.0)\n",
"Requirement already satisfied: idna<4,>=2.5 in /workspaces/langchain/.venv/lib/python3.9/site-packages (from requests<2.29.0,>=2.28.0->weaviate-client) (3.4)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /workspaces/langchain/.venv/lib/python3.9/site-packages (from requests<2.29.0,>=2.28.0->weaviate-client) (1.26.15)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /workspaces/langchain/.venv/lib/python3.9/site-packages (from requests<2.29.0,>=2.28.0->weaviate-client) (2023.5.7)\n",
"Requirement already satisfied: decorator>=3.4.0 in /workspaces/langchain/.venv/lib/python3.9/site-packages (from validators<=0.21.0,>=0.18.2->weaviate-client) (5.1.1)\n",
"Requirement already satisfied: cffi>=1.12 in /workspaces/langchain/.venv/lib/python3.9/site-packages (from cryptography>=3.2->authlib>=1.1.0->weaviate-client) (1.15.1)\n",
"Requirement already satisfied: pycparser in /workspaces/langchain/.venv/lib/python3.9/site-packages (from cffi>=1.12->cryptography>=3.2->authlib>=1.1.0->weaviate-client) (2.21)\n"
"Requirement already satisfied: weaviate-client in /opt/homebrew/lib/python3.11/site-packages (3.23.1)\n",
"Requirement already satisfied: requests<=2.31.0,>=2.28.0 in /opt/homebrew/lib/python3.11/site-packages (from weaviate-client) (2.31.0)\n",
"Requirement already satisfied: validators<=0.21.0,>=0.18.2 in /opt/homebrew/lib/python3.11/site-packages (from weaviate-client) (0.21.0)\n",
"Requirement already satisfied: tqdm<5.0.0,>=4.59.0 in /opt/homebrew/lib/python3.11/site-packages (from weaviate-client) (4.66.1)\n",
"Requirement already satisfied: authlib>=1.1.0 in /opt/homebrew/lib/python3.11/site-packages (from weaviate-client) (1.2.1)\n",
"Requirement already satisfied: cryptography>=3.2 in /opt/homebrew/lib/python3.11/site-packages (from authlib>=1.1.0->weaviate-client) (41.0.4)\n",
"Requirement already satisfied: charset-normalizer<4,>=2 in /opt/homebrew/lib/python3.11/site-packages (from requests<=2.31.0,>=2.28.0->weaviate-client) (2.0.12)\n",
"Requirement already satisfied: idna<4,>=2.5 in /opt/homebrew/lib/python3.11/site-packages (from requests<=2.31.0,>=2.28.0->weaviate-client) (3.4)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/homebrew/lib/python3.11/site-packages (from requests<=2.31.0,>=2.28.0->weaviate-client) (1.26.17)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /opt/homebrew/lib/python3.11/site-packages (from requests<=2.31.0,>=2.28.0->weaviate-client) (2023.7.22)\n",
"Requirement already satisfied: cffi>=1.12 in /opt/homebrew/lib/python3.11/site-packages (from cryptography>=3.2->authlib>=1.1.0->weaviate-client) (1.16.0)\n",
"Requirement already satisfied: pycparser in /opt/homebrew/lib/python3.11/site-packages (from cffi>=1.12->cryptography>=3.2->authlib>=1.1.0->weaviate-client) (2.21)\n",
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.1\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3.11 -m pip install --upgrade pip\u001b[0m\n"
]
}
],
@ -48,7 +60,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "6b34828d-e627-4d85-aabd-eeb15d9f4b00",
"metadata": {},
@ -81,7 +92,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 4,
"id": "53b7ce2d-3c09-4d1c-b66b-5769ce6746ae",
"metadata": {},
"outputs": [],
@ -90,9 +101,18 @@
"WEAVIATE_API_KEY = os.environ[\"WEAVIATE_API_KEY\"]"
]
},
{
"cell_type": "markdown",
"id": "b867eb31",
"metadata": {},
"source": [
"## Similarity search\n",
"Below you can see a minimal example of how to approach a simple similarity search."
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 5,
"id": "aac9563e",
"metadata": {
"tags": []
@ -107,7 +127,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 6,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
@ -124,17 +144,22 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 7,
"id": "21e9e528",
"metadata": {},
"outputs": [],
"source": [
"db = Weaviate.from_documents(docs, embeddings, weaviate_url=WEAVIATE_URL, by_text=False)"
"db = Weaviate.from_documents(\n",
" docs, \n",
" embeddings, \n",
" weaviate_url=WEAVIATE_URL, \n",
" by_text=False\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 8,
"id": "b4170176",
"metadata": {},
"outputs": [],
@ -145,7 +170,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 9,
"id": "ecf3b890",
"metadata": {},
"outputs": [
@ -186,7 +211,7 @@
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": 10,
"id": "f6604f1d",
"metadata": {},
"outputs": [
@ -202,7 +227,8 @@
"import weaviate\n",
"\n",
"client = weaviate.Client(\n",
" url=WEAVIATE_URL, auth_client_secret=weaviate.AuthApiKey(WEAVIATE_API_KEY)\n",
" url=WEAVIATE_URL, \n",
" auth_client_secret=weaviate.AuthApiKey(WEAVIATE_API_KEY)\n",
")\n",
"\n",
"# client = weaviate.Client(\n",
@ -214,7 +240,10 @@
"# )\n",
"\n",
"vectorstore = Weaviate.from_documents(\n",
" documents, embeddings, client=client, by_text=False\n",
" documents, \n",
" embeddings, \n",
" client=client, \n",
" by_text=False\n",
")"
]
},
@ -239,7 +268,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 11,
"id": "102105a1",
"metadata": {},
"outputs": [
@ -265,7 +294,7 @@
"id": "8fc3487b",
"metadata": {},
"source": [
"# Persistence"
"## Persistence"
]
},
{
@ -273,7 +302,7 @@
"id": "281c0fcc",
"metadata": {},
"source": [
"Anything uploaded to weaviate is automatically persistent into the database. You do not need to call any specific method or pass any param for this to happen."
"Anything uploaded to Weaviate is automatically persistent into the database. You do not need to call any specific method or pass any parameters for this to happen."
]
},
{
@ -285,14 +314,14 @@
"\n",
"This section goes over different options for how to use Weaviate as a retriever.\n",
"\n",
"### MMR\n",
"### Maximal marginal relevance search (MMR)\n",
"\n",
"In addition to using similarity search in the retriever object, you can also use `mmr`."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 12,
"id": "8b7df7ae",
"metadata": {},
"outputs": [
@ -312,12 +341,53 @@
"retriever.get_relevant_documents(query)[0]"
]
},
{
"cell_type": "markdown",
"id": "4b14a3a5",
"metadata": {},
"source": [
"### Hybrid search\n",
"Weaviate also offers hybrid search. See [`WeaviateHybridSearchRetriever`](https://python.langchain.com/docs/integrations/retrievers/weaviate-hybrid) for reference."
]
},
{
"cell_type": "markdown",
"id": "508016e8",
"metadata": {},
"source": [
"## Use cases\n",
"As the following example shows, LLMs don't have access to knowledge outside of their training data. Thus, vector stores come in handy to provide LLMs with additional context."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "5299b13b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"As an AI language model, I don't have real-time information or the ability to browse the internet. Therefore, I cannot provide you with the most recent statements made by the president about Justice Breyer. However, it's worth noting that the president's opinions on Justice Breyer may vary depending on the specific context and time period. It would be best to refer to reliable news sources or official statements to get the most accurate and up-to-date information on this topic.\""
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0)\n",
"llm.predict(\"What did the president say about Justice Breyer\")"
]
},
{
"cell_type": "markdown",
"id": "fbd7a6cb",
"metadata": {},
"source": [
"## Question Answering with Sources"
"### Question Answering with Sources"
]
},
{
@ -330,7 +400,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 14,
"id": "5e824f3b",
"metadata": {},
"outputs": [],
@ -341,7 +411,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 15,
"id": "61209cc3",
"metadata": {},
"outputs": [],
@ -354,7 +424,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 16,
"id": "4abc3d37",
"metadata": {},
"outputs": [],
@ -370,7 +440,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 17,
"id": "c7062393",
"metadata": {},
"outputs": [],
@ -382,7 +452,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 18,
"id": "7e41b773",
"metadata": {},
"outputs": [
@ -404,6 +474,115 @@
" return_only_outputs=True,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "05007f8a",
"metadata": {},
"source": [
"### Retrieval-Augmented Generation"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "30f285a1",
"metadata": {},
"outputs": [],
"source": [
"with open(\"../../modules/state_of_the_union.txt\") as f:\n",
" state_of_the_union = f.read()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_text(state_of_the_union)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "08490f15",
"metadata": {},
"outputs": [],
"source": [
"docsearch = Weaviate.from_texts(\n",
" texts,\n",
" embeddings,\n",
" weaviate_url=WEAVIATE_URL,\n",
" by_text=False,\n",
" metadatas=[{\"source\": f\"{i}-pl\"} for i in range(len(texts))],\n",
")\n",
"\n",
"retriever = docsearch.as_retriever()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "499cb1f5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"input_variables=['context', 'question'] messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template=\"You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\\nQuestion: {question} \\nContext: {context} \\nAnswer:\\n\"))]\n"
]
}
],
"source": [
"from langchain.prompts import ChatPromptTemplate\n",
"\n",
"template = \"\"\"You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\n",
"Question: {question} \n",
"Context: {context} \n",
"Answer:\n",
"\"\"\"\n",
"prompt = ChatPromptTemplate.from_template(template)\n",
"\n",
"print(prompt)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "28d95686",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "c697d0cd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'The president thanked Justice Breyer for his service and dedication to the country.'"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.schema.runnable import RunnablePassthrough\n",
"from langchain.schema.output_parser import StrOutputParser\n",
"\n",
"rag_chain = (\n",
" {\"context\": retriever, \"question\": RunnablePassthrough()} \n",
" | prompt \n",
" | llm\n",
" | StrOutputParser() \n",
")\n",
"\n",
"rag_chain.invoke(\"What did the president say about Justice Breyer\")"
]
}
],
"metadata": {
@ -422,7 +601,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.11.4"
}
},
"nbformat": 4,

Loading…
Cancel
Save