docs `retriever` improvements (#4430)

# Docs: improvements in the `retrievers/examples/` notebooks

Its primary purpose is to make the Jupyter notebook examples
**consistent** and more suitable for first-time viewers.
- add links to the integration source (if applicable) with a short
description of this source;
- removed `_retriever` suffix from the file names (where it existed) for
consistency;
- removed ` retriever` from the notebook title (where it existed) for
consistency;
- added code to install necessary Python package(s);
- added code to set up the necessary API Key.
- very small fixes in notebooks from other folders (for consistency):
  - docs/modules/indexes/vectorstores/examples/elasticsearch.ipynb
  - docs/modules/indexes/vectorstores/examples/pinecone.ipynb
  - docs/modules/models/llms/integrations/cohere.ipynb
- fixed misspelling in langchain/retrievers/time_weighted_retriever.py
comment (sorry, about this change in a .py file )

## Who can review
@dev2049
docker
Leonid Ganeline 1 year ago committed by GitHub
parent 0147f845f1
commit b96ab4b763
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -357,7 +357,7 @@ Proprietary
Articles on **Google Scholar**
-----------------------------
------------------------------
LangChain is used in many scientific and research projects.

@ -19,7 +19,7 @@ LangChain provides integration with many LLMs and systems:
- `Toolkit Integrations <./modules/agents/toolkits.html>`_
Companies / Products
----------
--------------------
.. toctree::

@ -1,7 +1,6 @@
<!DOCTYPE html>
<html>
<head>
<title>Test Title</title>
<head><title>Test Title</title>
</head>
<body>

@ -5,30 +5,25 @@
"id": "1edb9e6b",
"metadata": {},
"source": [
"# ChatGPT Plugin Retriever\n",
"# ChatGPT Plugin\n",
"\n",
"This notebook shows how to use the ChatGPT Retriever Plugin within LangChain."
]
},
{
"cell_type": "markdown",
"id": "074b0004",
"metadata": {},
"source": [
"## Create\n",
">[OpenAI plugins](https://platform.openai.com/docs/plugins/introduction) connect ChatGPT to third-party applications. These plugins enable ChatGPT to interact with APIs defined by developers, enhancing ChatGPT's capabilities and allowing it to perform a wide range of actions.\n",
"\n",
"First, let's go over how to create the ChatGPT Retriever Plugin.\n",
">Plugins can allow ChatGPT to do things like:\n",
">- Retrieve real-time information; e.g., sports scores, stock prices, the latest news, etc.\n",
">- Retrieve knowledge-base information; e.g., company docs, personal notes, etc.\n",
">- Perform actions on behalf of the user; e.g., booking a flight, ordering food, etc.\n",
"\n",
"To set up the ChatGPT Retriever Plugin, please follow instructions [here](https://github.com/openai/chatgpt-retrieval-plugin).\n",
"\n",
"You can also create the ChatGPT Retriever Plugin from LangChain document loaders. The below code walks through how to do that."
"This notebook shows how to use the ChatGPT Retriever Plugin within LangChain."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "bbe89ca0",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# STEP 1: Load\n",
@ -72,11 +67,44 @@
"The below code walks through how to do that."
]
},
{
"cell_type": "markdown",
"id": "fb27da9f-d574-425d-8fab-92b03b997568",
"metadata": {},
"source": [
"We want to use `ChatGPTPluginRetriever` so we have to get the OpenAI API Key."
]
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 6,
"id": "b5d8c9e9-839f-42e9-933a-08195797dd4c",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
"OpenAI API Key: ········\n"
]
}
],
"source": [
"import os\n",
"import getpass\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "39d6074e",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.retrievers import ChatGPTPluginRetriever"
@ -84,9 +112,11 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 10,
"id": "33fd23d1",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"retriever = ChatGPTPluginRetriever(url=\"http://0.0.0.0:8000\", bearer_token=\"foo\")"
@ -140,7 +170,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -5,7 +5,10 @@
"id": "13afcae7",
"metadata": {},
"source": [
"# Self-querying retriever with Chroma\n",
"# Self-querying with Chroma\n",
"\n",
">[Chroma](https://docs.trychroma.com/getting-started) is a database for building AI applications with embeddings.\n",
"\n",
"In the notebook we'll demo the `SelfQueryRetriever` wrapped around a Chroma vector store. "
]
},
@ -17,24 +20,71 @@
"## Creating a Chroma vectorstore\n",
"First we'll want to create a Chroma VectorStore and seed it with some data. We've created a small demo set of documents that contain summaries of movies.\n",
"\n",
"NOTE: The self-query retriever requires you to have `lark` installed (`pip install lark`)"
"NOTE: The self-query retriever requires you to have `lark` installed (`pip install lark`). We also need the `chromadb` package."
]
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "63a8af5b",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# !pip install lark"
"#!pip install lark"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "cb4a5787",
"execution_count": null,
"id": "22431060-52c4-48a7-a97b-9f542b8b0928",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#!pip install chromadb"
]
},
{
"cell_type": "markdown",
"id": "83811610-7df3-4ede-b268-68a6a83ba9e2",
"metadata": {},
"source": [
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "dd01b61b-7d32-4a55-85d6-b2d2d4f18840",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
"OpenAI API Key: ········\n"
]
}
],
"source": [
"import os\n",
"import getpass\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "cb4a5787",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.schema import Document\n",
@ -46,9 +96,11 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 5,
"id": "bcbe04d9",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
@ -83,9 +135,11 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 6,
"id": "86e34dbf",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
@ -138,7 +192,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"query='dinosaur' filter=None limit=None\n"
"query='dinosaur' filter=None\n"
]
},
{
@ -170,7 +224,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"query=' ' filter=Comparison(comparator=<Comparator.GT: 'gt'>, attribute='rating', value=8.5) limit=None\n"
"query=' ' filter=Comparison(comparator=<Comparator.GT: 'gt'>, attribute='rating', value=8.5)\n"
]
},
{
@ -200,7 +254,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"query='women' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='director', value='Greta Gerwig') limit=None\n"
"query='women' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='director', value='Greta Gerwig')\n"
]
},
{
@ -229,7 +283,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"query=' ' filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.GT: 'gt'>, attribute='rating', value=8.5), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='genre', value='science fiction')]) limit=None\n"
"query=' ' filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='genre', value='science fiction'), Comparison(comparator=<Comparator.GT: 'gt'>, attribute='rating', value=8.5)])\n"
]
},
{
@ -258,7 +312,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"query='toys' filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.GT: 'gt'>, attribute='year', value=1990), Comparison(comparator=<Comparator.LT: 'lt'>, attribute='year', value=2005), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='genre', value='animated')]) limit=None\n"
"query='toys' filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.GT: 'gt'>, attribute='year', value=1990), Comparison(comparator=<Comparator.LT: 'lt'>, attribute='year', value=2005), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='genre', value='animated')])\n"
]
},
{
@ -279,7 +333,7 @@
},
{
"cell_type": "markdown",
"id": "87513116",
"id": "39bd1de1-b9fe-4a98-89da-58d8a7a6ae51",
"metadata": {},
"source": [
"## Filter k\n",
@ -291,9 +345,11 @@
},
{
"cell_type": "code",
"execution_count": 4,
"id": "73cfca56",
"metadata": {},
"execution_count": 7,
"id": "bff36b88-b506-4877-9c63-e5a1a8d78e64",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"retriever = SelfQueryRetriever.from_llm(\n",
@ -308,25 +364,29 @@
},
{
"cell_type": "code",
"execution_count": 6,
"id": "60110338",
"metadata": {},
"execution_count": 9,
"id": "2758d229-4f97-499c-819f-888acaf8ee10",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"query='dinosaur' filter=None limit=2\n"
"query='dinosaur' filter=None\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'rating': 7.7, 'genre': 'science fiction'}),\n",
" Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'})]"
" Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'}),\n",
" Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.6}),\n",
" Document(page_content='Leo DiCaprio gets lost in a dream within a dream within a dream within a ...', metadata={'year': 2010, 'director': 'Christopher Nolan', 'rating': 8.2})]"
]
},
"execution_count": 6,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
@ -339,7 +399,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "f15d84b3",
"id": "c93f0847-cbd9-4c25-aed1-91588e856b5c",
"metadata": {},
"outputs": [],
"source": []
@ -361,7 +421,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -7,9 +7,62 @@
"source": [
"# Cohere Reranker\n",
"\n",
">[Cohere](https://cohere.ai/about) is a Canadian startup that provides natural language processing models that help companies improve human-machine interactions.\n",
"\n",
"This notebook shows how to use [Cohere's rerank endpoint](https://docs.cohere.com/docs/reranking) in a retriever. This builds on top of ideas in the [ContextualCompressionRetriever](contextual-compression.ipynb)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4f5973bb-7897-4340-a8ce-c3365ee73b2f",
"metadata": {},
"outputs": [],
"source": [
"#!pip install cohere"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "b37bd138-4f3c-4d2c-bc4b-be705ce27a09",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#!pip install faiss\n",
"\n",
"# OR (depending on Python version)\n",
"\n",
"#!pip install faiss-cpu"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c47b0b26-6d51-4beb-aedb-ad09740a9a2b",
"metadata": {},
"outputs": [],
"source": [
"# get a new token: https://dashboard.cohere.ai/\n",
"\n",
"import os\n",
"import getpass\n",
"\n",
"os.environ['COHERE_API_KEY'] = getpass.getpass('Cohere API Key:')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2268c17f-5cc3-457b-928b-0d470154c3a8",
"metadata": {},
"outputs": [],
"source": [
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
]
},
{
"cell_type": "code",
"execution_count": 2,
@ -26,7 +79,10 @@
{
"cell_type": "markdown",
"id": "6fa3d916",
"metadata": {},
"metadata": {
"jp-MarkdownHeadingCollapsed": true,
"tags": []
},
"source": [
"## Set up the base vector store retriever\n",
"Let's start by initializing a simple vector store retriever and storing the 2023 State of the Union speech (in chunks). We can set up the retriever to retrieve a high number (20) of docs."
@ -410,7 +466,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -5,7 +5,7 @@
"id": "fc0db1bc",
"metadata": {},
"source": [
"# Contextual Compression Retriever\n",
"# Contextual Compression\n",
"\n",
"This notebook introduces the concept of DocumentCompressors and the ContextualCompressionRetriever. The core idea is simple: given a specific query, we should be able to return only the documents relevant to that query, and only the parts of those documents that are relevant. The ContextualCompressionsRetriever is a wrapper for another retriever that iterates over the initial output of the base retriever and filters and compresses those initial documents, so that only the most relevant information is returned."
]
@ -363,7 +363,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -1,16 +1,18 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "9fc6205b",
"metadata": {},
"source": [
"# Databerry\n",
"\n",
">[Databerry platform](https://docs.databerry.ai/introduction) brings data from anywhere (Datsources: Text, PDF, Word, PowerPpoint, Excel, Notion, Airtable, Google Sheets, etc..) into Datastores (container of multiple Datasources).\n",
"Then your Datastores can be connected to ChatGPT via Plugins or any other Large Langue Model (LLM) via the `Databerry API`.\n",
"\n",
"This notebook shows how to use [Databerry's](https://www.databerry.ai/) retriever.\n",
"\n",
"First, you will need to sign up for Databerry, create a datastore, add some data and get your datastore api endpoint url"
"First, you will need to sign up for Databerry, create a datastore, add some data and get your datastore api endpoint url. You need the [API Key](https://docs.databerry.ai/api-reference/authentication)."
]
},
{
@ -27,7 +29,9 @@
"cell_type": "code",
"execution_count": 1,
"id": "d0e6f506",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.retrievers import DataberryRetriever"
@ -35,9 +39,11 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 2,
"id": "f381f642",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"retriever = DataberryRetriever(\n",
@ -87,7 +93,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -7,16 +7,36 @@
"source": [
"# ElasticSearch BM25\n",
"\n",
"This notebook goes over how to use a retriever that under the hood uses ElasticSearcha and BM25.\n",
">[Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.\n",
"\n",
">In information retrieval, [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) (BM is an abbreviation of best matching) is a ranking function used by search engines to estimate the relevance of documents to a given search query. It is based on the probabilistic retrieval framework developed in the 1970s and 1980s by Stephen E. Robertson, Karen Spärck Jones, and others.\n",
"\n",
">The name of the actual ranking function is BM25. The fuller name, Okapi BM25, includes the name of the first system to use it, which was the Okapi information retrieval system, implemented at London's City University in the 1980s and 1990s. BM25 and its newer variants, e.g. BM25F (a version of BM25 that can take document structure and anchor text into account), represent TF-IDF-like retrieval functions used in document retrieval.\n",
"\n",
"This notebook shows how to use a retriever that uses `ElasticSearch` and `BM25`.\n",
"\n",
"For more information on the details of BM25 see [this blog post](https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "51b49135-a61a-49e8-869d-7c1d76794cd7",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#!pip install elasticsearch"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "393ac030",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.retrievers import ElasticSearchBM25Retriever"
@ -32,9 +52,11 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"id": "bcb3c8c2",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"elasticsearch_url=\"http://localhost:9200\"\n",
@ -156,7 +178,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -1,12 +1,13 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "ab66dd43",
"metadata": {},
"source": [
"# kNN Retriever\n",
"# kNN\n",
"\n",
">In statistics, the [k-nearest neighbors algorithm (k-NN)](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression.\n",
"\n",
"This notebook goes over how to use a retriever that under the hood uses an kNN.\n",
"\n",
@ -103,7 +104,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -7,6 +7,8 @@
"source": [
"# Metal\n",
"\n",
">[Metal](https://github.com/getmetal/metal-python) is a managed service for ML Embeddings.\n",
"\n",
"This notebook shows how to use [Metal's](https://docs.getmetal.io/introduction) retriever.\n",
"\n",
"First, you will need to sign up for Metal and get an API key. You can do so [here](https://docs.getmetal.io/misc-create-app)"
@ -148,7 +150,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -1,16 +1,43 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "ab66dd43",
"metadata": {},
"source": [
"# Pinecone Hybrid Search\n",
"\n",
">[Pinecone](https://docs.pinecone.io/docs/overview) is a vector database with broad functionality.\n",
"\n",
"This notebook goes over how to use a retriever that under the hood uses Pinecone and Hybrid Search.\n",
"\n",
"The logic of this retriever is taken from [this documentaion](https://docs.pinecone.io/docs/hybrid-search)"
"The logic of this retriever is taken from [this documentaion](https://docs.pinecone.io/docs/hybrid-search)\n",
"\n",
"To use Pinecone, you must have an API key and an Environment. \n",
"Here are the [installation instructions](https://docs.pinecone.io/docs/quickstart)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ab4ab62-9bb2-4ecf-9fbf-1af7f0be558b",
"metadata": {},
"outputs": [],
"source": [
"#!pip install pinecone-client"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bf0cf405-451d-4f87-94b1-2b7d65f1e1be",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import getpass\n",
"\n",
"os.environ['PINECONE_API_KEY'] = getpass.getpass('Pinecone API Key:')"
]
},
{
@ -23,6 +50,34 @@
"from langchain.retrievers import PineconeHybridSearchRetriever"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4577fea1-05e7-47a0-8173-56b0ddaa22bf",
"metadata": {},
"outputs": [],
"source": [
"os.environ['PINECONE_ENVIRONMENT'] = getpass.getpass('Pinecone Environment:')"
]
},
{
"cell_type": "markdown",
"id": "80e2e8e3-0fb5-4bd9-9196-9eada3439a61",
"metadata": {},
"source": [
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "314a7ee5-f498-45f6-8fdb-81428730083e",
"metadata": {},
"outputs": [],
"source": [
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
]
},
{
"cell_type": "markdown",
"id": "aaf80e7f",
@ -32,7 +87,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "95d5d7f9",
"metadata": {},
@ -109,7 +163,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "dbc025d6",
"metadata": {},
@ -131,7 +184,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "96bf8879",
"metadata": {},
@ -156,7 +208,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "23601ddb",
"metadata": {},
@ -269,7 +320,7 @@
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -283,7 +334,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
"version": "3.10.6"
},
"vscode": {
"interpreter": {

@ -5,8 +5,9 @@
"id": "13afcae7",
"metadata": {},
"source": [
"# Self-querying retriever\n",
"In the notebook we'll demo the `SelfQueryRetriever`, which, as the name suggests, has the ability to query itself. Specifically, given any natural language query, the retriever uses a query-constructing LLM chain to write a structured query and then applies that structured query to it's underlying VectorStore. This allows the retriever to not only use the user-input query for semantic similarity comparison with the contents of stored documented, but to also extract filters from the user query on the metadata of stored documents and to execute those filter."
"# Self-querying\n",
"\n",
"In the notebook we'll demo the `SelfQueryRetriever`, which, as the name suggests, has the ability to query itself. Specifically, given any natural language query, the retriever uses a query-constructing LLM chain to write a structured query and then applies that structured query to it's underlying VectorStore. This allows the retriever to not only use the user-input query for semantic similarity comparison with the contents of stored documented, but to also extract filters from the user query on the metadata of stored documents and to execute those filters."
]
},
{
@ -15,9 +16,11 @@
"metadata": {},
"source": [
"## Creating a Pinecone index\n",
"First we'll want to create a Pinecone VectorStore and seed it with some data. We've created a small demo set of documents that contain summaries of movies.\n",
"First we'll want to create a `Pinecone` VectorStore and seed it with some data. We've created a small demo set of documents that contain summaries of movies.\n",
"\n",
"To use Pinecone, you to have `pinecone` package installed and you must have an API key and an Environment. Here are the [installation instructions](https://docs.pinecone.io/docs/quickstart).\n",
"\n",
"NOTE: The self-query retriever requires you to have `lark` installed (`pip install lark`)"
"NOTE: The self-query retriever requires you to have `lark` package installed."
]
},
{
@ -30,6 +33,16 @@
"# !pip install lark"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2f633445-57fe-45f3-84f7-80d3941b9e53",
"metadata": {},
"outputs": [],
"source": [
"#!pip install pinecone-client"
]
},
{
"cell_type": "code",
"execution_count": 1,
@ -352,7 +365,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -5,32 +5,83 @@
"id": "ab66dd43",
"metadata": {},
"source": [
"# SVM Retriever\n",
"# SVM\n",
"\n",
"This notebook goes over how to use a retriever that under the hood uses an SVM using scikit-learn.\n",
">[Support vector machines (SVMs)](https://scikit-learn.org/stable/modules/svm.html#support-vector-machines) are a set of supervised learning methods used for classification, regression and outliers detection.\n",
"\n",
"This notebook goes over how to use a retriever that under the hood uses an `SVM` using `scikit-learn` package.\n",
"\n",
"Largely based on https://github.com/karpathy/randomfun/blob/master/knn_vs_svm.ipynb"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "393ac030",
"metadata": {},
"execution_count": null,
"id": "a801b57c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.retrievers import SVMRetriever\n",
"from langchain.embeddings import OpenAIEmbeddings"
"#!pip install scikit-learn"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a801b57c",
"metadata": {},
"execution_count": null,
"id": "05b33419-fd3e-49c6-bae3-f20195d09c0c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# !pip install scikit-learn"
"#!pip install lark"
]
},
{
"cell_type": "markdown",
"id": "cc5e2d59-9510-40b2-a810-74af28e5a5e8",
"metadata": {
"tags": []
},
"source": [
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f9936d67-0471-4a82-954b-033c46ddb303",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
"OpenAI API Key: ········\n"
]
}
],
"source": [
"import os\n",
"import getpass\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "393ac030",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.retrievers import SVMRetriever\n",
"from langchain.embeddings import OpenAIEmbeddings"
]
},
{
@ -43,9 +94,11 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 6,
"id": "98b1c017",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"retriever = SVMRetriever.from_texts([\"foo\", \"bar\", \"world\", \"hello\", \"foo bar\"], OpenAIEmbeddings())"
@ -63,9 +116,11 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 9,
"id": "c0455218",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"result = retriever.get_relevant_documents(\"foo\")"
@ -73,9 +128,11 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 10,
"id": "7dfa5c29",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
@ -86,7 +143,7 @@
" Document(page_content='world', metadata={})]"
]
},
"execution_count": 5,
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
@ -120,7 +177,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -5,31 +5,35 @@
"id": "ab66dd43",
"metadata": {},
"source": [
"# TF-IDF Retriever\n",
"# TF-IDF\n",
"\n",
"This notebook goes over how to use a retriever that under the hood uses TF-IDF using scikit-learn.\n",
">[TF-IDF](https://scikit-learn.org/stable/modules/feature_extraction.html#tfidf-term-weighting) means term-frequency times inverse document-frequency.\n",
"\n",
"This notebook goes over how to use a retriever that under the hood uses [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) using `scikit-learn` package.\n",
"\n",
"For more information on the details of TF-IDF see [this blog post](https://medium.com/data-science-bootcamp/tf-idf-basics-of-information-retrieval-48de122b2a4c)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "393ac030",
"execution_count": null,
"id": "a801b57c",
"metadata": {},
"outputs": [],
"source": [
"from langchain.retrievers import TFIDFRetriever"
"# !pip install scikit-learn"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a801b57c",
"metadata": {},
"execution_count": 3,
"id": "393ac030",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# !pip install scikit-learn"
"from langchain.retrievers import TFIDFRetriever"
]
},
{
@ -42,9 +46,11 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"id": "98b1c017",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"retriever = TFIDFRetriever.from_texts([\"foo\", \"bar\", \"world\", \"hello\", \"foo bar\"])"
@ -62,9 +68,11 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"id": "c0455218",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"result = retriever.get_relevant_documents(\"foo\")"
@ -72,9 +80,11 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 6,
"id": "7dfa5c29",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
@ -85,7 +95,7 @@
" Document(page_content='world', metadata={})]"
]
},
"execution_count": 5,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@ -119,7 +129,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -5,17 +5,17 @@
"id": "a90b7557",
"metadata": {},
"source": [
"# Time Weighted VectorStore Retriever\n",
"# Time Weighted VectorStore\n",
"\n",
"This retriever uses a combination of semantic similarity and recency.\n",
"This retriever uses a combination of semantic similarity and a time decay.\n",
"\n",
"The algorithm for scoring them is:\n",
"\n",
"```\n",
"```python\n",
"semantic_similarity + (1.0 - decay_rate) ** hours_passed\n",
"```\n",
"\n",
"Notably, hours_passed refers to the hours passed since the object in the retriever **was last accessed**, not since it was created. This means that frequently accessed objects remain \"fresh.\""
"Notably, `hours_passed` refers to the hours passed since the object in the retriever **was last accessed**, not since it was created. This means that frequently accessed objects remain \"fresh.\""
]
},
{
@ -42,7 +42,7 @@
"source": [
"## Low Decay Rate\n",
"\n",
"A low decay rate (in this, to be extreme, we will set close to 0) means memories will be \"remembered\" for longer. A decay rate of 0 means memories never be forgotten, making this retriever equivalent to the vector lookup."
"A low `decay rate` (in this, to be extreme, we will set close to 0) means memories will be \"remembered\" for longer. A `decay rate` of 0 means memories never be forgotten, making this retriever equivalent to the vector lookup."
]
},
{
@ -113,7 +113,7 @@
"source": [
"## High Decay Rate\n",
"\n",
"With a high decay factor (e.g., several 9's), the recency score quickly goes to 0! If you set this all the way to 1, recency is 0 for all objects, once again making this equivalent to a vector lookup.\n"
"With a high `decay rate` (e.g., several 9's), the `recency score` quickly goes to 0! If you set this all the way to 1, `recency` is 0 for all objects, once again making this equivalent to a vector lookup.\n"
]
},
{
@ -243,7 +243,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -5,9 +5,9 @@
"id": "fc0db1bc",
"metadata": {},
"source": [
"# VectorStore Retriever\n",
"# VectorStore\n",
"\n",
"The index - and therefore the retriever - that LangChain has the most support for is a VectorStoreRetriever. As the name suggests, this retriever is backed heavily by a VectorStore.\n",
"The index - and therefore the retriever - that LangChain has the most support for is the `VectorStoreRetriever`. As the name suggests, this retriever is backed heavily by a VectorStore.\n",
"\n",
"Once you construct a VectorStore, its very easy to construct a retriever. Let's walk through an example."
]
@ -25,10 +25,18 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 9,
"id": "9fbcc58f",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Exiting: Cleaning up .chroma directory\n"
]
}
],
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import FAISS\n",
@ -195,7 +203,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -0,0 +1,138 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ce0f17b9",
"metadata": {},
"source": [
"# Vespa\n",
"\n",
">[Vespa](https://vespa.ai/) is a fully featured search engine and vector database. It supports vector search (ANN), lexical search, and search in structured data, all in the same query.\n",
"\n",
"This notebook shows how to use `Vespa.ai` as a LangChain retriever.\n",
"\n",
"In order to create a retriever, we use [pyvespa](https://pyvespa.readthedocs.io/en/latest/index.html) to\n",
"create a connection a `Vespa` service."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7e6a11ab-38bd-4920-ba11-60cb2f075754",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#!pip install pyvespa"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "c10dd962",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from vespa.application import Vespa\n",
"\n",
"vespa_app = Vespa(url=\"https://doc-search.vespa.oath.cloud\")"
]
},
{
"cell_type": "markdown",
"id": "3df4ce53",
"metadata": {},
"source": [
"This creates a connection to a `Vespa` service, here the Vespa documentation search service.\n",
"Using `pyvespa` package, you can also connect to a\n",
"[Vespa Cloud instance](https://pyvespa.readthedocs.io/en/latest/deploy-vespa-cloud.html)\n",
"or a local\n",
"[Docker instance](https://pyvespa.readthedocs.io/en/latest/deploy-docker.html).\n",
"\n",
"\n",
"After connecting to the service, you can set up the retriever:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7ccca1f4",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"from langchain.retrievers.vespa_retriever import VespaRetriever\n",
"\n",
"vespa_query_body = {\n",
" \"yql\": \"select content from paragraph where userQuery()\",\n",
" \"hits\": 5,\n",
" \"ranking\": \"documentation\",\n",
" \"locale\": \"en-us\"\n",
"}\n",
"vespa_content_field = \"content\"\n",
"retriever = VespaRetriever(vespa_app, vespa_query_body, vespa_content_field)"
]
},
{
"cell_type": "markdown",
"id": "1e7e34e1",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"This sets up a LangChain retriever that fetches documents from the Vespa application.\n",
"Here, up to 5 results are retrieved from the `content` field in the `paragraph` document type,\n",
"using `doumentation` as the ranking method. The `userQuery()` is replaced with the actual query\n",
"passed from LangChain.\n",
"\n",
"Please refer to the [pyvespa documentation](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa.html#Query)\n",
"for more information.\n",
"\n",
"Now you can return the results and continue using the results in LangChain."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "f47a2bfe",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"retriever.get_relevant_documents(\"what is vespa?\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -1,226 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ce0f17b9",
"metadata": {},
"source": [
"# Vespa retriever\n",
"\n",
"This notebook shows how to use Vespa.ai as a LangChain retriever.\n",
"Vespa.ai is a platform for highly efficient structured text and vector search.\n",
"Please refer to [Vespa.ai](https://vespa.ai) for more information.\n",
"\n",
"In this example we'll work with the public [cord-19-search](https://github.com/vespa-cloud/cord-19-search) app which serves an index for the [CORD-19](https://allenai.org/data/cord-19) dataset containing Covid-19 research papers.\n",
"\n",
"In order to create a retriever, we use [pyvespa](https://pyvespa.readthedocs.io/en/latest/index.html) to\n",
"create a connection a Vespa service."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "101c8eb3",
"metadata": {},
"outputs": [],
"source": [
"# Uncomment below if you haven't install pyvespa\n",
"\n",
"# !pip install pyvespa"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "9f0406d2",
"metadata": {},
"outputs": [],
"source": [
"def _pretty_print(docs):\n",
" for doc in docs:\n",
" print(\"-\" * 80)\n",
" print(\"CONTENT: \" + doc.page_content + \"\\n\")\n",
" print(\"METADATA: \" + str(doc.metadata))\n",
" print(\"-\" * 80)"
]
},
{
"cell_type": "markdown",
"id": "3db3bfea",
"metadata": {},
"source": [
"## Retrieving documents"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "d83331fa",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"from langchain.retrievers import VespaRetriever\n",
"\n",
"# Retrieve the abstracts of the top 2 papers that best match the user query.\n",
"retriever = VespaRetriever.from_params(\n",
" 'https://api.cord19.vespa.ai', \n",
" \"abstract\",\n",
" k=2,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f47a2bfe",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------------------------------------------\n",
"CONTENT: <sep />and peak hospitalizations by 4-96x, without contact tracing. Although contact tracing was highly <hi>effective</hi> at reducing spread, it was insufficient to stop outbreaks caused by <hi>travellers</hi> in even the best-case scenario, and the likelihood of exceeding contact tracing capacity was a concern in most scenarios. Quarantine compliance had only a small impact on <hi>COVID</hi> spread; <hi>travel</hi> volume and infection rate drove spread. Interpretation: NL's <hi>travel</hi> <hi>ban</hi> was likely a critically important intervention to prevent <hi>COVID</hi> spread. Even a small number<sep />\n",
"\n",
"METADATA: {'id': 'index:content/1/544bbfee3466d2c126719d5f'}\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"CONTENT: How <hi>effective</hi> are restrictions on mobility in limiting <hi>COVID</hi>-19 spread? Using zip code data across five U.S. cities, we estimate that total cases per capita decrease by 20% for every ten percentage point fall in mobility. Addressing endogeneity concerns, we instrument for <hi>travel</hi> by residential teleworkable and essential shares and find a 27% decline in cases per capita. Using panel data for NYC with week and zip code fixed effects, we estimate a decline of 17%. We find substantial spatial and temporal heterogeneity;east coast cities have stronger effects, with the largest for NYC<sep />\n",
"\n",
"METADATA: {'id': 'index:content/0/911dfc6986f1c8bc15fc3a26'}\n",
"--------------------------------------------------------------------------------\n"
]
}
],
"source": [
"docs = retriever.get_relevant_documents(\"How effective are covid travel bans?\")\n",
"_pretty_print(docs)"
]
},
{
"cell_type": "markdown",
"id": "4a158b8e",
"metadata": {},
"source": [
"## Configuring the retriever\n",
"We can further configure our results by specifying metadata fields to retrieve, specifying sources to pull from, adding filters and adding index-specific parameters."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "dc6be773",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------------------------------------------\n",
"CONTENT: ...and peak hospitalizations by 4-96x, without contact tracing. Although contact tracing was highly effective at reducing spread, it was insufficient to stop outbreaks caused by travellers in even the best-case scenario, and the likelihood of exceeding contact tracing capacity was a concern in most scenarios. Quarantine compliance had only a small impact on COVID spread; travel volume and infection rate drove spread. Interpretation: NL's travel ban was likely a critically important intervention to prevent COVID spread. Even a small number...\n",
"\n",
"METADATA: {'matchfeatures': {'bm25': 35.5404665009022, 'colbert_maxsim': 78.48671418428421}, 'sddocname': 'doc', 'title': \"How effective was Newfoundland & Labrador's travel ban to prevent the spread of COVID-19? An agent-based analysis\", 'id': 'index:content/1/544bbfee3466d2c126719d5f', 'timestamp': 1612738800, 'license': 'medrxiv', 'doi': 'https://doi.org/10.1101/2021.02.05.21251157', 'authors': [{'first': ' D. M.', 'name': ' D. M. Aleman', 'last': 'Aleman'}, {'first': ' B. Z.', 'name': ' B. Z. Tham', 'last': ' Tham'}, {'first': ' S. J.', 'name': ' S. J. Wagner', 'last': ' Wagner'}, {'first': ' J.', 'name': ' J. Semelhago', 'last': ' Semelhago'}, {'first': ' A.', 'name': ' A. Mohammadi', 'last': ' Mohammadi'}, {'first': ' P.', 'name': ' P. Price', 'last': ' Price'}, {'first': ' R.', 'name': ' R. Giffen', 'last': ' Giffen'}, {'first': ' P.', 'name': ' P. Rahman', 'last': ' Rahman'}], 'source': 'MedRxiv; WHO', 'cord_uid': '9b9kt4sp'}\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"CONTENT: ...reduction in COVID-19 importation and a delay of the COVID-19 outbreak in Australia by approximately one month. Further projection of COVID-19 to May 2020 showed spread patterns depending on the basic reproduction number. CONCLUSION: Imposing the travel ban was effective in delaying widespread transmission of COVID-19. However, strengthening of the domestic control measures is needed to prevent Australia from becoming another epicentre. Implications for public health: This report has shown the importance of border closure to pandemic control.\n",
"\n",
"METADATA: {'matchfeatures': {'bm25': 32.398379319326295, 'colbert_maxsim': 73.91238763928413}, 'sddocname': 'doc', 'title': 'Delaying the COVID-19 epidemic in Australia: evaluating the effectiveness of international travel bans', 'id': 'index:content/1/decd6a8642418607b0d7dff9', 'timestamp': 0, 'license': 'unk', 'authors': [{'first': ' Adeshina', 'name': ' Adeshina Adekunle', 'last': 'Adekunle'}, {'first': ' Michael', 'name': ' Michael Meehan', 'last': ' Meehan'}, {'first': ' Diana', 'name': ' Diana Rojas-Alvarez', 'last': ' Rojas-Alvarez'}, {'first': ' James', 'name': ' James Trauer', 'last': ' Trauer'}, {'first': ' Emma', 'name': ' Emma McBryde', 'last': ' McBryde'}], 'source': 'WHO', 'cord_uid': 'jdh33itm', 'journal': 'Aust N Z J Public Health'}\n",
"--------------------------------------------------------------------------------\n"
]
}
],
"source": [
"retriever = VespaRetriever.from_params(\n",
" 'https://api.cord19.vespa.ai', \n",
" \"abstract\",\n",
" k=2,\n",
" metadata_fields=\"*\", # return all data fields and store as metadata\n",
" ranking=\"hybrid-colbert\", # other valid values: colbert, bm25\n",
" bolding=False,\n",
")\n",
"docs = retriever.get_relevant_documents(\"How effective are covid travel bans?\")\n",
"_pretty_print(docs)"
]
},
{
"cell_type": "markdown",
"id": "11242e84",
"metadata": {},
"source": [
"# Querying with filtering conditions\n",
"\n",
"Vespa has powerful querying abilities, and lets you specify many different conditions in YQL. You can add these filtering conditions using the `get_relevant_documents_with_filter` function.\n",
"\n",
"Read more on the Vespa query language here: https://docs.vespa.ai/en/query-language.html"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "223aeaa9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------------------------------------------\n",
"CONTENT: Importance: As countermeasures against the economic downturn caused by the coronavirus 2019 (COVID-19) pandemic, many countries have introduced or considering financial incentives for people to engage in economic activities such as travel and use restaurants. Japan has implemented a large-scale, nationwide government-funded program that subsidizes up to 50% of all travel expenses since July 2020 with the aim of reviving the travel industry. However, it remains unknown as to how such provision of government subsidies for travel impacted the COVID-19 pandemic...\n",
"\n",
"METADATA: {'matchfeatures': {'bm25': 22.54935242101209, 'colbert_maxsim': 55.04242363572121}, 'sddocname': 'doc', 'title': 'Association between Participation in Government Subsidy Program for Domestic Travel and Symptoms Indicative of COVID-19 Infection', 'journal': 'medRxiv : the preprint server for health sciences', 'id': 'index:content/0/d88422d1d176ab0a854caccc', 'timestamp': 1607036400, 'license': 'medrxiv', 'doi': 'https://doi.org/10.1101/2020.12.03.20243352', 'authors': [{'first': ' A.', 'name': ' A. Miyawaki', 'last': 'Miyawaki'}, {'first': ' T.', 'name': ' T. Tabuchi', 'last': ' Tabuchi'}, {'first': ' Y.', 'name': ' Y. Tomata', 'last': ' Tomata'}, {'first': ' Y.', 'name': ' Y. Tsugawa', 'last': ' Tsugawa'}], 'source': 'MedRxiv; Medline; WHO', 'cord_uid': '0isi7yd4'}\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"CONTENT: The Japanese government has declared a national emergency and travel entry ban since the coronavirus disease 2019 (COVID-19) pandemic began. As of June 19, 2020, there have been no confirmed cases of COVID-19 in Iwate, a prefecture of Japan. Here, we analyzed the excess deaths as well as the number of patients and medical earnings due to the pandemic from prefectural ...\n",
"\n",
"METADATA: {'matchfeatures': {'bm25': 19.348708049098548, 'colbert_maxsim': 58.35367426276207}, 'sddocname': 'doc', 'title': 'Affected medical services in Iwate prefecture in the absence of a COVID-19 outbreak', 'id': 'index:content/1/9f27176791532b37ef8e4a24', 'timestamp': 1592604000, 'license': 'medrxiv', 'doi': 'https://doi.org/10.1101/2020.06.19.20135269', 'authors': [{'first': ' N.', 'name': ' N. Sasaki', 'last': 'Sasaki'}, {'first': ' S. S.', 'name': ' S. S. Nishizuka', 'last': ' Nishizuka'}], 'source': 'MedRxiv; WHO', 'cord_uid': '7egroqb1'}\n",
"--------------------------------------------------------------------------------\n"
]
}
],
"source": [
"docs = retriever.get_relevant_documents_with_filter(\n",
" \"How effective are covid travel bans?\", \n",
" _filter='abstract contains \"Japan\" and license matches \"medrxiv\"'\n",
")\n",
"_pretty_print(docs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "13039caf",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -7,7 +7,25 @@
"source": [
"# Weaviate Hybrid Search\n",
"\n",
"This notebook shows how to use [Weaviate hybrid search](https://weaviate.io/blog/hybrid-search-explained) as a LangChain retriever."
">[Weaviate](https://weaviate.io/developers/weaviate) is an open source vector database.\n",
"\n",
">[Hybrid search](https://weaviate.io/blog/hybrid-search-explained) is a technique that combines multiple search algorithms to improve the accuracy and relevance of search results. It uses the best features of both keyword-based search algorithms with vector search techniques.\n",
"\n",
">The `Hybrid search in Weaviate` uses sparse and dense vectors to represent the meaning and context of search queries and documents.\n",
"\n",
"This notebook shows how to use `Weaviate hybrid search` as a LangChain retriever."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bba863a2-977c-4add-b5f4-bfc33a80eae5",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#!pip install weaviate-client"
]
},
{
@ -124,7 +142,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

@ -7,7 +7,7 @@
"source": [
"# ElasticSearch\n",
"\n",
"[Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.\n",
">[Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.\n",
"\n",
"This notebook shows how to use functionality related to the `Elasticsearch` database."
]

@ -7,7 +7,7 @@
"source": [
"# Pinecone\n",
"\n",
"[Pinecone](https://docs.pinecone.io/docs/overview) is a vector database with broad functionality.\n",
">[Pinecone](https://docs.pinecone.io/docs/overview) is a vector database with broad functionality.\n",
"\n",
"This notebook shows how to use functionality related to the `Pinecone` vector database.\n",
"\n",

@ -7,7 +7,7 @@
"source": [
"# Cohere\n",
"\n",
"[Cohere](https://cohere.ai/about) is a Canadian startup that provides natural language processing models that help companies improve human-machine interactions.\n",
">[Cohere](https://cohere.ai/about) is a Canadian startup that provides natural language processing models that help companies improve human-machine interactions.\n",
"\n",
"This example goes over how to use LangChain to interact with `Cohere` [models](https://docs.cohere.ai/docs/generation-card)."
]

@ -13,7 +13,7 @@ class BaseLoader(ABC):
Implementations should implement the lazy-loading method using generators
to avoid loading all documents into memory at once.
The `load` method will remain as is for backwards compatibility, but it's
The `load` method will remain as is for backwards compatibility, but its
implementation should be just `list(self.lazy_load())`.
"""

@ -15,7 +15,7 @@ def _get_hours_passed(time: datetime.datetime, ref_time: datetime.datetime) -> f
class TimeWeightedVectorStoreRetriever(BaseRetriever, BaseModel):
"""Retriever combining embededing similarity with recency."""
"""Retriever combining embedding similarity with recency."""
vectorstore: VectorStore
"""The vectorstore to store documents and determine salience."""

Loading…
Cancel
Save