Merge pull request #238 from liuliuOD/fix/vector_databases

fix: about spelling and command
pull/1077/head
colin-openai 1 year ago committed by GitHub
commit f9e2dbd5a9

@ -13,7 +13,7 @@
"\n",
"In this notebook we will learn how to query relevant contexts to our queries from Pinecone, and pass these to a generative OpenAI model to generate an answer backed by real data sources.\n",
"\n",
"A common problem with using GPT-3 to factually answer questions is that GPT-3 can sometimes make things up. The GPT models have a broad range of general knowledge, but this does not necessarily apply to more specific information. For that we use the Pinecone vector database as our _\"external knowledge base\"_ — like *long-term memory for GPT-3.\n",
"A common problem with using GPT-3 to factually answer questions is that GPT-3 can sometimes make things up. The GPT models have a broad range of general knowledge, but this does not necessarily apply to more specific information. For that we use the Pinecone vector database as our _\"external knowledge base\"_ — like *long-term memory* for GPT-3.\n",
"\n",
"Required installs for this notebook are:"
]
@ -212,7 +212,7 @@
"The best training method to use for fine-tuning a pre-trained model with sentence transformers is the Masked Language Model (MLM) training. MLM training involves randomly masking some of the words in a sentence and then training the model to predict the masked words. This helps the model to learn the context of the sentence and better understand the relationships between words.\n",
"```\n",
"\n",
"This answer seems pretty convincing right? Yet, it's wrong. MLM is typically used in the pretraining step of a transformer model but *cannot* be used to fine-tune a sentence-transformer, and has nothing to do with having _\"pairs of related sentences\"_.\n",
"This answer seems pretty convincing right? Yet, it's wrong. MLM is typically used in the pretraining step of a transformer model but *\"cannot\"* be used to fine-tune a sentence-transformer, and has nothing to do with having _\"pairs of related sentences\"_.\n",
"\n",
"An alternative answer we receive (and the one we returned above) is about `supervised learning approach` being the most suitable. This is completely true, but it's not specific and doesn't answer the question.\n",
"\n",
@ -555,7 +555,7 @@
"id": "VMyJjt1cnwcH"
},
"source": [
"Now we need a place to store these embeddings and enable a efficient _vector search_ through them all. To do that we use Pinecone, we can get a [free API key](https://app.pinecone.io) and enter it below where we will initialize our connection to Pinecone and create a new index."
"Now we need a place to store these embeddings and enable a efficient _vector search_ through them all. To do that we use **`Pinecone`**, we can get a [free API key](https://app.pinecone.io) and enter it below where we will initialize our connection to `Pinecone` and create a new index."
]
},
{
@ -660,7 +660,6 @@
],
"source": [
"from tqdm.auto import tqdm\n",
"import datetime\n",
"from time import sleep\n",
"\n",
"batch_size = 100 # how many embeddings we create and insert at once\n",

@ -6,7 +6,7 @@
"source": [
"# Using Qdrant as a vector database for OpenAI embeddings\n",
"\n",
"This notebook guides you step by step on using Qdrant as a vector database for OpenAI embeddings. [Qdrant](https://qdrant.tech) is a high-performant vector search database written in Rust. It offers REST and gRPC APIs to manage your embeddings. There is an official Python [qdrant-client](https://github.com/qdrant/qdrant_client) that eases the integration with your apps.\n",
"This notebook guides you step by step on using **`Qdrant`** as a vector database for OpenAI embeddings. [Qdrant](https://qdrant.tech) is a high-performant vector search database written in Rust. It offers RESTful and gRPC APIs to manage your embeddings. There is an official Python [qdrant-client](https://github.com/qdrant/qdrant_client) that eases the integration with your apps.\n",
"\n",
"This notebook presents an end-to-end process of:\n",
"1. Using precomputed embeddings created by OpenAI API.\n",
@ -28,7 +28,7 @@
"\n",
"### Integration\n",
"\n",
"[Qdrant](https://qdrant.tech) provides both REST and gRPC APIs which makes integration easy, no matter the programming language you use. However, there are some official clients for the most popular languages available, and if you use Python then the [Python Qdrant client library](https://github.com/qdrant/qdrant_client) might be the best choice."
"[Qdrant](https://qdrant.tech) provides both RESTful and gRPC APIs which makes integration easy, no matter the programming language you use. However, there are some official clients for the most popular languages available, and if you use Python then the [Python Qdrant client library](https://github.com/qdrant/qdrant_client) might be the best choice."
]
},
{
@ -132,7 +132,16 @@
"\n",
"If you don't have an OpenAI API key, you can get one from [https://beta.openai.com/account/api-keys](https://beta.openai.com/account/api-keys).\n",
"\n",
"Once you get your key, please add it to your environment variables as `OPENAI_API_KEY`."
"Once you get your key, please add it to your environment variables as `OPENAI_API_KEY` by running following command:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! export OPENAI_API_KEY=\"your API key\""
]
},
{

@ -122,7 +122,16 @@
"\n",
"If you don't have an OpenAI API key, you can get one from [https://beta.openai.com/account/api-keys](https://beta.openai.com/account/api-keys).\n",
"\n",
"Once you get your key, please add it to your environment variables as `OPENAI_API_KEY`."
"Once you get your key, please add it to your environment variables as `OPENAI_API_KEY` by running following command:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! export OPENAI_API_KEY=\"your API key\""
]
},
{

@ -38,7 +38,7 @@
"source": [
"## Prerequisites\n",
"\n",
"Before we start this project, we need setup the following:\n",
"Before we start this project, we need to set up the following:\n",
"\n",
"* start a Redis database with RediSearch (redis-stack)\n",
"* install libraries\n",
@ -52,7 +52,7 @@
"To keep this example simple, we will use the Redis Stack docker container which we can start as follows\n",
"\n",
"```bash\n",
"$ docker compose up -d\n",
"$ docker-compose up -d\n",
"```\n",
"\n",
"This also includes the [RedisInsight](https://redis.com/redis-enterprise/redis-insight/) GUI for managing your Redis database which you can view at [http://localhost:8001](http://localhost:8001) once you start the docker container.\n",
@ -78,7 +78,7 @@
"metadata": {},
"outputs": [],
"source": [
"!pip install redis wget pandas openai"
"! pip install redis wget pandas openai"
]
},
{
@ -94,7 +94,17 @@
"\n",
"If you don't have an OpenAI API key, you can get one from [https://beta.openai.com/account/api-keys](https://beta.openai.com/account/api-keys).\n",
"\n",
"Once you get your key, please add it to your environment variables as `OPENAI_API_KEY`."
"Once you get your key, please add it to your environment variables as `OPENAI_API_KEY` by using following command:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "be28faa6",
"metadata": {},
"outputs": [],
"source": [
"! export OPENAI_API_KEY=\"your API key\""
]
},
{
@ -351,7 +361,7 @@
"source": [
"## Creating a Search Index in Redis\n",
"\n",
"The below cells will show how to specify and create a search index in Redis. We will\n",
"The below cells will show how to specify and create a search index in Redis. We will:\n",
"\n",
"1. Set some constants for defining our index like the distance metric and the index name\n",
"2. Define the index schema with RediSearch fields\n",
@ -432,7 +442,7 @@
"source": [
"## Load Documents into the Index\n",
"\n",
"Now that we have a search index, we can load documents into it. We will use the same documents we used in the previous examples. In Redis, either the Hash or JSON (if using RedisJSON in addition to RediSearch) data types can be used to store documents. We will use the HASH data type in this example. The below cells will show how to load documents into the index."
"Now that we have a search index, we can load documents into it. We will use the same documents we used in the previous examples. In Redis, either the HASH or JSON (if using RedisJSON in addition to RediSearch) data types can be used to store documents. We will use the HASH data type in this example. The below cells will show how to load documents into the index."
]
},
{

Loading…
Cancel
Save