{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "683953b3", "metadata": {}, "source": [ "# Pinecone\n", "\n", ">[Pinecone](https://docs.pinecone.io/docs/overview) is a vector database with broad functionality.\n", "\n", "This notebook shows how to use functionality related to the `Pinecone` vector database.\n", "\n", "To use Pinecone, you must have an API key. \n", "Here are the [installation instructions](https://docs.pinecone.io/docs/quickstart)." ] }, { "cell_type": "code", "execution_count": null, "id": "b4c41cad-08ef-4f72-a545-2151e4598efe", "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install pinecone-client openai tiktoken langchain" ] }, { "cell_type": "code", "execution_count": null, "id": "c1e38361-c1fe-4ac6-86e9-c90ebaf7ae87", "metadata": {}, "outputs": [], "source": [ "import os\n", "import getpass\n", "\n", "os.environ[\"PINECONE_API_KEY\"] = getpass.getpass(\"Pinecone API Key:\")" ] }, { "cell_type": "code", "execution_count": null, "id": "02a536e0-d603-4d79-b18b-1ed562977b40", "metadata": {}, "outputs": [], "source": [ "os.environ[\"PINECONE_ENV\"] = getpass.getpass(\"Pinecone Environment:\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "320af802-9271-46ee-948f-d2453933d44b", "metadata": {}, "source": [ "We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key." ] }, { "cell_type": "code", "execution_count": null, "id": "ffea66e4-bc23-46a9-9580-b348dfe7b7a7", "metadata": {}, "outputs": [], "source": [ "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")" ] }, { "cell_type": "code", "execution_count": null, "id": "aac9563e", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.embeddings.openai import OpenAIEmbeddings\n", "from langchain.text_splitter import CharacterTextSplitter\n", "from langchain.vectorstores import Pinecone\n", "from langchain.document_loaders import TextLoader" ] }, { "cell_type": "code", "execution_count": null, "id": "a3c3999a", "metadata": {}, "outputs": [], "source": [ "from langchain.document_loaders import TextLoader\n", "\n", "loader = TextLoader(\"../../../state_of_the_union.txt\")\n", "documents = loader.load()\n", "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", "docs = text_splitter.split_documents(documents)\n", "\n", "embeddings = OpenAIEmbeddings()" ] }, { "cell_type": "code", "execution_count": null, "id": "6e104aee", "metadata": {}, "outputs": [], "source": [ "import pinecone\n", "\n", "# initialize pinecone\n", "pinecone.init(\n", " api_key=os.getenv(\"PINECONE_API_KEY\"), # find at app.pinecone.io\n", " environment=os.getenv(\"PINECONE_ENV\"), # next to api key in console\n", ")\n", "\n", "index_name = \"langchain-demo\"\n", "\n", "# First, check if our index already exists. If it doesn't, we create it\n", "if index_name not in pinecone.list_indexes():\n", " # we create a new index\n", " pinecone.create_index(\n", " name=index_name,\n", " metric='cosine',\n", " dimension=1536 \n", ")\n", "# The OpenAI embedding model `text-embedding-ada-002 uses 1536 dimensions`\n", "docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)\n", "\n", "# if you already have an index, you can load it like this\n", "# docsearch = Pinecone.from_existing_index(index_name, embeddings)\n", "\n", "query = \"What did the president say about Ketanji Brown Jackson\"\n", "docs = docsearch.similarity_search(query)" ] }, { "cell_type": "code", "execution_count": null, "id": "9c608226", "metadata": {}, "outputs": [], "source": [ "print(docs[0].page_content)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "86a4b96b", "metadata": {}, "source": [ "### Adding More Text to an Existing Index\n", "\n", "More text can embedded and upserted to an existing Pinecone index using the `add_texts` function\n" ] }, { "cell_type": "code", "execution_count": null, "id": "38a7a60e", "metadata": {}, "outputs": [], "source": [ "index = pinecone.Index(\"langchain-demo\")\n", "vectorstore = Pinecone(index, embeddings.embed_query, \"text\")\n", "\n", "vectorstore.add_texts(\"More text!\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d46d1452", "metadata": {}, "source": [ "### Maximal Marginal Relevance Searches\n", "\n", "In addition to using similarity search in the retriever object, you can also use `mmr` as retriever.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a359ed74", "metadata": {}, "outputs": [], "source": [ "retriever = docsearch.as_retriever(search_type=\"mmr\")\n", "matched_docs = retriever.get_relevant_documents(query)\n", "for i, d in enumerate(matched_docs):\n", " print(f\"\\n## Document {i}\\n\")\n", " print(d.page_content)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7c477287", "metadata": {}, "source": [ "Or use `max_marginal_relevance_search` directly:" ] }, { "cell_type": "code", "execution_count": null, "id": "9ca82740", "metadata": {}, "outputs": [], "source": [ "found_docs = docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10)\n", "for i, doc in enumerate(found_docs):\n", " print(f\"{i + 1}.\", doc.page_content, \"\\n\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }