langchain/docs/extras/integrations/vectorstores/pinecone.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "683953b3",
   "metadata": {},
   "source": [
    "# Pinecone\n",
    "\n",
    ">[Pinecone](https://docs.pinecone.io/docs/overview) is a vector database with broad functionality.\n",
    "\n",
    "This notebook shows how to use functionality related to the `Pinecone` vector database.\n",
    "\n",
    "To use Pinecone, you must have an API key. \n",
    "Here are the [installation instructions](https://docs.pinecone.io/docs/quickstart)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b4c41cad-08ef-4f72-a545-2151e4598efe",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!pip install pinecone-client openai tiktoken langchain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c1e38361-c1fe-4ac6-86e9-c90ebaf7ae87",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import getpass\n",
    "\n",
    "os.environ[\"PINECONE_API_KEY\"] = getpass.getpass(\"Pinecone API Key:\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "02a536e0-d603-4d79-b18b-1ed562977b40",
   "metadata": {},
   "outputs": [],
   "source": [
    "os.environ[\"PINECONE_ENV\"] = getpass.getpass(\"Pinecone Environment:\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "320af802-9271-46ee-948f-d2453933d44b",
   "metadata": {},
   "source": [
    "We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ffea66e4-bc23-46a9-9580-b348dfe7b7a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aac9563e",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "from langchain.text_splitter import CharacterTextSplitter\n",
    "from langchain.vectorstores import Pinecone\n",
    "from langchain.document_loaders import TextLoader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a3c3999a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import TextLoader\n",
    "\n",
    "loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
    "documents = loader.load()\n",
    "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
    "docs = text_splitter.split_documents(documents)\n",
    "\n",
    "embeddings = OpenAIEmbeddings()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6e104aee",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pinecone\n",
    "\n",
    "# initialize pinecone\n",
    "pinecone.init(\n",
    "    api_key=os.getenv(\"PINECONE_API_KEY\"),  # find at app.pinecone.io\n",
    "    environment=os.getenv(\"PINECONE_ENV\"),  # next to api key in console\n",
    ")\n",
    "\n",
    "index_name = \"langchain-demo\"\n",
    "\n",
    "# First, check if our index already exists. If it doesn't, we create it\n",
    "if index_name not in pinecone.list_indexes():\n",
    "    # we create a new index\n",
    "    pinecone.create_index(\n",
    "      name=index_name,\n",
    "      metric='cosine',\n",
    "      dimension=1536  \n",
    ")\n",
    "# The OpenAI embedding model `text-embedding-ada-002 uses 1536 dimensions`\n",

    "docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)\n",
    "\n",
    "# if you already have an index, you can load it like this\n",
    "# docsearch = Pinecone.from_existing_index(index_name, embeddings)\n",
    "\n",
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "docs = docsearch.similarity_search(query)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9c608226",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(docs[0].page_content)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "86a4b96b",
   "metadata": {},
   "source": [
    "### Adding More Text to an Existing Index\n",
    "\n",
    "More text can embedded and upserted to an existing Pinecone index using the `add_texts` function\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38a7a60e",
   "metadata": {},
   "outputs": [],
   "source": [
    "index = pinecone.Index(\"langchain-demo\")\n",
    "vectorstore = Pinecone(index, embeddings.embed_query, \"text\")\n",
    "\n",
    "vectorstore.add_texts(\"More text!\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d46d1452",
   "metadata": {},
   "source": [
    "### Maximal Marginal Relevance Searches\n",
    "\n",
    "In addition to using similarity search in the retriever object, you can also use `mmr` as retriever.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a359ed74",
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = docsearch.as_retriever(search_type=\"mmr\")\n",
    "matched_docs = retriever.get_relevant_documents(query)\n",
    "for i, d in enumerate(matched_docs):\n",
    "    print(f\"\\n## Document {i}\\n\")\n",
    "    print(d.page_content)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7c477287",
   "metadata": {},
   "source": [
    "Or use `max_marginal_relevance_search` directly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9ca82740",
   "metadata": {},
   "outputs": [],
   "source": [
    "found_docs = docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10)\n",
    "for i, doc in enumerate(found_docs):\n",
    "    print(f\"{i + 1}.\", doc.page_content, \"\\n\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}