langchain/docs/modules/indexes/vectorstores/examples/typesense.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Typesense\n",
    "\n",
    "> [Typesense](https://typesense.org) is an open source, in-memory search engine, that you can either [self-host](https://typesense.org/docs/guide/install-typesense.html#option-2-local-machine-self-hosting) or run on [Typesense Cloud](https://cloud.typesense.org/).\n",
    ">\n",
    "> Typesense focuses on performance by storing the entire index in RAM (with a backup on disk) and also focuses on providing an out-of-the-box developer experience by simplifying available options and setting good defaults.\n",
    ">\n",
    "> It also lets you combine attribute-based filtering together with vector queries, to fetch the most relevant documents."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "This notebook shows you how to use Typesense as your VectorStore."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Let's first install our dependencies:"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "!pip install typesense openapi-schema-pydantic openai tiktoken"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "outputs": [],
   "source": [
    "import os\n",
    "import getpass\n",
    "\n",
    "os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-05-23T22:48:02.968822Z",
     "start_time": "2023-05-23T22:47:48.574094Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "outputs": [],
   "source": [
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "from langchain.text_splitter import CharacterTextSplitter\n",
    "from langchain.vectorstores import Typesense\n",
    "from langchain.document_loaders import TextLoader"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-05-23T22:50:34.775893Z",
     "start_time": "2023-05-23T22:50:34.771889Z"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Let's import our test dataset:"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "outputs": [],
   "source": [
    "loader = TextLoader('../../../state_of_the_union.txt')\n",
    "documents = loader.load()\n",
    "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
    "docs = text_splitter.split_documents(documents)\n",
    "\n",
    "embeddings = OpenAIEmbeddings()"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-05-23T22:56:19.093489Z",
     "start_time": "2023-05-23T22:56:19.089Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "docsearch = Typesense.from_documents(docs,\n",
    "                                     embeddings,\n",
    "                                     typesense_client_params={\n",
    "                                         'host': 'localhost',   # Use xxx.a1.typesense.net for Typesense Cloud\n",
    "                                         'port': '8108',        # Use 443 for Typesense Cloud\n",
    "                                         'protocol': 'http',    # Use https for Typesense Cloud\n",
    "                                         'typesense_api_key': 'xyz',\n",
    "                                         'typesense_collection_name': 'lang-chain'\n",
    "                                     })"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Similarity Search"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "found_docs = docsearch.similarity_search(query)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "print(found_docs[0].page_content)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Typesense as a Retriever\n",
    "\n",
    "Typesense, as all the other vector stores, is a LangChain Retriever, by using cosine similarity."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "retriever = docsearch.as_retriever()\n",
    "retriever"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "retriever.get_relevant_documents(query)[0]"
   ],
   "metadata": {
    "collapsed": false
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}