diff --git a/docs/ecosystem/chroma.md b/docs/ecosystem/chroma.md index 39ece51a..a8d46be5 100644 --- a/docs/ecosystem/chroma.md +++ b/docs/ecosystem/chroma.md @@ -17,4 +17,4 @@ To import this vectorstore: from langchain.vectorstores import Chroma ``` -For a more detailed walkthrough of the Chroma wrapper, see [this notebook](../modules/utils/combine_docs_examples/vectorstores.ipynb) +For a more detailed walkthrough of the Chroma wrapper, see [this notebook](../modules/indexes/examples/vectorstores.ipynb) diff --git a/docs/ecosystem/cohere.md b/docs/ecosystem/cohere.md index 682a44f1..fc696893 100644 --- a/docs/ecosystem/cohere.md +++ b/docs/ecosystem/cohere.md @@ -22,4 +22,4 @@ There exists an Cohere Embeddings wrapper, which you can access with ```python from langchain.embeddings import CohereEmbeddings ``` -For a more detailed walkthrough of this, see [this notebook](../modules/utils/combine_docs_examples/embeddings.ipynb) +For a more detailed walkthrough of this, see [this notebook](../modules/indexes/examples/embeddings.ipynb) diff --git a/docs/ecosystem/huggingface.md b/docs/ecosystem/huggingface.md index c1284014..dccceea3 100644 --- a/docs/ecosystem/huggingface.md +++ b/docs/ecosystem/huggingface.md @@ -47,7 +47,7 @@ To use a the wrapper for a model hosted on Hugging Face Hub: ```python from langchain.embeddings import HuggingFaceHubEmbeddings ``` -For a more detailed walkthrough of this, see [this notebook](../modules/utils/combine_docs_examples/embeddings.ipynb) +For a more detailed walkthrough of this, see [this notebook](../modules/indexes/examples/embeddings.ipynb) ### Tokenizer @@ -59,7 +59,7 @@ You can also use it to count tokens when splitting documents with from langchain.text_splitter import CharacterTextSplitter CharacterTextSplitter.from_huggingface_tokenizer(...) ``` -For a more detailed walkthrough of this, see [this notebook](../modules/utils/combine_docs_examples/textsplitter.ipynb) +For a more detailed walkthrough of this, see [this notebook](../modules/indexes/examples/textsplitter.ipynb) ### Datasets diff --git a/docs/ecosystem/openai.md b/docs/ecosystem/openai.md index 829dc9d3..2d4fa583 100644 --- a/docs/ecosystem/openai.md +++ b/docs/ecosystem/openai.md @@ -31,7 +31,7 @@ There exists an OpenAI Embeddings wrapper, which you can access with ```python from langchain.embeddings import OpenAIEmbeddings ``` -For a more detailed walkthrough of this, see [this notebook](../modules/utils/combine_docs_examples/embeddings.ipynb) +For a more detailed walkthrough of this, see [this notebook](../modules/indexes/examples/embeddings.ipynb) ### Tokenizer @@ -44,7 +44,7 @@ You can also use it to count tokens when splitting documents with from langchain.text_splitter import CharacterTextSplitter CharacterTextSplitter.from_tiktoken_encoder(...) ``` -For a more detailed walkthrough of this, see [this notebook](../modules/utils/combine_docs_examples/textsplitter.ipynb) +For a more detailed walkthrough of this, see [this notebook](../modules/indexes/examples/textsplitter.ipynb) ### Moderation You can also access the OpenAI content moderation endpoint with diff --git a/docs/ecosystem/pinecone.md b/docs/ecosystem/pinecone.md index e56940b3..8b9fc9d4 100644 --- a/docs/ecosystem/pinecone.md +++ b/docs/ecosystem/pinecone.md @@ -17,4 +17,4 @@ To import this vectorstore: from langchain.vectorstores import Pinecone ``` -For a more detailed walkthrough of the Pinecone wrapper, see [this notebook](../modules/utils/combine_docs_examples/vectorstores.ipynb) +For a more detailed walkthrough of the Pinecone wrapper, see [this notebook](../modules/indexes/examples/vectorstores.ipynb) diff --git a/docs/ecosystem/weaviate.md b/docs/ecosystem/weaviate.md index 9c5b163d..3fab349a 100644 --- a/docs/ecosystem/weaviate.md +++ b/docs/ecosystem/weaviate.md @@ -30,4 +30,4 @@ To import this vectorstore: from langchain.vectorstores import Weaviate ``` -For a more detailed walkthrough of the Weaviate wrapper, see [this notebook](../modules/utils/combine_docs_examples/vectorstores.ipynb) +For a more detailed walkthrough of the Weaviate wrapper, see [this notebook](../modules/indexes/examples/vectorstores.ipynb) diff --git a/docs/index.rst b/docs/index.rst index b1eef577..ccf2024f 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -42,7 +42,7 @@ Checkout the below guide for a walkthrough of how to get started using LangChain Modules ----------- -There are six main modules that LangChain provides support for. +There are several main modules that LangChain provides support for. For each module we provide some examples to get started, how-to guides, reference docs, and conceptual guides. These modules are, in increasing order of complexity: @@ -57,6 +57,8 @@ These modules are, in increasing order of complexity: - `Chains <./modules/chains.html>`_: Chains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications. +- `Indexes <./modules/indexes.html>`_: Language models are often more powerful when combined with your own text data - this module covers best practices for doing exactly that. + - `Agents <./modules/agents.html>`_: Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end to end agents. - `Memory <./modules/memory.html>`_: Memory is the concept of persisting state between calls of a chain/agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory. @@ -72,6 +74,7 @@ These modules are, in increasing order of complexity: ./modules/llms.md ./modules/document_loaders.md ./modules/utils.md + ./modules/indexes.md ./modules/chains.md ./modules/agents.md ./modules/memory.md diff --git a/docs/modules/chains/combine_docs_how_to.rst b/docs/modules/chains/combine_docs_how_to.rst deleted file mode 100644 index 2eef128c..00000000 --- a/docs/modules/chains/combine_docs_how_to.rst +++ /dev/null @@ -1,34 +0,0 @@ -CombineDocuments Chains ------------------------ - -A chain is made up of links, which can be either primitives or other chains. -Primitives can be either `prompts <../prompts.html>`_, `llms <../llms.html>`_, `utils <../utils.html>`_, or other chains. -The examples here are all end-to-end chains for working with documents. - -`Question Answering <./combine_docs_examples/question_answering.html>`_: A walkthrough of how to use LangChain for question answering over specific documents. - -`Question Answering with Sources <./combine_docs_examples/qa_with_sources.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over specific documents. - -`Summarization <./combine_docs_examples/summarize.html>`_: A walkthrough of how to use LangChain for summarization over specific documents. - -`Vector DB Text Generation <./combine_docs_examples/vector_db_text_generation.html>`_: A walkthrough of how to use LangChain for text generation over a vector database. - -`Vector DB Question Answering <./combine_docs_examples/vector_db_qa.html>`_: A walkthrough of how to use LangChain for question answering over a vector database. - -`Vector DB Question Answering with Sources <./combine_docs_examples/vector_db_qa_with_sources.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over a vector database. - -`Graph Question Answering <./combine_docs_examples/graph_qa.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over a graph database. - -`Chat Vector DB <./combine_docs_examples/chat_vector_db.html>`_: A walkthrough of how to use LangChain as a chatbot over a vector database. - -`Analyze Document <./combine_docs_examples/analyze_document.html>`_: A walkthrough of how to use LangChain to analyze long documents. - - -.. toctree:: - :maxdepth: 1 - :glob: - :caption: CombineDocument Chains - :name: combine_docs - :hidden: - - ./combine_docs_examples/* diff --git a/docs/modules/chains/how_to_guides.rst b/docs/modules/chains/how_to_guides.rst index 44815539..6fad1fcc 100644 --- a/docs/modules/chains/how_to_guides.rst +++ b/docs/modules/chains/how_to_guides.rst @@ -4,12 +4,11 @@ How-To Guides A chain is made up of links, which can be either primitives or other chains. Primitives can be either `prompts <../prompts.html>`_, `llms <../llms.html>`_, `utils <../utils.html>`_, or other chains. The examples here are all end-to-end chains for specific applications. -They are broken up into four categories: +They are broken up into three categories: 1. `Generic Chains <./generic_how_to.html>`_: Generic chains, that are meant to help build other chains rather than serve a particular purpose. -2. `CombineDocuments Chains <./combine_docs_how_to.html>`_: Chains aimed at making it easy to work with documents (question answering, summarization, etc). -3. `Utility Chains <./utility_how_to.html>`_: Chains consisting of an LLMChain interacting with a specific util. -4. `Asynchronous <./async_chain.html>`_: Covering asynchronous functionality. +2. `Utility Chains <./utility_how_to.html>`_: Chains consisting of an LLMChain interacting with a specific util. +3. `Asynchronous <./async_chain.html>`_: Covering asynchronous functionality. .. toctree:: :maxdepth: 1 @@ -17,7 +16,6 @@ They are broken up into four categories: :hidden: ./generic_how_to.rst - ./combine_docs_how_to.rst ./utility_how_to.rst ./async_chain.ipynb diff --git a/docs/modules/chains/key_concepts.md b/docs/modules/chains/key_concepts.md index 1cb04026..cd97a734 100644 --- a/docs/modules/chains/key_concepts.md +++ b/docs/modules/chains/key_concepts.md @@ -9,13 +9,3 @@ This is a specific type of chain where multiple other chains are run in sequence to the next. A subtype of this type of chain is the `SimpleSequentialChain`, where all subchains have only one input and one output, and the output of one is therefore used as sole input to the next chain. -## CombineDocuments Chains -These are a subset of chains designed to work with documents. There are two pieces to consider: - -1. The underlying chain method (eg, how the documents are combined) -2. Use cases for these types of chains. - -For the first, please see [this documentation](combine_docs.md) for more detailed information on the types of chains LangChain supports. -For the second, please see the Use Cases section for more information on [question answering](/use_cases/question_answering.md), -[question answering with sources](/use_cases/qa_with_sources.md), and [summarization](/use_cases/summarization.md). - diff --git a/docs/modules/indexes.rst b/docs/modules/indexes.rst new file mode 100644 index 00000000..eb44c8e8 --- /dev/null +++ b/docs/modules/indexes.rst @@ -0,0 +1,25 @@ +Indexes +========================== + +Indexes refer to ways to structure documents so that LLMs can best interact with them. +This module contains utility functions for working with documents, different types of indexes, and then examples for using those indexes in chains. +LangChain provides common indices for working with data (most prominently support for vector databases). +For more complicated index structures, it is worth checking out `GPTIndex `_. + +The following sections of documentation are provided: + +- `Getting Started <./indexes/getting_started.html>`_: An overview of all the functionality LangChain provides for working with indexes. + +- `Key Concepts <./indexes/key_concepts.html>`_: A conceptual guide going over the various concepts related to indexes and the tools needed to create them. + +- `How-To Guides <./indexes/how_to_guides.html>`_: A collection of how-to guides. These highlight how to use all the relevant tools, the different types of vector databases, and how to use indexes in chains. + + +.. toctree:: + :maxdepth: 1 + :name: LLMs + :hidden: + + ./indexes/getting_started.ipynb + ./indexes/key_concepts.md + ./indexes/how_to_guides.rst diff --git a/docs/modules/chains/combine_docs_examples/analyze_document.ipynb b/docs/modules/indexes/chain_examples/analyze_document.ipynb similarity index 100% rename from docs/modules/chains/combine_docs_examples/analyze_document.ipynb rename to docs/modules/indexes/chain_examples/analyze_document.ipynb diff --git a/docs/modules/chains/combine_docs_examples/chat_vector_db.ipynb b/docs/modules/indexes/chain_examples/chat_vector_db.ipynb similarity index 99% rename from docs/modules/chains/combine_docs_examples/chat_vector_db.ipynb rename to docs/modules/indexes/chain_examples/chat_vector_db.ipynb index cd466243..9f2158eb 100644 --- a/docs/modules/chains/combine_docs_examples/chat_vector_db.ipynb +++ b/docs/modules/indexes/chain_examples/chat_vector_db.ipynb @@ -506,7 +506,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.9" + "version": "3.9.1" } }, "nbformat": 4, diff --git a/docs/modules/chains/combine_docs_examples/graph_qa.ipynb b/docs/modules/indexes/chain_examples/graph_qa.ipynb similarity index 100% rename from docs/modules/chains/combine_docs_examples/graph_qa.ipynb rename to docs/modules/indexes/chain_examples/graph_qa.ipynb diff --git a/docs/modules/chains/combine_docs_examples/qa_with_sources.ipynb b/docs/modules/indexes/chain_examples/qa_with_sources.ipynb similarity index 100% rename from docs/modules/chains/combine_docs_examples/qa_with_sources.ipynb rename to docs/modules/indexes/chain_examples/qa_with_sources.ipynb diff --git a/docs/modules/chains/combine_docs_examples/question_answering.ipynb b/docs/modules/indexes/chain_examples/question_answering.ipynb similarity index 100% rename from docs/modules/chains/combine_docs_examples/question_answering.ipynb rename to docs/modules/indexes/chain_examples/question_answering.ipynb diff --git a/docs/modules/chains/combine_docs_examples/summarize.ipynb b/docs/modules/indexes/chain_examples/summarize.ipynb similarity index 100% rename from docs/modules/chains/combine_docs_examples/summarize.ipynb rename to docs/modules/indexes/chain_examples/summarize.ipynb diff --git a/docs/modules/chains/combine_docs_examples/vector_db_qa.ipynb b/docs/modules/indexes/chain_examples/vector_db_qa.ipynb similarity index 100% rename from docs/modules/chains/combine_docs_examples/vector_db_qa.ipynb rename to docs/modules/indexes/chain_examples/vector_db_qa.ipynb diff --git a/docs/modules/chains/combine_docs_examples/vector_db_qa_with_sources.ipynb b/docs/modules/indexes/chain_examples/vector_db_qa_with_sources.ipynb similarity index 100% rename from docs/modules/chains/combine_docs_examples/vector_db_qa_with_sources.ipynb rename to docs/modules/indexes/chain_examples/vector_db_qa_with_sources.ipynb diff --git a/docs/modules/chains/combine_docs_examples/vector_db_text_generation.ipynb b/docs/modules/indexes/chain_examples/vector_db_text_generation.ipynb similarity index 100% rename from docs/modules/chains/combine_docs_examples/vector_db_text_generation.ipynb rename to docs/modules/indexes/chain_examples/vector_db_text_generation.ipynb diff --git a/docs/modules/chains/combine_docs.md b/docs/modules/indexes/combine_docs.md similarity index 100% rename from docs/modules/chains/combine_docs.md rename to docs/modules/indexes/combine_docs.md diff --git a/docs/modules/utils/combine_docs_examples/embeddings.ipynb b/docs/modules/indexes/examples/embeddings.ipynb similarity index 100% rename from docs/modules/utils/combine_docs_examples/embeddings.ipynb rename to docs/modules/indexes/examples/embeddings.ipynb diff --git a/docs/modules/utils/combine_docs_examples/hyde.ipynb b/docs/modules/indexes/examples/hyde.ipynb similarity index 100% rename from docs/modules/utils/combine_docs_examples/hyde.ipynb rename to docs/modules/indexes/examples/hyde.ipynb diff --git a/docs/modules/utils/combine_docs_examples/textsplitter.ipynb b/docs/modules/indexes/examples/textsplitter.ipynb similarity index 100% rename from docs/modules/utils/combine_docs_examples/textsplitter.ipynb rename to docs/modules/indexes/examples/textsplitter.ipynb diff --git a/docs/modules/indexes/examples/vectorstores.ipynb b/docs/modules/indexes/examples/vectorstores.ipynb new file mode 100644 index 00000000..8f1191ba --- /dev/null +++ b/docs/modules/indexes/examples/vectorstores.ipynb @@ -0,0 +1,273 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7ef4d402-6662-4a26-b612-35b542066487", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "# VectorStores\n", + "\n", + "This notebook showcases basic functionality related to VectorStores. A key part of working with vectorstores is creating the vector to put in them, which is usually created via embeddings. Therefor, it is recommended that you familiarize yourself with the [embedding notebook](embeddings.ipynb) before diving into this.\n", + "\n", + "This covers generic high level functionality related to all vector stores. For guides on specific vectorstores, please see the how-to guides [here](../how_to_guides.rst)" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "965eecee", + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "from langchain.embeddings.openai import OpenAIEmbeddings\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain.vectorstores import Chroma" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "68481687", + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "with open('../../state_of_the_union.txt') as f:\n", + " state_of_the_union = f.read()\n", + "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", + "texts = text_splitter.split_text(state_of_the_union)\n", + "\n", + "embeddings = OpenAIEmbeddings()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "015f4ff5", + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Running Chroma using direct local API.\n", + "Using DuckDB in-memory for database. Data will be transient.\n" + ] + } + ], + "source": [ + "docsearch = Chroma.from_texts(texts, embeddings)\n", + "\n", + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "docs = docsearch.similarity_search(query)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "67baf32e", + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n", + "\n", + "We cannot let this happen. \n", + "\n", + "Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n", + "\n", + "Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", + "\n", + "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", + "\n", + "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n" + ] + } + ], + "source": [ + "print(docs[0].page_content)" + ] + }, + { + "cell_type": "markdown", + "id": "fb6baaf8", + "metadata": {}, + "source": [ + "## Add texts\n", + "You can easily add text to a vectorstore with the `add_texts` method. It will return a list of document IDs (in case you need to use them downstream)." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "70758e4f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['a05e3d0c-ab40-11ed-a853-e65801318981']" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "docsearch.add_texts([\"Ankush went to Princeton\"])" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "4edeb88f", + "metadata": {}, + "outputs": [], + "source": [ + "query = \"Where did Ankush go to college?\"\n", + "docs = docsearch.similarity_search(query)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "1cba64a2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Document(page_content='Ankush went to Princeton', lookup_str='', metadata={}, lookup_index=0)" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "docs[0]" + ] + }, + { + "cell_type": "markdown", + "id": "bbf5ec44", + "metadata": {}, + "source": [ + "## From Documents\n", + "We can also initialize a vectorstore from documents directly. This is useful when we use the method on the text splitter to get documents directly (handy when the original documents have associated metadata)." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "df4a459c", + "metadata": {}, + "outputs": [], + "source": [ + "documents = text_splitter.create_documents([state_of_the_union], metadatas=[{\"source\": \"State of the Union\"}])" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "4b480245", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Running Chroma using direct local API.\n", + "Using DuckDB in-memory for database. Data will be transient.\n" + ] + } + ], + "source": [ + "docsearch = Chroma.from_documents(documents, embeddings)\n", + "\n", + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "docs = docsearch.similarity_search(query)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "86aa4cda", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n", + "\n", + "We cannot let this happen. \n", + "\n", + "Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n", + "\n", + "Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", + "\n", + "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", + "\n", + "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n" + ] + } + ], + "source": [ + "print(docs[0].page_content)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4af5a071", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.1" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/modules/indexes/getting_started.ipynb b/docs/modules/indexes/getting_started.ipynb new file mode 100644 index 00000000..cbe047a1 --- /dev/null +++ b/docs/modules/indexes/getting_started.ipynb @@ -0,0 +1,186 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "07c1e3b9", + "metadata": {}, + "source": [ + "# Getting Started\n", + "\n", + "This example showcases question answering over a vector database.\n", + "We have chosen this as the example for getting started because it nicely combines a lot of different elements (Text splitters, embeddings, vectorstores) and then also shows how to use them in a chain." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "82525493", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.embeddings.openai import OpenAIEmbeddings\n", + "from langchain.vectorstores import Chroma\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain import OpenAI, VectorDBQA" + ] + }, + { + "cell_type": "markdown", + "id": "0b7adc54", + "metadata": {}, + "source": [ + "Here we load in the documents we want to use to create our index." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "611e0c19", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.document_loaders import TextLoader\n", + "loader = TextLoader('../state_of_the_union.txt')\n", + "documents = loader.load()" + ] + }, + { + "cell_type": "markdown", + "id": "9fdc0fc2", + "metadata": {}, + "source": [ + "Next, we will split the documents into chunks." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "afecb8cf", + "metadata": {}, + "outputs": [], + "source": [ + "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", + "texts = text_splitter.split_documents(documents)" + ] + }, + { + "cell_type": "markdown", + "id": "4bebc041", + "metadata": {}, + "source": [ + "We will then select which embeddings we want to use." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "9eaaa735", + "metadata": {}, + "outputs": [], + "source": [ + "embeddings = OpenAIEmbeddings()" + ] + }, + { + "cell_type": "markdown", + "id": "24612905", + "metadata": {}, + "source": [ + "We now create the vectorstore to use as the index." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "5c7049db", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Running Chroma using direct local API.\n", + "Using DuckDB in-memory for database. Data will be transient.\n" + ] + } + ], + "source": [ + "db = Chroma.from_documents(texts, embeddings)" + ] + }, + { + "cell_type": "markdown", + "id": "30c4e5c6", + "metadata": {}, + "source": [ + "Finally, we create a chain and use it to answer questions!" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "3018f865", + "metadata": {}, + "outputs": [], + "source": [ + "qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type=\"stuff\", vectorstore=db)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "032a47f8", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\" The President said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\"" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "qa.run(query)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8b403637", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.1" + }, + "vscode": { + "interpreter": { + "hash": "b1677b440931f40d89ef8be7bf03acb108ce003de0ac9b18e8d43753ea2e7103" + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/modules/indexes/how_to_guides.rst b/docs/modules/indexes/how_to_guides.rst new file mode 100644 index 00000000..a68c991a --- /dev/null +++ b/docs/modules/indexes/how_to_guides.rst @@ -0,0 +1,93 @@ +How To Guides +==================================== + + +Utils +----- + +There are a lot of different utilities that LangChain provides integrations for +These guides go over how to use them. +The utilities here are all utilities that make it easier to work with documents. + +`Text Splitters <./examples/textsplitter.html>`_: A walkthrough of how to split large documents up into smaller, more manageable pieces of text. + +`VectorStores <./examples/vectorstores.html>`_: A walkthrough of the vectorstore abstraction that LangChain supports. + +`Embeddings <./examples/embeddings.html>`_: A walkthrough of embedding functionalities, and different types of embeddings, that LangChain supports. + +`HyDE <./examples/hyde.html>`_: How to use Hypothetical Document Embeddings, a novel way of constructing embeddings for document retrieval systems. + +.. toctree:: + :maxdepth: 1 + :glob: + :caption: Utils + :name: utils + :hidden: + + examples/* + + +Vectorstores +------------ + + +Vectorstores are one of the most important components of building indexes. +In the below guides, we cover different types of vectorstores and how to use them. + +`Chroma <./vectorstore_examples/chroma.html>`_: A walkthrough of how to use the Chroma vectorstore wrapper. + +`FAISS <./vectorstore_examples/faiss.html>`_: A walkthrough of how to use the FAISS vectorstore wrapper. + +`Elastic Search <./vectorstore_examples/elasticsearch.html>`_: A walkthrough of how to use the ElasticSearch wrapper. + +`Milvus <./vectorstore_examples/milvus.html>`_: A walkthrough of how to use the Milvus vectorstore wrapper. + +`Pinecone <./vectorstore_examples/pinecone.html>`_: A walkthrough of how to use the Pinecone vectorstore wrapper. + +`Qdrant <./vectorstore_examples/qdrant.html>`_: A walkthrough of how to use the Qdrant vectorstore wrapper. + +`Weaviate <./vectorstore_examples/weaviate.html>`_: A walkthrough of how to use the Weaviate vectorstore wrapper. + + +.. toctree:: + :maxdepth: 1 + :glob: + :caption: Vectorstores + :name: vectorstores + :hidden: + + vectorstore_examples/* + + +Chains +------ + +The examples here are all end-to-end chains that use indexes or utils covered above. + +`Question Answering <./chain_examples/question_answering.html>`_: A walkthrough of how to use LangChain for question answering over specific documents. + +`Question Answering with Sources <./chain_examples/qa_with_sources.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over specific documents. + +`Summarization <./chain_examples/summarize.html>`_: A walkthrough of how to use LangChain for summarization over specific documents. + +`Vector DB Text Generation <./chain_examples/vector_db_text_generation.html>`_: A walkthrough of how to use LangChain for text generation over a vector database. + +`Vector DB Question Answering <./chain_examples/vector_db_qa.html>`_: A walkthrough of how to use LangChain for question answering over a vector database. + +`Vector DB Question Answering with Sources <./chain_examples/vector_db_qa_with_sources.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over a vector database. + +`Graph Question Answering <./chain_examples/graph_qa.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over a graph database. + +`Chat Vector DB <./chain_examples/chat_vector_db.html>`_: A walkthrough of how to use LangChain as a chatbot over a vector database. + +`Analyze Document <./chain_examples/analyze_document.html>`_: A walkthrough of how to use LangChain to analyze long documents. + + +.. toctree:: + :maxdepth: 1 + :glob: + :caption: With Chains + :name: chains + :hidden: + + ./chain_examples/* \ No newline at end of file diff --git a/docs/modules/indexes/key_concepts.md b/docs/modules/indexes/key_concepts.md new file mode 100644 index 00000000..8e20ee10 --- /dev/null +++ b/docs/modules/indexes/key_concepts.md @@ -0,0 +1,27 @@ +# Key Concepts + +## Text Splitter +This class is responsible for splitting long pieces of text into smaller components. +It contains different ways for splitting text (on characters, using Spacy, etc) +as well as different ways for measuring length (token based, character based, etc). + +## Embeddings +These classes are very similar to the LLM classes in that they are wrappers around models, +but rather than return a string they return an embedding (list of floats). These are particularly useful when +implementing semantic search functionality. They expose separate methods for embedding queries versus embedding documents. + +## Vectorstores +These are datastores that store embeddings of documents in vector form. +They expose a method for passing in a string and finding similar documents. + + +## CombineDocuments Chains +These are a subset of chains designed to work with documents. There are two pieces to consider: + +1. The underlying chain method (eg, how the documents are combined) +2. Use cases for these types of chains. + +For the first, please see [this documentation](combine_docs.md) for more detailed information on the types of chains LangChain supports. +For the second, please see the Use Cases section for more information on [question answering](/use_cases/question_answering.md), +[question answering with sources](/use_cases/qa_with_sources.md), and [summarization](/use_cases/summarization.md). + diff --git a/docs/modules/indexes/vectorstore_examples/chroma.ipynb b/docs/modules/indexes/vectorstore_examples/chroma.ipynb new file mode 100644 index 00000000..3a793a4a --- /dev/null +++ b/docs/modules/indexes/vectorstore_examples/chroma.ipynb @@ -0,0 +1,122 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "683953b3", + "metadata": {}, + "source": [ + "# Chroma\n", + "\n", + "This notebook shows how to use functionality related to the Chroma vector database." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "aac9563e", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.embeddings.openai import OpenAIEmbeddings\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain.vectorstores import Chroma\n", + "from langchain.document_loaders import TextLoader" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "a3c3999a", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.document_loaders import TextLoader\n", + "loader = TextLoader('../../state_of_the_union.txt')\n", + "documents = loader.load()\n", + "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", + "docs = text_splitter.split_documents(documents)\n", + "\n", + "embeddings = OpenAIEmbeddings()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "5eabdb75", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Running Chroma using direct local API.\n", + "Using DuckDB in-memory for database. Data will be transient.\n" + ] + } + ], + "source": [ + "db = Chroma.from_documents(docs, embeddings)\n", + "\n", + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "docs = db.similarity_search(query)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "4b172de8", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n", + "\n", + "We cannot let this happen. \n", + "\n", + "Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n", + "\n", + "Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", + "\n", + "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", + "\n", + "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n" + ] + } + ], + "source": [ + "print(docs[0].page_content)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a359ed74", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.1" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/modules/indexes/vectorstore_examples/elasticsearch.ipynb b/docs/modules/indexes/vectorstore_examples/elasticsearch.ipynb new file mode 100644 index 00000000..d60f0500 --- /dev/null +++ b/docs/modules/indexes/vectorstore_examples/elasticsearch.ipynb @@ -0,0 +1,113 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "683953b3", + "metadata": {}, + "source": [ + "# ElasticSearch\n", + "\n", + "This notebook shows how to use functionality related to the ElasticSearch database." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "aac9563e", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.embeddings.openai import OpenAIEmbeddings\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain.vectorstores import ElasticVectorSearch\n", + "from langchain.document_loaders import TextLoader" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "a3c3999a", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.document_loaders import TextLoader\n", + "loader = TextLoader('../../state_of_the_union.txt')\n", + "documents = loader.load()\n", + "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", + "docs = text_splitter.split_documents(documents)\n", + "\n", + "embeddings = OpenAIEmbeddings()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12eb86d8", + "metadata": {}, + "outputs": [], + "source": [ + "db = ElasticVectorSearch.from_documents(docs, embeddings, elasticsearch_url=\"http://localhost:9200\"\n", + "\n", + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "docs = db.similarity_search(query)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "4b172de8", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n", + "\n", + "We cannot let this happen. \n", + "\n", + "Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n", + "\n", + "Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", + "\n", + "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", + "\n", + "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n" + ] + } + ], + "source": [ + "print(docs[0].page_content)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a359ed74", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.1" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/modules/indexes/vectorstore_examples/faiss.ipynb b/docs/modules/indexes/vectorstore_examples/faiss.ipynb new file mode 100644 index 00000000..cce8cdba --- /dev/null +++ b/docs/modules/indexes/vectorstore_examples/faiss.ipynb @@ -0,0 +1,233 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "683953b3", + "metadata": {}, + "source": [ + "# FAISS\n", + "\n", + "This notebook shows how to use functionality related to the FAISS vector database." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "aac9563e", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.embeddings.openai import OpenAIEmbeddings\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain.vectorstores import FAISS\n", + "from langchain.document_loaders import TextLoader" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "a3c3999a", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.document_loaders import TextLoader\n", + "loader = TextLoader('../../state_of_the_union.txt')\n", + "documents = loader.load()\n", + "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", + "docs = text_splitter.split_documents(documents)\n", + "\n", + "embeddings = OpenAIEmbeddings()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "5eabdb75", + "metadata": {}, + "outputs": [], + "source": [ + "db = FAISS.from_documents(docs, embeddings)\n", + "\n", + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "docs = db.similarity_search(query)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "4b172de8", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n", + "\n", + "We cannot let this happen. \n", + "\n", + "Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n", + "\n", + "Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", + "\n", + "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", + "\n", + "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n" + ] + } + ], + "source": [ + "print(docs[0].page_content)" + ] + }, + { + "cell_type": "markdown", + "id": "f13473b5", + "metadata": {}, + "source": [ + "## Similarity Search with score\n", + "There are some FAISS specific methods. One of them is `similarity_search_with_score`, which allows you to return not only the documents but also the similarity score of the query to them." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "186ee1d8", + "metadata": {}, + "outputs": [], + "source": [ + "docs_and_scores = db.similarity_search_with_score(query)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "284e04b5", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0),\n", + " 0.3914415)" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "docs_and_scores[0]" + ] + }, + { + "cell_type": "markdown", + "id": "f34420cf", + "metadata": {}, + "source": [ + "It is also possible to do a search for documents similar to a given embedding vector using `similarity_search_by_vector` which accepts an embedding vector as a parameter instead of a string." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "b558ebb7", + "metadata": {}, + "outputs": [], + "source": [ + "embedding_vector = embeddings.embed_query(query)\n", + "docs_and_scores = db.similarity_search_by_vector(embedding_vector)" + ] + }, + { + "cell_type": "markdown", + "id": "31bda7fd", + "metadata": {}, + "source": [ + "## Saving and loading\n", + "You can also save and load a FAISS index. This is useful so you don't have to recreate it everytime you use it." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "428a6816", + "metadata": {}, + "outputs": [], + "source": [ + "db.save_local(\"faiss_index\")" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "56d1841c", + "metadata": {}, + "outputs": [], + "source": [ + "new_db = FAISS.load_local(\"faiss_index\", embeddings)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "39055525", + "metadata": {}, + "outputs": [], + "source": [ + "docs = new_db.similarity_search(query)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "98378c4e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0)" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "docs[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bc8b71f7", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.1" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/modules/indexes/vectorstore_examples/milvus.ipynb b/docs/modules/indexes/vectorstore_examples/milvus.ipynb new file mode 100644 index 00000000..d1f3110d --- /dev/null +++ b/docs/modules/indexes/vectorstore_examples/milvus.ipynb @@ -0,0 +1,108 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "683953b3", + "metadata": {}, + "source": [ + "# Milvus\n", + "\n", + "This notebook shows how to use functionality related to the Milvus vector database.\n", + "\n", + "To run, you should have a Milvus instance up and running: https://milvus.io/docs/install_standalone-docker.md" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "aac9563e", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.embeddings.openai import OpenAIEmbeddings\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain.vectorstores import Milvus\n", + "from langchain.document_loaders import TextLoader" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "a3c3999a", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.document_loaders import TextLoader\n", + "loader = TextLoader('../../state_of_the_union.txt')\n", + "documents = loader.load()\n", + "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", + "docs = text_splitter.split_documents(documents)\n", + "\n", + "embeddings = OpenAIEmbeddings()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dcf88bdf", + "metadata": {}, + "outputs": [], + "source": [ + "vector_db = Milvus.from_documents(\n", + " docs,\n", + " embeddings,\n", + " connection_args={\"host\": \"127.0.0.1\", \"port\": \"19530\"},\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a8c513ab", + "metadata": {}, + "outputs": [], + "source": [ + "docs = vector_db.similarity_search(query)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fc516993", + "metadata": {}, + "outputs": [], + "source": [ + "docs[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a359ed74", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.1" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/modules/indexes/vectorstore_examples/pinecone.ipynb b/docs/modules/indexes/vectorstore_examples/pinecone.ipynb new file mode 100644 index 00000000..95c4e029 --- /dev/null +++ b/docs/modules/indexes/vectorstore_examples/pinecone.ipynb @@ -0,0 +1,105 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "683953b3", + "metadata": {}, + "source": [ + "# Pinecone\n", + "\n", + "This notebook shows how to use functionality related to the Pinecone vector database." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "aac9563e", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.embeddings.openai import OpenAIEmbeddings\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain.vectorstores import Pinecone\n", + "from langchain.document_loaders import TextLoader" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "a3c3999a", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.document_loaders import TextLoader\n", + "loader = TextLoader('../../state_of_the_union.txt')\n", + "documents = loader.load()\n", + "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", + "docs = text_splitter.split_documents(documents)\n", + "\n", + "embeddings = OpenAIEmbeddings()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6e104aee", + "metadata": {}, + "outputs": [], + "source": [ + "import pinecone \n", + "\n", + "# initialize pinecone\n", + "pinecone.init(\n", + " api_key=\"YOUR_API_KEY\", # find at app.pinecone.io\n", + " environment=\"YOUR_ENV\" # next to api key in console\n", + ")\n", + "\n", + "index_name = \"langchain-demo\"\n", + "\n", + "docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)\n", + "\n", + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "docs = docsearch.similarity_search(query)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9c608226", + "metadata": {}, + "outputs": [], + "source": [ + "print(docs[0].page_content)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a359ed74", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.1" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/modules/indexes/vectorstore_examples/qdrant.ipynb b/docs/modules/indexes/vectorstore_examples/qdrant.ipynb new file mode 100644 index 00000000..250a88b8 --- /dev/null +++ b/docs/modules/indexes/vectorstore_examples/qdrant.ipynb @@ -0,0 +1,105 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "683953b3", + "metadata": {}, + "source": [ + "# Qdrant\n", + "\n", + "This notebook shows how to use functionality related to the Qdrant vector database." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "aac9563e", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.embeddings.openai import OpenAIEmbeddings\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain.vectorstores import Qdrant\n", + "from langchain.document_loaders import TextLoader" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "a3c3999a", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.document_loaders import TextLoader\n", + "loader = TextLoader('../../state_of_the_union.txt')\n", + "documents = loader.load()\n", + "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", + "docs = text_splitter.split_documents(documents)\n", + "\n", + "embeddings = OpenAIEmbeddings()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dcf88bdf", + "metadata": {}, + "outputs": [], + "source": [ + "host = \"<---host name here --->\"\n", + "api_key = \"<---api key here--->\"\n", + "qdrant = Qdrant.from_documents(docs, embeddings, host=host, prefer_grpc=True, api_key=api_key)\n", + "query = \"What did the president say about Ketanji Brown Jackson\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a8c513ab", + "metadata": {}, + "outputs": [], + "source": [ + "docs = qdrant.similarity_search(query)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fc516993", + "metadata": {}, + "outputs": [], + "source": [ + "docs[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a359ed74", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.1" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/modules/indexes/vectorstore_examples/weaviate.ipynb b/docs/modules/indexes/vectorstore_examples/weaviate.ipynb new file mode 100644 index 00000000..9ae4abe8 --- /dev/null +++ b/docs/modules/indexes/vectorstore_examples/weaviate.ipynb @@ -0,0 +1,163 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "683953b3", + "metadata": {}, + "source": [ + "# Weaviate\n", + "\n", + "This notebook shows how to use functionality related to the Weaviate vector database." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "aac9563e", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.embeddings.openai import OpenAIEmbeddings\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain.vectorstores import Weaviate\n", + "from langchain.document_loaders import TextLoader" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "a3c3999a", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.document_loaders import TextLoader\n", + "loader = TextLoader('../../state_of_the_union.txt')\n", + "documents = loader.load()\n", + "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", + "docs = text_splitter.split_documents(documents)\n", + "\n", + "embeddings = OpenAIEmbeddings()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5888dcc7", + "metadata": {}, + "outputs": [], + "source": [ + "import weaviate\n", + "import os\n", + "\n", + "WEAVIATE_URL = \"\"\n", + "client = weaviate.Client(\n", + " url=WEAVIATE_URL,\n", + " additional_headers={\n", + " 'X-OpenAI-Api-Key': os.environ[\"OPENAI_API_KEY\"]\n", + " }\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f004e8ee", + "metadata": {}, + "outputs": [], + "source": [ + "client.schema.delete_all()\n", + "client.schema.get()\n", + "schema = {\n", + " \"classes\": [\n", + " {\n", + " \"class\": \"Paragraph\",\n", + " \"description\": \"A written paragraph\",\n", + " \"vectorizer\": \"text2vec-openai\",\n", + " \"moduleConfig\": {\n", + " \"text2vec-openai\": {\n", + " \"model\": \"babbage\",\n", + " \"type\": \"text\"\n", + " }\n", + " },\n", + " \"properties\": [\n", + " {\n", + " \"dataType\": [\"text\"],\n", + " \"description\": \"The content of the paragraph\",\n", + " \"moduleConfig\": {\n", + " \"text2vec-openai\": {\n", + " \"skip\": False,\n", + " \"vectorizePropertyName\": False\n", + " }\n", + " },\n", + " \"name\": \"content\",\n", + " },\n", + " ],\n", + " },\n", + " ]\n", + "}\n", + "\n", + "client.schema.create(schema)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ef6d5d04", + "metadata": {}, + "outputs": [], + "source": [ + "vectorstore = Weaviate(client, \"Paragraph\", \"content\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "06e8c1ed", + "metadata": {}, + "outputs": [], + "source": [ + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "docs = vectorstore.similarity_search(query)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "38b86be6", + "metadata": {}, + "outputs": [], + "source": [ + "print(docs[0].page_content)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a359ed74", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.1" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/modules/utils/combine_docs_examples/vectorstores.ipynb b/docs/modules/utils/combine_docs_examples/vectorstores.ipynb deleted file mode 100644 index 04d8073e..00000000 --- a/docs/modules/utils/combine_docs_examples/vectorstores.ipynb +++ /dev/null @@ -1,772 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "7ef4d402-6662-4a26-b612-35b542066487", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, - "source": [ - "# VectorStores\n", - "\n", - "This notebook show cases how to use VectorStores. A key part of working with vectorstores is creating the vector to put in them, which is usually created via embeddings. Therefor, it is recommended that you familiarize yourself with the [embedding notebook](embeddings.ipynb) before diving into this." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "965eecee", - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "from langchain.embeddings.openai import OpenAIEmbeddings\n", - "from langchain.text_splitter import CharacterTextSplitter\n", - "from langchain.vectorstores import ElasticVectorSearch, Pinecone, Weaviate, FAISS, Qdrant, Chroma" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "68481687", - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "with open('../../state_of_the_union.txt') as f:\n", - " state_of_the_union = f.read()\n", - "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", - "texts = text_splitter.split_text(state_of_the_union)\n", - "\n", - "embeddings = OpenAIEmbeddings()" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "id": "015f4ff5", - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Running Chroma using direct local API.\n", - "Using DuckDB in-memory for database. Data will be transient.\n" - ] - } - ], - "source": [ - "docsearch = Chroma.from_texts(texts, embeddings)\n", - "\n", - "query = \"What did the president say about Ketanji Brown Jackson\"\n", - "docs = docsearch.similarity_search(query)" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "67baf32e", - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n", - "\n", - "We cannot let this happen. \n", - "\n", - "Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n", - "\n", - "Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", - "\n", - "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", - "\n", - "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n" - ] - } - ], - "source": [ - "print(docs[0].page_content)" - ] - }, - { - "cell_type": "markdown", - "id": "fb6baaf8", - "metadata": {}, - "source": [ - "## Add texts\n", - "You can easily add text to a vectorstore with the `add_texts` method. It will return a list of document IDs (in case you need to use them downstream)." - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "70758e4f", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['a05e3d0c-ab40-11ed-a853-e65801318981']" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "docsearch.add_texts([\"Ankush went to Princeton\"])" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "4edeb88f", - "metadata": {}, - "outputs": [], - "source": [ - "query = \"Where did Ankush go to college?\"\n", - "docs = docsearch.similarity_search(query)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "1cba64a2", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Document(page_content='Ankush went to Princeton', lookup_str='', metadata={}, lookup_index=0)" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "docs[0]" - ] - }, - { - "cell_type": "markdown", - "id": "bbf5ec44", - "metadata": {}, - "source": [ - "## From Documents\n", - "We can also initialize a vectorstore from documents directly. This is useful when we use the method on the text splitter to get documents directly (handy when the original documents have associated metadata)." - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "df4a459c", - "metadata": {}, - "outputs": [], - "source": [ - "documents = text_splitter.create_documents([state_of_the_union], metadatas=[{\"source\": \"State of the Union\"}])" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "4b480245", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Running Chroma using direct local API.\n", - "Using DuckDB in-memory for database. Data will be transient.\n" - ] - } - ], - "source": [ - "docsearch = Chroma.from_documents(documents, embeddings)\n", - "\n", - "query = \"What did the president say about Ketanji Brown Jackson\"\n", - "docs = docsearch.similarity_search(query)" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "id": "86aa4cda", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n", - "\n", - "We cannot let this happen. \n", - "\n", - "Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n", - "\n", - "Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", - "\n", - "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", - "\n", - "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n" - ] - } - ], - "source": [ - "print(docs[0].page_content)" - ] - }, - { - "cell_type": "markdown", - "id": "2445a5e6", - "metadata": {}, - "source": [ - "## FAISS\n", - "There are some FAISS specific methods. One of them is `similarity_search_with_score`, which allows you to return not only the documents but also the similarity score of the query to them." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "479e22ce", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Exiting: Cleaning up .chroma directory\n" - ] - } - ], - "source": [ - "docsearch = FAISS.from_texts(texts, embeddings)" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "b4f49314", - "metadata": {}, - "outputs": [], - "source": [ - "docs_and_scores = docsearch.similarity_search_with_score(query)" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "86f78ab1", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', lookup_str='', metadata={}, lookup_index=0),\n", - " 0.40834612)" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "docs_and_scores[0]" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "d5170563", - "metadata": {}, - "source": [ - "It is also possible to do a search for documents similar to a given embedding vector using `similarity_search_by_vector` which accepts an embedding vector as a parameter instead of a string." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "7675b0aa", - "metadata": {}, - "outputs": [], - "source": [ - "embedding_vector = embeddings.embed_query(query)\n", - "docs_and_scores = docsearch.similarity_search_by_vector(embedding_vector)" - ] - }, - { - "cell_type": "markdown", - "id": "b386dbb8", - "metadata": {}, - "source": [ - "### Saving and loading\n", - "You can also save and load a FAISS index. This is useful so you don't have to recreate it everytime you use it." - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "b58b3955", - "metadata": {}, - "outputs": [], - "source": [ - "docsearch.save_local(\"faiss_index\")" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "ca72c650", - "metadata": {}, - "outputs": [], - "source": [ - "new_docsearch = FAISS.load_local(\"faiss_index\", embeddings)" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "5bf2ee24", - "metadata": {}, - "outputs": [], - "source": [ - "docs = new_docsearch.similarity_search(query)" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "edc2aad1", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', lookup_str='', metadata={}, lookup_index=0)" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "docs[0]" - ] - }, - { - "cell_type": "markdown", - "id": "eea6e627", - "metadata": {}, - "source": [ - "## Requires having ElasticSearch setup" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "4906b8a3", - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "docsearch = ElasticVectorSearch.from_texts(texts, embeddings, elasticsearch_url=\"http://localhost:9200\")\n", - "\n", - "query = \"What did the president say about Ketanji Brown Jackson\"\n", - "docs = docsearch.similarity_search(query)" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "95f9eee9", - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", - "\n", - "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", - "\n", - "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. \n", - "\n", - "A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n", - "\n", - "And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n" - ] - } - ], - "source": [ - "print(docs[0].page_content)" - ] - }, - { - "cell_type": "markdown", - "id": "7f9cb9e7", - "metadata": {}, - "source": [ - "## Weaviate" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "1037a85e", - "metadata": {}, - "outputs": [], - "source": [ - "import weaviate\n", - "import os\n", - "\n", - "WEAVIATE_URL = \"\"\n", - "client = weaviate.Client(\n", - " url=WEAVIATE_URL,\n", - " additional_headers={\n", - " 'X-OpenAI-Api-Key': os.environ[\"OPENAI_API_KEY\"]\n", - " }\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "b9043766", - "metadata": {}, - "outputs": [], - "source": [ - "client.schema.delete_all()\n", - "client.schema.get()\n", - "schema = {\n", - " \"classes\": [\n", - " {\n", - " \"class\": \"Paragraph\",\n", - " \"description\": \"A written paragraph\",\n", - " \"vectorizer\": \"text2vec-openai\",\n", - " \"moduleConfig\": {\n", - " \"text2vec-openai\": {\n", - " \"model\": \"babbage\",\n", - " \"type\": \"text\"\n", - " }\n", - " },\n", - " \"properties\": [\n", - " {\n", - " \"dataType\": [\"text\"],\n", - " \"description\": \"The content of the paragraph\",\n", - " \"moduleConfig\": {\n", - " \"text2vec-openai\": {\n", - " \"skip\": False,\n", - " \"vectorizePropertyName\": False\n", - " }\n", - " },\n", - " \"name\": \"content\",\n", - " },\n", - " ],\n", - " },\n", - " ]\n", - "}\n", - "\n", - "client.schema.create(schema)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "ac20d99c", - "metadata": {}, - "outputs": [], - "source": [ - "with client.batch as batch:\n", - " for text in texts:\n", - " batch.add_data_object({\"content\": text}, \"Paragraph\")" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "01645d61", - "metadata": {}, - "outputs": [], - "source": [ - "from langchain.vectorstores.weaviate import Weaviate" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "bdd97d29", - "metadata": {}, - "outputs": [], - "source": [ - "vectorstore = Weaviate(client, \"Paragraph\", \"content\")" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "id": "b70c0f98", - "metadata": {}, - "outputs": [], - "source": [ - "query = \"What did the president say about Ketanji Brown Jackson\"\n", - "docs = vectorstore.similarity_search(query)" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "id": "07533e40", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n", - "\n", - "We cannot let this happen. \n", - "\n", - "Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n", - "\n", - "Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", - "\n", - "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", - "\n", - "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. \n" - ] - } - ], - "source": [ - "print(docs[0].page_content)" - ] - }, - { - "cell_type": "markdown", - "id": "007f3102", - "metadata": {}, - "source": [ - "## Pinecone" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "7f6047e5", - "metadata": {}, - "outputs": [], - "source": [ - "import pinecone \n", - "\n", - "# initialize pinecone\n", - "pinecone.init(\n", - " api_key=\"YOUR_API_KEY\", # find at app.pinecone.io\n", - " environment=\"YOUR_ENV\" # next to api key in console\n", - ")\n", - "\n", - "index_name = \"langchain-demo\"\n", - "\n", - "docsearch = Pinecone.from_texts(texts, embeddings, index_name=index_name)\n", - "\n", - "query = \"What did the president say about Ketanji Brown Jackson\"\n", - "docs = docsearch.similarity_search(query)" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "8e81f1f0", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWe’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWe’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWe’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders. ', lookup_str='', metadata={}, lookup_index=0)" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "docs[0]" - ] - }, - { - "cell_type": "markdown", - "id": "9b852079", - "metadata": {}, - "source": [ - "## Qdrant" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e5ec70ce", - "metadata": {}, - "outputs": [], - "source": [ - "host = \"<---host name here --->\"\n", - "api_key = \"<---api key here--->\"\n", - "qdrant = Qdrant.from_texts(texts, embeddings, host=host, prefer_grpc=True, api_key=api_key)\n", - "query = \"What did the president say about Ketanji Brown Jackson\"" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "id": "9805ad1f", - "metadata": {}, - "outputs": [], - "source": [ - "docs = qdrant.similarity_search(query)" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "id": "bd097a0e", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', lookup_str='', metadata={}, lookup_index=0)" - ] - }, - "execution_count": 22, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "docs[0]" - ] - }, - { - "cell_type": "markdown", - "id": "6c3ec797", - "metadata": {}, - "source": [ - "## Milvus\n", - "To run, you should have a Milvus instance up and running: https://milvus.io/docs/install_standalone-docker.md" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "be347313", - "metadata": {}, - "outputs": [], - "source": [ - "from langchain.vectorstores import Milvus" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "f2eee23f", - "metadata": {}, - "outputs": [], - "source": [ - "vector_db = Milvus.from_texts(\n", - " texts,\n", - " embeddings,\n", - " connection_args={\"host\": \"127.0.0.1\", \"port\": \"19530\"},\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "06bdb701", - "metadata": {}, - "outputs": [], - "source": [ - "docs = vector_db.similarity_search(query)" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "id": "7b3e94aa", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', lookup_str='', metadata={}, lookup_index=0)" - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "docs[0]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "4af5a071", - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.1" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/docs/modules/utils/combine_docs_how_to.rst b/docs/modules/utils/combine_docs_how_to.rst deleted file mode 100644 index 398a5195..00000000 --- a/docs/modules/utils/combine_docs_how_to.rst +++ /dev/null @@ -1,21 +0,0 @@ -Utilities for working with Documents -==================================== - -There are a lot of different utilities that LangChain provides integrations for -These guides go over how to use them. -The utilities here are all utilities that make it easier to work with documents. - -`Text Splitters <./combine_docs_examples/textsplitter.html>`_: A walkthrough of how to split large documents up into smaller, more manageable pieces of text. - -`VectorStores <./combine_docs_examples/vectorstores.html>`_: A walkthrough of vectorstore functionalities, and different types of vectorstores, that LangChain supports. - -`Embeddings <./combine_docs_examples/embeddings.html>`_: A walkthrough of embedding functionalities, and different types of embeddings, that LangChain supports. - -`HyDE <./combine_docs_examples/hyde.html>`_: How to use Hypothetical Document Embeddings, a novel way of constructing embeddings for document retrieval systems. - -.. toctree:: - :maxdepth: 1 - :glob: - :hidden: - - combine_docs_examples/* diff --git a/docs/modules/utils/generic_how_to.rst b/docs/modules/utils/generic_how_to.rst deleted file mode 100644 index b7bf53c6..00000000 --- a/docs/modules/utils/generic_how_to.rst +++ /dev/null @@ -1,30 +0,0 @@ -Generic Utilities -================= - -There are a lot of different utilities that LangChain provides integrations for -These guides go over how to use them. -The utilities listed here are all generic utilities. - -`Bash <./examples/bash.html>`_: How to use a bash wrapper to execute bash commands. - -`Python REPL <./examples/python.html>`_: How to use a Python wrapper to execute python commands. - -`Requests <./examples/requests.html>`_: How to use a requests wrapper to interact with the web. - -`Google Search <./examples/google_search.html>`_: How to use the google search wrapper to search the web. - -`SerpAPI <./examples/serpapi.html>`_: How to use the SerpAPI wrapper to search the web. - -`SearxNG Search API <./examples/searx_search.html>`_: Hot to use the SearxNG meta search wrapper to search the web. - -`Bing Search <./examples/bing_search.html>`_: How to use the Bing search wrapper to search the web. - -`Wolfram Alpha <./examples/wolfram_alpha.html>`_: How to use the Wolfram Alpha wrapper to interact with Wolfram Alpha. - - -.. toctree:: - :maxdepth: 1 - :glob: - :hidden: - - ./examples/* diff --git a/docs/modules/utils/how_to_guides.rst b/docs/modules/utils/how_to_guides.rst index eb9d520b..3fa1ccb8 100644 --- a/docs/modules/utils/how_to_guides.rst +++ b/docs/modules/utils/how_to_guides.rst @@ -1,17 +1,30 @@ -How-To Guides -============= +Generic Utilities +================= There are a lot of different utilities that LangChain provides integrations for These guides go over how to use them. -These can largely be grouped into two categories: +The utilities listed here are all generic utilities. + +`Bash <./examples/bash.html>`_: How to use a bash wrapper to execute bash commands. + +`Python REPL <./examples/python.html>`_: How to use a Python wrapper to execute python commands. + +`Requests <./examples/requests.html>`_: How to use a requests wrapper to interact with the web. + +`Google Search <./examples/google_search.html>`_: How to use the google search wrapper to search the web. + +`SerpAPI <./examples/serpapi.html>`_: How to use the SerpAPI wrapper to search the web. + +`SearxNG Search API <./examples/searx_search.html>`_: Hot to use the SearxNG meta search wrapper to search the web. + +`Bing Search <./examples/bing_search.html>`_: How to use the Bing search wrapper to search the web. + +`Wolfram Alpha <./examples/wolfram_alpha.html>`_: How to use the Wolfram Alpha wrapper to interact with Wolfram Alpha. -1. `Generic Utilities <./generic_how_to.html>`_: Generic utilities, including search, python REPLs, etc. -2. `Utilities for working with Documents <./combine_docs_how_to.html>`_: Utilities aimed at making it easy to work with documents (text splitting, embeddings, vectorstores, etc). .. toctree:: :maxdepth: 1 :glob: :hidden: - ./generic_how_to.rst - ./combine_docs_how_to.rst + ./examples/* \ No newline at end of file diff --git a/docs/modules/utils/key_concepts.md b/docs/modules/utils/key_concepts.md index 5713d28f..cbb0af9c 100644 --- a/docs/modules/utils/key_concepts.md +++ b/docs/modules/utils/key_concepts.md @@ -1,19 +1,5 @@ # Key Concepts -## Text Splitter -This class is responsible for splitting long pieces of text into smaller components. -It contains different ways for splitting text (on characters, using Spacy, etc) -as well as different ways for measuring length (token based, character based, etc). - -## Embeddings -These classes are very similar to the LLM classes in that they are wrappers around models, -but rather than return a string they return an embedding (list of floats). These are particularly useful when -implementing semantic search functionality. They expose separate methods for embedding queries versus embedding documents. - -## Vectorstores -These are datastores that store embeddings of documents in vector form. -They expose a method for passing in a string and finding similar documents. - ## Python REPL Sometimes, for complex calculations, rather than have an LLM generate the answer directly, it can be better to have the LLM generate code to calculate the answer, and then run that code to get the answer. diff --git a/docs/use_cases/combine_docs.md b/docs/use_cases/combine_docs.md index 5b2067bc..d67e9781 100644 --- a/docs/use_cases/combine_docs.md +++ b/docs/use_cases/combine_docs.md @@ -61,7 +61,7 @@ small enough chunks. LangChain provides some utilities to help with splitting up larger pieces of data. This comes in the form of the TextSplitter class. The class takes in a document and splits it up into chunks, with several parameters that control the size of the chunks as well as the overlap in the chunks (important for maintaining context). -See [this walkthrough](../modules/utils/combine_docs_examples/textsplitter.ipynb) for more information. +See [this walkthrough](../modules/indexes/examples/textsplitter.ipynb) for more information. ### Relevant Documents A second large issue related fetching data is to make sure you are not fetching too many documents, and are only fetching