improve docs for indexes (#1146)

searx-api
Harrison Chase 1 year ago committed by GitHub
parent 28781a6213
commit 4f3fbd7267
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -17,4 +17,4 @@ To import this vectorstore:
from langchain.vectorstores import Chroma
```
For a more detailed walkthrough of the Chroma wrapper, see [this notebook](../modules/utils/combine_docs_examples/vectorstores.ipynb)
For a more detailed walkthrough of the Chroma wrapper, see [this notebook](../modules/indexes/examples/vectorstores.ipynb)

@ -22,4 +22,4 @@ There exists an Cohere Embeddings wrapper, which you can access with
```python
from langchain.embeddings import CohereEmbeddings
```
For a more detailed walkthrough of this, see [this notebook](../modules/utils/combine_docs_examples/embeddings.ipynb)
For a more detailed walkthrough of this, see [this notebook](../modules/indexes/examples/embeddings.ipynb)

@ -47,7 +47,7 @@ To use a the wrapper for a model hosted on Hugging Face Hub:
```python
from langchain.embeddings import HuggingFaceHubEmbeddings
```
For a more detailed walkthrough of this, see [this notebook](../modules/utils/combine_docs_examples/embeddings.ipynb)
For a more detailed walkthrough of this, see [this notebook](../modules/indexes/examples/embeddings.ipynb)
### Tokenizer
@ -59,7 +59,7 @@ You can also use it to count tokens when splitting documents with
from langchain.text_splitter import CharacterTextSplitter
CharacterTextSplitter.from_huggingface_tokenizer(...)
```
For a more detailed walkthrough of this, see [this notebook](../modules/utils/combine_docs_examples/textsplitter.ipynb)
For a more detailed walkthrough of this, see [this notebook](../modules/indexes/examples/textsplitter.ipynb)
### Datasets

@ -31,7 +31,7 @@ There exists an OpenAI Embeddings wrapper, which you can access with
```python
from langchain.embeddings import OpenAIEmbeddings
```
For a more detailed walkthrough of this, see [this notebook](../modules/utils/combine_docs_examples/embeddings.ipynb)
For a more detailed walkthrough of this, see [this notebook](../modules/indexes/examples/embeddings.ipynb)
### Tokenizer
@ -44,7 +44,7 @@ You can also use it to count tokens when splitting documents with
from langchain.text_splitter import CharacterTextSplitter
CharacterTextSplitter.from_tiktoken_encoder(...)
```
For a more detailed walkthrough of this, see [this notebook](../modules/utils/combine_docs_examples/textsplitter.ipynb)
For a more detailed walkthrough of this, see [this notebook](../modules/indexes/examples/textsplitter.ipynb)
### Moderation
You can also access the OpenAI content moderation endpoint with

@ -17,4 +17,4 @@ To import this vectorstore:
from langchain.vectorstores import Pinecone
```
For a more detailed walkthrough of the Pinecone wrapper, see [this notebook](../modules/utils/combine_docs_examples/vectorstores.ipynb)
For a more detailed walkthrough of the Pinecone wrapper, see [this notebook](../modules/indexes/examples/vectorstores.ipynb)

@ -30,4 +30,4 @@ To import this vectorstore:
from langchain.vectorstores import Weaviate
```
For a more detailed walkthrough of the Weaviate wrapper, see [this notebook](../modules/utils/combine_docs_examples/vectorstores.ipynb)
For a more detailed walkthrough of the Weaviate wrapper, see [this notebook](../modules/indexes/examples/vectorstores.ipynb)

@ -42,7 +42,7 @@ Checkout the below guide for a walkthrough of how to get started using LangChain
Modules
-----------
There are six main modules that LangChain provides support for.
There are several main modules that LangChain provides support for.
For each module we provide some examples to get started, how-to guides, reference docs, and conceptual guides.
These modules are, in increasing order of complexity:
@ -57,6 +57,8 @@ These modules are, in increasing order of complexity:
- `Chains <./modules/chains.html>`_: Chains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.
- `Indexes <./modules/indexes.html>`_: Language models are often more powerful when combined with your own text data - this module covers best practices for doing exactly that.
- `Agents <./modules/agents.html>`_: Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end to end agents.
- `Memory <./modules/memory.html>`_: Memory is the concept of persisting state between calls of a chain/agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.
@ -72,6 +74,7 @@ These modules are, in increasing order of complexity:
./modules/llms.md
./modules/document_loaders.md
./modules/utils.md
./modules/indexes.md
./modules/chains.md
./modules/agents.md
./modules/memory.md

@ -1,34 +0,0 @@
CombineDocuments Chains
-----------------------
A chain is made up of links, which can be either primitives or other chains.
Primitives can be either `prompts <../prompts.html>`_, `llms <../llms.html>`_, `utils <../utils.html>`_, or other chains.
The examples here are all end-to-end chains for working with documents.
`Question Answering <./combine_docs_examples/question_answering.html>`_: A walkthrough of how to use LangChain for question answering over specific documents.
`Question Answering with Sources <./combine_docs_examples/qa_with_sources.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over specific documents.
`Summarization <./combine_docs_examples/summarize.html>`_: A walkthrough of how to use LangChain for summarization over specific documents.
`Vector DB Text Generation <./combine_docs_examples/vector_db_text_generation.html>`_: A walkthrough of how to use LangChain for text generation over a vector database.
`Vector DB Question Answering <./combine_docs_examples/vector_db_qa.html>`_: A walkthrough of how to use LangChain for question answering over a vector database.
`Vector DB Question Answering with Sources <./combine_docs_examples/vector_db_qa_with_sources.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over a vector database.
`Graph Question Answering <./combine_docs_examples/graph_qa.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over a graph database.
`Chat Vector DB <./combine_docs_examples/chat_vector_db.html>`_: A walkthrough of how to use LangChain as a chatbot over a vector database.
`Analyze Document <./combine_docs_examples/analyze_document.html>`_: A walkthrough of how to use LangChain to analyze long documents.
.. toctree::
:maxdepth: 1
:glob:
:caption: CombineDocument Chains
:name: combine_docs
:hidden:
./combine_docs_examples/*

@ -4,12 +4,11 @@ How-To Guides
A chain is made up of links, which can be either primitives or other chains.
Primitives can be either `prompts <../prompts.html>`_, `llms <../llms.html>`_, `utils <../utils.html>`_, or other chains.
The examples here are all end-to-end chains for specific applications.
They are broken up into four categories:
They are broken up into three categories:
1. `Generic Chains <./generic_how_to.html>`_: Generic chains, that are meant to help build other chains rather than serve a particular purpose.
2. `CombineDocuments Chains <./combine_docs_how_to.html>`_: Chains aimed at making it easy to work with documents (question answering, summarization, etc).
3. `Utility Chains <./utility_how_to.html>`_: Chains consisting of an LLMChain interacting with a specific util.
4. `Asynchronous <./async_chain.html>`_: Covering asynchronous functionality.
2. `Utility Chains <./utility_how_to.html>`_: Chains consisting of an LLMChain interacting with a specific util.
3. `Asynchronous <./async_chain.html>`_: Covering asynchronous functionality.
.. toctree::
:maxdepth: 1
@ -17,7 +16,6 @@ They are broken up into four categories:
:hidden:
./generic_how_to.rst
./combine_docs_how_to.rst
./utility_how_to.rst
./async_chain.ipynb

@ -9,13 +9,3 @@ This is a specific type of chain where multiple other chains are run in sequence
to the next. A subtype of this type of chain is the `SimpleSequentialChain`, where all subchains have only one input and one output,
and the output of one is therefore used as sole input to the next chain.
## CombineDocuments Chains
These are a subset of chains designed to work with documents. There are two pieces to consider:
1. The underlying chain method (eg, how the documents are combined)
2. Use cases for these types of chains.
For the first, please see [this documentation](combine_docs.md) for more detailed information on the types of chains LangChain supports.
For the second, please see the Use Cases section for more information on [question answering](/use_cases/question_answering.md),
[question answering with sources](/use_cases/qa_with_sources.md), and [summarization](/use_cases/summarization.md).

@ -0,0 +1,25 @@
Indexes
==========================
Indexes refer to ways to structure documents so that LLMs can best interact with them.
This module contains utility functions for working with documents, different types of indexes, and then examples for using those indexes in chains.
LangChain provides common indices for working with data (most prominently support for vector databases).
For more complicated index structures, it is worth checking out `GPTIndex <https://gpt-index.readthedocs.io/en/latest/index.html>`_.
The following sections of documentation are provided:
- `Getting Started <./indexes/getting_started.html>`_: An overview of all the functionality LangChain provides for working with indexes.
- `Key Concepts <./indexes/key_concepts.html>`_: A conceptual guide going over the various concepts related to indexes and the tools needed to create them.
- `How-To Guides <./indexes/how_to_guides.html>`_: A collection of how-to guides. These highlight how to use all the relevant tools, the different types of vector databases, and how to use indexes in chains.
.. toctree::
:maxdepth: 1
:name: LLMs
:hidden:
./indexes/getting_started.ipynb
./indexes/key_concepts.md
./indexes/how_to_guides.rst

@ -506,7 +506,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.9.1"
}
},
"nbformat": 4,

@ -0,0 +1,273 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "7ef4d402-6662-4a26-b612-35b542066487",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"# VectorStores\n",
"\n",
"This notebook showcases basic functionality related to VectorStores. A key part of working with vectorstores is creating the vector to put in them, which is usually created via embeddings. Therefor, it is recommended that you familiarize yourself with the [embedding notebook](embeddings.ipynb) before diving into this.\n",
"\n",
"This covers generic high level functionality related to all vector stores. For guides on specific vectorstores, please see the how-to guides [here](../how_to_guides.rst)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "965eecee",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import Chroma"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "68481687",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"with open('../../state_of_the_union.txt') as f:\n",
" state_of_the_union = f.read()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_text(state_of_the_union)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "015f4ff5",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"docsearch = Chroma.from_texts(texts, embeddings)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "67baf32e",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
"\n",
"We cannot let this happen. \n",
"\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "fb6baaf8",
"metadata": {},
"source": [
"## Add texts\n",
"You can easily add text to a vectorstore with the `add_texts` method. It will return a list of document IDs (in case you need to use them downstream)."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "70758e4f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['a05e3d0c-ab40-11ed-a853-e65801318981']"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docsearch.add_texts([\"Ankush went to Princeton\"])"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "4edeb88f",
"metadata": {},
"outputs": [],
"source": [
"query = \"Where did Ankush go to college?\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "1cba64a2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='Ankush went to Princeton', lookup_str='', metadata={}, lookup_index=0)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0]"
]
},
{
"cell_type": "markdown",
"id": "bbf5ec44",
"metadata": {},
"source": [
"## From Documents\n",
"We can also initialize a vectorstore from documents directly. This is useful when we use the method on the text splitter to get documents directly (handy when the original documents have associated metadata)."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "df4a459c",
"metadata": {},
"outputs": [],
"source": [
"documents = text_splitter.create_documents([state_of_the_union], metadatas=[{\"source\": \"State of the Union\"}])"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "4b480245",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"docsearch = Chroma.from_documents(documents, embeddings)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "86aa4cda",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
"\n",
"We cannot let this happen. \n",
"\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4af5a071",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,186 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "07c1e3b9",
"metadata": {},
"source": [
"# Getting Started\n",
"\n",
"This example showcases question answering over a vector database.\n",
"We have chosen this as the example for getting started because it nicely combines a lot of different elements (Text splitters, embeddings, vectorstores) and then also shows how to use them in a chain."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "82525493",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain import OpenAI, VectorDBQA"
]
},
{
"cell_type": "markdown",
"id": "0b7adc54",
"metadata": {},
"source": [
"Here we load in the documents we want to use to create our index."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "611e0c19",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../state_of_the_union.txt')\n",
"documents = loader.load()"
]
},
{
"cell_type": "markdown",
"id": "9fdc0fc2",
"metadata": {},
"source": [
"Next, we will split the documents into chunks."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "afecb8cf",
"metadata": {},
"outputs": [],
"source": [
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_documents(documents)"
]
},
{
"cell_type": "markdown",
"id": "4bebc041",
"metadata": {},
"source": [
"We will then select which embeddings we want to use."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "9eaaa735",
"metadata": {},
"outputs": [],
"source": [
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "markdown",
"id": "24612905",
"metadata": {},
"source": [
"We now create the vectorstore to use as the index."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "5c7049db",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"db = Chroma.from_documents(texts, embeddings)"
]
},
{
"cell_type": "markdown",
"id": "30c4e5c6",
"metadata": {},
"source": [
"Finally, we create a chain and use it to answer questions!"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "3018f865",
"metadata": {},
"outputs": [],
"source": [
"qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type=\"stuff\", vectorstore=db)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "032a47f8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\" The President said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"qa.run(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8b403637",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
},
"vscode": {
"interpreter": {
"hash": "b1677b440931f40d89ef8be7bf03acb108ce003de0ac9b18e8d43753ea2e7103"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,93 @@
How To Guides
====================================
Utils
-----
There are a lot of different utilities that LangChain provides integrations for
These guides go over how to use them.
The utilities here are all utilities that make it easier to work with documents.
`Text Splitters <./examples/textsplitter.html>`_: A walkthrough of how to split large documents up into smaller, more manageable pieces of text.
`VectorStores <./examples/vectorstores.html>`_: A walkthrough of the vectorstore abstraction that LangChain supports.
`Embeddings <./examples/embeddings.html>`_: A walkthrough of embedding functionalities, and different types of embeddings, that LangChain supports.
`HyDE <./examples/hyde.html>`_: How to use Hypothetical Document Embeddings, a novel way of constructing embeddings for document retrieval systems.
.. toctree::
:maxdepth: 1
:glob:
:caption: Utils
:name: utils
:hidden:
examples/*
Vectorstores
------------
Vectorstores are one of the most important components of building indexes.
In the below guides, we cover different types of vectorstores and how to use them.
`Chroma <./vectorstore_examples/chroma.html>`_: A walkthrough of how to use the Chroma vectorstore wrapper.
`FAISS <./vectorstore_examples/faiss.html>`_: A walkthrough of how to use the FAISS vectorstore wrapper.
`Elastic Search <./vectorstore_examples/elasticsearch.html>`_: A walkthrough of how to use the ElasticSearch wrapper.
`Milvus <./vectorstore_examples/milvus.html>`_: A walkthrough of how to use the Milvus vectorstore wrapper.
`Pinecone <./vectorstore_examples/pinecone.html>`_: A walkthrough of how to use the Pinecone vectorstore wrapper.
`Qdrant <./vectorstore_examples/qdrant.html>`_: A walkthrough of how to use the Qdrant vectorstore wrapper.
`Weaviate <./vectorstore_examples/weaviate.html>`_: A walkthrough of how to use the Weaviate vectorstore wrapper.
.. toctree::
:maxdepth: 1
:glob:
:caption: Vectorstores
:name: vectorstores
:hidden:
vectorstore_examples/*
Chains
------
The examples here are all end-to-end chains that use indexes or utils covered above.
`Question Answering <./chain_examples/question_answering.html>`_: A walkthrough of how to use LangChain for question answering over specific documents.
`Question Answering with Sources <./chain_examples/qa_with_sources.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over specific documents.
`Summarization <./chain_examples/summarize.html>`_: A walkthrough of how to use LangChain for summarization over specific documents.
`Vector DB Text Generation <./chain_examples/vector_db_text_generation.html>`_: A walkthrough of how to use LangChain for text generation over a vector database.
`Vector DB Question Answering <./chain_examples/vector_db_qa.html>`_: A walkthrough of how to use LangChain for question answering over a vector database.
`Vector DB Question Answering with Sources <./chain_examples/vector_db_qa_with_sources.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over a vector database.
`Graph Question Answering <./chain_examples/graph_qa.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over a graph database.
`Chat Vector DB <./chain_examples/chat_vector_db.html>`_: A walkthrough of how to use LangChain as a chatbot over a vector database.
`Analyze Document <./chain_examples/analyze_document.html>`_: A walkthrough of how to use LangChain to analyze long documents.
.. toctree::
:maxdepth: 1
:glob:
:caption: With Chains
:name: chains
:hidden:
./chain_examples/*

@ -0,0 +1,27 @@
# Key Concepts
## Text Splitter
This class is responsible for splitting long pieces of text into smaller components.
It contains different ways for splitting text (on characters, using Spacy, etc)
as well as different ways for measuring length (token based, character based, etc).
## Embeddings
These classes are very similar to the LLM classes in that they are wrappers around models,
but rather than return a string they return an embedding (list of floats). These are particularly useful when
implementing semantic search functionality. They expose separate methods for embedding queries versus embedding documents.
## Vectorstores
These are datastores that store embeddings of documents in vector form.
They expose a method for passing in a string and finding similar documents.
## CombineDocuments Chains
These are a subset of chains designed to work with documents. There are two pieces to consider:
1. The underlying chain method (eg, how the documents are combined)
2. Use cases for these types of chains.
For the first, please see [this documentation](combine_docs.md) for more detailed information on the types of chains LangChain supports.
For the second, please see the Use Cases section for more information on [question answering](/use_cases/question_answering.md),
[question answering with sources](/use_cases/qa_with_sources.md), and [summarization](/use_cases/summarization.md).

@ -0,0 +1,122 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
"source": [
"# Chroma\n",
"\n",
"This notebook shows how to use functionality related to the Chroma vector database."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "aac9563e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "5eabdb75",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"db = Chroma.from_documents(docs, embeddings)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = db.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "4b172de8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
"\n",
"We cannot let this happen. \n",
"\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a359ed74",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,113 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
"source": [
"# ElasticSearch\n",
"\n",
"This notebook shows how to use functionality related to the ElasticSearch database."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "aac9563e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import ElasticVectorSearch\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "12eb86d8",
"metadata": {},
"outputs": [],
"source": [
"db = ElasticVectorSearch.from_documents(docs, embeddings, elasticsearch_url=\"http://localhost:9200\"\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = db.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "4b172de8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
"\n",
"We cannot let this happen. \n",
"\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a359ed74",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,233 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
"source": [
"# FAISS\n",
"\n",
"This notebook shows how to use functionality related to the FAISS vector database."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "aac9563e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import FAISS\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5eabdb75",
"metadata": {},
"outputs": [],
"source": [
"db = FAISS.from_documents(docs, embeddings)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = db.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4b172de8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
"\n",
"We cannot let this happen. \n",
"\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "f13473b5",
"metadata": {},
"source": [
"## Similarity Search with score\n",
"There are some FAISS specific methods. One of them is `similarity_search_with_score`, which allows you to return not only the documents but also the similarity score of the query to them."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "186ee1d8",
"metadata": {},
"outputs": [],
"source": [
"docs_and_scores = db.similarity_search_with_score(query)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "284e04b5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0),\n",
" 0.3914415)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs_and_scores[0]"
]
},
{
"cell_type": "markdown",
"id": "f34420cf",
"metadata": {},
"source": [
"It is also possible to do a search for documents similar to a given embedding vector using `similarity_search_by_vector` which accepts an embedding vector as a parameter instead of a string."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "b558ebb7",
"metadata": {},
"outputs": [],
"source": [
"embedding_vector = embeddings.embed_query(query)\n",
"docs_and_scores = db.similarity_search_by_vector(embedding_vector)"
]
},
{
"cell_type": "markdown",
"id": "31bda7fd",
"metadata": {},
"source": [
"## Saving and loading\n",
"You can also save and load a FAISS index. This is useful so you don't have to recreate it everytime you use it."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "428a6816",
"metadata": {},
"outputs": [],
"source": [
"db.save_local(\"faiss_index\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "56d1841c",
"metadata": {},
"outputs": [],
"source": [
"new_db = FAISS.load_local(\"faiss_index\", embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "39055525",
"metadata": {},
"outputs": [],
"source": [
"docs = new_db.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "98378c4e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bc8b71f7",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,108 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
"source": [
"# Milvus\n",
"\n",
"This notebook shows how to use functionality related to the Milvus vector database.\n",
"\n",
"To run, you should have a Milvus instance up and running: https://milvus.io/docs/install_standalone-docker.md"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "aac9563e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import Milvus\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dcf88bdf",
"metadata": {},
"outputs": [],
"source": [
"vector_db = Milvus.from_documents(\n",
" docs,\n",
" embeddings,\n",
" connection_args={\"host\": \"127.0.0.1\", \"port\": \"19530\"},\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a8c513ab",
"metadata": {},
"outputs": [],
"source": [
"docs = vector_db.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fc516993",
"metadata": {},
"outputs": [],
"source": [
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a359ed74",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,105 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
"source": [
"# Pinecone\n",
"\n",
"This notebook shows how to use functionality related to the Pinecone vector database."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "aac9563e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import Pinecone\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e104aee",
"metadata": {},
"outputs": [],
"source": [
"import pinecone \n",
"\n",
"# initialize pinecone\n",
"pinecone.init(\n",
" api_key=\"YOUR_API_KEY\", # find at app.pinecone.io\n",
" environment=\"YOUR_ENV\" # next to api key in console\n",
")\n",
"\n",
"index_name = \"langchain-demo\"\n",
"\n",
"docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9c608226",
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a359ed74",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,105 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
"source": [
"# Qdrant\n",
"\n",
"This notebook shows how to use functionality related to the Qdrant vector database."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "aac9563e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import Qdrant\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dcf88bdf",
"metadata": {},
"outputs": [],
"source": [
"host = \"<---host name here --->\"\n",
"api_key = \"<---api key here--->\"\n",
"qdrant = Qdrant.from_documents(docs, embeddings, host=host, prefer_grpc=True, api_key=api_key)\n",
"query = \"What did the president say about Ketanji Brown Jackson\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a8c513ab",
"metadata": {},
"outputs": [],
"source": [
"docs = qdrant.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fc516993",
"metadata": {},
"outputs": [],
"source": [
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a359ed74",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,163 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
"source": [
"# Weaviate\n",
"\n",
"This notebook shows how to use functionality related to the Weaviate vector database."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "aac9563e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import Weaviate\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5888dcc7",
"metadata": {},
"outputs": [],
"source": [
"import weaviate\n",
"import os\n",
"\n",
"WEAVIATE_URL = \"\"\n",
"client = weaviate.Client(\n",
" url=WEAVIATE_URL,\n",
" additional_headers={\n",
" 'X-OpenAI-Api-Key': os.environ[\"OPENAI_API_KEY\"]\n",
" }\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f004e8ee",
"metadata": {},
"outputs": [],
"source": [
"client.schema.delete_all()\n",
"client.schema.get()\n",
"schema = {\n",
" \"classes\": [\n",
" {\n",
" \"class\": \"Paragraph\",\n",
" \"description\": \"A written paragraph\",\n",
" \"vectorizer\": \"text2vec-openai\",\n",
" \"moduleConfig\": {\n",
" \"text2vec-openai\": {\n",
" \"model\": \"babbage\",\n",
" \"type\": \"text\"\n",
" }\n",
" },\n",
" \"properties\": [\n",
" {\n",
" \"dataType\": [\"text\"],\n",
" \"description\": \"The content of the paragraph\",\n",
" \"moduleConfig\": {\n",
" \"text2vec-openai\": {\n",
" \"skip\": False,\n",
" \"vectorizePropertyName\": False\n",
" }\n",
" },\n",
" \"name\": \"content\",\n",
" },\n",
" ],\n",
" },\n",
" ]\n",
"}\n",
"\n",
"client.schema.create(schema)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ef6d5d04",
"metadata": {},
"outputs": [],
"source": [
"vectorstore = Weaviate(client, \"Paragraph\", \"content\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "06e8c1ed",
"metadata": {},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = vectorstore.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "38b86be6",
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a359ed74",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -1,772 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "7ef4d402-6662-4a26-b612-35b542066487",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"# VectorStores\n",
"\n",
"This notebook show cases how to use VectorStores. A key part of working with vectorstores is creating the vector to put in them, which is usually created via embeddings. Therefor, it is recommended that you familiarize yourself with the [embedding notebook](embeddings.ipynb) before diving into this."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "965eecee",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import ElasticVectorSearch, Pinecone, Weaviate, FAISS, Qdrant, Chroma"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "68481687",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"with open('../../state_of_the_union.txt') as f:\n",
" state_of_the_union = f.read()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_text(state_of_the_union)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "015f4ff5",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"docsearch = Chroma.from_texts(texts, embeddings)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "67baf32e",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
"\n",
"We cannot let this happen. \n",
"\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "fb6baaf8",
"metadata": {},
"source": [
"## Add texts\n",
"You can easily add text to a vectorstore with the `add_texts` method. It will return a list of document IDs (in case you need to use them downstream)."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "70758e4f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['a05e3d0c-ab40-11ed-a853-e65801318981']"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docsearch.add_texts([\"Ankush went to Princeton\"])"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "4edeb88f",
"metadata": {},
"outputs": [],
"source": [
"query = \"Where did Ankush go to college?\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "1cba64a2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='Ankush went to Princeton', lookup_str='', metadata={}, lookup_index=0)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0]"
]
},
{
"cell_type": "markdown",
"id": "bbf5ec44",
"metadata": {},
"source": [
"## From Documents\n",
"We can also initialize a vectorstore from documents directly. This is useful when we use the method on the text splitter to get documents directly (handy when the original documents have associated metadata)."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "df4a459c",
"metadata": {},
"outputs": [],
"source": [
"documents = text_splitter.create_documents([state_of_the_union], metadatas=[{\"source\": \"State of the Union\"}])"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "4b480245",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"docsearch = Chroma.from_documents(documents, embeddings)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "86aa4cda",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
"\n",
"We cannot let this happen. \n",
"\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "2445a5e6",
"metadata": {},
"source": [
"## FAISS\n",
"There are some FAISS specific methods. One of them is `similarity_search_with_score`, which allows you to return not only the documents but also the similarity score of the query to them."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "479e22ce",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Exiting: Cleaning up .chroma directory\n"
]
}
],
"source": [
"docsearch = FAISS.from_texts(texts, embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b4f49314",
"metadata": {},
"outputs": [],
"source": [
"docs_and_scores = docsearch.similarity_search_with_score(query)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "86f78ab1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', lookup_str='', metadata={}, lookup_index=0),\n",
" 0.40834612)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs_and_scores[0]"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d5170563",
"metadata": {},
"source": [
"It is also possible to do a search for documents similar to a given embedding vector using `similarity_search_by_vector` which accepts an embedding vector as a parameter instead of a string."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7675b0aa",
"metadata": {},
"outputs": [],
"source": [
"embedding_vector = embeddings.embed_query(query)\n",
"docs_and_scores = docsearch.similarity_search_by_vector(embedding_vector)"
]
},
{
"cell_type": "markdown",
"id": "b386dbb8",
"metadata": {},
"source": [
"### Saving and loading\n",
"You can also save and load a FAISS index. This is useful so you don't have to recreate it everytime you use it."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b58b3955",
"metadata": {},
"outputs": [],
"source": [
"docsearch.save_local(\"faiss_index\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ca72c650",
"metadata": {},
"outputs": [],
"source": [
"new_docsearch = FAISS.load_local(\"faiss_index\", embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "5bf2ee24",
"metadata": {},
"outputs": [],
"source": [
"docs = new_docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "edc2aad1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', lookup_str='', metadata={}, lookup_index=0)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0]"
]
},
{
"cell_type": "markdown",
"id": "eea6e627",
"metadata": {},
"source": [
"## Requires having ElasticSearch setup"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4906b8a3",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"docsearch = ElasticVectorSearch.from_texts(texts, embeddings, elasticsearch_url=\"http://localhost:9200\")\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "95f9eee9",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence. \n",
"\n",
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
"\n",
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "7f9cb9e7",
"metadata": {},
"source": [
"## Weaviate"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "1037a85e",
"metadata": {},
"outputs": [],
"source": [
"import weaviate\n",
"import os\n",
"\n",
"WEAVIATE_URL = \"\"\n",
"client = weaviate.Client(\n",
" url=WEAVIATE_URL,\n",
" additional_headers={\n",
" 'X-OpenAI-Api-Key': os.environ[\"OPENAI_API_KEY\"]\n",
" }\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "b9043766",
"metadata": {},
"outputs": [],
"source": [
"client.schema.delete_all()\n",
"client.schema.get()\n",
"schema = {\n",
" \"classes\": [\n",
" {\n",
" \"class\": \"Paragraph\",\n",
" \"description\": \"A written paragraph\",\n",
" \"vectorizer\": \"text2vec-openai\",\n",
" \"moduleConfig\": {\n",
" \"text2vec-openai\": {\n",
" \"model\": \"babbage\",\n",
" \"type\": \"text\"\n",
" }\n",
" },\n",
" \"properties\": [\n",
" {\n",
" \"dataType\": [\"text\"],\n",
" \"description\": \"The content of the paragraph\",\n",
" \"moduleConfig\": {\n",
" \"text2vec-openai\": {\n",
" \"skip\": False,\n",
" \"vectorizePropertyName\": False\n",
" }\n",
" },\n",
" \"name\": \"content\",\n",
" },\n",
" ],\n",
" },\n",
" ]\n",
"}\n",
"\n",
"client.schema.create(schema)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "ac20d99c",
"metadata": {},
"outputs": [],
"source": [
"with client.batch as batch:\n",
" for text in texts:\n",
" batch.add_data_object({\"content\": text}, \"Paragraph\")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "01645d61",
"metadata": {},
"outputs": [],
"source": [
"from langchain.vectorstores.weaviate import Weaviate"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "bdd97d29",
"metadata": {},
"outputs": [],
"source": [
"vectorstore = Weaviate(client, \"Paragraph\", \"content\")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "b70c0f98",
"metadata": {},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = vectorstore.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "07533e40",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
"\n",
"We cannot let this happen. \n",
"\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence. \n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "007f3102",
"metadata": {},
"source": [
"## Pinecone"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "7f6047e5",
"metadata": {},
"outputs": [],
"source": [
"import pinecone \n",
"\n",
"# initialize pinecone\n",
"pinecone.init(\n",
" api_key=\"YOUR_API_KEY\", # find at app.pinecone.io\n",
" environment=\"YOUR_ENV\" # next to api key in console\n",
")\n",
"\n",
"index_name = \"langchain-demo\"\n",
"\n",
"docsearch = Pinecone.from_texts(texts, embeddings, index_name=index_name)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "8e81f1f0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders. ', lookup_str='', metadata={}, lookup_index=0)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0]"
]
},
{
"cell_type": "markdown",
"id": "9b852079",
"metadata": {},
"source": [
"## Qdrant"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e5ec70ce",
"metadata": {},
"outputs": [],
"source": [
"host = \"<---host name here --->\"\n",
"api_key = \"<---api key here--->\"\n",
"qdrant = Qdrant.from_texts(texts, embeddings, host=host, prefer_grpc=True, api_key=api_key)\n",
"query = \"What did the president say about Ketanji Brown Jackson\""
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "9805ad1f",
"metadata": {},
"outputs": [],
"source": [
"docs = qdrant.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "bd097a0e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', lookup_str='', metadata={}, lookup_index=0)"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0]"
]
},
{
"cell_type": "markdown",
"id": "6c3ec797",
"metadata": {},
"source": [
"## Milvus\n",
"To run, you should have a Milvus instance up and running: https://milvus.io/docs/install_standalone-docker.md"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "be347313",
"metadata": {},
"outputs": [],
"source": [
"from langchain.vectorstores import Milvus"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "f2eee23f",
"metadata": {},
"outputs": [],
"source": [
"vector_db = Milvus.from_texts(\n",
" texts,\n",
" embeddings,\n",
" connection_args={\"host\": \"127.0.0.1\", \"port\": \"19530\"},\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "06bdb701",
"metadata": {},
"outputs": [],
"source": [
"docs = vector_db.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "7b3e94aa",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', lookup_str='', metadata={}, lookup_index=0)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4af5a071",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -1,21 +0,0 @@
Utilities for working with Documents
====================================
There are a lot of different utilities that LangChain provides integrations for
These guides go over how to use them.
The utilities here are all utilities that make it easier to work with documents.
`Text Splitters <./combine_docs_examples/textsplitter.html>`_: A walkthrough of how to split large documents up into smaller, more manageable pieces of text.
`VectorStores <./combine_docs_examples/vectorstores.html>`_: A walkthrough of vectorstore functionalities, and different types of vectorstores, that LangChain supports.
`Embeddings <./combine_docs_examples/embeddings.html>`_: A walkthrough of embedding functionalities, and different types of embeddings, that LangChain supports.
`HyDE <./combine_docs_examples/hyde.html>`_: How to use Hypothetical Document Embeddings, a novel way of constructing embeddings for document retrieval systems.
.. toctree::
:maxdepth: 1
:glob:
:hidden:
combine_docs_examples/*

@ -1,30 +0,0 @@
Generic Utilities
=================
There are a lot of different utilities that LangChain provides integrations for
These guides go over how to use them.
The utilities listed here are all generic utilities.
`Bash <./examples/bash.html>`_: How to use a bash wrapper to execute bash commands.
`Python REPL <./examples/python.html>`_: How to use a Python wrapper to execute python commands.
`Requests <./examples/requests.html>`_: How to use a requests wrapper to interact with the web.
`Google Search <./examples/google_search.html>`_: How to use the google search wrapper to search the web.
`SerpAPI <./examples/serpapi.html>`_: How to use the SerpAPI wrapper to search the web.
`SearxNG Search API <./examples/searx_search.html>`_: Hot to use the SearxNG meta search wrapper to search the web.
`Bing Search <./examples/bing_search.html>`_: How to use the Bing search wrapper to search the web.
`Wolfram Alpha <./examples/wolfram_alpha.html>`_: How to use the Wolfram Alpha wrapper to interact with Wolfram Alpha.
.. toctree::
:maxdepth: 1
:glob:
:hidden:
./examples/*

@ -1,17 +1,30 @@
How-To Guides
=============
Generic Utilities
=================
There are a lot of different utilities that LangChain provides integrations for
These guides go over how to use them.
These can largely be grouped into two categories:
The utilities listed here are all generic utilities.
`Bash <./examples/bash.html>`_: How to use a bash wrapper to execute bash commands.
`Python REPL <./examples/python.html>`_: How to use a Python wrapper to execute python commands.
`Requests <./examples/requests.html>`_: How to use a requests wrapper to interact with the web.
`Google Search <./examples/google_search.html>`_: How to use the google search wrapper to search the web.
`SerpAPI <./examples/serpapi.html>`_: How to use the SerpAPI wrapper to search the web.
`SearxNG Search API <./examples/searx_search.html>`_: Hot to use the SearxNG meta search wrapper to search the web.
`Bing Search <./examples/bing_search.html>`_: How to use the Bing search wrapper to search the web.
`Wolfram Alpha <./examples/wolfram_alpha.html>`_: How to use the Wolfram Alpha wrapper to interact with Wolfram Alpha.
1. `Generic Utilities <./generic_how_to.html>`_: Generic utilities, including search, python REPLs, etc.
2. `Utilities for working with Documents <./combine_docs_how_to.html>`_: Utilities aimed at making it easy to work with documents (text splitting, embeddings, vectorstores, etc).
.. toctree::
:maxdepth: 1
:glob:
:hidden:
./generic_how_to.rst
./combine_docs_how_to.rst
./examples/*

@ -1,19 +1,5 @@
# Key Concepts
## Text Splitter
This class is responsible for splitting long pieces of text into smaller components.
It contains different ways for splitting text (on characters, using Spacy, etc)
as well as different ways for measuring length (token based, character based, etc).
## Embeddings
These classes are very similar to the LLM classes in that they are wrappers around models,
but rather than return a string they return an embedding (list of floats). These are particularly useful when
implementing semantic search functionality. They expose separate methods for embedding queries versus embedding documents.
## Vectorstores
These are datastores that store embeddings of documents in vector form.
They expose a method for passing in a string and finding similar documents.
## Python REPL
Sometimes, for complex calculations, rather than have an LLM generate the answer directly,
it can be better to have the LLM generate code to calculate the answer, and then run that code to get the answer.

@ -61,7 +61,7 @@ small enough chunks.
LangChain provides some utilities to help with splitting up larger pieces of data. This comes in the form of the TextSplitter class.
The class takes in a document and splits it up into chunks, with several parameters that control the
size of the chunks as well as the overlap in the chunks (important for maintaining context).
See [this walkthrough](../modules/utils/combine_docs_examples/textsplitter.ipynb) for more information.
See [this walkthrough](../modules/indexes/examples/textsplitter.ipynb) for more information.
### Relevant Documents
A second large issue related fetching data is to make sure you are not fetching too many documents, and are only fetching

Loading…
Cancel
Save