mirror of https://github.com/hwchase17/langchain
improve docs for indexes (#1146)
parent
28781a6213
commit
4f3fbd7267
@ -1,34 +0,0 @@
|
|||||||
CombineDocuments Chains
|
|
||||||
-----------------------
|
|
||||||
|
|
||||||
A chain is made up of links, which can be either primitives or other chains.
|
|
||||||
Primitives can be either `prompts <../prompts.html>`_, `llms <../llms.html>`_, `utils <../utils.html>`_, or other chains.
|
|
||||||
The examples here are all end-to-end chains for working with documents.
|
|
||||||
|
|
||||||
`Question Answering <./combine_docs_examples/question_answering.html>`_: A walkthrough of how to use LangChain for question answering over specific documents.
|
|
||||||
|
|
||||||
`Question Answering with Sources <./combine_docs_examples/qa_with_sources.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over specific documents.
|
|
||||||
|
|
||||||
`Summarization <./combine_docs_examples/summarize.html>`_: A walkthrough of how to use LangChain for summarization over specific documents.
|
|
||||||
|
|
||||||
`Vector DB Text Generation <./combine_docs_examples/vector_db_text_generation.html>`_: A walkthrough of how to use LangChain for text generation over a vector database.
|
|
||||||
|
|
||||||
`Vector DB Question Answering <./combine_docs_examples/vector_db_qa.html>`_: A walkthrough of how to use LangChain for question answering over a vector database.
|
|
||||||
|
|
||||||
`Vector DB Question Answering with Sources <./combine_docs_examples/vector_db_qa_with_sources.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over a vector database.
|
|
||||||
|
|
||||||
`Graph Question Answering <./combine_docs_examples/graph_qa.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over a graph database.
|
|
||||||
|
|
||||||
`Chat Vector DB <./combine_docs_examples/chat_vector_db.html>`_: A walkthrough of how to use LangChain as a chatbot over a vector database.
|
|
||||||
|
|
||||||
`Analyze Document <./combine_docs_examples/analyze_document.html>`_: A walkthrough of how to use LangChain to analyze long documents.
|
|
||||||
|
|
||||||
|
|
||||||
.. toctree::
|
|
||||||
:maxdepth: 1
|
|
||||||
:glob:
|
|
||||||
:caption: CombineDocument Chains
|
|
||||||
:name: combine_docs
|
|
||||||
:hidden:
|
|
||||||
|
|
||||||
./combine_docs_examples/*
|
|
@ -0,0 +1,25 @@
|
|||||||
|
Indexes
|
||||||
|
==========================
|
||||||
|
|
||||||
|
Indexes refer to ways to structure documents so that LLMs can best interact with them.
|
||||||
|
This module contains utility functions for working with documents, different types of indexes, and then examples for using those indexes in chains.
|
||||||
|
LangChain provides common indices for working with data (most prominently support for vector databases).
|
||||||
|
For more complicated index structures, it is worth checking out `GPTIndex <https://gpt-index.readthedocs.io/en/latest/index.html>`_.
|
||||||
|
|
||||||
|
The following sections of documentation are provided:
|
||||||
|
|
||||||
|
- `Getting Started <./indexes/getting_started.html>`_: An overview of all the functionality LangChain provides for working with indexes.
|
||||||
|
|
||||||
|
- `Key Concepts <./indexes/key_concepts.html>`_: A conceptual guide going over the various concepts related to indexes and the tools needed to create them.
|
||||||
|
|
||||||
|
- `How-To Guides <./indexes/how_to_guides.html>`_: A collection of how-to guides. These highlight how to use all the relevant tools, the different types of vector databases, and how to use indexes in chains.
|
||||||
|
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
:name: LLMs
|
||||||
|
:hidden:
|
||||||
|
|
||||||
|
./indexes/getting_started.ipynb
|
||||||
|
./indexes/key_concepts.md
|
||||||
|
./indexes/how_to_guides.rst
|
@ -0,0 +1,186 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "07c1e3b9",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Getting Started\n",
|
||||||
|
"\n",
|
||||||
|
"This example showcases question answering over a vector database.\n",
|
||||||
|
"We have chosen this as the example for getting started because it nicely combines a lot of different elements (Text splitters, embeddings, vectorstores) and then also shows how to use them in a chain."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 1,
|
||||||
|
"id": "82525493",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||||
|
"from langchain.vectorstores import Chroma\n",
|
||||||
|
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||||
|
"from langchain import OpenAI, VectorDBQA"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "0b7adc54",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Here we load in the documents we want to use to create our index."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 3,
|
||||||
|
"id": "611e0c19",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langchain.document_loaders import TextLoader\n",
|
||||||
|
"loader = TextLoader('../state_of_the_union.txt')\n",
|
||||||
|
"documents = loader.load()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "9fdc0fc2",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Next, we will split the documents into chunks."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 4,
|
||||||
|
"id": "afecb8cf",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||||
|
"texts = text_splitter.split_documents(documents)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "4bebc041",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We will then select which embeddings we want to use."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 5,
|
||||||
|
"id": "9eaaa735",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"embeddings = OpenAIEmbeddings()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "24612905",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We now create the vectorstore to use as the index."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 6,
|
||||||
|
"id": "5c7049db",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"Running Chroma using direct local API.\n",
|
||||||
|
"Using DuckDB in-memory for database. Data will be transient.\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"db = Chroma.from_documents(texts, embeddings)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "30c4e5c6",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Finally, we create a chain and use it to answer questions!"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 9,
|
||||||
|
"id": "3018f865",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type=\"stuff\", vectorstore=db)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 10,
|
||||||
|
"id": "032a47f8",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"data": {
|
||||||
|
"text/plain": [
|
||||||
|
"\" The President said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"execution_count": 10,
|
||||||
|
"metadata": {},
|
||||||
|
"output_type": "execute_result"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||||
|
"qa.run(query)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "8b403637",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": []
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.9.1"
|
||||||
|
},
|
||||||
|
"vscode": {
|
||||||
|
"interpreter": {
|
||||||
|
"hash": "b1677b440931f40d89ef8be7bf03acb108ce003de0ac9b18e8d43753ea2e7103"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5
|
||||||
|
}
|
@ -0,0 +1,93 @@
|
|||||||
|
How To Guides
|
||||||
|
====================================
|
||||||
|
|
||||||
|
|
||||||
|
Utils
|
||||||
|
-----
|
||||||
|
|
||||||
|
There are a lot of different utilities that LangChain provides integrations for
|
||||||
|
These guides go over how to use them.
|
||||||
|
The utilities here are all utilities that make it easier to work with documents.
|
||||||
|
|
||||||
|
`Text Splitters <./examples/textsplitter.html>`_: A walkthrough of how to split large documents up into smaller, more manageable pieces of text.
|
||||||
|
|
||||||
|
`VectorStores <./examples/vectorstores.html>`_: A walkthrough of the vectorstore abstraction that LangChain supports.
|
||||||
|
|
||||||
|
`Embeddings <./examples/embeddings.html>`_: A walkthrough of embedding functionalities, and different types of embeddings, that LangChain supports.
|
||||||
|
|
||||||
|
`HyDE <./examples/hyde.html>`_: How to use Hypothetical Document Embeddings, a novel way of constructing embeddings for document retrieval systems.
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
:glob:
|
||||||
|
:caption: Utils
|
||||||
|
:name: utils
|
||||||
|
:hidden:
|
||||||
|
|
||||||
|
examples/*
|
||||||
|
|
||||||
|
|
||||||
|
Vectorstores
|
||||||
|
------------
|
||||||
|
|
||||||
|
|
||||||
|
Vectorstores are one of the most important components of building indexes.
|
||||||
|
In the below guides, we cover different types of vectorstores and how to use them.
|
||||||
|
|
||||||
|
`Chroma <./vectorstore_examples/chroma.html>`_: A walkthrough of how to use the Chroma vectorstore wrapper.
|
||||||
|
|
||||||
|
`FAISS <./vectorstore_examples/faiss.html>`_: A walkthrough of how to use the FAISS vectorstore wrapper.
|
||||||
|
|
||||||
|
`Elastic Search <./vectorstore_examples/elasticsearch.html>`_: A walkthrough of how to use the ElasticSearch wrapper.
|
||||||
|
|
||||||
|
`Milvus <./vectorstore_examples/milvus.html>`_: A walkthrough of how to use the Milvus vectorstore wrapper.
|
||||||
|
|
||||||
|
`Pinecone <./vectorstore_examples/pinecone.html>`_: A walkthrough of how to use the Pinecone vectorstore wrapper.
|
||||||
|
|
||||||
|
`Qdrant <./vectorstore_examples/qdrant.html>`_: A walkthrough of how to use the Qdrant vectorstore wrapper.
|
||||||
|
|
||||||
|
`Weaviate <./vectorstore_examples/weaviate.html>`_: A walkthrough of how to use the Weaviate vectorstore wrapper.
|
||||||
|
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
:glob:
|
||||||
|
:caption: Vectorstores
|
||||||
|
:name: vectorstores
|
||||||
|
:hidden:
|
||||||
|
|
||||||
|
vectorstore_examples/*
|
||||||
|
|
||||||
|
|
||||||
|
Chains
|
||||||
|
------
|
||||||
|
|
||||||
|
The examples here are all end-to-end chains that use indexes or utils covered above.
|
||||||
|
|
||||||
|
`Question Answering <./chain_examples/question_answering.html>`_: A walkthrough of how to use LangChain for question answering over specific documents.
|
||||||
|
|
||||||
|
`Question Answering with Sources <./chain_examples/qa_with_sources.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over specific documents.
|
||||||
|
|
||||||
|
`Summarization <./chain_examples/summarize.html>`_: A walkthrough of how to use LangChain for summarization over specific documents.
|
||||||
|
|
||||||
|
`Vector DB Text Generation <./chain_examples/vector_db_text_generation.html>`_: A walkthrough of how to use LangChain for text generation over a vector database.
|
||||||
|
|
||||||
|
`Vector DB Question Answering <./chain_examples/vector_db_qa.html>`_: A walkthrough of how to use LangChain for question answering over a vector database.
|
||||||
|
|
||||||
|
`Vector DB Question Answering with Sources <./chain_examples/vector_db_qa_with_sources.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over a vector database.
|
||||||
|
|
||||||
|
`Graph Question Answering <./chain_examples/graph_qa.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over a graph database.
|
||||||
|
|
||||||
|
`Chat Vector DB <./chain_examples/chat_vector_db.html>`_: A walkthrough of how to use LangChain as a chatbot over a vector database.
|
||||||
|
|
||||||
|
`Analyze Document <./chain_examples/analyze_document.html>`_: A walkthrough of how to use LangChain to analyze long documents.
|
||||||
|
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
:glob:
|
||||||
|
:caption: With Chains
|
||||||
|
:name: chains
|
||||||
|
:hidden:
|
||||||
|
|
||||||
|
./chain_examples/*
|
@ -0,0 +1,27 @@
|
|||||||
|
# Key Concepts
|
||||||
|
|
||||||
|
## Text Splitter
|
||||||
|
This class is responsible for splitting long pieces of text into smaller components.
|
||||||
|
It contains different ways for splitting text (on characters, using Spacy, etc)
|
||||||
|
as well as different ways for measuring length (token based, character based, etc).
|
||||||
|
|
||||||
|
## Embeddings
|
||||||
|
These classes are very similar to the LLM classes in that they are wrappers around models,
|
||||||
|
but rather than return a string they return an embedding (list of floats). These are particularly useful when
|
||||||
|
implementing semantic search functionality. They expose separate methods for embedding queries versus embedding documents.
|
||||||
|
|
||||||
|
## Vectorstores
|
||||||
|
These are datastores that store embeddings of documents in vector form.
|
||||||
|
They expose a method for passing in a string and finding similar documents.
|
||||||
|
|
||||||
|
|
||||||
|
## CombineDocuments Chains
|
||||||
|
These are a subset of chains designed to work with documents. There are two pieces to consider:
|
||||||
|
|
||||||
|
1. The underlying chain method (eg, how the documents are combined)
|
||||||
|
2. Use cases for these types of chains.
|
||||||
|
|
||||||
|
For the first, please see [this documentation](combine_docs.md) for more detailed information on the types of chains LangChain supports.
|
||||||
|
For the second, please see the Use Cases section for more information on [question answering](/use_cases/question_answering.md),
|
||||||
|
[question answering with sources](/use_cases/qa_with_sources.md), and [summarization](/use_cases/summarization.md).
|
||||||
|
|
@ -0,0 +1,108 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "683953b3",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Milvus\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook shows how to use functionality related to the Milvus vector database.\n",
|
||||||
|
"\n",
|
||||||
|
"To run, you should have a Milvus instance up and running: https://milvus.io/docs/install_standalone-docker.md"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 1,
|
||||||
|
"id": "aac9563e",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||||
|
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||||
|
"from langchain.vectorstores import Milvus\n",
|
||||||
|
"from langchain.document_loaders import TextLoader"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 2,
|
||||||
|
"id": "a3c3999a",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langchain.document_loaders import TextLoader\n",
|
||||||
|
"loader = TextLoader('../../state_of_the_union.txt')\n",
|
||||||
|
"documents = loader.load()\n",
|
||||||
|
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||||
|
"docs = text_splitter.split_documents(documents)\n",
|
||||||
|
"\n",
|
||||||
|
"embeddings = OpenAIEmbeddings()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "dcf88bdf",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"vector_db = Milvus.from_documents(\n",
|
||||||
|
" docs,\n",
|
||||||
|
" embeddings,\n",
|
||||||
|
" connection_args={\"host\": \"127.0.0.1\", \"port\": \"19530\"},\n",
|
||||||
|
")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "a8c513ab",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"docs = vector_db.similarity_search(query)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "fc516993",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"docs[0]"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "a359ed74",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": []
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.9.1"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5
|
||||||
|
}
|
@ -0,0 +1,105 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "683953b3",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Pinecone\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook shows how to use functionality related to the Pinecone vector database."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 1,
|
||||||
|
"id": "aac9563e",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||||
|
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||||
|
"from langchain.vectorstores import Pinecone\n",
|
||||||
|
"from langchain.document_loaders import TextLoader"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 2,
|
||||||
|
"id": "a3c3999a",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langchain.document_loaders import TextLoader\n",
|
||||||
|
"loader = TextLoader('../../state_of_the_union.txt')\n",
|
||||||
|
"documents = loader.load()\n",
|
||||||
|
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||||
|
"docs = text_splitter.split_documents(documents)\n",
|
||||||
|
"\n",
|
||||||
|
"embeddings = OpenAIEmbeddings()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "6e104aee",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import pinecone \n",
|
||||||
|
"\n",
|
||||||
|
"# initialize pinecone\n",
|
||||||
|
"pinecone.init(\n",
|
||||||
|
" api_key=\"YOUR_API_KEY\", # find at app.pinecone.io\n",
|
||||||
|
" environment=\"YOUR_ENV\" # next to api key in console\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"index_name = \"langchain-demo\"\n",
|
||||||
|
"\n",
|
||||||
|
"docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)\n",
|
||||||
|
"\n",
|
||||||
|
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||||
|
"docs = docsearch.similarity_search(query)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "9c608226",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"print(docs[0].page_content)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "a359ed74",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": []
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.9.1"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5
|
||||||
|
}
|
@ -0,0 +1,105 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "683953b3",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Qdrant\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook shows how to use functionality related to the Qdrant vector database."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 1,
|
||||||
|
"id": "aac9563e",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||||
|
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||||
|
"from langchain.vectorstores import Qdrant\n",
|
||||||
|
"from langchain.document_loaders import TextLoader"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 2,
|
||||||
|
"id": "a3c3999a",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langchain.document_loaders import TextLoader\n",
|
||||||
|
"loader = TextLoader('../../state_of_the_union.txt')\n",
|
||||||
|
"documents = loader.load()\n",
|
||||||
|
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||||
|
"docs = text_splitter.split_documents(documents)\n",
|
||||||
|
"\n",
|
||||||
|
"embeddings = OpenAIEmbeddings()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "dcf88bdf",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"host = \"<---host name here --->\"\n",
|
||||||
|
"api_key = \"<---api key here--->\"\n",
|
||||||
|
"qdrant = Qdrant.from_documents(docs, embeddings, host=host, prefer_grpc=True, api_key=api_key)\n",
|
||||||
|
"query = \"What did the president say about Ketanji Brown Jackson\""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "a8c513ab",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"docs = qdrant.similarity_search(query)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "fc516993",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"docs[0]"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "a359ed74",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": []
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.9.1"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5
|
||||||
|
}
|
@ -0,0 +1,163 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "683953b3",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Weaviate\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook shows how to use functionality related to the Weaviate vector database."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 1,
|
||||||
|
"id": "aac9563e",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||||
|
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||||
|
"from langchain.vectorstores import Weaviate\n",
|
||||||
|
"from langchain.document_loaders import TextLoader"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 2,
|
||||||
|
"id": "a3c3999a",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langchain.document_loaders import TextLoader\n",
|
||||||
|
"loader = TextLoader('../../state_of_the_union.txt')\n",
|
||||||
|
"documents = loader.load()\n",
|
||||||
|
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||||
|
"docs = text_splitter.split_documents(documents)\n",
|
||||||
|
"\n",
|
||||||
|
"embeddings = OpenAIEmbeddings()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "5888dcc7",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import weaviate\n",
|
||||||
|
"import os\n",
|
||||||
|
"\n",
|
||||||
|
"WEAVIATE_URL = \"\"\n",
|
||||||
|
"client = weaviate.Client(\n",
|
||||||
|
" url=WEAVIATE_URL,\n",
|
||||||
|
" additional_headers={\n",
|
||||||
|
" 'X-OpenAI-Api-Key': os.environ[\"OPENAI_API_KEY\"]\n",
|
||||||
|
" }\n",
|
||||||
|
")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "f004e8ee",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"client.schema.delete_all()\n",
|
||||||
|
"client.schema.get()\n",
|
||||||
|
"schema = {\n",
|
||||||
|
" \"classes\": [\n",
|
||||||
|
" {\n",
|
||||||
|
" \"class\": \"Paragraph\",\n",
|
||||||
|
" \"description\": \"A written paragraph\",\n",
|
||||||
|
" \"vectorizer\": \"text2vec-openai\",\n",
|
||||||
|
" \"moduleConfig\": {\n",
|
||||||
|
" \"text2vec-openai\": {\n",
|
||||||
|
" \"model\": \"babbage\",\n",
|
||||||
|
" \"type\": \"text\"\n",
|
||||||
|
" }\n",
|
||||||
|
" },\n",
|
||||||
|
" \"properties\": [\n",
|
||||||
|
" {\n",
|
||||||
|
" \"dataType\": [\"text\"],\n",
|
||||||
|
" \"description\": \"The content of the paragraph\",\n",
|
||||||
|
" \"moduleConfig\": {\n",
|
||||||
|
" \"text2vec-openai\": {\n",
|
||||||
|
" \"skip\": False,\n",
|
||||||
|
" \"vectorizePropertyName\": False\n",
|
||||||
|
" }\n",
|
||||||
|
" },\n",
|
||||||
|
" \"name\": \"content\",\n",
|
||||||
|
" },\n",
|
||||||
|
" ],\n",
|
||||||
|
" },\n",
|
||||||
|
" ]\n",
|
||||||
|
"}\n",
|
||||||
|
"\n",
|
||||||
|
"client.schema.create(schema)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "ef6d5d04",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"vectorstore = Weaviate(client, \"Paragraph\", \"content\")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "06e8c1ed",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||||
|
"docs = vectorstore.similarity_search(query)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "38b86be6",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"print(docs[0].page_content)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "a359ed74",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": []
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.9.1"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5
|
||||||
|
}
|
@ -1,21 +0,0 @@
|
|||||||
Utilities for working with Documents
|
|
||||||
====================================
|
|
||||||
|
|
||||||
There are a lot of different utilities that LangChain provides integrations for
|
|
||||||
These guides go over how to use them.
|
|
||||||
The utilities here are all utilities that make it easier to work with documents.
|
|
||||||
|
|
||||||
`Text Splitters <./combine_docs_examples/textsplitter.html>`_: A walkthrough of how to split large documents up into smaller, more manageable pieces of text.
|
|
||||||
|
|
||||||
`VectorStores <./combine_docs_examples/vectorstores.html>`_: A walkthrough of vectorstore functionalities, and different types of vectorstores, that LangChain supports.
|
|
||||||
|
|
||||||
`Embeddings <./combine_docs_examples/embeddings.html>`_: A walkthrough of embedding functionalities, and different types of embeddings, that LangChain supports.
|
|
||||||
|
|
||||||
`HyDE <./combine_docs_examples/hyde.html>`_: How to use Hypothetical Document Embeddings, a novel way of constructing embeddings for document retrieval systems.
|
|
||||||
|
|
||||||
.. toctree::
|
|
||||||
:maxdepth: 1
|
|
||||||
:glob:
|
|
||||||
:hidden:
|
|
||||||
|
|
||||||
combine_docs_examples/*
|
|
@ -1,30 +0,0 @@
|
|||||||
Generic Utilities
|
|
||||||
=================
|
|
||||||
|
|
||||||
There are a lot of different utilities that LangChain provides integrations for
|
|
||||||
These guides go over how to use them.
|
|
||||||
The utilities listed here are all generic utilities.
|
|
||||||
|
|
||||||
`Bash <./examples/bash.html>`_: How to use a bash wrapper to execute bash commands.
|
|
||||||
|
|
||||||
`Python REPL <./examples/python.html>`_: How to use a Python wrapper to execute python commands.
|
|
||||||
|
|
||||||
`Requests <./examples/requests.html>`_: How to use a requests wrapper to interact with the web.
|
|
||||||
|
|
||||||
`Google Search <./examples/google_search.html>`_: How to use the google search wrapper to search the web.
|
|
||||||
|
|
||||||
`SerpAPI <./examples/serpapi.html>`_: How to use the SerpAPI wrapper to search the web.
|
|
||||||
|
|
||||||
`SearxNG Search API <./examples/searx_search.html>`_: Hot to use the SearxNG meta search wrapper to search the web.
|
|
||||||
|
|
||||||
`Bing Search <./examples/bing_search.html>`_: How to use the Bing search wrapper to search the web.
|
|
||||||
|
|
||||||
`Wolfram Alpha <./examples/wolfram_alpha.html>`_: How to use the Wolfram Alpha wrapper to interact with Wolfram Alpha.
|
|
||||||
|
|
||||||
|
|
||||||
.. toctree::
|
|
||||||
:maxdepth: 1
|
|
||||||
:glob:
|
|
||||||
:hidden:
|
|
||||||
|
|
||||||
./examples/*
|
|
@ -1,17 +1,30 @@
|
|||||||
How-To Guides
|
Generic Utilities
|
||||||
=============
|
=================
|
||||||
|
|
||||||
There are a lot of different utilities that LangChain provides integrations for
|
There are a lot of different utilities that LangChain provides integrations for
|
||||||
These guides go over how to use them.
|
These guides go over how to use them.
|
||||||
These can largely be grouped into two categories:
|
The utilities listed here are all generic utilities.
|
||||||
|
|
||||||
|
`Bash <./examples/bash.html>`_: How to use a bash wrapper to execute bash commands.
|
||||||
|
|
||||||
|
`Python REPL <./examples/python.html>`_: How to use a Python wrapper to execute python commands.
|
||||||
|
|
||||||
|
`Requests <./examples/requests.html>`_: How to use a requests wrapper to interact with the web.
|
||||||
|
|
||||||
|
`Google Search <./examples/google_search.html>`_: How to use the google search wrapper to search the web.
|
||||||
|
|
||||||
|
`SerpAPI <./examples/serpapi.html>`_: How to use the SerpAPI wrapper to search the web.
|
||||||
|
|
||||||
|
`SearxNG Search API <./examples/searx_search.html>`_: Hot to use the SearxNG meta search wrapper to search the web.
|
||||||
|
|
||||||
|
`Bing Search <./examples/bing_search.html>`_: How to use the Bing search wrapper to search the web.
|
||||||
|
|
||||||
|
`Wolfram Alpha <./examples/wolfram_alpha.html>`_: How to use the Wolfram Alpha wrapper to interact with Wolfram Alpha.
|
||||||
|
|
||||||
1. `Generic Utilities <./generic_how_to.html>`_: Generic utilities, including search, python REPLs, etc.
|
|
||||||
2. `Utilities for working with Documents <./combine_docs_how_to.html>`_: Utilities aimed at making it easy to work with documents (text splitting, embeddings, vectorstores, etc).
|
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
:glob:
|
:glob:
|
||||||
:hidden:
|
:hidden:
|
||||||
|
|
||||||
./generic_how_to.rst
|
./examples/*
|
||||||
./combine_docs_how_to.rst
|
|
Loading…
Reference in New Issue