pull/21623/head
Bagatur 4 weeks ago
commit 38774add64

@ -7,16 +7,7 @@ This section contains introductions to key parts of LangChain.
## Architecture
LangChain as a framework consists of several pieces. The below diagram shows how they relate.
<ThemedImage
alt="Diagram outlining the hierarchical organization of the LangChain framework, displaying the interconnected parts across multiple layers."
sources={{
light: useBaseUrl('/svg/langchain_stack.svg'),
dark: useBaseUrl('/svg/langchain_stack_dark.svg'),
}}
title="LangChain Framework Overview"
/>
LangChain as a framework consists of a number of packages.
### `langchain-core`
This package contains base abstractions of different components and ways to compose them together.
@ -24,13 +15,6 @@ The interfaces for core components like LLMs, vectorstores, retrievers and more
No third party integrations are defined here.
The dependencies are kept purposefully very lightweight.
### `langchain-community`
This package contains third party integrations that are maintained by the LangChain community.
Key partner packages are separated out (see below).
This contains all integrations for various components (LLMs, vectorstores, retrievers).
All dependencies in this package are optional to keep the package as lightweight as possible.
### Partner packages
While the long tail of integrations are in `langchain-community`, we split popular integrations into their own packages (e.g. `langchain-openai`, `langchain-anthropic`, etc).
@ -42,14 +26,21 @@ The main `langchain` package contains chains, agents, and retrieval strategies t
These are NOT third party integrations.
All chains, agents, and retrieval strategies here are NOT specific to any one integration, but rather generic across all integrations.
### [LangGraph](/docs/langgraph)
### `langchain-community`
This package contains third party integrations that are maintained by the LangChain community.
Key partner packages are separated out (see below).
This contains all integrations for various components (LLMs, vectorstores, retrievers).
All dependencies in this package are optional to keep the package as lightweight as possible.
### [`langgraph`](/docs/langgraph)
Not currently in this repo, `langgraph` is an extension of `langchain` aimed at
`langgraph` is an extension of `langchain` aimed at
building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.
LangGraph exposes high level interfaces for creating common types of agents, as well as a low-level API for constructing more contr
### [langserve](/docs/langserve)
### [`langserve`](/docs/langserve)
A package to deploy LangChain chains as REST APIs. Makes it easy to get a production ready API up and running.
@ -57,28 +48,18 @@ A package to deploy LangChain chains as REST APIs. Makes it easy to get a produc
A developer platform that lets you debug, test, evaluate, and monitor LLM applications.
## Installation
If you want to work with high level abstractions, you should install the `langchain` package.
```shell
pip install langchain
```
If you want to work with specific integrations, you will need to install them separately.
See [here](/docs/integrations/platforms/) for a list of integrations and how to install them.
For working with LangSmith, you will need to set up a LangSmith developer account [here](https://smith.langchain.com) and get an API key.
After that, you can enable it by setting environment variables:
```shell
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=ls__...
```
<ThemedImage
alt="Diagram outlining the hierarchical organization of the LangChain framework, displaying the interconnected parts across multiple layers."
sources={{
light: useBaseUrl('/svg/langchain_stack.svg'),
dark: useBaseUrl('/svg/langchain_stack_dark.svg'),
}}
title="LangChain Framework Overview"
/>
## LangChain Expression Language
## LangChain Expression Language (LCEL)
LangChain Expression Language, or LCEL, is a declarative way to easily compose chains together.
LangChain Expression Language, or LCEL, is a declarative way to chain LangChain components.
LCEL was designed from day 1 to **support putting prototypes in production, with no code changes**, from the simplest “prompt + LLM” chain to the most complex chains (weve seen folks successfully run LCEL chains with 100s of steps in production). To highlight a few of the reasons you might want to use LCEL:
**First-class streaming support**
@ -106,7 +87,7 @@ With LCEL, **all** steps are automatically logged to [LangSmith](/docs/langsmith
[**Seamless LangServe deployment**](/docs/langserve)
Any chain created with LCEL can be easily deployed using [LangServe](/docs/langserve).
### Interface
### Runnable interface
To make it as easy as possible to create custom chains, we've implemented a ["Runnable"](https://api.python.langchain.com/en/stable/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable) protocol. Many LangChain components implement the `Runnable` protocol, including chat models, LLMs, output parsers, retrievers, prompt templates, and more. There are also several useful primitives for working with runnables, which you can read about below.
@ -146,16 +127,6 @@ All runnables expose input and output **schemas** to inspect the inputs and outp
LangChain provides standard, extendable interfaces and external integrations for various components useful for building with LLMs.
Some components LangChain implements, some components we rely on third-party integrations for, and others are a mix.
### LLMs
Language models that takes a string as input and returns a string.
These are traditionally older models (newer models generally are `ChatModels`, see below).
Although the underlying models are string in, string out, the LangChain wrappers also allow these models to take messages as input.
This makes them interchangeable with ChatModels.
When messages are passed in as input, they will be formatted into a string under the hood before being passed to the underlying model.
LangChain does not provide any LLMs, rather we rely on third party integrations.
### Chat models
Language models that use a sequence of messages as inputs and return chat messages as outputs (as opposed to using plain text).
These are traditionally newer models (older models are generally `LLMs`, see above).
@ -172,45 +143,17 @@ We have some standardized parameters when constructing ChatModels:
ChatModels also accept other parameters that are specific to that integration.
### Function/Tool Calling
:::info
We use the term tool calling interchangeably with function calling. Although
function calling is sometimes meant to refer to invocations of a single function,
we treat all models as though they can return multiple tool or function calls in
each message.
:::
Tool calling allows a model to respond to a given prompt by generating output that
matches a user-defined schema. While the name implies that the model is performing
some action, this is actually not the case! The model is coming up with the
arguments to a tool, and actually running the tool (or not) is up to the user -
for example, if you want to [extract output matching some schema](/docs/tutorials/extraction)
from unstructured text, you could give the model an "extraction" tool that takes
parameters matching the desired schema, then treat the generated output as your final
result.
A tool call includes a name, arguments dict, and an optional identifier. The
arguments dict is structured `{argument_name: argument_value}`.
Many LLM providers, including [Anthropic](https://www.anthropic.com/),
[Cohere](https://cohere.com/), [Google](https://cloud.google.com/vertex-ai),
[Mistral](https://mistral.ai/), [OpenAI](https://openai.com/), and others,
support variants of a tool calling feature. These features typically allow requests
to the LLM to include available tools and their schemas, and for responses to include
calls to these tools. For instance, given a search engine tool, an LLM might handle a
query by first issuing a call to the search engine. The system calling the LLM can
receive the tool call, execute it, and return the output to the LLM to inform its
response. LangChain includes a suite of [built-in tools](/docs/integrations/tools/)
and supports several methods for defining your own [custom tools](/docs/how_to/custom_tools).
There are two main use cases for function/tool calling:
### LLMs
Language models that takes a string as input and returns a string.
These are traditionally older models (newer models generally are `ChatModels`, see below).
- [How to return structured data from an LLM](/docs/how_to/structured_output/)
- [How to use a model to call tools](/docs/how_to/tool_calling/)
Although the underlying models are string in, string out, the LangChain wrappers also allow these models to take messages as input.
This makes them interchangeable with ChatModels.
When messages are passed in as input, they will be formatted into a string under the hood before being passed to the underlying model.
LangChain does not provide any LLMs, rather we rely on third party integrations.
### Message types
### Messages
Some language models take a list of messages as input and return a message.
There are a few different types of messages.
@ -338,7 +281,7 @@ prompt_template = ChatPromptTemplate.from_messages([
])
```
### Example Selectors
### Example selectors
One common prompting technique for achieving better performance is to include examples as part of the prompt.
This gives the language model concrete examples of how it should behave.
Sometimes these examples are hardcoded into the prompt, but for more advanced situations it may be nice to dynamically select them.
@ -389,7 +332,7 @@ LangChain has lots of different types of output parsers. This is a list of outpu
| [Datetime](https://api.python.langchain.com/en/latest/output_parsers/langchain.output_parsers.datetime.DatetimeOutputParser.html#langchain.output_parsers.datetime.DatetimeOutputParser) | | ✅ | | `str` \| `Message` | `datetime.datetime` | Parses response into a datetime string. |
| [Structured](https://api.python.langchain.com/en/latest/output_parsers/langchain.output_parsers.structured.StructuredOutputParser.html#langchain.output_parsers.structured.StructuredOutputParser) | | ✅ | | `str` \| `Message` | `Dict[str, str]` | An output parser that returns structured information. It is less powerful than other output parsers since it only allows for fields to be strings. This can be useful when you are working with smaller LLMs. |
### Chat History
### Chat history
Most LLM applications have a conversational interface.
An essential component of a conversation is being able to refer to information introduced earlier in the conversation.
At bare minimum, a conversational system should be able to access some window of past messages directly.
@ -398,7 +341,7 @@ The concept of `ChatHistory` refers to a class in LangChain which can be used to
This `ChatHistory` will keep track of inputs and outputs of the underlying chain, and append them as messages to a message database
Future interactions will then load those messages and pass them into the chain as part of the input.
### Document
### Documents
A Document object in LangChain contains information about some data. It has two attributes:
@ -445,12 +388,12 @@ Embeddings create a vector representation of a piece of text. This is useful bec
The base Embeddings class in LangChain provides two methods: one for embedding documents and one for embedding a query. The former takes as input multiple texts, while the latter takes a single text. The reason for having these as two separate methods is that some embedding providers have different embedding methods for documents (to be searched over) vs queries (the search query itself).
### Vectorstores
### Vector stores
One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors,
and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query.
A vector store takes care of storing embedded data and performing vector search for you.
Vectorstores can be converted to the retriever interface by doing:
Vector stores can be converted to the retriever interface by doing:
```python
vectorstore = MyVectorStore()
@ -465,31 +408,6 @@ Retrievers can be created from vectorstores, but are also broad enough to includ
Retrievers accept a string query as input and return a list of Document's as output.
### Advanced Retrieval Types
LangChain provides several advanced retrieval types. A full list is below, along with the following information:
**Name**: Name of the retrieval algorithm.
**Index Type**: Which index type (if any) this relies on.
**Uses an LLM**: Whether this retrieval method uses an LLM.
**When to Use**: Our commentary on when you should considering using this retrieval method.
**Description**: Description of what this retrieval algorithm is doing.
| Name | Index Type | Uses an LLM | When to Use | Description |
|---------------------------|------------------------------|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Vectorstore](https://api.python.langchain.com/en/latest/vectorstores/langchain_core.vectorstores.VectorStoreRetriever.html#langchain_core.vectorstores.VectorStoreRetriever) | Vectorstore | No | If you are just getting started and looking for something quick and easy. | This is the simplest method and the one that is easiest to get started with. It involves creating embeddings for each piece of text. |
| [ParentDocument](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.parent_document_retriever.ParentDocumentRetriever.html#langchain.retrievers.parent_document_retriever.ParentDocumentRetriever) | Vectorstore + Document Store | No | If your pages have lots of smaller pieces of distinct information that are best indexed by themselves, but best retrieved all together. | This involves indexing multiple chunks for each document. Then you find the chunks that are most similar in embedding space, but you retrieve the whole parent document and return that (rather than individual chunks). |
| [Multi Vector](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.multi_vector.MultiVectorRetriever.html#langchain.retrievers.multi_vector.MultiVectorRetriever) | Vectorstore + Document Store | Sometimes during indexing | If you are able to extract information from documents that you think is more relevant to index than the text itself. | This involves creating multiple vectors for each document. Each vector could be created in a myriad of ways - examples include summaries of the text and hypothetical questions. |
| [Self Query](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.self_query.base.SelfQueryRetriever.html#langchain.retrievers.self_query.base.SelfQueryRetriever) | Vectorstore | Yes | If users are asking questions that are better answered by fetching documents based on metadata rather than similarity with the text. | This uses an LLM to transform user input into two things: (1) a string to look up semantically, (2) a metadata filer to go along with it. This is useful because oftentimes questions are about the METADATA of documents (not the content itself). |
| [Contextual Compression](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.contextual_compression.ContextualCompressionRetriever.html#langchain.retrievers.contextual_compression.ContextualCompressionRetriever) | Any | Sometimes | If you are finding that your retrieved documents contain too much irrelevant information and are distracting the LLM. | This puts a post-processing step on top of another retriever and extracts only the most relevant information from retrieved documents. This can be done with embeddings or an LLM. |
| [Time-Weighted Vectorstore](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.time_weighted_retriever.TimeWeightedVectorStoreRetriever.html#langchain.retrievers.time_weighted_retriever.TimeWeightedVectorStoreRetriever) | Vectorstore | No | If you have timestamps associated with your documents, and you want to retrieve the most recent ones | This fetches documents based on a combination of semantic similarity (as in normal vector retrieval) and recency (looking at timestamps of indexed documents) |
| [Multi-Query Retriever](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.multi_query.MultiQueryRetriever.html#langchain.retrievers.multi_query.MultiQueryRetriever) | Any | Yes | If users are asking questions that are complex and require multiple pieces of distinct information to respond | This uses an LLM to generate multiple queries from the original one. This is useful when the original query needs pieces of information about multiple topics to be properly answered. By generating multiple queries, we can then fetch documents for each of them. |
| [Ensemble](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.ensemble.EnsembleRetriever.html#langchain.retrievers.ensemble.EnsembleRetriever) | Any | No | If you have multiple retrieval methods and want to try combining them. | This fetches documents from multiple retrievers and then combines them. |
### Tools
Tools are interfaces that an agent, chain, or LLM can use to interact with the world.
They combine a few things:
@ -541,3 +459,94 @@ In order to solve that we built LangGraph to be this flexible, highly-controllab
If you are still using AgentExecutor, do not fear: we still have a guide on [how to use AgentExecutor](/docs/how_to/agent_executor).
It is recommended, however, that you start to transition to LangGraph.
In order to assist in this we have put together a [transition guide on how to do so](/docs/how_to/migrate_agent)
## Techniques
### Function/tool calling
:::info
We use the term tool calling interchangeably with function calling. Although
function calling is sometimes meant to refer to invocations of a single function,
we treat all models as though they can return multiple tool or function calls in
each message.
:::
Tool calling allows a model to respond to a given prompt by generating output that
matches a user-defined schema. While the name implies that the model is performing
some action, this is actually not the case! The model is coming up with the
arguments to a tool, and actually running the tool (or not) is up to the user -
for example, if you want to [extract output matching some schema](/docs/tutorials/extraction)
from unstructured text, you could give the model an "extraction" tool that takes
parameters matching the desired schema, then treat the generated output as your final
result.
A tool call includes a name, arguments dict, and an optional identifier. The
arguments dict is structured `{argument_name: argument_value}`.
Many LLM providers, including [Anthropic](https://www.anthropic.com/),
[Cohere](https://cohere.com/), [Google](https://cloud.google.com/vertex-ai),
[Mistral](https://mistral.ai/), [OpenAI](https://openai.com/), and others,
support variants of a tool calling feature. These features typically allow requests
to the LLM to include available tools and their schemas, and for responses to include
calls to these tools. For instance, given a search engine tool, an LLM might handle a
query by first issuing a call to the search engine. The system calling the LLM can
receive the tool call, execute it, and return the output to the LLM to inform its
response. LangChain includes a suite of [built-in tools](/docs/integrations/tools/)
and supports several methods for defining your own [custom tools](/docs/how_to/custom_tools).
There are two main use cases for function/tool calling:
- [How to return structured data from an LLM](/docs/how_to/structured_output/)
- [How to use a model to call tools](/docs/how_to/tool_calling/)
### Retrieval
LangChain provides several advanced retrieval types. A full list is below, along with the following information:
**Name**: Name of the retrieval algorithm.
**Index Type**: Which index type (if any) this relies on.
**Uses an LLM**: Whether this retrieval method uses an LLM.
**When to Use**: Our commentary on when you should considering using this retrieval method.
**Description**: Description of what this retrieval algorithm is doing.
| Name | Index Type | Uses an LLM | When to Use | Description |
|---------------------------|------------------------------|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Vectorstore](/docs/how_to/vectorstore_retriever/) | Vectorstore | No | If you are just getting started and looking for something quick and easy. | This is the simplest method and the one that is easiest to get started with. It involves creating embeddings for each piece of text. |
| [ParentDocument](/docs/how_to/parent_document_retriever/) | Vectorstore + Document Store | No | If your pages have lots of smaller pieces of distinct information that are best indexed by themselves, but best retrieved all together. | This involves indexing multiple chunks for each document. Then you find the chunks that are most similar in embedding space, but you retrieve the whole parent document and return that (rather than individual chunks). |
| [Multi Vector](/docs/how_to/multi_vector/) | Vectorstore + Document Store | Sometimes during indexing | If you are able to extract information from documents that you think is more relevant to index than the text itself. | This involves creating multiple vectors for each document. Each vector could be created in a myriad of ways - examples include summaries of the text and hypothetical questions. |
| [Self Query](/docs/how_to/self_query/) | Vectorstore | Yes | If users are asking questions that are better answered by fetching documents based on metadata rather than similarity with the text. | This uses an LLM to transform user input into two things: (1) a string to look up semantically, (2) a metadata filer to go along with it. This is useful because oftentimes questions are about the METADATA of documents (not the content itself). |
| [Contextual Compression](/docs/how_to/contextual_compression/) | Any | Sometimes | If you are finding that your retrieved documents contain too much irrelevant information and are distracting the LLM. | This puts a post-processing step on top of another retriever and extracts only the most relevant information from retrieved documents. This can be done with embeddings or an LLM. |
| [Time-Weighted Vectorstore](/docs/how_to/time_weighted_vectorstore/) | Vectorstore | No | If you have timestamps associated with your documents, and you want to retrieve the most recent ones | This fetches documents based on a combination of semantic similarity (as in normal vector retrieval) and recency (looking at timestamps of indexed documents) |
| [Multi-Query Retriever](/docs/how_to/MultiQueryRetriever/) | Any | Yes | If users are asking questions that are complex and require multiple pieces of distinct information to respond | This uses an LLM to generate multiple queries from the original one. This is useful when the original query needs pieces of information about multiple topics to be properly answered. By generating multiple queries, we can then fetch documents for each of them. |
| [Ensemble](/docs/how_to/ensemble_retriever/) | Any | No | If you have multiple retrieval methods and want to try combining them. | This fetches documents from multiple retrievers and then combines them. |
### Text splitting
LangChain offers many different types of `text splitters`.
These all live in the `langchain-text-splitters` package.
Table columns:
- **Name**: Name of the text splitter
- **Classes**: Classes that implement this text splitter
- **Splits On**: How this text splitter splits text
- **Adds Metadata**: Whether or not this text splitter adds metadata about where each chunk came from.
- **Description**: Description of the splitter, including recommendation on when to use it.
| Name | Classes | Splits On | Adds Metadata | Description |
|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Recursive | [RecursiveCharacterTextSplitter](/docs/how_to/recursive_text_splitter/), [RecursiveJsonSplitter](/docs/how_to/recursive_json_splitter/) | A list of user defined characters | | Recursively splits text. This splitting is trying to keep related pieces of text next to each other. This is the `recommended way` to start splitting text. |
| HTML | [HTMLHeaderTextSplitter](/docs/how_to/HTML_header_metadata_splitter/), [HTMLSectionSplitter](/docs/how_to/HTML_section_aware_splitter/) | HTML specific characters | ✅ | Splits text based on HTML-specific characters. Notably, this adds in relevant information about where that chunk came from (based on the HTML) |
| Markdown | [MarkdownHeaderTextSplitter](/docs/how_to/markdown_header_metadata_splitter/), | Markdown specific characters | ✅ | Splits text based on Markdown-specific characters. Notably, this adds in relevant information about where that chunk came from (based on the Markdown) |
| Code | [many languages](/docs/how_to/code_splitter/) | Code (Python, JS) specific characters | | Splits text based on characters specific to coding languages. 15 different languages are available to choose from. |
| Token | [many classes](/docs/how_to/split_by_token/) | Tokens | | Splits text on tokens. There exist a few different ways to measure tokens. |
| Character | [CharacterTextSplitter](/docs/how_to/character_text_splitter/) | A user defined character | | Splits text based on a user defined character. One of the simpler methods. |
| Semantic Chunker (Experimental) | [SemanticChunker](/docs/how_to/semantic-chunker/) | Sentences | | First splits on sentences. Then combines ones next to each other if they are semantically similar enough. Taken from [Greg Kamradt](https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb) |
| Integration: AI21 Semantic | [AI21SemanticTextSplitter](/docs/integrations/document_transformers/ai21_semantic_text_splitter/) | ✅ | Identifies distinct topics that form coherent pieces of text and splits along those. |

@ -3,7 +3,7 @@ sidebar_position: 0
sidebar_class_name: hidden
---
# How-to Guides
# How-to guides
Here youll find answers to “How do I….?” types of questions.
These guides are *goal-oriented* and *concrete*; they're meant to help you complete a specific task.
@ -15,9 +15,9 @@ For comprehensive descriptions of every class and function see [API Reference](h
This highlights functionality that is core to using LangChain.
- [Chat Models — How to: return structured data from an LLM](/docs/how_to/structured_output/)
- [Chat Models — How to: use a chat model to call tools](/docs/how_to/tool_calling/)
- [LCEL — How to: stream runnables](/docs/how_to/streaming)
- [How to: return structured data from an LLM](/docs/how_to/structured_output/)
- [How to: use a chat model to call tools](/docs/how_to/tool_calling/)
- [How to: stream runnables](/docs/how_to/streaming)
- [How to: debug your LLM apps](/docs/how_to/debugging/)
## LangChain Expression Language (LCEL)
@ -41,7 +41,7 @@ LangChain Expression Language is a way to create arbitrary custom chains. It is
These are the core building blocks you can use when building applications.
### Prompt Templates
### Prompt templates
Prompt Templates are responsible for formatting user input into a format that can be passed to a language model.
@ -50,7 +50,7 @@ Prompt Templates are responsible for formatting user input into a format that ca
- [How to: partially format prompt templates](/docs/how_to/prompts_partial)
- [How to: compose prompts together](/docs/how_to/prompts_composition)
### Example Selectors
### Example selectors
Example Selectors are responsible for selecting the correct few shot examples to pass to the prompt.
@ -60,7 +60,7 @@ Example Selectors are responsible for selecting the correct few shot examples to
- [How to: select examples by semantic ngram overlap](/docs/how_to/example_selectors_ngram)
- [How to: select examples by maximal marginal relevance](/docs/how_to/example_selectors_mmr)
### Chat Models
### Chat models
Chat Models are newer forms of language models that take messages in and output a message.
@ -83,7 +83,7 @@ What LangChain calls LLMs are older forms of language models that take a string
- [How to: track token usage](/docs/how_to/llm_token_usage_tracking)
- [How to: work with local LLMs](/docs/how_to/local_llms)
### Output Parsers
### Output parsers
Output Parsers are responsible for taking the output of an LLM and parsing into more structured format.
@ -95,7 +95,7 @@ Output Parsers are responsible for taking the output of an LLM and parsing into
- [How to: try to fix errors in output parsing](/docs/how_to/output_parser_fixing)
- [How to: write a custom output parser class](/docs/how_to/output_parser_custom)
### Document Loaders
### Document loaders
Document Loaders are responsible for loading documents from a variety of sources.
@ -108,7 +108,7 @@ Document Loaders are responsible for loading documents from a variety of sources
- [How to: load PDF files](/docs/how_to/document_loader_pdf)
- [How to: write a custom document loader](/docs/how_to/document_loader_custom)
### Text Splitters
### Text splitters
Text Splitters take a document and split into chunks that can be used for retrieval.
@ -122,16 +122,16 @@ Text Splitters take a document and split into chunks that can be used for retrie
- [How to: split text into semantic chunks](/docs/how_to/semantic-chunker)
- [How to: split by tokens](/docs/how_to/split_by_token)
### Embedding Models
### Embedding models
Embedding Models take a piece of text and create a numerical representation of it.
- [How to: embed text data](/docs/how_to/embed_text)
- [How to: cache embedding results](/docs/how_to/caching_embeddings)
### Vector Stores
### Vector stores
Vector Stores are databases that can efficiently store and retrieve embeddings.
Vector stores are databases that can efficiently store and retrieve embeddings.
- [How to: use a vector store to retrieve data](/docs/how_to/vectorstores)
@ -194,7 +194,8 @@ All of LangChain components can easily be extended to support your own versions.
- [How to: write a custom output parser class](/docs/how_to/output_parser_custom)
- [How to: define a custom tool](/docs/how_to/custom_tools)
## Use Cases
## Use cases
These guides cover use-case specific details.
@ -225,7 +226,7 @@ Chatbots involve using an LLM to have a conversation.
- [How to: do retrieval](/docs/how_to/chatbots_retrieval)
- [How to: use tools](/docs/how_to/chatbots_tools)
### Query Analysis
### Query analysis
Query Analysis is the task of using an LLM to generate a query to send to a retriever.
@ -245,7 +246,7 @@ You can use LLMs to do question answering over tabular data.
- [How to: deal with large databases](/docs/how_to/sql_large_db)
- [How to: deal with CSV files](/docs/how_to/sql_csv)
### Q&A over Graph Databases
### Q&A over graph databases
You can use an LLM to do question answering over graph databases.

@ -54,13 +54,13 @@ These are the best ones to get started with:
Explore the full list of tutorials [here](/docs/tutorials).
## [How-To Guides](/docs/how_to)
## [How-to guides](/docs/how_to)
[Here](/docs/how_to) youll find short answers to “How do I….?” types of questions.
These how-to guides dont cover topics in depth youll find that material in the [Tutorials](/docs/tutorials) and the [API Reference](https://api.python.langchain.com/en/latest/).
However, these guides will help you quickly accomplish common tasks.
## [Conceptual Guide](/docs/concepts)
## [Conceptual guide](/docs/concepts)
Introductions to all the key parts of LangChain youll need to know! [Here](/docs/concepts) you'll find high level explanations of all LangChain concepts.

@ -2,7 +2,7 @@
LangChain has a large ecosystem of integrations with various external resources like local and remote file systems, APIs and databases. These integrations allow developers to create versatile applications that combine the power of LLMs with the ability to access, interact with and manipulate external resources.
## Best Practices
## Best practices
When building such applications developers should remember to follow good security practices:
@ -25,6 +25,6 @@ If you're building applications that access external resources like file systems
or databases, consider speaking with your company's security team to determine how to best
design and secure your applications.
## Reporting a Vulnerability
## Reporting a vulnerability
Please report security vulnerabilities by email to security@langchain.dev. This will ensure the issue is promptly triaged and acted upon as needed.

@ -3,7 +3,7 @@ sidebar_position: 0
sidebar_label: Overview
---
# LangChain Over Time
# LangChain over time
## Whats new in LangChain?
@ -45,7 +45,7 @@ This document serves to outline at a high level what has changed and why.
- `langchain` was split into the following component packages: `langchain-core`, `langchain`, `langchain-community`, `langchain-[partner]` to improve the usability of langchain code in production settings. You can read more about it on our [blog](https://blog.langchain.dev/langchain-v0-1-0/).
### Ecosystem Organization
### Ecosystem organization
By the release of 0.1.0, LangChain had grown to a large ecosystem with many integrations and a large community.

@ -3,7 +3,7 @@ sidebar_position: 3
sidebar_label: Packages
---
# 📕 Package Versioning
# 📕 Package versioning
As of now, LangChain has an ad hoc release process: releases are cut with high frequency by
a maintainer and published to [PyPI](https://pypi.org/).

@ -3,7 +3,7 @@ sidebar_position: 2
sidebar_label: Release Policy
---
# LangChain Releases
# LangChain releases
The LangChain ecosystem is composed of different component packages (e.g., `langchain-core`, `langchain`, `langchain-community`, `langgraph`, `langserve`, partner packages etc.)
@ -32,13 +32,13 @@ From time to time, we will version packages as **release candidates**. These are
Other packages in the ecosystem (including user packages) can follow a different versioning scheme, but are generally expected to pin to specific minor versions of `langchain` and `langchain-core`.
## Release Cadence
## Release cadence
We expect to space out **minor** releases (e.g., from 0.2.0 to 0.3.0) of `langchain` and `langchain-core` by at least 2-3 months, as such releases may contain breaking changes.
Patch versions are released frequently as they contain bug fixes and new features.
## API Stability
## API stability
The development of LLM applications is a rapidly evolving field, and we are constantly learning from our users and the community. As such, we expect that the APIs in `langchain` and `langchain-core` will continue to evolve to better serve the needs of our users.
@ -49,14 +49,14 @@ Even though both `langchain` and `langchain-core` are currently in a pre-1.0 sta
We will generally try to avoid making unnecessary changes, and will provide a deprecation policy for features that are being removed.
### Stability of Other Packages
### Stability of other packages
The stability of other packages in the LangChain ecosystem may vary:
- `langchain-community` is a community maintained package that contains 3rd party integrations. While we do our best to review and test changes in `langchain-community`, `langchain-community` is expected to experience more breaking changes than `langchain` and `langchain-core` as it contains many community contributions.
- Partner packages may follow different stability and versioning policies, and users should refer to the documentation of those packages for more information; however, in general these packages are expected to be stable.
### What is a "API Stability"?
### What is a "API stability"?
API stability means:
@ -72,7 +72,7 @@ Certain APIs are explicitly marked as “internal” in a couple of ways:
- Functions, methods, and other objects prefixed by a leading underscore (**`_`**). This is the standard Python convention of indicating that something is private; if any method starts with a single **`_`**, its an internal API.
- **Exception:** Certain methods are prefixed with `_` , but do not contain an implementation. These methods are *meant* to be overridden by sub-classes that provide the implementation. Such methods are generally part of the **Public API** of LangChain.
## Deprecation Policy
## Deprecation policy
We will generally avoid deprecating features until a better alternative is available.

@ -41,7 +41,7 @@ Here is an example of the import changes that the migration script can help appl
| langchain | langchain-text-splitters | from langchain.text_splitter import RecursiveCharacterTextSplitter | from langchain_text_splitters import RecursiveCharacterTextSplitter |
#### Deprecation Timeline
#### Deprecation timeline
We have two main types of deprecations:
@ -102,7 +102,7 @@ langchain-cli migrate [path to code] --diff # Preview
langchain-cli migrate [path to code] # Apply
```
#### Other Options
#### Other options
```bash
# See help menu
@ -114,11 +114,11 @@ langchain-cli migrate --diff [path to code]
langchain-cli migrate --disable langchain_to_core --include-ipynb [path to code]
```
## Deprecations and Breaking Changes
## Deprecations and breaking changes
This code contains a list of deprecations and removals in the `langchain` and `langchain-core` packages.
### Breaking Changes in 0.2.0
### Breaking changes in 0.2.0
As of release 0.2.0, `langchain` is required to be integration-agnostic. This means that code in `langchain` should not by default instantiate any specific chat models, llms, embedding models, vectorstores etc; instead, the user will be required to specify those explicitly.

@ -42,7 +42,7 @@ module.exports = {
{
type: "category",
link: {type: 'doc', id: 'how_to/index'},
label: "How-To Guides",
label: "How-to guides",
collapsible: false,
items: [{
type: 'autogenerated',

@ -10,6 +10,7 @@ from langchain_core.documents import Document
class TypeOption(str, Enum):
FACTS = "facts"
ENTITIES = "entities"
SENTIMENT = "sentiment"
def format_property_key(s: str) -> str:
@ -148,6 +149,8 @@ class DiffbotGraphTransformer:
include_evidence: bool = True,
simplified_schema: bool = True,
extract_types: List[TypeOption] = [TypeOption.FACTS],
*,
include_confidence: bool = False,
) -> None:
"""
Initialize the graph transformer with various options.
@ -165,10 +168,12 @@ class DiffbotGraphTransformer:
simplified_schema (bool):
Whether to use a simplified schema for relationships.
extract_types (List[TypeOption]):
A list of data types to extract. Only facts or entities
are supported. By default, the option is set to facts.
A fact represents a combination of source and target
nodes with a relationship type.
A list of data types to extract. Facts, entities, and
sentiment are supported. By default, the option is
set to facts. A fact represents a combination of
source and target nodes with a relationship type.
include_confidence (bool):
Whether to include confidence scores on nodes and rels
"""
self.diffbot_api_key = diffbot_api_key or get_from_env(
"diffbot_api_key", "DIFFBOT_API_KEY"
@ -176,6 +181,7 @@ class DiffbotGraphTransformer:
self.fact_threshold_confidence = fact_confidence_threshold
self.include_qualifiers = include_qualifiers
self.include_evidence = include_evidence
self.include_confidence = include_confidence
self.simplified_schema = None
if simplified_schema:
self.simplified_schema = SimplifiedSchema()
@ -250,6 +256,17 @@ class DiffbotGraphTransformer:
nodes_list.add_node_property(
(source_id, source_label), {"name": source_name}
)
if record.get("sentiment") is not None:
nodes_list.add_node_property(
(source_id, source_label),
{"sentiment": record.get("sentiment")},
)
if self.include_confidence:
nodes_list.add_node_property(
(source_id, source_label),
{"confidence": record.get("confidence")},
)
relationships = list()
# Relationships are a list because we don't deduplicate nor anything else
if "facts" in payload and payload["facts"]:
@ -307,6 +324,8 @@ class DiffbotGraphTransformer:
][0]
if self.include_evidence:
rel_properties.update({"evidence": relationship_evidence})
if self.include_confidence:
rel_properties.update({"confidence": record["confidence"]})
if self.include_qualifiers and record.get("qualifiers"):
for property in record["qualifiers"]:
prop_key = format_property_key(property["property"]["name"])

Loading…
Cancel
Save