fmt

4 weeks ago · 38774add64
parent 5297e7aca6 b514a479c0
commit 38774add64
10 changed files with 184 additions and 155 deletions
--- a/docs/docs/concepts.mdx
+++ b/docs/docs/concepts.mdx
@ -7,16 +7,7 @@ This section contains introductions to key parts of LangChain.

 ## Architecture

-LangChain as a framework consists of several pieces. The below diagram shows how they relate.
-
-<ThemedImage
-  alt="Diagram outlining the hierarchical organization of the LangChain framework, displaying the interconnected parts across multiple layers."
-  sources={{
-    light: useBaseUrl('/svg/langchain_stack.svg'),
-    dark: useBaseUrl('/svg/langchain_stack_dark.svg'),
-  }}
-  title="LangChain Framework Overview"
-/>
+LangChain as a framework consists of a number of packages.

 ### `langchain-core`
 This package contains base abstractions of different components and ways to compose them together.
@ -24,13 +15,6 @@ The interfaces for core components like LLMs, vectorstores, retrievers and more
 No third party integrations are defined here.
 The dependencies are kept purposefully very lightweight.

-### `langchain-community`
-
-This package contains third party integrations that are maintained by the LangChain community.
-Key partner packages are separated out (see below).
-This contains all integrations for various components (LLMs, vectorstores, retrievers).
-All dependencies in this package are optional to keep the package as lightweight as possible.
-
 ### Partner packages

 While the long tail of integrations are in `langchain-community`, we split popular integrations into their own packages (e.g. `langchain-openai`, `langchain-anthropic`, etc).
@ -42,14 +26,21 @@ The main `langchain` package contains chains, agents, and retrieval strategies t
 These are NOT third party integrations.
 All chains, agents, and retrieval strategies here are NOT specific to any one integration, but rather generic across all integrations.

-### [LangGraph](/docs/langgraph)
+### `langchain-community`
+
+This package contains third party integrations that are maintained by the LangChain community.
+Key partner packages are separated out (see below).
+This contains all integrations for various components (LLMs, vectorstores, retrievers).
+All dependencies in this package are optional to keep the package as lightweight as possible.
+
+### [`langgraph`](/docs/langgraph)

-Not currently in this repo, `langgraph` is an extension of `langchain` aimed at
+`langgraph` is an extension of `langchain` aimed at
 building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.

 LangGraph exposes high level interfaces for creating common types of agents, as well as a low-level API for constructing more contr

-### [langserve](/docs/langserve)
+### [`langserve`](/docs/langserve)

 A package to deploy LangChain chains as REST APIs. Makes it easy to get a production ready API up and running.

@ -57,28 +48,18 @@ A package to deploy LangChain chains as REST APIs. Makes it easy to get a produc

 A developer platform that lets you debug, test, evaluate, and monitor LLM applications.

-## Installation
-
-If you want to work with high level abstractions, you should install the `langchain` package.
-
-```shell
-pip install langchain
-```
-
-If you want to work with specific integrations, you will need to install them separately.
-See [here](/docs/integrations/platforms/) for a list of integrations and how to install them.
-
-For working with LangSmith, you will need to set up a LangSmith developer account [here](https://smith.langchain.com) and get an API key.
-After that, you can enable it by setting environment variables:
-
-```shell
-export LANGCHAIN_TRACING_V2=true
-export LANGCHAIN_API_KEY=ls__...
-```
+<ThemedImage
+  alt="Diagram outlining the hierarchical organization of the LangChain framework, displaying the interconnected parts across multiple layers."
+  sources={{
+    light: useBaseUrl('/svg/langchain_stack.svg'),
+    dark: useBaseUrl('/svg/langchain_stack_dark.svg'),
+  }}
+  title="LangChain Framework Overview"
+/>

-## LangChain Expression Language
+## LangChain Expression Language (LCEL)

-LangChain Expression Language, or LCEL, is a declarative way to easily compose chains together.
+LangChain Expression Language, or LCEL, is a declarative way to chain LangChain components.
 LCEL was designed from day 1 to **support putting prototypes in production, with no code changes**, from the simplest “prompt + LLM” chain to the most complex chains (we’ve seen folks successfully run LCEL chains with 100s of steps in production). To highlight a few of the reasons you might want to use LCEL:

 **First-class streaming support**
@ -106,7 +87,7 @@ With LCEL, **all** steps are automatically logged to [LangSmith](/docs/langsmith
 [**Seamless LangServe deployment**](/docs/langserve)
 Any chain created with LCEL can be easily deployed using [LangServe](/docs/langserve).

-### Interface
+### Runnable interface

 To make it as easy as possible to create custom chains, we've implemented a ["Runnable"](https://api.python.langchain.com/en/stable/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable) protocol. Many LangChain components implement the `Runnable` protocol, including chat models, LLMs, output parsers, retrievers, prompt templates, and more. There are also several useful primitives for working with runnables, which you can read about below.

@ -146,16 +127,6 @@ All runnables expose input and output **schemas** to inspect the inputs and outp
 LangChain provides standard, extendable interfaces and external integrations for various components useful for building with LLMs.
 Some components LangChain implements, some components we rely on third-party integrations for, and others are a mix.

-### LLMs
-Language models that takes a string as input and returns a string.
-These are traditionally older models (newer models generally are `ChatModels`, see below).
-
-Although the underlying models are string in, string out, the LangChain wrappers also allow these models to take messages as input.
-This makes them interchangeable with ChatModels.
-When messages are passed in as input, they will be formatted into a string under the hood before being passed to the underlying model.
-
-LangChain does not provide any LLMs, rather we rely on third party integrations.
-
 ### Chat models
 Language models that use a sequence of messages as inputs and return chat messages as outputs (as opposed to using plain text).
 These are traditionally newer models (older models are generally `LLMs`, see above).
@ -172,45 +143,17 @@ We have some standardized parameters when constructing ChatModels:

 ChatModels also accept other parameters that are specific to that integration.

-### Function/Tool Calling
-
-:::info
-We use the term tool calling interchangeably with function calling. Although
-function calling is sometimes meant to refer to invocations of a single function,
-we treat all models as though they can return multiple tool or function calls in
-each message.
-:::
-
-Tool calling allows a model to respond to a given prompt by generating output that
-matches a user-defined schema. While the name implies that the model is performing
-some action, this is actually not the case! The model is coming up with the
-arguments to a tool, and actually running the tool (or not) is up to the user -
-for example, if you want to [extract output matching some schema](/docs/tutorials/extraction)
-from unstructured text, you could give the model an "extraction" tool that takes
-parameters matching the desired schema, then treat the generated output as your final
-result.
-
-A tool call includes a name, arguments dict, and an optional identifier. The
-arguments dict is structured `{argument_name: argument_value}`.
-
-Many LLM providers, including [Anthropic](https://www.anthropic.com/),
-[Cohere](https://cohere.com/), [Google](https://cloud.google.com/vertex-ai),
-[Mistral](https://mistral.ai/), [OpenAI](https://openai.com/), and others,
-support variants of a tool calling feature. These features typically allow requests
-to the LLM to include available tools and their schemas, and for responses to include
-calls to these tools. For instance, given a search engine tool, an LLM might handle a
-query by first issuing a call to the search engine. The system calling the LLM can
-receive the tool call, execute it, and return the output to the LLM to inform its
-response. LangChain includes a suite of [built-in tools](/docs/integrations/tools/)
-and supports several methods for defining your own [custom tools](/docs/how_to/custom_tools).
-
-There are two main use cases for function/tool calling:
+### LLMs
+Language models that takes a string as input and returns a string.
+These are traditionally older models (newer models generally are `ChatModels`, see below).

- [How to return structured data from an LLM](/docs/how_to/structured_output/)
- [How to use a model to call tools](/docs/how_to/tool_calling/)
+Although the underlying models are string in, string out, the LangChain wrappers also allow these models to take messages as input.
+This makes them interchangeable with ChatModels.
+When messages are passed in as input, they will be formatted into a string under the hood before being passed to the underlying model.

+LangChain does not provide any LLMs, rather we rely on third party integrations.

-### Message types
+### Messages

 Some language models take a list of messages as input and return a message.
 There are a few different types of messages.
@ -338,7 +281,7 @@ prompt_template = ChatPromptTemplate.from_messages([
 ])
 ```

-### Example Selectors
+### Example selectors
 One common prompting technique for achieving better performance is to include examples as part of the prompt.
 This gives the language model concrete examples of how it should behave.
 Sometimes these examples are hardcoded into the prompt, but for more advanced situations it may be nice to dynamically select them.
@ -389,7 +332,7 @@ LangChain has lots of different types of output parsers. This is a list of outpu
 | [Datetime](https://api.python.langchain.com/en/latest/output_parsers/langchain.output_parsers.datetime.DatetimeOutputParser.html#langchain.output_parsers.datetime.DatetimeOutputParser)        |                    | ✅                             |           | `str` \| `Message`                 | `datetime.datetime`  | Parses response into a datetime string.                                                                                                                                                                                                                  |
 | [Structured](https://api.python.langchain.com/en/latest/output_parsers/langchain.output_parsers.structured.StructuredOutputParser.html#langchain.output_parsers.structured.StructuredOutputParser)      |                    | ✅                             |           | `str` \| `Message`                 | `Dict[str, str]`     | An output parser that returns structured information. It is less powerful than other output parsers since it only allows for fields to be strings. This can be useful when you are working with smaller LLMs.                                            |

-### Chat History
+### Chat history
 Most LLM applications have a conversational interface.
 An essential component of a conversation is being able to refer to information introduced earlier in the conversation.
 At bare minimum, a conversational system should be able to access some window of past messages directly.
@ -398,7 +341,7 @@ The concept of `ChatHistory` refers to a class in LangChain which can be used to
 This `ChatHistory` will keep track of inputs and outputs of the underlying chain, and append them as messages to a message database
 Future interactions will then load those messages and pass them into the chain as part of the input.

-### Document
+### Documents

 A Document object in LangChain contains information about some data. It has two attributes:

@ -445,12 +388,12 @@ Embeddings create a vector representation of a piece of text. This is useful bec

 The base Embeddings class in LangChain provides two methods: one for embedding documents and one for embedding a query. The former takes as input multiple texts, while the latter takes a single text. The reason for having these as two separate methods is that some embedding providers have different embedding methods for documents (to be searched over) vs queries (the search query itself).

-### Vectorstores
+### Vector stores
 One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors,
 and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query.
 A vector store takes care of storing embedded data and performing vector search for you.

-Vectorstores can be converted to the retriever interface by doing:
+Vector stores can be converted to the retriever interface by doing:

 ```python
 vectorstore = MyVectorStore()
@ -465,31 +408,6 @@ Retrievers can be created from vectorstores, but are also broad enough to includ

 Retrievers accept a string query as input and return a list of Document's as output.

-### Advanced Retrieval Types
-
-LangChain provides several advanced retrieval types. A full list is below, along with the following information:
-
-**Name**: Name of the retrieval algorithm.
-
-**Index Type**: Which index type (if any) this relies on.
-
-**Uses an LLM**: Whether this retrieval method uses an LLM.
-
-**When to Use**: Our commentary on when you should considering using this retrieval method.
-
-**Description**: Description of what this retrieval algorithm is doing.
-
-| Name                      | Index Type                   | Uses an LLM               | When to Use                                                                                                                                   | Description                                                                                                                                                                                                                                                                                      |
-|---------------------------|------------------------------|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| [Vectorstore](https://api.python.langchain.com/en/latest/vectorstores/langchain_core.vectorstores.VectorStoreRetriever.html#langchain_core.vectorstores.VectorStoreRetriever)               | Vectorstore                  | No                        | If you are just getting started and looking for something quick and easy.                                                                     | This is the simplest method and the one that is easiest to get started with. It involves creating embeddings for each piece of text.                                                                                                                                                             |
-| [ParentDocument](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.parent_document_retriever.ParentDocumentRetriever.html#langchain.retrievers.parent_document_retriever.ParentDocumentRetriever)            | Vectorstore + Document Store | No                        | If your pages have lots of smaller pieces of distinct information that are best indexed by themselves, but best retrieved all together.       | This involves indexing multiple chunks for each document. Then you find the chunks that are most similar in embedding space, but you retrieve the whole parent document and return that (rather than individual chunks).                                                                         |
-| [Multi Vector](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.multi_vector.MultiVectorRetriever.html#langchain.retrievers.multi_vector.MultiVectorRetriever)              | Vectorstore + Document Store | Sometimes during indexing | If you are able to extract information from documents that you think is more relevant to index than the text itself.                          | This involves creating multiple vectors for each document. Each vector could be created in a myriad of ways - examples include summaries of the text and hypothetical questions.                                                                                                                 |
-| [Self Query](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.self_query.base.SelfQueryRetriever.html#langchain.retrievers.self_query.base.SelfQueryRetriever)               | Vectorstore                  | Yes                       | If users are asking questions that are better answered by fetching documents based on metadata rather than similarity with the text.          | This uses an LLM to transform user input into two things: (1) a string to look up semantically, (2) a metadata filer to go along with it. This is useful because oftentimes questions are about the METADATA of documents (not the content itself).                                              |
-| [Contextual Compression](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.contextual_compression.ContextualCompressionRetriever.html#langchain.retrievers.contextual_compression.ContextualCompressionRetriever)    | Any                          | Sometimes                 | If you are finding that your retrieved documents contain too much irrelevant information and are distracting the LLM.                         | This puts a post-processing step on top of another retriever and extracts only the most relevant information from retrieved documents. This can be done with embeddings or an LLM.                                                                                                               |
-| [Time-Weighted Vectorstore](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.time_weighted_retriever.TimeWeightedVectorStoreRetriever.html#langchain.retrievers.time_weighted_retriever.TimeWeightedVectorStoreRetriever) | Vectorstore                  | No                        | If you have timestamps associated with your documents, and you want to retrieve the most recent ones                                          | This fetches documents based on a combination of semantic similarity (as in normal vector retrieval) and recency (looking at timestamps of indexed documents)                                                                                                                                    |
-| [Multi-Query Retriever](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.multi_query.MultiQueryRetriever.html#langchain.retrievers.multi_query.MultiQueryRetriever)     | Any                          | Yes                       | If users are asking questions that are complex and require multiple pieces of distinct information to respond                                 | This uses an LLM to generate multiple queries from the original one. This is useful when the original query needs pieces of information about multiple topics to be properly answered. By generating multiple queries, we can then fetch documents for each of them.                             |
-| [Ensemble](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.ensemble.EnsembleRetriever.html#langchain.retrievers.ensemble.EnsembleRetriever)                  | Any                          | No                        | If you have multiple retrieval methods and want to try combining them.                                                                        | This fetches documents from multiple retrievers and then combines them.                                                                                                                                                                                                                          |
-
 ### Tools
 Tools are interfaces that an agent, chain, or LLM can use to interact with the world.
 They combine a few things:
@ -541,3 +459,94 @@ In order to solve that we built LangGraph to be this flexible, highly-controllab
 If you are still using AgentExecutor, do not fear: we still have a guide on [how to use AgentExecutor](/docs/how_to/agent_executor).
 It is recommended, however, that you start to transition to LangGraph.
 In order to assist in this we have put together a [transition guide on how to do so](/docs/how_to/migrate_agent)
+
+## Techniques
+
+### Function/tool calling
+
+:::info
+We use the term tool calling interchangeably with function calling. Although
+function calling is sometimes meant to refer to invocations of a single function,
+we treat all models as though they can return multiple tool or function calls in
+each message.
+:::
+
+Tool calling allows a model to respond to a given prompt by generating output that
+matches a user-defined schema. While the name implies that the model is performing
+some action, this is actually not the case! The model is coming up with the
+arguments to a tool, and actually running the tool (or not) is up to the user -
+for example, if you want to [extract output matching some schema](/docs/tutorials/extraction)
+from unstructured text, you could give the model an "extraction" tool that takes
+parameters matching the desired schema, then treat the generated output as your final
+result.
+
+A tool call includes a name, arguments dict, and an optional identifier. The
+arguments dict is structured `{argument_name: argument_value}`.
+
+Many LLM providers, including [Anthropic](https://www.anthropic.com/),
+[Cohere](https://cohere.com/), [Google](https://cloud.google.com/vertex-ai),
+[Mistral](https://mistral.ai/), [OpenAI](https://openai.com/), and others,
+support variants of a tool calling feature. These features typically allow requests
+to the LLM to include available tools and their schemas, and for responses to include
+calls to these tools. For instance, given a search engine tool, an LLM might handle a
+query by first issuing a call to the search engine. The system calling the LLM can
+receive the tool call, execute it, and return the output to the LLM to inform its
+response. LangChain includes a suite of [built-in tools](/docs/integrations/tools/)
+and supports several methods for defining your own [custom tools](/docs/how_to/custom_tools).
+
+There are two main use cases for function/tool calling:
+
+- [How to return structured data from an LLM](/docs/how_to/structured_output/)
+- [How to use a model to call tools](/docs/how_to/tool_calling/)
+
+
+### Retrieval
+
+LangChain provides several advanced retrieval types. A full list is below, along with the following information:
+
+**Name**: Name of the retrieval algorithm.
+
+**Index Type**: Which index type (if any) this relies on.
+
+**Uses an LLM**: Whether this retrieval method uses an LLM.
+
+**When to Use**: Our commentary on when you should considering using this retrieval method.
+
+**Description**: Description of what this retrieval algorithm is doing.
+
+| Name                      | Index Type                   | Uses an LLM               | When to Use                                                                                                                                   | Description                                                                                                                                                                                                                                                                                      |
+|---------------------------|------------------------------|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [Vectorstore](/docs/how_to/vectorstore_retriever/)               | Vectorstore                  | No                        | If you are just getting started and looking for something quick and easy.                                                                     | This is the simplest method and the one that is easiest to get started with. It involves creating embeddings for each piece of text.                                                                                                                                                             |
+| [ParentDocument](/docs/how_to/parent_document_retriever/)            | Vectorstore + Document Store | No                        | If your pages have lots of smaller pieces of distinct information that are best indexed by themselves, but best retrieved all together.       | This involves indexing multiple chunks for each document. Then you find the chunks that are most similar in embedding space, but you retrieve the whole parent document and return that (rather than individual chunks).                                                                         |
+| [Multi Vector](/docs/how_to/multi_vector/)              | Vectorstore + Document Store | Sometimes during indexing | If you are able to extract information from documents that you think is more relevant to index than the text itself.                          | This involves creating multiple vectors for each document. Each vector could be created in a myriad of ways - examples include summaries of the text and hypothetical questions.                                                                                                                 |
+| [Self Query](/docs/how_to/self_query/)               | Vectorstore                  | Yes                       | If users are asking questions that are better answered by fetching documents based on metadata rather than similarity with the text.          | This uses an LLM to transform user input into two things: (1) a string to look up semantically, (2) a metadata filer to go along with it. This is useful because oftentimes questions are about the METADATA of documents (not the content itself).                                              |
+| [Contextual Compression](/docs/how_to/contextual_compression/)    | Any                          | Sometimes                 | If you are finding that your retrieved documents contain too much irrelevant information and are distracting the LLM.                         | This puts a post-processing step on top of another retriever and extracts only the most relevant information from retrieved documents. This can be done with embeddings or an LLM.                                                                                                               |
+| [Time-Weighted Vectorstore](/docs/how_to/time_weighted_vectorstore/) | Vectorstore                  | No                        | If you have timestamps associated with your documents, and you want to retrieve the most recent ones                                          | This fetches documents based on a combination of semantic similarity (as in normal vector retrieval) and recency (looking at timestamps of indexed documents)                                                                                                                                    |
+| [Multi-Query Retriever](/docs/how_to/MultiQueryRetriever/)     | Any                          | Yes                       | If users are asking questions that are complex and require multiple pieces of distinct information to respond                                 | This uses an LLM to generate multiple queries from the original one. This is useful when the original query needs pieces of information about multiple topics to be properly answered. By generating multiple queries, we can then fetch documents for each of them.                             |
+| [Ensemble](/docs/how_to/ensemble_retriever/)                  | Any                          | No                        | If you have multiple retrieval methods and want to try combining them.                                                                        | This fetches documents from multiple retrievers and then combines them.                                                                                                                                                                                                                          |
+
+
+### Text splitting
+
+LangChain offers many different types of `text splitters`.
+These all live in the `langchain-text-splitters` package.
+
+Table columns:
+
+- **Name**: Name of the text splitter
+- **Classes**: Classes that implement this text splitter
+- **Splits On**: How this text splitter splits text
+- **Adds Metadata**: Whether or not this text splitter adds metadata about where each chunk came from.
+- **Description**: Description of the splitter, including recommendation on when to use it.
+
+
+| Name     | Classes                                                                                                                                                                                                             | Splits On                                                   | Adds Metadata | Description                                                                                                                                                                                                                                                                  |
+|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Recursive | [RecursiveCharacterTextSplitter](/docs/how_to/recursive_text_splitter/), [RecursiveJsonSplitter](/docs/how_to/recursive_json_splitter/) | A list of user defined characters     |               | Recursively splits text. This splitting is trying to keep related pieces of text next to each other. This is the `recommended way` to start splitting text.                                                                                                                    |
+| HTML      | [HTMLHeaderTextSplitter](/docs/how_to/HTML_header_metadata_splitter/), [HTMLSectionSplitter](/docs/how_to/HTML_section_aware_splitter/)          | HTML specific characters                                                                                                 | ✅             | Splits text based on HTML-specific characters. Notably, this adds in relevant information about where that chunk came from (based on the HTML)                                                                                                                               |
+| Markdown  | [MarkdownHeaderTextSplitter](/docs/how_to/markdown_header_metadata_splitter/),                                                                                                           | Markdown specific characters                                                                                    | ✅             | Splits text based on Markdown-specific characters. Notably, this adds in relevant information about where that chunk came from (based on the Markdown)                                                                                                                       |
+| Code      | [many languages](/docs/how_to/code_splitter/)                                                                                                                                 | Code (Python, JS) specific characters                                                                           |               | Splits text based on characters specific to coding languages. 15 different languages are available to choose from.                                                                                                                                                           |
+| Token    | [many classes](/docs/how_to/split_by_token/)                                                                                                                                  | Tokens                                                                                                          |               | Splits text on tokens. There exist a few different ways to measure tokens.                                                                                                                                                                                                   |
+| Character  | [CharacterTextSplitter](/docs/how_to/character_text_splitter/)                                                                                                                | A user defined character                                                                                        |               | Splits text based on a user defined character. One of the simpler methods.                                                                                                                                                                                                   |
+| Semantic Chunker (Experimental) | [SemanticChunker](/docs/how_to/semantic-chunker/)                                                                                                                             | Sentences                                                                                                       |               | First splits on sentences. Then combines ones next to each other if they are semantically similar enough. Taken from [Greg Kamradt](https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb) |
+| Integration: AI21 Semantic | [AI21SemanticTextSplitter](/docs/integrations/document_transformers/ai21_semantic_text_splitter/)                                                                                                                    |    ✅           | Identifies distinct topics that form coherent pieces of text and splits along those.                                                                                                                                                                                         |
--- a/docs/docs/how_to/index.mdx
+++ b/docs/docs/how_to/index.mdx
@ -3,7 +3,7 @@ sidebar_position: 0
 sidebar_class_name: hidden
 ---

-# How-to Guides
+# How-to guides

 Here you’ll find answers to “How do I….?” types of questions.
 These guides are *goal-oriented* and *concrete*; they're meant to help you complete a specific task.
@ -15,9 +15,9 @@ For comprehensive descriptions of every class and function see [API Reference](h

 This highlights functionality that is core to using LangChain.

- [Chat Models — How to: return structured data from an LLM](/docs/how_to/structured_output/)
- [Chat Models — How to: use a chat model to call tools](/docs/how_to/tool_calling/)
- [LCEL — How to: stream runnables](/docs/how_to/streaming)
+- [How to: return structured data from an LLM](/docs/how_to/structured_output/)
+- [How to: use a chat model to call tools](/docs/how_to/tool_calling/)
+- [How to: stream runnables](/docs/how_to/streaming)
 - [How to: debug your LLM apps](/docs/how_to/debugging/)

 ## LangChain Expression Language (LCEL)
@ -41,7 +41,7 @@ LangChain Expression Language is a way to create arbitrary custom chains. It is

 These are the core building blocks you can use when building applications.

-### Prompt Templates
+### Prompt templates

 Prompt Templates are responsible for formatting user input into a format that can be passed to a language model.

@ -50,7 +50,7 @@ Prompt Templates are responsible for formatting user input into a format that ca
 - [How to: partially format prompt templates](/docs/how_to/prompts_partial)
 - [How to: compose prompts together](/docs/how_to/prompts_composition)

-### Example Selectors
+### Example selectors

 Example Selectors are responsible for selecting the correct few shot examples to pass to the prompt.

@ -60,7 +60,7 @@ Example Selectors are responsible for selecting the correct few shot examples to
 - [How to: select examples by semantic ngram overlap](/docs/how_to/example_selectors_ngram)
 - [How to: select examples by maximal marginal relevance](/docs/how_to/example_selectors_mmr)

-### Chat Models
+### Chat models

 Chat Models are newer forms of language models that take messages in and output a message.

@ -83,7 +83,7 @@ What LangChain calls LLMs are older forms of language models that take a string
 - [How to: track token usage](/docs/how_to/llm_token_usage_tracking)
 - [How to: work with local LLMs](/docs/how_to/local_llms)

-### Output Parsers
+### Output parsers

 Output Parsers are responsible for taking the output of an LLM and parsing into more structured format.

@ -95,7 +95,7 @@ Output Parsers are responsible for taking the output of an LLM and parsing into
 - [How to: try to fix errors in output parsing](/docs/how_to/output_parser_fixing)
 - [How to: write a custom output parser class](/docs/how_to/output_parser_custom)

-### Document Loaders
+### Document loaders

 Document Loaders are responsible for loading documents from a variety of sources.

@ -108,7 +108,7 @@ Document Loaders are responsible for loading documents from a variety of sources
 - [How to: load PDF files](/docs/how_to/document_loader_pdf)
 - [How to: write a custom document loader](/docs/how_to/document_loader_custom)

-### Text Splitters
+### Text splitters

 Text Splitters take a document and split into chunks that can be used for retrieval.

@ -122,16 +122,16 @@ Text Splitters take a document and split into chunks that can be used for retrie
 - [How to: split text into semantic chunks](/docs/how_to/semantic-chunker)
 - [How to: split by tokens](/docs/how_to/split_by_token)

-### Embedding Models
+### Embedding models

 Embedding Models take a piece of text and create a numerical representation of it.

 - [How to: embed text data](/docs/how_to/embed_text)
 - [How to: cache embedding results](/docs/how_to/caching_embeddings)

-### Vector Stores
+### Vector stores

-Vector Stores are databases that can efficiently store and retrieve embeddings.
+Vector stores are databases that can efficiently store and retrieve embeddings.

 - [How to: use a vector store to retrieve data](/docs/how_to/vectorstores)

@ -194,7 +194,8 @@ All of LangChain components can easily be extended to support your own versions.
 - [How to: write a custom output parser class](/docs/how_to/output_parser_custom)
 - [How to: define a custom tool](/docs/how_to/custom_tools)

-## Use Cases
+
+## Use cases

 These guides cover use-case specific details.

@ -225,7 +226,7 @@ Chatbots involve using an LLM to have a conversation.
 - [How to: do retrieval](/docs/how_to/chatbots_retrieval)
 - [How to: use tools](/docs/how_to/chatbots_tools)

-### Query Analysis
+### Query analysis

 Query Analysis is the task of using an LLM to generate a query to send to a retriever.

@ -245,7 +246,7 @@ You can use LLMs to do question answering over tabular data.
 - [How to: deal with large databases](/docs/how_to/sql_large_db)
 - [How to: deal with CSV files](/docs/how_to/sql_csv)

-### Q&A over Graph Databases
+### Q&A over graph databases

 You can use an LLM to do question answering over graph databases.

--- a/docs/docs/introduction.mdx
+++ b/docs/docs/introduction.mdx
@ -54,13 +54,13 @@ These are the best ones to get started with:
 Explore the full list of tutorials [here](/docs/tutorials).


-## [How-To Guides](/docs/how_to)
+## [How-to guides](/docs/how_to)

 [Here](/docs/how_to) you’ll find short answers to “How do I….?” types of questions.
 These how-to guides don’t cover topics in depth – you’ll find that material in the [Tutorials](/docs/tutorials) and the [API Reference](https://api.python.langchain.com/en/latest/).
 However, these guides will help you quickly accomplish common tasks.

-## [Conceptual Guide](/docs/concepts)
+## [Conceptual guide](/docs/concepts)

 Introductions to all the key parts of LangChain you’ll need to know! [Here](/docs/concepts) you'll find high level explanations of all LangChain concepts.

--- a/docs/docs/security.md
+++ b/docs/docs/security.md
@ -2,7 +2,7 @@

 LangChain has a large ecosystem of integrations with various external resources like local and remote file systems, APIs and databases. These integrations allow developers to create versatile applications that combine the power of LLMs with the ability to access, interact with and manipulate external resources.

-## Best Practices
+## Best practices

 When building such applications developers should remember to follow good security practices:

@ -25,6 +25,6 @@ If you're building applications that access external resources like file systems
 or databases, consider speaking with your company's security team to determine how to best
 design and secure your applications.

-## Reporting a Vulnerability
+## Reporting a vulnerability

 Please report security vulnerabilities by email to security@langchain.dev. This will ensure the issue is promptly triaged and acted upon as needed.
--- a/docs/docs/versions/overview.mdx
+++ b/docs/docs/versions/overview.mdx
@ -3,7 +3,7 @@ sidebar_position: 0
 sidebar_label: Overview
 ---

-# LangChain Over Time
+# LangChain over time

 ## What’s new in LangChain?

@ -45,7 +45,7 @@ This document serves to outline at a high level what has changed and why.

 - `langchain` was split into the following component packages: `langchain-core`, `langchain`, `langchain-community`, `langchain-[partner]` to improve the usability of langchain code in production settings. You can read more about it on our [blog](https://blog.langchain.dev/langchain-v0-1-0/).

-### Ecosystem Organization
+### Ecosystem organization

 By the release of 0.1.0, LangChain had grown to a large ecosystem with many integrations and a large community.

--- a/docs/docs/versions/packages.mdx
+++ b/docs/docs/versions/packages.mdx
@ -3,7 +3,7 @@ sidebar_position: 3
 sidebar_label: Packages
 ---

-# 📕 Package Versioning
+# 📕 Package versioning

 As of now, LangChain has an ad hoc release process: releases are cut with high frequency by
 a maintainer and published to [PyPI](https://pypi.org/).
--- a/docs/docs/versions/release_policy.mdx
+++ b/docs/docs/versions/release_policy.mdx
@ -3,7 +3,7 @@ sidebar_position: 2
 sidebar_label: Release Policy
 ---

-# LangChain Releases
+# LangChain releases

 The LangChain ecosystem is composed of different component packages (e.g., `langchain-core`, `langchain`, `langchain-community`, `langgraph`, `langserve`, partner packages etc.)

@ -32,13 +32,13 @@ From time to time, we will version packages as **release candidates**. These are

 Other packages in the ecosystem (including user packages) can follow a different versioning scheme, but are generally expected to pin to specific minor versions of `langchain` and `langchain-core`.

-## Release Cadence
+## Release cadence

 We expect to space out **minor** releases (e.g., from 0.2.0 to 0.3.0) of `langchain` and `langchain-core` by at least 2-3 months, as such releases may contain breaking changes.

 Patch versions are released frequently as they contain bug fixes and new features.

-## API Stability
+## API stability

 The development of LLM applications is a rapidly evolving field, and we are constantly learning from our users and the community. As such, we expect that the APIs in `langchain` and `langchain-core` will continue to evolve to better serve the needs of our users.

@ -49,14 +49,14 @@ Even though both `langchain` and `langchain-core` are currently in a pre-1.0 sta

 We will generally try to avoid making unnecessary changes, and will provide a deprecation policy for features that are being removed.

-### Stability of Other Packages
+### Stability of other packages

 The stability of other packages in the LangChain ecosystem may vary:

 - `langchain-community` is a community maintained package that contains 3rd party integrations. While we do our best to review and test changes in `langchain-community`, `langchain-community` is expected to experience more breaking changes than `langchain` and `langchain-core` as it contains many community contributions.
 - Partner packages may follow different stability and versioning policies, and users should refer to the documentation of those packages for more information; however, in general these packages are expected to be stable.

-### What is a "API Stability"?
+### What is a "API stability"?

 API stability means:

@ -72,7 +72,7 @@ Certain APIs are explicitly marked as “internal” in a couple of ways:
 - Functions, methods, and other objects prefixed by a leading underscore (**`_`**). This is the standard Python convention of indicating that something is private; if any method starts with a single **`_`**, it’s an internal API.
    - **Exception:** Certain methods are prefixed with `_` , but do not contain an implementation. These methods are *meant* to be overridden by sub-classes that provide the implementation. Such methods are generally part of the **Public API** of LangChain.

-## Deprecation Policy
+## Deprecation policy

 We will generally avoid deprecating features until a better alternative is available.

--- a/docs/docs/versions/v0_2.mdx
+++ b/docs/docs/versions/v0_2.mdx
@ -41,7 +41,7 @@ Here is an example of the import changes that the migration script can help appl
 | langchain           | langchain-text-splitters | from langchain.text_splitter import RecursiveCharacterTextSplitter | from langchain_text_splitters import RecursiveCharacterTextSplitter |


-#### Deprecation Timeline
+#### Deprecation timeline

 We have two main types of deprecations:

@ -102,7 +102,7 @@ langchain-cli migrate [path to code] --diff # Preview
 langchain-cli migrate [path to code] # Apply
 ```

-#### Other Options
+#### Other options

 ```bash
 # See help menu
@ -114,11 +114,11 @@ langchain-cli migrate --diff [path to code]
 langchain-cli migrate --disable langchain_to_core --include-ipynb [path to code]
 ```

-## Deprecations and Breaking Changes
+## Deprecations and breaking changes

 This code contains a list of deprecations and removals in the `langchain` and `langchain-core` packages.

-### Breaking Changes in 0.2.0
+### Breaking changes in 0.2.0

 As of release 0.2.0, `langchain` is required to be integration-agnostic. This means that code in `langchain`  should not by default instantiate any specific chat models, llms, embedding models, vectorstores etc; instead, the user will be required to specify those explicitly.

--- a/docs/sidebars.js
+++ b/docs/sidebars.js
@ -42,7 +42,7 @@ module.exports = {
    {
      type: "category",
      link: {type: 'doc', id: 'how_to/index'},
-      label: "How-To Guides",
+      label: "How-to guides",
      collapsible: false,
      items: [{
        type: 'autogenerated',
--- a/libs/experimental/langchain_experimental/graph_transformers/diffbot.py
+++ b/libs/experimental/langchain_experimental/graph_transformers/diffbot.py
@ -10,6 +10,7 @@ from langchain_core.documents import Document
 class TypeOption(str, Enum):
    FACTS = "facts"
    ENTITIES = "entities"
+    SENTIMENT = "sentiment"


 def format_property_key(s: str) -> str:
@ -148,6 +149,8 @@ class DiffbotGraphTransformer:
        include_evidence: bool = True,
        simplified_schema: bool = True,
        extract_types: List[TypeOption] = [TypeOption.FACTS],
+        *,
+        include_confidence: bool = False,
    ) -> None:
        """
        Initialize the graph transformer with various options.
@ -165,10 +168,12 @@ class DiffbotGraphTransformer:
            simplified_schema (bool):
                Whether to use a simplified schema for relationships.
            extract_types (List[TypeOption]):
-                A list of data types to extract. Only facts or entities
-                are supported. By default, the option is set to facts.
-                A fact represents a combination of source and target
-                nodes with a relationship type.
+                A list of data types to extract. Facts, entities, and
+                sentiment are supported. By default, the option is
+                set to facts. A fact represents a combination of
+                source and target nodes with a relationship type.
+            include_confidence (bool):
+                Whether to include confidence scores on nodes and rels
        """
        self.diffbot_api_key = diffbot_api_key or get_from_env(
            "diffbot_api_key", "DIFFBOT_API_KEY"
@ -176,6 +181,7 @@ class DiffbotGraphTransformer:
        self.fact_threshold_confidence = fact_confidence_threshold
        self.include_qualifiers = include_qualifiers
        self.include_evidence = include_evidence
+        self.include_confidence = include_confidence
        self.simplified_schema = None
        if simplified_schema:
            self.simplified_schema = SimplifiedSchema()
@ -250,6 +256,17 @@ class DiffbotGraphTransformer:
                nodes_list.add_node_property(
                    (source_id, source_label), {"name": source_name}
                )
+                if record.get("sentiment") is not None:
+                    nodes_list.add_node_property(
+                        (source_id, source_label),
+                        {"sentiment": record.get("sentiment")},
+                    )
+                if self.include_confidence:
+                    nodes_list.add_node_property(
+                        (source_id, source_label),
+                        {"confidence": record.get("confidence")},
+                    )
+
        relationships = list()
        # Relationships are a list because we don't deduplicate nor anything else
        if "facts" in payload and payload["facts"]:
@ -307,6 +324,8 @@ class DiffbotGraphTransformer:
                    ][0]
                    if self.include_evidence:
                        rel_properties.update({"evidence": relationship_evidence})
+                    if self.include_confidence:
+                        rel_properties.update({"confidence": record["confidence"]})
                    if self.include_qualifiers and record.get("qualifiers"):
                        for property in record["qualifiers"]:
                            prop_key = format_property_key(property["property"]["name"])