- Description: Add a BM25 Retriever that do not need Elastic search
- Dependencies: rank_bm25(if it is not installed it will be install by
using pip, just like TFIDFRetriever do)
- Tag maintainer: @rlancemartin, @eyurtsev
- Twitter handle: DayuanJian21687
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Description:
Add LLM for ChatGLM-6B & ChatGLM2-6B API
Related Issue:
Will the langchain support ChatGLM? #4766
Add support for selfhost models like ChatGLM or transformer models #1780
Dependencies:
No extra library install required.
It wraps api call to a ChatGLM(2)-6B server(start with api.py), so api
endpoint is required to run.
Tag maintainer: @mlot
Any comments on this PR would be appreciated.
---------
Co-authored-by: mlot <limpo2000@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
# Support Redis Sentinel database connections
This PR adds the support to connect not only to Redis standalone servers
but High Availability Replication sets too
(https://redis.io/docs/management/sentinel/)
Redis Replica Sets have on Master allowing to write data and 2+ replicas
with read-only access to the data. The additional Redis Sentinel
instances monitor all server and reconfigure the RW-Master on the fly if
it comes unavailable.
Therefore all connections must be made through the Sentinels the query
the current master for a read-write connection. This PR adds basic
support to also allow a redis connection url specifying a Sentinel as
Redis connection.
Redis documentation and Jupyter notebook with Redis examples are updated
to mention how to connect to a redis Replica Set with Sentinels
-
Remark - i did not found test cases for Redis server connections to add
new cases here. Therefor i tests the new utility class locally with
different kind of setups to make sure different connection urls are
working as expected. But no test case here as part of this PR.
- [Xorbits](https://doc.xorbits.io/en/latest/) is an open-source
computing framework that makes it easy to scale data science and machine
learning workloads in parallel. Xorbits can leverage multi cores or GPUs
to accelerate computation on a single machine, or scale out up to
thousands of machines to support processing terabytes of data.
- This PR added support for the Xorbits agent, which allows langchain to
interact with Xorbits Pandas dataframe and Xorbits Numpy array.
- Dependencies: This change requires the Xorbits library to be installed
in order to be used.
`pip install xorbits`
- Request for review: @hinthornw
- Twitter handle: https://twitter.com/Xorbitsio
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
- New pin-to-side (button). This functionality allows you to search the
docs while asking the AI for questions
- Fixed the search bar in Firefox that won't detect a mouse click
- Fixes and improvements overall in the model's performance
Inspired by #5550, I implemented full async API support in Qdrant. The
docs were extended to mention the existence of asynchronous operations
in Langchain. I also used that chance to restructure the tests of Qdrant
and provided a suite of tests for the async version. Async API requires
the GRPC protocol to be enabled. Thus, it doesn't work on local mode
yet, but we're considering including the support to be consistent.
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
Integrate [Rockset](https://rockset.com/docs/) as a document loader.
Issue: None
Dependencies: Nothing new (rockset's dependency was already added
[here](https://github.com/hwchase17/langchain/pull/6216))
Tag maintainer: @rlancemartin
I have added a test for the integration and an example notebook showing
its use. I ran `make lint` and everything looks good.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
This pull request adds a ElasticsearchDatabaseChain chain for
interacting with analytics database, in the manner of the
SQLDatabaseChain.
Maintainer: @samber
Twitter handler: samuelberthe
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- Add langchain.llms.Tonyi for text completion, in examples into the
Tonyi Text API,
- Add system tests.
Note async completion for the Text API is not yet supported and will be
included in a future PR.
Dependencies: dashscope. It will be installed manually cause it is not
need by everyone.
Happy for feedback on any aspect of this PR @hwchase17 @baskaryan.
Multiple people have asked in #5081 for a way to limit the documents
returned from an AzureCognitiveSearchRetriever. This PR adds the `top_n`
parameter to allow that.
Twitter handle:
[@UmerHAdil](twitter.com/umerHAdil)
# Browserless
Added support for Browserless' `/content` endpoint as a document loader.
### About Browserless
Browserless is a cloud service that provides access to headless Chrome
browsers via a REST API. It allows developers to automate Chromium in a
serverless fashion without having to configure and maintain their own
Chrome infrastructure.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Lance Martin <lance@langchain.dev>
This PR is aimed at enhancing the clarity of the documentation in the
langchain project.
**Description**:
In the graphql.ipynb file, I have removed the unnecessary 'llm' argument
from the initialization process of the GraphQL tool (of type
_EXTRA_OPTIONAL_TOOLS). The 'llm' argument is not required for this
process. Its presence could potentially confuse users. This modification
simplifies the understanding of tool initialization and minimizes
potential confusion.
**Issue**: Not applicable, as this is a documentation improvement.
**Dependencies**: None.
**I kindly request a review from the following maintainer**: @hinthornw,
who is responsible for Agents / Tools / Toolkits.
No new integration is being added in this PR, hence no need for a test
or an example notebook.
Please see the changes for more detail and let me know if any further
modification is necessary.
- Migrate from deprecated langchainplus_sdk to `langsmith` package
- Update the `run_on_dataset()` API to use an eval config
- Update a number of evaluators, as well as the loading logic
- Update docstrings / reference docs
- Update tracer to share single HTTP session
Added fix to avoid irrelevant attributes being returned plus an example
of extracting unrelated entities and an exampe of using an 'extra_info'
attribute to extract unstructured data for an entity.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Adds a new document transformer that automatically extracts metadata for
a document based on an input schema. I also moved
`document_transformers.py` to `document_transformers/__init__.py` to
group it with this new transformer - it didn't seem to cause issues in
the notebook, but let me know if I've done something wrong there.
Also had a linter issue I couldn't figure out:
```
MacBook-Pro:langchain jacoblee$ make lint
poetry run mypy .
docs/dist/conf.py: error: Duplicate module named "conf" (also at "./docs/api_reference/conf.py")
docs/dist/conf.py: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#mapping-file-paths-to-modules for more info
docs/dist/conf.py: note: Common resolutions include: a) using `--exclude` to avoid checking one of them, b) adding `__init__.py` somewhere, c) using `--explicit-package-bases` or adjusting MYPYPATH
Found 1 error in 1 file (errors prevented further checking)
make: *** [lint] Error 2
```
@rlancemartin @baskaryan
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- Description: Add two new document transformers that translates
documents into different languages and converts documents into q&a
format to improve vector search results. Uses OpenAI function calling
via the [doctran](https://github.com/psychic-api/doctran/tree/main)
library.
- Issue: N/A
- Dependencies: `doctran = "^0.0.5"`
- Tag maintainer: @rlancemartin @eyurtsev @hwchase17
- Twitter handle: @psychicapi or @jfan001
Notes
- Adheres to the `DocumentTransformer` abstraction set by @dev2049 in
#3182
- refactored `EmbeddingsRedundantFilter` to put it in a file under a new
`document_transformers` module
- Added basic docs for `DocumentInterrogator`, `DocumentTransformer` as
well as the existing `EmbeddingsRedundantFilter`
---------
Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Updates to the WhyLabsCallbackHandler and example notebook
- Update dependency to langkit 0.0.6 which defines new helper methods
for callback integrations
- Update WhyLabsCallbackHandler to use the new `get_callback_instance`
so that the callback is mostly defined in langkit
- Remove much of the implementation of the WhyLabsCallbackHandler here
in favor of the callback instance
This does not change the behavior of the whylabs callback handler
implementation but is a reorganization that moves some of the
implementation externally to our optional dependency package, and should
make future updates easier.
@agola11
Probably the most boring PR to review ;)
Individual commits might be easier to digest
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- Description: Adds a new chain that acts as a wrapper around Sympy to
give LLMs the ability to do some symbolic math.
- Dependencies: SymPy
---------
Co-authored-by: sreiswig <sreiswig@github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
This change makes the ecosystem integrations cnosdb documentation more
realistic and easy to understand.
- change examples of question and table
- modify typo and format
- Description: add wrapper that lets you use KoboldAI api in langchain
- Issue: n/a
- Dependencies: none extra, just what exists in lanchain
- Tag maintainer: @baskaryan
- Twitter handle: @zanzibased
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description: a description of the change**
Fixed `make docs_build` and related scripts which caused errors. There
are several changes.
First, I made the build of the documentation and the API Reference into
two separate commands. This is because it takes less time to build. The
commands for documents are `make docs_build`, `make docs_clean`, and
`make docs_linkcheck`. The commands for API Reference are `make
api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`.
It looked like `docs/.local_build.sh` could be used to build the
documentation, so I used that. Since `.local_build.sh` was also building
API Rerefence internally, I removed that process. `.local_build.sh` also
added some Bash options to stop in error or so. Futher more added `cd
"${SCRIPT_DIR}"` at the beginning so that the script will work no matter
which directory it is executed in.
`docs/api_reference/api_reference.rst` is removed, because which is
generated by `docs/api_reference/create_api_rst.py`, and added it to
.gitignore.
Finally, the description of CONTRIBUTING.md was modified.
**Issue: the issue # it fixes (if applicable)**
https://github.com/hwchase17/langchain/issues/6413
**Dependencies: any dependencies required for this change**
`nbdoc` was missing in group docs so it was added. I installed it with
the `poetry add --group docs nbdoc` command. I am concerned if any
modifications are needed to poetry.lock. I would greatly appreciate it
if you could pay close attention to this file during the review.
**Tag maintainer**
- General / Misc / if you don't know who to tag: @baskaryan
If this PR needs any additional changes, I'll be happy to make them!
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Description: I added an example of how to reference the OpenAI API
Organization ID, because I couldn't find it before. In the example, it
is mentioned how to achieve this using environment variables as well as
parameters for the OpenAI()-class
Issue: -
Dependencies: -
Twitter @schop-rob
Small fix to a link from the Marqo page in the ecosystem.
The link was not updated correctly when the documentation structure
changed to html pages instead of links to notebooks.
This PR changes the behavior of `Qdrant.from_texts` so the collection is
reused if not requested to recreate it. Previously, calling
`Qdrant.from_texts` or `Qdrant.from_documents` resulted in removing the
old data which was confusing for many.
- Description: Added notebook to LangChain docs that explains how to use
Lemon AI NLP Workflow Automation tool with Langchain
- Issue: not applicable
- Dependencies: not applicable
- Tag maintainer: @agola11
- Twitter handle: felixbrockm
# Causal program-aided language (CPAL) chain
## Motivation
This builds on the recent [PAL](https://arxiv.org/abs/2211.10435) to
stop LLM hallucination. The problem with the
[PAL](https://arxiv.org/abs/2211.10435) approach is that it hallucinates
on a math problem with a nested chain of dependence. The innovation here
is that this new CPAL approach includes causal structure to fix
hallucination.
For example, using the below word problem, PAL answers with 5, and CPAL
answers with 13.
"Tim buys the same number of pets as Cindy and Boris."
"Cindy buys the same number of pets as Bill plus Bob."
"Boris buys the same number of pets as Ben plus Beth."
"Bill buys the same number of pets as Obama."
"Bob buys the same number of pets as Obama."
"Ben buys the same number of pets as Obama."
"Beth buys the same number of pets as Obama."
"If Obama buys one pet, how many pets total does everyone buy?"
The CPAL chain represents the causal structure of the above narrative as
a causal graph or DAG, which it can also plot, as shown below.
![complex-graph](https://github.com/hwchase17/langchain/assets/367522/d938db15-f941-493d-8605-536ad530f576)
.
The two major sections below are:
1. Technical overview
2. Future application
Also see [this jupyter
notebook](https://github.com/borisdev/langchain/blob/master/docs/extras/modules/chains/additional/cpal.ipynb)
doc.
## 1. Technical overview
### CPAL versus PAL
Like [PAL](https://arxiv.org/abs/2211.10435), CPAL intends to reduce
large language model (LLM) hallucination.
The CPAL chain is different from the PAL chain for a couple of reasons.
* CPAL adds a causal structure (or DAG) to link entity actions (or math
expressions).
* The CPAL math expressions are modeling a chain of cause and effect
relations, which can be intervened upon, whereas for the PAL chain math
expressions are projected math identities.
PAL's generated python code is wrong. It hallucinates when complexity
increases.
```python
def solution():
"""Tim buys the same number of pets as Cindy and Boris.Cindy buys the same number of pets as Bill plus Bob.Boris buys the same number of pets as Ben plus Beth.Bill buys the same number of pets as Obama.Bob buys the same number of pets as Obama.Ben buys the same number of pets as Obama.Beth buys the same number of pets as Obama.If Obama buys one pet, how many pets total does everyone buy?"""
obama_pets = 1
tim_pets = obama_pets
cindy_pets = obama_pets + obama_pets
boris_pets = obama_pets + obama_pets
total_pets = tim_pets + cindy_pets + boris_pets
result = total_pets
return result # math result is 5
```
CPAL's generated python code is correct.
```python
story outcome data
name code value depends_on
0 obama pass 1.0 []
1 bill bill.value = obama.value 1.0 [obama]
2 bob bob.value = obama.value 1.0 [obama]
3 ben ben.value = obama.value 1.0 [obama]
4 beth beth.value = obama.value 1.0 [obama]
5 cindy cindy.value = bill.value + bob.value 2.0 [bill, bob]
6 boris boris.value = ben.value + beth.value 2.0 [ben, beth]
7 tim tim.value = cindy.value + boris.value 4.0 [cindy, boris]
query data
{
"question": "how many pets total does everyone buy?",
"expression": "SELECT SUM(value) FROM df",
"llm_error_msg": ""
}
# query result is 13
```
Based on the comments below, CPAL's intended location in the library is
`experimental/chains/cpal` and PAL's location is`chains/pal`.
### CPAL vs Graph QA
Both the CPAL chain and the Graph QA chain extract entity-action-entity
relations into a DAG.
The CPAL chain is different from the Graph QA chain for a few reasons.
* Graph QA does not connect entities to math expressions
* Graph QA does not associate actions in a sequence of dependence.
* Graph QA does not decompose the narrative into these three parts:
1. Story plot or causal model
4. Hypothetical question
5. Hypothetical condition
### Evaluation
Preliminary evaluation on simple math word problems shows that this CPAL
chain generates less hallucination than the PAL chain on answering
questions about a causal narrative. Two examples are in [this jupyter
notebook](https://github.com/borisdev/langchain/blob/master/docs/extras/modules/chains/additional/cpal.ipynb)
doc.
## 2. Future application
### "Describe as Narrative, Test as Code"
The thesis here is that the Describe as Narrative, Test as Code approach
allows you to represent a causal mental model both as code and as a
narrative, giving you the best of both worlds.
#### Why describe a causal mental mode as a narrative?
The narrative form is quick. At a consensus building meeting, people use
narratives to persuade others of their causal mental model, aka. plan.
You can share, version control and index a narrative.
#### Why test a causal mental model as a code?
Code is testable, complex narratives are not. Though fast, narratives
are problematic as their complexity increases. The problem is LLMs and
humans are prone to hallucination when predicting the outcomes of a
narrative. The cost of building a consensus around the validity of a
narrative outcome grows as its narrative complexity increases. Code does
not require tribal knowledge or social power to validate.
Code is composable, complex narratives are not. The answer of one CPAL
chain can be the hypothetical conditions of another CPAL Chain. For
stochastic simulations, a composable plan can be integrated with the
[DoWhy library](https://github.com/py-why/dowhy). Lastly, for the
futuristic folk, a composable plan as code allows ordinary community
folk to design a plan that can be integrated with a blockchain for
funding.
An explanation of a dependency planning application is
[here.](https://github.com/borisdev/cpal-llm-chain-demo)
---
Twitter handle: @boris_dev
---------
Co-authored-by: Boris Dev <borisdev@Boriss-MacBook-Air.local>
This PR proposes an implementation to support `generate` as an
`early_stopping_method` for the new `OpenAIFunctionsAgent` class.
The motivation behind is to facilitate the user to set a maximum number
of actions the agent can take with `max_iterations` and force a final
response with this new agent (as with the `Agent` class).
The following changes were made:
- The `OpenAIFunctionsAgent.return_stopped_response` method was
overwritten to support `generate` as an `early_stopping_method`
- A boolean `with_functions` parameter was added to the
`OpenAIFunctionsAgent.plan` method
This way the `OpenAIFunctionsAgent.return_stopped_response` method can
call the `OpenAIFunctionsAgent.plan` method with `with_function=False`
when the `early_stopping_method` is set to `generate`, making a call to
the LLM with no functions and forcing a final response from the
`"assistant"`.
- Relevant maintainer: @hinthornw
- Twitter handle: @aledelunap
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Improve documentation for a central use-case, qa / chat over documents.
This will be merged as an update to `index.mdx`
[here](https://python.langchain.com/docs/use_cases/question_answering/).
Testing w/ local Docusaurus server:
```
From `docs` directory:
mkdir _dist
cp -r {docs_skeleton,snippets} _dist
cp -r extras/* _dist/docs_skeleton/docs
cd _dist/docs_skeleton
yarn install
yarn start
```
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.
Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.
After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->
<!-- Remove if not applicable -->
Fixes # (issue)
#### Before submitting
<!-- If you're adding a new integration, please include:
1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use
See contribution guidelines for more information on how to write tests,
lint
etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
#### Who can review?
Tag maintainers/contributors who might be interested:
<!-- For a quicker response, figure out the right person to tag with @
@hwchase17 - project lead
Tracing / Callbacks
- @agola11
Async
- @agola11
DataLoaders
- @eyurtsev
Models
- @hwchase17
- @agola11
Agents / Tools / Toolkits
- @hwchase17
VectorStores / Retrievers / Memory
- @dev2049
-->
1. Added use cases of the new features
2. Done some code refactoring
---------
Co-authored-by: Ivo Stranic <istranic@gmail.com>
### Description
Created a Loader to get a list of specific logs from Datadog Logs.
### Dependencies
`datadog_api_client` is required.
### Twitter handle
[kzk_maeda](https://twitter.com/kzk_maeda)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- [Xorbits](https://doc.xorbits.io/en/latest/) is an open-source
computing framework that makes it easy to scale data science and machine
learning workloads in parallel. Xorbits can leverage multi cores or GPUs
to accelerate computation on a single machine, or scale out up to
thousands of machines to support processing terabytes of data.
- This PR added support for the Xorbits document loader, which allows
langchain to leverage Xorbits to parallelize and distribute the loading
of data.
- Dependencies: This change requires the Xorbits library to be installed
in order to be used.
`pip install xorbits`
- Request for review: @rlancemartin, @eyurtsev
- Twitter handle: https://twitter.com/Xorbitsio
Co-authored-by: Bagatur <baskaryan@gmail.com>
Adding a maximal_marginal_relevance method to the
MongoDBAtlasVectorSearch vectorstore enhances the user experience by
providing more diverse search results
Issue: #7304
### Summary
Adds an `UnstructuredTSVLoader` for TSV files. Also updates the doc
strings for `UnstructuredCSV` and `UnstructuredExcel` loaders.
### Testing
```python
from langchain.document_loaders.tsv import UnstructuredTSVLoader
loader = UnstructuredTSVLoader(
file_path="example_data/mlb_teams_2012.csv", mode="elements"
)
docs = loader.load()
```
Hey @hwchase17 -
This PR adds a `ZepMemory` class, improves handling of Zep's message
metadata, and makes it easier for folks building custom chains to
persist metadata alongside their chat history.
We've had plenty confused users unfamiliar with ChatMessageHistory
classes and how to wrap the `ZepChatMessageHistory` in a
`ConversationBufferMemory`. So we've created the `ZepMemory` class as a
light wrapper for `ZepChatMessageHistory`.
Details:
- add ZepMemory, modify notebook to demo use of ZepMemory
- Modify summary to be SystemMessage
- add metadata argument to add_message; add Zep metadata to
Message.additional_kwargs
- support passing in metadata
### Description
Adding a callback handler for Context. Context is a product analytics
platform for AI chat experiences to help you understand how users are
interacting with your product.
I've added the callback library + an example notebook showing its use.
### Dependencies
Requires the user to install the `context-python` library. The library
is lazily-loaded when the callback is instantiated.
### Announcing the feature
We spoke with Harrison a few weeks ago about also doing a blog post
announcing our integration, so will coordinate this with him. Our
Twitter handle for the company is @getcontextai, and the founders are
@_agamble and @HenrySG.
Thanks in advance!
Continuing with Tolkien inspired series of langchain tools. I bring to
you:
**The Fellowship of the Vectors**, AKA EmbeddingsClusteringFilter.
This document filter uses embeddings to group vectors together into
clusters, then allows you to pick an arbitrary number of documents
vector based on proximity to the cluster centers. That's a
representative sample of the cluster.
The original idea is from [Greg Kamradt](https://github.com/gkamradt)
from this video (Level4):
https://www.youtube.com/watch?v=qaPMdcCqtWk&t=365s
I added few tricks to make it a bit more versatile, so you can
parametrize what to do with duplicate documents in case of cluster
overlap: replace the duplicates with the next closest document or remove
it. This allow you to use it as an special kind of redundant filter too.
Additionally you can choose 2 diff orders: grouped by cluster or
respecting the original retriever scores.
In my use case I was using the docs grouped by cluster to run refine
chains per cluster to generate summarization over a large corpus of
documents.
Let me know if you want to change anything!
@rlancemartin, @eyurtsev, @hwchase17,
---------
Co-authored-by: rlm <pexpresss31@gmail.com>
This PR improves the example notebook for the Marqo vectorstore
implementation by adding a new RetrievalQAWithSourcesChain example. The
`embedding` parameter in `from_documents` has its type updated to
`Union[Embeddings, None]` and a default parameter of None because this
is ignored in Marqo.
This PR also upgrades the Marqo version to 0.11.0 to remove the device
parameter after a breaking change to the API.
Related to #7068 @tomhamer @hwchase17
---------
Co-authored-by: Tom Hamer <tom@marqo.ai>
This PR improves upon the Clarifai LangChain integration with improved docs, errors, args and the addition of embedding model support in LancChain for Clarifai's embedding models and an overview of the various ways you can integrate with Clarifai added to the docs.
---------
Co-authored-by: Matthew Zeiler <zeiler@clarifai.com>
updated `tutorials.mdx`:
- added a link to new `Deeplearning AI` course on LangChain
- added links to other tutorial videos
- fixed format
@baskaryan, @hwchase17
Based on user feedback, we have improved the Alibaba Cloud OpenSearch
vector store documentation.
Co-authored-by: zhaoshengbo <shengbo.zsb@alibaba-inc.com>
Replace this comment with:
- Description: added documentation for a template repo that helps
dockerizing and deploying a LangChain using a Cloud Build CI/CD pipeline
to Google Cloud build serverless
- Issue: None,
- Dependencies: None,
- Tag maintainer: @baskaryan,
- Twitter handle: EdenEmarco177
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
**Description**
In the following page, "Wikipedia" tool is explained.
https://python.langchain.com/docs/modules/agents/tools/integrations/wikipedia
However, the WikipediaAPIWrapper being used is not a tool. This PR
updated the documentation to use a tool WikipediaQueryRun.
**Issue**
None
**Tag maintainer**
Agents / Tools / Toolkits: @hinthornw
- Description: This is a chat model equivalent of HumanInputLLM. An
example notebook is also added.
- Tag maintainer: @hwchase17, @baskaryan
- Twitter handle: N/A
Description: `flan-t5-xl` hangs, updated to `flan-t5-xxl`. Tested all
stabilityai LLMs- all hang so removed from tutorial. Temperature > 0 to
prevent unintended determinism.
Issue: #3275
Tag maintainer: @baskaryan
### Description
This pull request introduces the "Cube Semantic Layer" document loader,
which demonstrates the retrieval of Cube's data model metadata in a
format suitable for passing to LLMs as embeddings. This enhancement aims
to provide contextual information and improve the understanding of data.
Twitter handle:
@the_cube_dev
---------
Co-authored-by: rlm <pexpresss31@gmail.com>
This PR brings in a vectorstore interface for
[Marqo](https://www.marqo.ai/).
The Marqo vectorstore exposes some of Marqo's functionality in addition
the the VectorStore base class. The Marqo vectorstore also makes the
embedding parameter optional because inference for embeddings is an
inherent part of Marqo.
Docs, notebook examples and integration tests included.
Related PR:
https://github.com/hwchase17/langchain/pull/2807
---------
Co-authored-by: Tom Hamer <tom@marqo.ai>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
added tutorials.mdx; updated youtube.mdx
Rationale: the Tutorials section in the documentation is top-priority.
(for example, https://pytorch.org/docs/stable/index.html) Not every
project has resources to make tutorials. We have such a privilege.
Community experts created several tutorials on YouTube. But the tutorial
links are now hidden on the YouTube page and not easily discovered by
first-time visitors.
- Added new videos and tutorials that were created since the last
update.
- Made some reprioritization between videos on the base of the view
numbers.
#### Who can review?
- @hwchase17
- @dev2049
- Description: added some documentation to the Pinecone vector store
docs page.
- Issue: #7126
- Dependencies: None
- Tag maintainer: @baskaryan
I can add more documentation on the Pinecone integration functions as I
am going to go in great depth into this area. Just wanted to check with
the maintainers is if this is all good.
# [SPARQL](https://www.w3.org/TR/rdf-sparql-query/) for
[LangChain](https://github.com/hwchase17/langchain)
## Description
LangChain support for knowledge graphs relying on W3C standards using
RDFlib: SPARQL/ RDF(S)/ OWL with special focus on RDF \
* Works with local files, files from the web, and SPARQL endpoints
* Supports both SELECT and UPDATE queries
* Includes both a Jupyter notebook with an example and integration tests
## Contribution compared to related PRs and discussions
* [Wikibase agent](https://github.com/hwchase17/langchain/pull/2690) -
uses SPARQL, but specifically for wikibase querying
* [Cypher qa](https://github.com/hwchase17/langchain/pull/5078) - graph
DB question answering for Neo4J via Cypher
* [PR 6050](https://github.com/hwchase17/langchain/pull/6050) - tries
something similar, but does not cover UPDATE queries and supports only
RDF
* Discussions on [w3c mailing list](mailto:semantic-web@w3.org) related
to the combination of LLMs (specifically ChatGPT) and knowledge graphs
## Dependencies
* [RDFlib](https://github.com/RDFLib/rdflib)
## Tag maintainer
Graph database related to memory -> @hwchase17
Fix for typos in MongoDB Atlas Vector Search documentation
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
Hi @rlancemartin, @eyurtsev!
- Description: Adding HNSW extension support for Postgres. Similar to
pgvector vectorstore, with 3 differences
1. it uses HNSW extension for exact and ANN searches,
2. Vectors are of type array of real
3. Only supports L2
- Dependencies: [HNSW](https://github.com/knizhnik/hnsw) extension for
Postgres
- Example:
```python
db = HNSWVectoreStore.from_documents(
embedding=embeddings,
documents=docs,
collection_name=collection_name,
connection_string=connection_string
)
query = "What did the president say about Ketanji Brown Jackson"
docs_with_score: List[Tuple[Document, float]] =
db.similarity_search_with_score(query)
```
The example notebook is in the PR too.
Documentation update for [Jina
ecosystem](https://python.langchain.com/docs/ecosystem/integrations/jina)
and `langchain-serve` in the deployments section to latest features.
@hwchase17
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
[Apache HugeGraph](https://github.com/apache/incubator-hugegraph) is a
convenient, efficient, and adaptable graph database, compatible with the
Apache TinkerPop3 framework and the Gremlin query language.
In this PR, the HugeGraph and HugeGraphQAChain provide the same
functionality as the existing integration with Neo4j and enables query
generation and question answering over HugeGraph database. The
difference is that the graph query language supported by HugeGraph is
not cypher but another very popular graph query language
[Gremlin](https://tinkerpop.apache.org/gremlin.html).
A notebook example and a simple test case have also been added.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
This PR introduces a new Mendable UI tailored to a better search
experience.
We're more closely integrating our traditional search with our AI
generation.
With this change, you won't have to tab back and forth between the
mendable bot and the keyword search. Both types of search are handled in
the same bar. This should make the docs easier to navigate. while still
letting users get code generations or AI-summarized answers if they so
wish. Also, it should reduce the cost.
Would love to hear your feedback :)
Cc: @dev2049 @hwchase17
- Description: Added a new SpacyEmbeddings class for generating
embeddings using the Spacy library.
- Issue: Sentencebert/Bert/Spacy/Doc2vec embedding support #6952
- Dependencies: This change requires the Spacy library and the
'en_core_web_sm' Spacy model.
- Tag maintainer: @dev2049
- Twitter handle: N/A
This change includes a new SpacyEmbeddings class, but does not include a
test or an example notebook.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description**:
The JSON Lines format is used by some services such as OpenAI and
HuggingFace. It's also a convenient alternative to CSV.
This PR adds JSON Lines support to `JSONLoader` and also updates related
tests.
**Tag maintainer**: @rlancemartin, @eyurtsev.
PS I was not able to build docs locally so didn't update related
section.
Update to Vectara integration
- By user request added "add_files" to take advantage of Vectara
capabilities to process files on the backend, without the need for
separate loading of documents and chunking in the chain.
- Updated vectara.ipynb example notebook to be broader and added testing
of add_file()
@hwchase17 - project lead
---------
Co-authored-by: rlm <pexpresss31@gmail.com>
Retrying with the same improvements as in #6772, this time trying not to
mess up with branches.
@rlancemartin doing a fresh new PR from a branch with a new name. This
should do. Thank you for your help!
---------
Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
Co-authored-by: rlm <pexpresss31@gmail.com>
### Summary
Updates `UnstructuredEmailLoader` so that it can process attachments in
addition to the e-mail content. The loader will process attachments if
the `process_attachments` kwarg is passed when the loader is
instantiated.
### Testing
```python
file_path = "fake-email-attachment.eml"
loader = UnstructuredEmailLoader(
file_path, mode="elements", process_attachments=True
)
docs = loader.load()
docs[-1]
```
### Reviewers
- @rlancemartin
- @eyurtsev
- @hwchase17
Handle the new retriever events in a way that (I think) is entirely
backwards compatible? Needs more testing for some of the chain changes
and all.
This creates an entire new run type, however. We could also just treat
this as an event within a chain run presumably (same with memory)
Adds a subclass initializer that upgrades old retriever implementations
to the new schema, along with tests to ensure they work.
First commit doesn't upgrade any of our retriever implementations (to
show that we can pass the tests along with additional ones testing the
upgrade logic).
Second commit upgrades the known universe of retrievers in langchain.
- [X] Add callback handling methods for retriever start/end/error (open
to renaming to 'retrieval' if you want that)
- [X] Update BaseRetriever schema to support callbacks
- [X] Tests for upgrading old "v1" retrievers for backwards
compatibility
- [X] Update existing retriever implementations to implement the new
interface
- [X] Update calls within chains to .{a]get_relevant_documents to pass
the child callback manager
- [X] Update the notebooks/docs to reflect the new interface
- [X] Test notebooks thoroughly
Not handled:
- Memory pass throughs: retrieval memory doesn't have a parent callback
manager passed through the method
---------
Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>