begining -> beginning
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
## Description
This commit introduces the `DropboxLoader` class, a new document loader
that allows loading files from Dropbox into the application. The loader
relies on a Dropbox app, which requires creating an app on Dropbox,
obtaining the necessary scope permissions, and generating an access
token. Additionally, the dropbox Python package is required.
The `DropboxLoader` class is designed to be used as a document loader
for processing various file types, including text files, PDFs, and
Dropbox Paper files.
## Dependencies
`pip install dropbox` and `pip install unstructured` for PDF reading.
## Tag maintainer
@rlancemartin, @eyurtsev (from Data Loaders). I'd appreciate some
feedback here 🙏 .
## Social Networks
https://github.com/rubenbarraganhttps://www.linkedin.com/in/rgbarragan/https://twitter.com/RubenBarraganP
---------
Co-authored-by: Ruben Barragan <rbarragan@Rubens-MacBook-Air.local>
# [WIP] Tree of Thought introducing a new ToTChain.
This PR adds a new chain called ToTChain that implements the ["Large
Language Model Guided
Tree-of-Though"](https://arxiv.org/pdf/2305.08291.pdf) paper.
There's a notebook example `docs/modules/chains/examples/tot.ipynb` that
shows how to use it.
Implements #4975
## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
- @hwchase17
- @vowelparrot
---------
Co-authored-by: Vadim Gubergrits <vgubergrits@outbox.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
This PR introduces async API support for Cohere, both LLM and
embeddings. It requires updating `cohere` package to `^4`.
Tagging @hwchase17, @baskaryan, @agola11
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Given a user question, this will -
* Use LLM to generate a set of queries.
* Query for each.
* The URLs from search results are stored in self.urls.
* A check is performed for any new URLs that haven't been processed yet
(not in self.url_database).
* Only these new URLs are loaded, transformed, and added to the
vectorstore.
* The vectorstore is queried for relevant documents based on the
questions generated by the LLM.
* Only unique documents are returned as the final result.
This code will avoid reprocessing of URLs across multiple runs of
similar queries, which should improve the performance of the retriever.
It also keeps track of all URLs that have been processed, which could be
useful for debugging or understanding the retriever's behavior.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- Until now, hybrid search was limited to modules requiring external
services, such as Weaviate/Pinecone Hybrid Search. However, I have
developed a hybrid retriever that can merge a list of retrievers using
the [Reciprocal Rank
Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf)
algorithm. This new approach, similar to Weaviate hybrid search, does
not require the initialization of any external service.
- Dependencies: No - Twitter handle: dayuanjian21687
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
New HTML loader that asynchronously loader a list of urls.
New transformer using [HTML2Text](https://github.com/Alir3z4/html2text/)
for HTML to clean, easy-to-read plain ASCII text (valid Markdown).
In certain 0-shot scenarios, the existing stateful language model can
unintentionally send/accumulate the .history.
This commit adds the "with_history" option to chatglm, allowing users to
control the behavior of .history and prevent unintended accumulation.
Possible reviewers @hwchase17 @baskaryan @mlot
Refer to discussion over this thread:
https://twitter.com/wey_gu/status/1681996149543276545?s=20
I've extended the support of async API to local Qdrant mode. It is faked
but allows prototyping without spinning a container. The tests are
improved to test the in-memory case as well.
@baskaryan @rlancemartin @eyurtsev @agola11
Streaming support is useful if you are doing long-running completions or
need interactivity e.g. for chat... adding it to replicate, using a
similar pattern to other LLMs that support streaming.
Housekeeping: I ran `make format` and `make lint`, no issues reported in
the files I touched.
I did update the replicate integration test but ran into some issues,
specifically:
1. The original test was failing for me due to the model argument not
being specified... perhaps this test is not regularly run? I fixed it by
adding a call to the lightweight hello world model which should not be
burdensome for replicate infra.
2. I couldn't get the `make integration_tests` command to pass... a lot
of failures in other integration tests due to missing dependencies...
however I did make sure the particluar test file I updated does pass, by
running `poetry run pytest
tests/integration_tests/llms/test_replicate.py`
Finally, I am @tjaffri https://twitter.com/tjaffri for feature
announcement tweets... or if you could please tag @docugami
https://twitter.com/docugami we would really appreciate that :-)
Tagging model maintainers @hwchase17 @baskaryan
Thank for all the awesome work you folks are doing.
---------
Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
## Description
This PR adds a graph class and an openCypher QA chain to work with the
Amazon Neptune database.
## Dependencies
`requests` which is included in the LangChain dependencies.
## Maintainers for Review
@krlawrence
@baskaryan
### Twitter handle
pjain7
BedrockEmbeddings does not have endpoint_url so that switching to custom
endpoint is not possible. I have access to Bedrock custom endpoint and
cannot use BedrockEmbeddings
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Golden Query is a wrapper on top of the [Golden Query
API](https://docs.golden.com/reference/query-api) which enables
programmatic access to query results on entities across Golden's
Knowledge Base. For more information about Golden API, please see the
[Golden API Getting
Started](https://docs.golden.com/reference/getting-started) page.
**Issue:** None
**Dependencies:** requests(already present in project)
**Tag maintainer:** @hinthornw
Signed-off-by: Constantin Musca <constantin.musca@gmail.com>
## Background
With the addition on email and calendar tools, LangChain is continuing
to complete its functionality to automate business processes.
## Challenge
One of the pieces of business functionality that LangChain currently
doesn't have is the ability to search for flights and travel in order to
book business travel.
## Changes
This PR implements an integration with the
[Amadeus](https://developers.amadeus.com/) travel search API for
LangChain, enabling seamless search for flights with a single
authentication process.
## Who can review?
@hinthornw
## Appendix
@tsolakoua and @minjikarin, I utilized your
[amadeus-python](https://github.com/amadeus4dev/amadeus-python) library
extensively. Given the rising popularity of LangChain and similar AI
frameworks, the convergence of libraries like amadeus-python and tools
like this one is likely. So, I wanted to keep you updated on our
progress.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Work in Progress.
WIP
Not ready...
Adds Document Loader support for
[Geopandas.GeoDataFrames](https://geopandas.org/)
Example:
- [x] stub out `GeoDataFrameLoader` class
- [x] stub out integration tests
- [ ] Experiment with different geometry text representations
- [ ] Verify CRS is successfully added in metadata
- [ ] Test effectiveness of searches on geometries
- [ ] Test with different geometry types (point, line, polygon with
multi-variants).
- [ ] Add documentation
---------
Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Lance Martin <122662504+rlancemartin@users.noreply.github.com>
** This should land Monday the 17th **
Chroma is upgrading from `0.3.29` to `0.4.0`. `0.4.0` is easier to
build, more durable, faster, smaller, and more extensible. This comes
with a few changes:
1. A simplified and improved client setup. Instead of having to remember
weird settings, users can just do `EphemeralClient`, `PersistentClient`
or `HttpClient` (the underlying direct `Client` implementation is also
still accessible)
2. We migrated data stores away from `duckdb` and `clickhouse`. This
changes the api for the `PersistentClient` that used to reference
`chroma_db_impl="duckdb+parquet"`. Now we simply set
`is_persistent=true`. `is_persistent` is set for you to `true` if you
use `PersistentClient`.
3. Because we migrated away from `duckdb` and `clickhouse` - this also
means that users need to migrate their data into the new layout and
schema. Chroma is committed to providing extension notification and
tooling around any schema and data migrations (for example - this PR!).
After upgrading to `0.4.0` - if users try to access their data that was
stored in the previous regime, the system will throw an `Exception` and
instruct them how to use the migration assistant to migrate their data.
The migration assitant is a pip installable CLI: `pip install
chroma_migrate`. And is runnable by calling `chroma_migrate`
-- TODO ADD here is a short video demonstrating how it works.
Please reference the readme at
[chroma-core/chroma-migrate](https://github.com/chroma-core/chroma-migrate)
to see a full write-up of our philosophy on migrations as well as more
details about this particular migration.
Please direct any users facing issues upgrading to our Discord channel
called
[#get-help](https://discord.com/channels/1073293645303795742/1129200523111841883).
We have also created a [email
listserv](https://airtable.com/shrHaErIs1j9F97BE) to notify developers
directly in the future about breaking changes.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Moving to the latest non-preview Azure OpenAI API version=2023-05-15.
The previous 2023-03-15-preview doesn't have support, SLA etc. For
instance, OpenAI SDK has moved to this version
https://github.com/openai/openai-python/releases/tag/v0.27.7
@baskaryan
Description:
Currently, Zilliz only support dedicated clusters using a pair of
username and password for connection. Regarding serverless clusters,
they can connect to them by using API keys( [ see official note
detail](https://docs.zilliz.com/docs/manage-cluster-credentials)), so I
add API key(token) description in Zilliz docs to make it more obvious
and convenient for this group of users to better utilize Zilliz. No
changes done to code.
---------
Co-authored-by: Robin.Wang <3Jg$94sbQ@q1>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Azure GPT-4 models can't be accessed via LLM model. It's easy to miss
that and a lot of discussions about that are on the Internet. Therefore
I added a comment in Azure LLM docs that mentions that and points to
Azure Chat OpenAI docs.
@baskaryan
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Description: This PR adds the option to retrieve scores and explanations
in the WeaviateHybridSearchRetriever. This feature improves the
usability of the retriever by allowing users to understand the scoring
logic behind the search results and further refine their search queries.
Issue: This PR is a solution to the issue #7855
Dependencies: This PR does not introduce any new dependencies.
Tag maintainer: @rlancemartin, @eyurtsev
I have included a unit test for the added feature, ensuring that it
retrieves scores and explanations correctly. I have also included an
example notebook demonstrating its use.
Here I am adding documentation for the `PromptLayerCallbackHandler`.
When we created the initial PR for the callback handler the docs were
causing issues, so we merged without the docs.
Motivation, it seems that when dealing with a long context and "big"
number of relevant documents we must avoid using out of the box score
ordering from vector stores.
See: https://arxiv.org/pdf/2306.01150.pdf
So, I added an additional parameter that allows you to reorder the
retrieved documents so we can work around this performance degradation.
The relevance respect the original search score but accommodates the
lest relevant document in the middle of the context.
Extract from the paper (one image speaks 1000 tokens):
![image](https://github.com/hwchase17/langchain/assets/1821407/fafe4843-6e18-4fa6-9416-50cc1d32e811)
This seems to be common to all diff arquitectures. SO I think we need a
good generic way to implement this reordering and run some test in our
already running retrievers.
It could be that my approach is not the best one from the architecture
point of view, happy to have a discussion about that.
For me this was the best place to introduce the change and start
retesting diff implementations.
@rlancemartin, @eyurtsev
---------
Co-authored-by: Lance Martin <lance@langchain.dev>
Still don't have good "how to's", and the guides / examples section
could be further pruned and improved, but this PR adds a couple examples
for each of the common evaluator interfaces.
- [x] Example docs for each implemented evaluator
- [x] "how to make a custom evalutor" notebook for each low level APIs
(comparison, string, agent)
- [x] Move docs to modules area
- [x] Link to reference docs for more information
- [X] Still need to finish the evaluation index page
- ~[ ] Don't have good data generation section~
- ~[ ] Don't have good how to section for other common scenarios / FAQs
like regression testing, testing over similar inputs to measure
sensitivity, etc.~
- Description: Add a BM25 Retriever that do not need Elastic search
- Dependencies: rank_bm25(if it is not installed it will be install by
using pip, just like TFIDFRetriever do)
- Tag maintainer: @rlancemartin, @eyurtsev
- Twitter handle: DayuanJian21687
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Description:
Add LLM for ChatGLM-6B & ChatGLM2-6B API
Related Issue:
Will the langchain support ChatGLM? #4766
Add support for selfhost models like ChatGLM or transformer models #1780
Dependencies:
No extra library install required.
It wraps api call to a ChatGLM(2)-6B server(start with api.py), so api
endpoint is required to run.
Tag maintainer: @mlot
Any comments on this PR would be appreciated.
---------
Co-authored-by: mlot <limpo2000@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
# Support Redis Sentinel database connections
This PR adds the support to connect not only to Redis standalone servers
but High Availability Replication sets too
(https://redis.io/docs/management/sentinel/)
Redis Replica Sets have on Master allowing to write data and 2+ replicas
with read-only access to the data. The additional Redis Sentinel
instances monitor all server and reconfigure the RW-Master on the fly if
it comes unavailable.
Therefore all connections must be made through the Sentinels the query
the current master for a read-write connection. This PR adds basic
support to also allow a redis connection url specifying a Sentinel as
Redis connection.
Redis documentation and Jupyter notebook with Redis examples are updated
to mention how to connect to a redis Replica Set with Sentinels
-
Remark - i did not found test cases for Redis server connections to add
new cases here. Therefor i tests the new utility class locally with
different kind of setups to make sure different connection urls are
working as expected. But no test case here as part of this PR.
- [Xorbits](https://doc.xorbits.io/en/latest/) is an open-source
computing framework that makes it easy to scale data science and machine
learning workloads in parallel. Xorbits can leverage multi cores or GPUs
to accelerate computation on a single machine, or scale out up to
thousands of machines to support processing terabytes of data.
- This PR added support for the Xorbits agent, which allows langchain to
interact with Xorbits Pandas dataframe and Xorbits Numpy array.
- Dependencies: This change requires the Xorbits library to be installed
in order to be used.
`pip install xorbits`
- Request for review: @hinthornw
- Twitter handle: https://twitter.com/Xorbitsio
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
Integrate [Rockset](https://rockset.com/docs/) as a document loader.
Issue: None
Dependencies: Nothing new (rockset's dependency was already added
[here](https://github.com/hwchase17/langchain/pull/6216))
Tag maintainer: @rlancemartin
I have added a test for the integration and an example notebook showing
its use. I ran `make lint` and everything looks good.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
This pull request adds a ElasticsearchDatabaseChain chain for
interacting with analytics database, in the manner of the
SQLDatabaseChain.
Maintainer: @samber
Twitter handler: samuelberthe
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- Add langchain.llms.Tonyi for text completion, in examples into the
Tonyi Text API,
- Add system tests.
Note async completion for the Text API is not yet supported and will be
included in a future PR.
Dependencies: dashscope. It will be installed manually cause it is not
need by everyone.
Happy for feedback on any aspect of this PR @hwchase17 @baskaryan.
Multiple people have asked in #5081 for a way to limit the documents
returned from an AzureCognitiveSearchRetriever. This PR adds the `top_n`
parameter to allow that.
Twitter handle:
[@UmerHAdil](twitter.com/umerHAdil)
# Browserless
Added support for Browserless' `/content` endpoint as a document loader.
### About Browserless
Browserless is a cloud service that provides access to headless Chrome
browsers via a REST API. It allows developers to automate Chromium in a
serverless fashion without having to configure and maintain their own
Chrome infrastructure.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Lance Martin <lance@langchain.dev>
This PR is aimed at enhancing the clarity of the documentation in the
langchain project.
**Description**:
In the graphql.ipynb file, I have removed the unnecessary 'llm' argument
from the initialization process of the GraphQL tool (of type
_EXTRA_OPTIONAL_TOOLS). The 'llm' argument is not required for this
process. Its presence could potentially confuse users. This modification
simplifies the understanding of tool initialization and minimizes
potential confusion.
**Issue**: Not applicable, as this is a documentation improvement.
**Dependencies**: None.
**I kindly request a review from the following maintainer**: @hinthornw,
who is responsible for Agents / Tools / Toolkits.
No new integration is being added in this PR, hence no need for a test
or an example notebook.
Please see the changes for more detail and let me know if any further
modification is necessary.
Added fix to avoid irrelevant attributes being returned plus an example
of extracting unrelated entities and an exampe of using an 'extra_info'
attribute to extract unstructured data for an entity.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- Description: Add two new document transformers that translates
documents into different languages and converts documents into q&a
format to improve vector search results. Uses OpenAI function calling
via the [doctran](https://github.com/psychic-api/doctran/tree/main)
library.
- Issue: N/A
- Dependencies: `doctran = "^0.0.5"`
- Tag maintainer: @rlancemartin @eyurtsev @hwchase17
- Twitter handle: @psychicapi or @jfan001
Notes
- Adheres to the `DocumentTransformer` abstraction set by @dev2049 in
#3182
- refactored `EmbeddingsRedundantFilter` to put it in a file under a new
`document_transformers` module
- Added basic docs for `DocumentInterrogator`, `DocumentTransformer` as
well as the existing `EmbeddingsRedundantFilter`
---------
Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Probably the most boring PR to review ;)
Individual commits might be easier to digest
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- Description: Adds a new chain that acts as a wrapper around Sympy to
give LLMs the ability to do some symbolic math.
- Dependencies: SymPy
---------
Co-authored-by: sreiswig <sreiswig@github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- Description: add wrapper that lets you use KoboldAI api in langchain
- Issue: n/a
- Dependencies: none extra, just what exists in lanchain
- Tag maintainer: @baskaryan
- Twitter handle: @zanzibased
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description: a description of the change**
Fixed `make docs_build` and related scripts which caused errors. There
are several changes.
First, I made the build of the documentation and the API Reference into
two separate commands. This is because it takes less time to build. The
commands for documents are `make docs_build`, `make docs_clean`, and
`make docs_linkcheck`. The commands for API Reference are `make
api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`.
It looked like `docs/.local_build.sh` could be used to build the
documentation, so I used that. Since `.local_build.sh` was also building
API Rerefence internally, I removed that process. `.local_build.sh` also
added some Bash options to stop in error or so. Futher more added `cd
"${SCRIPT_DIR}"` at the beginning so that the script will work no matter
which directory it is executed in.
`docs/api_reference/api_reference.rst` is removed, because which is
generated by `docs/api_reference/create_api_rst.py`, and added it to
.gitignore.
Finally, the description of CONTRIBUTING.md was modified.
**Issue: the issue # it fixes (if applicable)**
https://github.com/hwchase17/langchain/issues/6413
**Dependencies: any dependencies required for this change**
`nbdoc` was missing in group docs so it was added. I installed it with
the `poetry add --group docs nbdoc` command. I am concerned if any
modifications are needed to poetry.lock. I would greatly appreciate it
if you could pay close attention to this file during the review.
**Tag maintainer**
- General / Misc / if you don't know who to tag: @baskaryan
If this PR needs any additional changes, I'll be happy to make them!
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Description: I added an example of how to reference the OpenAI API
Organization ID, because I couldn't find it before. In the example, it
is mentioned how to achieve this using environment variables as well as
parameters for the OpenAI()-class
Issue: -
Dependencies: -
Twitter @schop-rob
This PR changes the behavior of `Qdrant.from_texts` so the collection is
reused if not requested to recreate it. Previously, calling
`Qdrant.from_texts` or `Qdrant.from_documents` resulted in removing the
old data which was confusing for many.
- Description: Added notebook to LangChain docs that explains how to use
Lemon AI NLP Workflow Automation tool with Langchain
- Issue: not applicable
- Dependencies: not applicable
- Tag maintainer: @agola11
- Twitter handle: felixbrockm