The `sql_database.py` is unnecessarily placed in the root code folder.
A similar code is usually placed in the `utilities/`.
As a byproduct of this placement, the sql_database is [placed on the top
level of classes in the API
Reference](https://api.python.langchain.com/en/latest/api_reference.html#module-langchain.sql_database)
which is confusing and not correct.
- moved the `sql_database.py` from the root code folder to the
`utilities/`
@baskaryan
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Fixed the bug causing: `TypeError: generate() got multiple values for
keyword argument 'stop_sequences'`
```python
res = await self.async_client.generate(
prompt,
**self._default_params,
stop_sequences=stop,
**kwargs,
)
```
The above throws an error because stop_sequences is in also in the
self._default_params.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
I've extended the support of async API to local Qdrant mode. It is faked
but allows prototyping without spinning a container. The tests are
improved to test the in-memory case as well.
@baskaryan @rlancemartin @eyurtsev @agola11
Redis cache currently stores model outputs as strings. Chat generations
have Messages which contain more information than just a string. Until
Redis cache supports fully storing messages, cache should not interact
with chat generations.
Streaming support is useful if you are doing long-running completions or
need interactivity e.g. for chat... adding it to replicate, using a
similar pattern to other LLMs that support streaming.
Housekeeping: I ran `make format` and `make lint`, no issues reported in
the files I touched.
I did update the replicate integration test but ran into some issues,
specifically:
1. The original test was failing for me due to the model argument not
being specified... perhaps this test is not regularly run? I fixed it by
adding a call to the lightweight hello world model which should not be
burdensome for replicate infra.
2. I couldn't get the `make integration_tests` command to pass... a lot
of failures in other integration tests due to missing dependencies...
however I did make sure the particluar test file I updated does pass, by
running `poetry run pytest
tests/integration_tests/llms/test_replicate.py`
Finally, I am @tjaffri https://twitter.com/tjaffri for feature
announcement tweets... or if you could please tag @docugami
https://twitter.com/docugami we would really appreciate that :-)
Tagging model maintainers @hwchase17 @baskaryan
Thank for all the awesome work you folks are doing.
---------
Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
`math_utils.py` is in the root code folder. This creates the
`langchain.math_utils: Math Utils` group on the API Reference navigation
ToC, on the same level with `Chains` and `Agents` which is not correct.
Refactoring:
- created the `utils/` folder
- moved `math_utils.py` to `utils/math.py`
- moved `utils.py` to `utils/utils.py`
- split `utils.py` into `utils.py, env.py, strings.py`
- added module description
@baskaryan
Integrating Portkey, which adds production features like caching,
tracing, tagging, retries, etc. to langchain apps.
- Dependencies: None
- Twitter handle: https://twitter.com/portkeyai
- test_portkey.py added for tests
- example notebook added in new utilities folder in modules
Also fixed a bug with OpenAIEmbeddings where headers weren't passing.
cc @baskaryan
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Golden Query is a wrapper on top of the [Golden Query
API](https://docs.golden.com/reference/query-api) which enables
programmatic access to query results on entities across Golden's
Knowledge Base. For more information about Golden API, please see the
[Golden API Getting
Started](https://docs.golden.com/reference/getting-started) page.
**Issue:** None
**Dependencies:** requests(already present in project)
**Tag maintainer:** @hinthornw
Signed-off-by: Constantin Musca <constantin.musca@gmail.com>
## Background
With the addition on email and calendar tools, LangChain is continuing
to complete its functionality to automate business processes.
## Challenge
One of the pieces of business functionality that LangChain currently
doesn't have is the ability to search for flights and travel in order to
book business travel.
## Changes
This PR implements an integration with the
[Amadeus](https://developers.amadeus.com/) travel search API for
LangChain, enabling seamless search for flights with a single
authentication process.
## Who can review?
@hinthornw
## Appendix
@tsolakoua and @minjikarin, I utilized your
[amadeus-python](https://github.com/amadeus4dev/amadeus-python) library
extensively. Given the rising popularity of LangChain and similar AI
frameworks, the convergence of libraries like amadeus-python and tools
like this one is likely. So, I wanted to keep you updated on our
progress.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Work in Progress.
WIP
Not ready...
Adds Document Loader support for
[Geopandas.GeoDataFrames](https://geopandas.org/)
Example:
- [x] stub out `GeoDataFrameLoader` class
- [x] stub out integration tests
- [ ] Experiment with different geometry text representations
- [ ] Verify CRS is successfully added in metadata
- [ ] Test effectiveness of searches on geometries
- [ ] Test with different geometry types (point, line, polygon with
multi-variants).
- [ ] Add documentation
---------
Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Lance Martin <122662504+rlancemartin@users.noreply.github.com>
Removing **kwargs argument from add_texts method in DeepLake vectorstore
as it confuses users and doesn't fail when user is typing incorrect
parameters.
Also added small test to ensure the change is applies correctly.
Guys could pls take a look: @rlancemartin, @eyurtsev, this is a small
PR.
Thx so much!
Description: This PR adds the option to retrieve scores and explanations
in the WeaviateHybridSearchRetriever. This feature improves the
usability of the retriever by allowing users to understand the scoring
logic behind the search results and further refine their search queries.
Issue: This PR is a solution to the issue #7855
Dependencies: This PR does not introduce any new dependencies.
Tag maintainer: @rlancemartin, @eyurtsev
I have included a unit test for the added feature, ensuring that it
retrieves scores and explanations correctly. I have also included an
example notebook demonstrating its use.
Motivation, it seems that when dealing with a long context and "big"
number of relevant documents we must avoid using out of the box score
ordering from vector stores.
See: https://arxiv.org/pdf/2306.01150.pdf
So, I added an additional parameter that allows you to reorder the
retrieved documents so we can work around this performance degradation.
The relevance respect the original search score but accommodates the
lest relevant document in the middle of the context.
Extract from the paper (one image speaks 1000 tokens):
![image](https://github.com/hwchase17/langchain/assets/1821407/fafe4843-6e18-4fa6-9416-50cc1d32e811)
This seems to be common to all diff arquitectures. SO I think we need a
good generic way to implement this reordering and run some test in our
already running retrievers.
It could be that my approach is not the best one from the architecture
point of view, happy to have a discussion about that.
For me this was the best place to introduce the change and start
retesting diff implementations.
@rlancemartin, @eyurtsev
---------
Co-authored-by: Lance Martin <lance@langchain.dev>
Some docstring / small nits to #6003
---------
Co-authored-by: BoazWasserman <49598618+boazwasserman@users.noreply.github.com>
Co-authored-by: HippoTerrific <49598618+HippoTerrific@users.noreply.github.com>
Co-authored-by: Or Raz <orraz1994@gmail.com>
- Description: Add a BM25 Retriever that do not need Elastic search
- Dependencies: rank_bm25(if it is not installed it will be install by
using pip, just like TFIDFRetriever do)
- Tag maintainer: @rlancemartin, @eyurtsev
- Twitter handle: DayuanJian21687
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Description:
Add LLM for ChatGLM-6B & ChatGLM2-6B API
Related Issue:
Will the langchain support ChatGLM? #4766
Add support for selfhost models like ChatGLM or transformer models #1780
Dependencies:
No extra library install required.
It wraps api call to a ChatGLM(2)-6B server(start with api.py), so api
endpoint is required to run.
Tag maintainer: @mlot
Any comments on this PR would be appreciated.
---------
Co-authored-by: mlot <limpo2000@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- [Xorbits](https://doc.xorbits.io/en/latest/) is an open-source
computing framework that makes it easy to scale data science and machine
learning workloads in parallel. Xorbits can leverage multi cores or GPUs
to accelerate computation on a single machine, or scale out up to
thousands of machines to support processing terabytes of data.
- This PR added support for the Xorbits agent, which allows langchain to
interact with Xorbits Pandas dataframe and Xorbits Numpy array.
- Dependencies: This change requires the Xorbits library to be installed
in order to be used.
`pip install xorbits`
- Request for review: @hinthornw
- Twitter handle: https://twitter.com/Xorbitsio
Starting over from #5654 because I utterly borked the poetry.lock file.
Adds new paramerters for to the MWDumpLoader class:
* skip_redirecst (bool) Tells the loader to skip articles that redirect
to other articles. False by default.
* stop_on_error (bool) Tells the parser to skip any page that causes a
parse error. True by default.
* namespaces (List[int]) Tells the parser which namespaces to parse.
Contains namespaces from -2 to 15 by default.
Default values are chosen to preserve backwards compatibility.
Sample dump XML and full unit test coverage (with extended tests that
pass!) also included!
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Inspired by #5550, I implemented full async API support in Qdrant. The
docs were extended to mention the existence of asynchronous operations
in Langchain. I also used that chance to restructure the tests of Qdrant
and provided a suite of tests for the async version. Async API requires
the GRPC protocol to be enabled. Thus, it doesn't work on local mode
yet, but we're considering including the support to be consistent.
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
Integrate [Rockset](https://rockset.com/docs/) as a document loader.
Issue: None
Dependencies: Nothing new (rockset's dependency was already added
[here](https://github.com/hwchase17/langchain/pull/6216))
Tag maintainer: @rlancemartin
I have added a test for the integration and an example notebook showing
its use. I ran `make lint` and everything looks good.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- Add langchain.llms.Tonyi for text completion, in examples into the
Tonyi Text API,
- Add system tests.
Note async completion for the Text API is not yet supported and will be
included in a future PR.
Dependencies: dashscope. It will be installed manually cause it is not
need by everyone.
Happy for feedback on any aspect of this PR @hwchase17 @baskaryan.
Multiple people have asked in #5081 for a way to limit the documents
returned from an AzureCognitiveSearchRetriever. This PR adds the `top_n`
parameter to allow that.
Twitter handle:
[@UmerHAdil](twitter.com/umerHAdil)
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description:
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
## Description
This PR addresses a bug in the RecursiveUrlLoader class where absolute
URLs were being treated as relative URLs, causing malformed URLs to be
produced. The fix involves using the urljoin function from the
urllib.parse module to correctly handle both absolute and relative URLs.
@rlancemartin @eyurtsev
---------
Co-authored-by: Lance Martin <lance@langchain.dev>
Fixes # (issue)
The existing PlaywrightURLLoader load() function uses a synchronous
browser which is not compatible with jupyter.
This PR adds a sister function aload() which can be run insisde a
notebook.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- Migrate from deprecated langchainplus_sdk to `langsmith` package
- Update the `run_on_dataset()` API to use an eval config
- Update a number of evaluators, as well as the loading logic
- Update docstrings / reference docs
- Update tracer to share single HTTP session
Sometimes the score responded by chatgpt would be like 'Respone
example\nScore: 90 (fully answers the question, but could provide more
detail on the specific error message)'
For the score contains not only numbers, it raise a ValueError like
Update the RegexParser from `.*` to `\d*` would help us to ignore the
text after number.
Co-authored-by: Bagatur <baskaryan@gmail.com>
Fixed#6768.
This is a workaround only. I think a better longer-term solution is for
chains to declare how many input variables they *actually* need (as
opposed to ones that are in the prompt, where some may be satisfied by
the memory). Then, a wrapping chain can check the input match against
the actual input variables.
@hwchase17
- Description: Add two new document transformers that translates
documents into different languages and converts documents into q&a
format to improve vector search results. Uses OpenAI function calling
via the [doctran](https://github.com/psychic-api/doctran/tree/main)
library.
- Issue: N/A
- Dependencies: `doctran = "^0.0.5"`
- Tag maintainer: @rlancemartin @eyurtsev @hwchase17
- Twitter handle: @psychicapi or @jfan001
Notes
- Adheres to the `DocumentTransformer` abstraction set by @dev2049 in
#3182
- refactored `EmbeddingsRedundantFilter` to put it in a file under a new
`document_transformers` module
- Added basic docs for `DocumentInterrogator`, `DocumentTransformer` as
well as the existing `EmbeddingsRedundantFilter`
---------
Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Probably the most boring PR to review ;)
Individual commits might be easier to digest
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- Description: Adds a new chain that acts as a wrapper around Sympy to
give LLMs the ability to do some symbolic math.
- Dependencies: SymPy
---------
Co-authored-by: sreiswig <sreiswig@github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
When using callbacks, there are times when callbacks can be added
redundantly: for instance sometimes you might need to create an llm with
specific callbacks, but then also create and agent that uses a chain
that has those callbacks already set. This means that "callbacks" might
get passed down again to the llm at predict() time, resulting in
duplicate calls to the `on_llm_start` callback.
For the sake of simplicity, I made it so that langchain never adds an
exact handler/callbacks object in `add_handler`, thus avoiding the
duplicate handler issue.
Tagging @hwchase17 for callback review
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Currently `ChatOutputParser` extracts actions by splitting the text on
"```", and then load the second part as a json string.
But sometimes the LLM will wrap the action in markdown code block like:
````markdown
```json
{
"action": "foo",
"action_input": "bar"
}
```
````
Splitting text on "```" will cause `OutputParserException` in such case.
This PR changes the behaviour to extract the `$JSON_BLOB` by regex, so
that it can handle both ` ``` ``` ` and ` ```json ``` `
@hinthornw
---------
Co-authored-by: Junlin Zhou <jlzhou@zjuici.com>
This PR changes the behavior of `Qdrant.from_texts` so the collection is
reused if not requested to recreate it. Previously, calling
`Qdrant.from_texts` or `Qdrant.from_documents` resulted in removing the
old data which was confusing for many.