Commit Graph

92 Commits (docker)

Author SHA1 Message Date
Eugene Yurtsev e46202829f
feat #4479: TextLoader auto detect encoding and improved exceptions (#4927)
# TextLoader auto detect encoding and enhanced exception handling

- Add an option to enable encoding detection on `TextLoader`. 
- The detection is done using `chardet`
- The loading is done by trying all detected encodings by order of
confidence or raise an exception otherwise.

### New Dependencies:
- `chardet`

Fixes #4479 

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

- @eyurtsev

---------

Co-authored-by: blob42 <spike@w530>
1 year ago
Davis Chase 8966f61ca5
Zep memory (#4898)
Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>
1 year ago
Eugene Yurtsev c5ab9782c6
Add beautiful soup 4 to extended testing extra (#4869)
# Add bs4 to extended testing extra

Updating extended testing extra in preparation for more refactors.
1 year ago
Adam Quigley e78c9be312
Add Confluence Loader unit tests (#3333)
Adds some basic unit tests for the ConfluenceLoader that can be extended
later. Ports this [PR from
llama-hub](https://github.com/emptycrown/llama-hub/pull/208) and adapts
it to `langchain`.

@Jflick58 and @zywilliamli adding you here as potential reviewers

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Raduan Al-Shedivat 00c6ec8a2d
fix(document_loaders/telegram): fix pandas calls + add tests (#4806)
# Fix Telegram API loader + add tests.
I was testing this integration and it was broken with next error:
```python
message_threads = loader._get_message_threads(df)
KeyError: False
```
Also, this particular loader didn't have any tests / related group in
poetry, so I added those as well.

@hwchase17 / @eyurtsev please take a look on this fix PR.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Eugene Yurtsev c3b6129beb
Block sockets for unit-tests (#4803)
# Block usage of sockets during unit tests

Catch any tests that attempt to use the network.
1 year ago
Eugene Yurtsev d403f659ea
Update google protobuf dep (#4798)
# Update google protobuf dep

Resolve: https://github.com/hwchase17/langchain/security/dependabot/11
1 year ago
Eugene Yurtsev 3ecd7c9641
Add check to verify poetry.toml (#4794)
# Add poetry check to github action

Check poetry toml file during tests for errors
1 year ago
Eugene Yurtsev 14bedf1cc5
Github Action: Fix poetry lock file checking (#4789)
Fix how poetry lock file is checked to avoid skipping caches silently.
1 year ago
Roma cb802edf75
[Feature] Add GraphQL Query Tool (#4409)
# Add GraphQL Query Support

This PR introduces a GraphQL API Wrapper tool that allows LLM agents to
query GraphQL databases. The tool utilizes the httpx and gql Python
packages to interact with GraphQL APIs and provides a simple interface
for running queries with LLM agents.

@vowelparrot

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Eugene Yurtsev 09587a3201
Clean up tests for pdf parsers (#4595)
# Organize tests for pdf parsers

Clean up tests for pdf parsers, remove duplicate tests, convert to unit
tests.
1 year ago
Eugene Yurtsev 3c490b5ba3
Docugami DataLoader (#4727)
### Adds a document loader for Docugami

Specifically:

1. Adds a data loader that talks to the [Docugami](http://docugami.com)
API to download processed documents as semantic XML
2. Parses the semantic XML into chunks, with additional metadata
capturing chunk semantics
3. Adds a detailed notebook showing how you can use additional metadata
returned by Docugami for techniques like the [self-querying
retriever](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query_retriever.html)
4. Adds an integration test, and related documentation

Here is an example of a result that is not possible without the
capabilities added by Docugami (from the notebook):

<img width="1585" alt="image"
src="https://github.com/hwchase17/langchain/assets/749277/bb6c1ce3-13dc-4349-a53b-de16681fdd5b">

---------

Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
Co-authored-by: Taqi Jaffri <tjaffri@gmail.com>
1 year ago
Harrison Chase cdc20d1203
Harrison/json loader fix (#4686)
Co-authored-by: Triet Le <112841660+triet-lq-holistics@users.noreply.github.com>
1 year ago
Eugene Yurtsev 08ed927c32
Turn on extended tests (#4588)
# Turn on strict extended tests

This PR turns on strict testing for extended tests.
1 year ago
Zander Chase d96f6a106b
Add Steamship Image Generation Tool (#4580)
Co-authored-by: Enias Cailliau <enias@steamship.com>
1 year ago
Davis Chase 46b100ea63
Add DocArray vector stores (#4483)
Thanks to @anna-charlotte and @jupyterjazz for the contribution! Made
few small changes to get it across the finish line

---------

Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
Co-authored-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Co-authored-by: jupyterjazz <saba.sturua@jina.ai>
Co-authored-by: Saba Sturua <45267439+jupyterjazz@users.noreply.github.com>
1 year ago
Eugene Yurtsev 80558b5b27
Add workflow for testing with all deps (#4410)
# Add action to test with all dependencies installed

PR adds a custom action for setting up poetry that allows specifying a
cache key:
https://github.com/actions/setup-python/issues/505#issuecomment-1273013236

This makes it possible to run 2 types of unit tests: 

(1) unit tests with only core dependencies
(2) unit tests with extended dependencies (e.g., those that rely on an
optional pdf parsing library)


As part of this PR, we're moving some pdf parsing tests into the
unit-tests section and making sure that these unit tests get executed
when running with extended dependencies.
1 year ago
Aivin V. Solatorio 6567b73e1a
JSON loader (#4067)
This implements a loader of text passages in JSON format. The `jq`
syntax is used to define a schema for accessing the relevant contents
from the JSON file. This requires dependency on the `jq` package:
https://pypi.org/project/jq/.

---------

Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>
1 year ago
Harrison Chase fba6921b50
Harrison/one drive loader (#4081)
Co-authored-by: José Ferraz Neto <netoferraz@gmail.com>
1 year ago
Harrison Chase bd7e0a534c
Harrison/csv loader (#3771)
Co-authored-by: mrT23 <tal.r@codium.ai>
1 year ago
Harrison Chase c55ba43093
Harrison/vespa (#3761)
Co-authored-by: Lester Solbakken <lesters@users.noreply.github.com>
1 year ago
Davis Chase b807a114e4
Add query parsing unit tests (#3672) 1 year ago
Davis Chase 3b609642ae
Self-query with generic query constructor (#3607)
Alternate implementation of #3452 that relies on a generic query
constructor chain and language and then has vector store-specific
translation layer. Still refactoring and updating examples but general
structure is there and seems to work s well as #3452 on exampels

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Harrison Chase a35bbbfa9e
Harrison/lancedb (#3634)
Co-authored-by: Minh Le <minhle@canva.com>
1 year ago
Eduard van Valkenburg a3e3f26090
Some more PowerBI pydantic and import fixes (#3461) 1 year ago
Eduard van Valkenburg ba7a5ac9d7
Azure CosmosDB memory (#3434)
Still needs docs, otherwise works.
1 year ago
Davit Buniatyan 2c0023393b
Deep Lake mini upgrades (#3375)
Improvements
* set default num_workers for ingestion to 0
* upgraded notebooks for avoiding dataset creation ambiguity
* added `force_delete_dataset_by_path`
* bumped deeplake to 3.3.0
* creds arg passing to deeplake object that would allow custom S3

Notes
* please double check if poetry is not messed up (thanks!)

Asks
* Would be great to create a shared slack channel for quick questions

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>
1 year ago
Harrison Chase a6664be79c
Harrison/myscale (#3352)
Co-authored-by: Fangrui Liu <fangruil@moqi.ai>
Co-authored-by: 刘 方瑞 <fangrui.liu@outlook.com>
Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>
1 year ago
Harrison Chase cc6fe18152
Harrison/power bi (#3205)
Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
1 year ago
Harrison Chase d2520a5f1e
Harrison/ddg (#3206)
Co-authored-by: itai <itai.marks@gmail.com>
Co-authored-by: Itai Marks <itaim@users.noreply.github.com>
Co-authored-by: Tianyi Pan <60060750+tipani86@users.noreply.github.com>
Co-authored-by: Tianyi Pan <tianyi.pan@clobotics.com>
Co-authored-by: Adilzhan Ismailov <13088690+aismlv@users.noreply.github.com>
Co-authored-by: Justin Flick <Justinjayflick@gmail.com>
Co-authored-by: Justin Flick <jflick@homesite.com>
1 year ago
Harrison Chase f19b3890c9
Harrison/site map tqdm (#3184)
Co-authored-by: Tianyi Pan <60060750+tipani86@users.noreply.github.com>
Co-authored-by: Tianyi Pan <tianyi.pan@clobotics.com>
1 year ago
Harrison Chase 68cd37175e
Harrison/arxiv tool (#3186)
Co-authored-by: leo-gan <leo.gan.57@gmail.com>
1 year ago
Harrison Chase afd3e70ae5
Harrison/confluent loader (#2994)
Co-authored-by: Justin Flick <Justinjayflick@gmail.com>
1 year ago
vowelparrot 5ca7ce77cd
Remove pythonrepl from LLM-MathChain (#2943)
Use numexpr evaluate instead of the python REPL to avoid malicious code
injection.

Tested against the (limited) math dataset and got the same score as
before.

For more permissive tools (like the REPL tool itself), other approaches
ought to be provided (some combination of Sanitizer + Restricted python
+ unprivileged-docker + ...), but for a calculator tool, only
mathematical expressions should be permitted.

See https://github.com/hwchase17/langchain/issues/814
1 year ago
Ankush Gola ec59e9d886
Fix ChatAnthropic stop_sequences error (#2919) (#2920)
Note to self: Always run integration tests, even on "that last minute
change you thought would be safe" :)

---------

Co-authored-by: Mike Lambert <mike.lambert@anthropic.com>
1 year ago
Harrison Chase 1e9378d0a8
Harrison/weaviate fixes (#2872)
Co-authored-by: cs0lar <cristiano.solarino@gmail.com>
Co-authored-by: cs0lar <cristiano.solarino@brightminded.com>
1 year ago
sergerdn 04c458a270
feat: improve pinecone tests (#2806)
Improve the integration tests for Pinecone by adding an `.env.example`
file for local testing. Additionally, add some dev dependencies
specifically for integration tests.

This change also helps me understand how Pinecone deals with certain
things, see related issues
https://github.com/hwchase17/langchain/issues/2484
https://github.com/hwchase17/langchain/issues/2816
1 year ago
Harrison Chase e49f1e628c
Harrison/gpt cache (#2744)
Co-authored-by: SimFG <bang.fu@zilliz.com>
1 year ago
Harrison Chase 507cee5ee5
Harrison/pinecone hybrid update (#2742)
Co-authored-by: acatav <39461369+acatav@users.noreply.github.com>
Co-authored-by: Amnon Catav <catav.amnon1@gmail.com>
1 year ago
sergerdn 4bdcedab54
fix: some imports for integration tests (#2612)
Add more missed imports for integration tests. Bump `pytest` to the
current latest version.
Fix `tests/integration_tests/vectorstores/test_elasticsearch.py` to
update its cassette(easy fix).

Related PR: https://github.com/hwchase17/langchain/pull/2560
1 year ago
sergerdn cd9336469e
fix: missed deps integrations tests (#2560)
Almost all integration tests have failed, but we haven't encountered any
import errors yet. Some tests failed due to lazy import issues. It
doesn't seem like a problem to resolve some of these errors in the next
PR.
I have a headache from resolving conflicts with `deeplake` and `boto3`,
so I will temporarily comment out `boto3`.


fix https://github.com/hwchase17/langchain/issues/2426
1 year ago
Kacper Łukawski d8967e28d0
Upgrade Qdrant to 1.1.2 (#2554)
This is a minor upgrade for Qdrant. We made a small bugfix in the local
mode, so it might also be good to upgrade Qdrant for LangChain users.
1 year ago
sergerdn 6dc86ad48f
feat: add pytest-vcr for recording HTTP interactions in integration tests (#2445)
Using `pytest-vcr` in integration tests has several benefits. Firstly,
it removes the need to mock external services, as VCR records and
replays HTTP interactions on the fly. Secondly, it simplifies the
integration test setup by eliminating the need to set up and tear down
external services in some cases. Finally, it allows for more reliable
and deterministic integration tests by ensuring that HTTP interactions
are always replayed with the same response.
Overall, `pytest-vcr` is a valuable tool for simplifying integration
test setup and improving their reliability

This commit adds the `pytest-vcr` package as a dependency for
integration tests in the `pyproject.toml` file. It also introduces two
new fixtures in `tests/integration_tests/conftest.py` files for managing
cassette directories and VCR configurations.

In addition, the
`tests/integration_tests/vectorstores/test_elasticsearch.py` file has
been updated to use the `@pytest.mark.vcr` decorator for recording and
replaying HTTP interactions.

Finally, this commit removes the `documents` fixture from the
`test_elasticsearch.py` file and replaces it with a new fixture defined
in `tests/integration_tests/vectorstores/conftest.py` that yields a list
of documents to use in any other tests.

This also includes my second attempt to fix issue :
https://github.com/hwchase17/langchain/issues/2386

Maybe related https://github.com/hwchase17/langchain/issues/2484
1 year ago
Harrison Chase 26314d7004
Harrison/openapi parser (#2461)
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
1 year ago
sergerdn b410dc76aa
fix: elasticsearch (#2402)
- Create a new docker-compose file to start an Elasticsearch instance
for integration tests.
- Add new tests to `test_elasticsearch.py` to verify Elasticsearch
functionality.
- Include an optional group `test_integration` in the `pyproject.toml`
file. This group should contain dependencies for integration tests and
can be installed using the command `poetry install --with
test_integration`. Any new dependencies should be added by running
`poetry add some_new_deps --group "test_integration" `

Note:
New tests running in live mode, which involve end-to-end testing of the
OpenAI API. In the future, adding `pytest-vcr` to record and replay all
API requests would be a nice feature for testing process.More info:
https://pytest-vcr.readthedocs.io/en/latest/

Fixes https://github.com/hwchase17/langchain/issues/2386
1 year ago
Kacper Łukawski 585f60a5aa
Qdrant update to 1.1.1 & docs polishing (#2388)
This PR updates Qdrant to 1.1.1 and introduces local mode, so there is
no need to spin up the Qdrant server. By that occasion, the Qdrant
example notebooks also got updated, covering more cases and answering
some commonly asked questions. All the Qdrant's integration tests were
switched to local mode, so no Docker container is required to launch
them.
1 year ago
sergerdn 870cd33701
fix: testing in Windows and add missing dev dependency (#2340)
This changes addresses two issues.

First, we add `setuptools` to the dev dependencies in order to debug
tests locally with an IDE, especially with PyCharm. All dependencies dev
dependencies should be installed with `poetry install --extras "dev"`.

Second, we use PurePosixPath instead of Path for URL paths to fix issues
with testing in Windows. This ensures that forward slashes are used as
the path separator regardless of the operating system.

Closes https://github.com/hwchase17/langchain/issues/2334
1 year ago
Mike Lambert 393cd3c796
Bump anthropic version (#2352)
Improves async support (and a few other bug fixes I'd prefer folks be
forced to grab)
1 year ago
Harrison Chase b35260ed47
Harrison/memory base (#2122)
@3coins + @zoltan-fedor.... heres the pr + some minor changes i made.
thoguhts? can try to get it into tmrws release

---------

Co-authored-by: Zoltan Fedor <zoltan.0.fedor@gmail.com>
Co-authored-by: Piyush Jain <piyushjain@duck.com>
1 year ago
Ankush Gola ccee1aedd2
add async support for anthropic (#2114)
should not be merged in before
https://github.com/anthropics/anthropic-sdk-python/pull/11 gets released
1 year ago