langchain

Author	SHA1	Message	Date
Michael Landis	6eacd88ae7	fix: revert docarray explicit transitive dependencies and use extras instead (#5015 ) tldr: The docarray [integration PR](https://github.com/hwchase17/langchain/pull/4483) introduced a pinned dependency to protobuf. This is a docarray dependency, not a langchain dependency. Since this is handled by the docarray dependencies, it is unnecessary here. Further, as a pinned dependency, this quickly leads to incompatibilities with application code that consumes the library. Much less with a heavily used library like protobuf. Detail: as we see in the [docarray integration](https://github.com/hwchase17/langchain/pull/4483/files#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711R81-R83), the transitive dependencies of docarray were also listed as langchain dependencies. This is unnecessary as the docarray project has an appropriate [extras](`a01a05542d/pyproject.toml (L70)`). The docarray project also does not require this _pinned_ version of protobuf, rather [a minimum version](`a01a05542d/pyproject.toml (L41)`). So this pinned version was likely in error. To fix this, this PR reverts the explicit hnswlib and protobuf dependencies and adds the hnswlib extras install for docarray (which installs hnswlib and protobuf, as originally intended). Because version `0.32.0` of the docarray hnswlib extras added protobuf, we bump the docarray dependency from `^0.31.0` to `^0.32.0`. # revert docarray explicit transitive dependencies and use extras instead ## Who can review? @dev2049 -- reviewed the original PR @eyurtsev -- bumped the pinned protobuf dependency a few days ago --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-22 12:48:09 -04:00
Harrison Chase	10ba201d05	Harrison/neo4j (#5078 ) Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-22 07:31:48 -07:00
Harrison Chase	b0431c672b	Harrison/psychic (#5063 ) Co-authored-by: Ayan Bandyopadhyay <ayanb9440@gmail.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-21 09:13:20 -07:00
Davis Chase	080eb1b3fc	Fix graphql tool (#4984 ) Fix construction and add unit test.	2023-05-19 15:27:50 -07:00
Mike McGarry	ddd595fe81	feature/4493 Improve Evernote Document Loader (#4577 ) # Improve Evernote Document Loader When exporting from Evernote you may export more than one note. Currently the Evernote loader concatenates the content of all notes in the export into a single document and only attaches the name of the export file as metadata on the document. This change ensures that each note is loaded as an independent document and all available metadata on the note e.g. author, title, created, updated are added as metadata on each document. It also uses an existing optional dependency of `html2text` instead of `pypandoc` to remove the need to download the pandoc application via `download_pandoc()` to be able to use the `pypandoc` python bindings. Fixes #4493 Co-authored-by: Mike McGarry <mike.mcgarry@finbourne.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-19 14:28:17 -07:00
Eugene Yurtsev	e46202829f	feat #4479 : TextLoader auto detect encoding and improved exceptions (#4927 ) # TextLoader auto detect encoding and enhanced exception handling - Add an option to enable encoding detection on `TextLoader`. - The detection is done using `chardet` - The loading is done by trying all detected encodings by order of confidence or raise an exception otherwise. ### New Dependencies: - `chardet` Fixes #4479 ## Before submitting <!-- If you're adding a new integration, include an integration test and an example notebook showing its use! --> ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: - @eyurtsev --------- Co-authored-by: blob42 <spike@w530>	2023-05-18 09:55:14 -04:00
Davis Chase	8966f61ca5	Zep memory (#4898 ) Co-authored-by: Daniel Chalef <daniel.chalef@private.org> Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>	2023-05-17 20:01:01 -07:00
Eugene Yurtsev	c5ab9782c6	Add beautiful soup 4 to extended testing extra (#4869 ) # Add bs4 to extended testing extra Updating extended testing extra in preparation for more refactors.	2023-05-17 14:11:26 -04:00
Adam Quigley	e78c9be312	Add Confluence Loader unit tests (#3333 ) Adds some basic unit tests for the ConfluenceLoader that can be extended later. Ports this [PR from llama-hub](https://github.com/emptycrown/llama-hub/pull/208) and adapts it to `langchain`. @Jflick58 and @zywilliamli adding you here as potential reviewers --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-16 15:17:07 -07:00
Raduan Al-Shedivat	00c6ec8a2d	fix(document_loaders/telegram): fix pandas calls + add tests (#4806 ) # Fix Telegram API loader + add tests. I was testing this integration and it was broken with next error: ```python message_threads = loader._get_message_threads(df) KeyError: False ``` Also, this particular loader didn't have any tests / related group in poetry, so I added those as well. @hwchase17 / @eyurtsev please take a look on this fix PR. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-16 14:35:25 -07:00
Eugene Yurtsev	c3b6129beb	Block sockets for unit-tests (#4803 ) # Block usage of sockets during unit tests Catch any tests that attempt to use the network.	2023-05-16 14:41:24 -04:00
Eugene Yurtsev	d403f659ea	Update google protobuf dep (#4798 ) # Update google protobuf dep Resolve: https://github.com/hwchase17/langchain/security/dependabot/11	2023-05-16 12:25:07 -04:00
Eugene Yurtsev	3ecd7c9641	Add check to verify poetry.toml (#4794 ) # Add poetry check to github action Check poetry toml file during tests for errors	2023-05-16 11:53:06 -04:00
Eugene Yurtsev	14bedf1cc5	Github Action: Fix poetry lock file checking (#4789 ) Fix how poetry lock file is checked to avoid skipping caches silently.	2023-05-16 11:40:28 -04:00
Roma	cb802edf75	[Feature] Add GraphQL Query Tool (#4409 ) # Add GraphQL Query Support This PR introduces a GraphQL API Wrapper tool that allows LLM agents to query GraphQL databases. The tool utilizes the httpx and gql Python packages to interact with GraphQL APIs and provides a simple interface for running queries with LLM agents. @vowelparrot --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-15 14:06:12 -07:00
Eugene Yurtsev	09587a3201	Clean up tests for pdf parsers (#4595 ) # Organize tests for pdf parsers Clean up tests for pdf parsers, remove duplicate tests, convert to unit tests.	2023-05-15 14:21:05 -04:00
Eugene Yurtsev	3c490b5ba3	Docugami DataLoader (#4727 ) ### Adds a document loader for Docugami Specifically: 1. Adds a data loader that talks to the [Docugami](http://docugami.com) API to download processed documents as semantic XML 2. Parses the semantic XML into chunks, with additional metadata capturing chunk semantics 3. Adds a detailed notebook showing how you can use additional metadata returned by Docugami for techniques like the [self-querying retriever](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query_retriever.html) 4. Adds an integration test, and related documentation Here is an example of a result that is not possible without the capabilities added by Docugami (from the notebook): <img width="1585" alt="image" src="https://github.com/hwchase17/langchain/assets/749277/bb6c1ce3-13dc-4349-a53b-de16681fdd5b"> --------- Co-authored-by: Taqi Jaffri <tjaffri@docugami.com> Co-authored-by: Taqi Jaffri <tjaffri@gmail.com>	2023-05-15 10:53:00 -04:00
Harrison Chase	cdc20d1203	Harrison/json loader fix (#4686 ) Co-authored-by: Triet Le <112841660+triet-lq-holistics@users.noreply.github.com>	2023-05-14 18:25:59 -07:00
Eugene Yurtsev	08ed927c32	Turn on extended tests (#4588 ) # Turn on strict extended tests This PR turns on strict testing for extended tests.	2023-05-12 14:50:08 -04:00
Zander Chase	d96f6a106b	Add Steamship Image Generation Tool (#4580 ) Co-authored-by: Enias Cailliau <enias@steamship.com>	2023-05-12 10:35:01 -07:00
Davis Chase	46b100ea63	Add DocArray vector stores (#4483 ) Thanks to @anna-charlotte and @jupyterjazz for the contribution! Made few small changes to get it across the finish line --------- Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> Signed-off-by: jupyterjazz <saba.sturua@jina.ai> Co-authored-by: anna-charlotte <charlotte.gerhaher@jina.ai> Co-authored-by: jupyterjazz <saba.sturua@jina.ai> Co-authored-by: Saba Sturua <45267439+jupyterjazz@users.noreply.github.com>	2023-05-10 15:22:16 -07:00
Eugene Yurtsev	80558b5b27	Add workflow for testing with all deps (#4410 ) # Add action to test with all dependencies installed PR adds a custom action for setting up poetry that allows specifying a cache key: https://github.com/actions/setup-python/issues/505#issuecomment-1273013236 This makes it possible to run 2 types of unit tests: (1) unit tests with only core dependencies (2) unit tests with extended dependencies (e.g., those that rely on an optional pdf parsing library) As part of this PR, we're moving some pdf parsing tests into the unit-tests section and making sure that these unit tests get executed when running with extended dependencies.	2023-05-10 09:35:07 -04:00
Aivin V. Solatorio	6567b73e1a	JSON loader (#4067 ) This implements a loader of text passages in JSON format. The `jq` syntax is used to define a schema for accessing the relevant contents from the JSON file. This requires dependency on the `jq` package: https://pypi.org/project/jq/. --------- Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>	2023-05-05 14:48:13 -07:00
Harrison Chase	fba6921b50	Harrison/one drive loader (#4081 ) Co-authored-by: José Ferraz Neto <netoferraz@gmail.com>	2023-05-03 22:55:34 -07:00
Harrison Chase	bd7e0a534c	Harrison/csv loader (#3771 ) Co-authored-by: mrT23 <tal.r@codium.ai>	2023-04-28 21:54:24 -07:00
Harrison Chase	c55ba43093	Harrison/vespa (#3761 ) Co-authored-by: Lester Solbakken <lesters@users.noreply.github.com>	2023-04-28 19:48:43 -07:00
Davis Chase	b807a114e4	Add query parsing unit tests (#3672 )	2023-04-27 13:42:12 -07:00
Davis Chase	3b609642ae	Self-query with generic query constructor (#3607 ) Alternate implementation of #3452 that relies on a generic query constructor chain and language and then has vector store-specific translation layer. Still refactoring and updating examples but general structure is there and seems to work s well as #3452 on exampels --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-27 08:36:00 -07:00
Harrison Chase	a35bbbfa9e	Harrison/lancedb (#3634 ) Co-authored-by: Minh Le <minhle@canva.com>	2023-04-27 08:14:36 -07:00
Eduard van Valkenburg	a3e3f26090	Some more PowerBI pydantic and import fixes (#3461 )	2023-04-26 22:09:12 -07:00
Eduard van Valkenburg	ba7a5ac9d7	Azure CosmosDB memory (#3434 ) Still needs docs, otherwise works.	2023-04-24 22:15:12 -07:00
Davit Buniatyan	2c0023393b	Deep Lake mini upgrades (#3375 ) Improvements * set default num_workers for ingestion to 0 * upgraded notebooks for avoiding dataset creation ambiguity * added `force_delete_dataset_by_path` * bumped deeplake to 3.3.0 * creds arg passing to deeplake object that would allow custom S3 Notes * please double check if poetry is not messed up (thanks!) Asks * Would be great to create a shared slack channel for quick questions --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>	2023-04-23 21:23:54 -07:00
Harrison Chase	a6664be79c	Harrison/myscale (#3352 ) Co-authored-by: Fangrui Liu <fangruil@moqi.ai> Co-authored-by: 刘方瑞 <fangrui.liu@outlook.com> Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>	2023-04-22 09:17:38 -07:00
Harrison Chase	cc6fe18152	Harrison/power bi (#3205 ) Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>	2023-04-22 08:24:48 -07:00
Harrison Chase	d2520a5f1e	Harrison/ddg (#3206 ) Co-authored-by: itai <itai.marks@gmail.com> Co-authored-by: Itai Marks <itaim@users.noreply.github.com> Co-authored-by: Tianyi Pan <60060750+tipani86@users.noreply.github.com> Co-authored-by: Tianyi Pan <tianyi.pan@clobotics.com> Co-authored-by: Adilzhan Ismailov <13088690+aismlv@users.noreply.github.com> Co-authored-by: Justin Flick <Justinjayflick@gmail.com> Co-authored-by: Justin Flick <jflick@homesite.com>	2023-04-19 21:32:26 -07:00
Harrison Chase	f19b3890c9	Harrison/site map tqdm (#3184 ) Co-authored-by: Tianyi Pan <60060750+tipani86@users.noreply.github.com> Co-authored-by: Tianyi Pan <tianyi.pan@clobotics.com>	2023-04-19 20:48:47 -07:00
Harrison Chase	68cd37175e	Harrison/arxiv tool (#3186 ) Co-authored-by: leo-gan <leo.gan.57@gmail.com>	2023-04-19 16:53:34 -07:00
Harrison Chase	afd3e70ae5	Harrison/confluent loader (#2994 ) Co-authored-by: Justin Flick <Justinjayflick@gmail.com>	2023-04-17 20:23:45 -07:00
vowelparrot	5ca7ce77cd	Remove pythonrepl from LLM-MathChain (#2943 ) Use numexpr evaluate instead of the python REPL to avoid malicious code injection. Tested against the (limited) math dataset and got the same score as before. For more permissive tools (like the REPL tool itself), other approaches ought to be provided (some combination of Sanitizer + Restricted python + unprivileged-docker + ...), but for a calculator tool, only mathematical expressions should be permitted. See https://github.com/hwchase17/langchain/issues/814	2023-04-16 08:50:32 -07:00
Ankush Gola	ec59e9d886	Fix ChatAnthropic stop_sequences error (#2919 ) (#2920 ) Note to self: Always run integration tests, even on "that last minute change you thought would be safe" :) --------- Co-authored-by: Mike Lambert <mike.lambert@anthropic.com>	2023-04-14 17:22:01 -07:00
Harrison Chase	1e9378d0a8	Harrison/weaviate fixes (#2872 ) Co-authored-by: cs0lar <cristiano.solarino@gmail.com> Co-authored-by: cs0lar <cristiano.solarino@brightminded.com>	2023-04-13 22:37:34 -07:00
sergerdn	04c458a270	feat: improve pinecone tests (#2806 ) Improve the integration tests for Pinecone by adding an `.env.example` file for local testing. Additionally, add some dev dependencies specifically for integration tests. This change also helps me understand how Pinecone deals with certain things, see related issues https://github.com/hwchase17/langchain/issues/2484 https://github.com/hwchase17/langchain/issues/2816	2023-04-13 21:49:31 -07:00
Harrison Chase	e49f1e628c	Harrison/gpt cache (#2744 ) Co-authored-by: SimFG <bang.fu@zilliz.com>	2023-04-12 14:16:58 -07:00
Harrison Chase	507cee5ee5	Harrison/pinecone hybrid update (#2742 ) Co-authored-by: acatav <39461369+acatav@users.noreply.github.com> Co-authored-by: Amnon Catav <catav.amnon1@gmail.com>	2023-04-11 21:32:17 -07:00
sergerdn	4bdcedab54	fix: some imports for integration tests (#2612 ) Add more missed imports for integration tests. Bump `pytest` to the current latest version. Fix `tests/integration_tests/vectorstores/test_elasticsearch.py` to update its cassette(easy fix). Related PR: https://github.com/hwchase17/langchain/pull/2560	2023-04-11 20:45:36 -07:00
sergerdn	cd9336469e	fix: missed deps integrations tests (#2560 ) Almost all integration tests have failed, but we haven't encountered any import errors yet. Some tests failed due to lazy import issues. It doesn't seem like a problem to resolve some of these errors in the next PR. I have a headache from resolving conflicts with `deeplake` and `boto3`, so I will temporarily comment out `boto3`. fix https://github.com/hwchase17/langchain/issues/2426	2023-04-07 20:43:53 -07:00
Kacper Łukawski	d8967e28d0	Upgrade Qdrant to 1.1.2 (#2554 ) This is a minor upgrade for Qdrant. We made a small bugfix in the local mode, so it might also be good to upgrade Qdrant for LangChain users.	2023-04-07 12:24:32 -07:00
sergerdn	6dc86ad48f	feat: add pytest-vcr for recording HTTP interactions in integration tests (#2445 ) Using `pytest-vcr` in integration tests has several benefits. Firstly, it removes the need to mock external services, as VCR records and replays HTTP interactions on the fly. Secondly, it simplifies the integration test setup by eliminating the need to set up and tear down external services in some cases. Finally, it allows for more reliable and deterministic integration tests by ensuring that HTTP interactions are always replayed with the same response. Overall, `pytest-vcr` is a valuable tool for simplifying integration test setup and improving their reliability This commit adds the `pytest-vcr` package as a dependency for integration tests in the `pyproject.toml` file. It also introduces two new fixtures in `tests/integration_tests/conftest.py` files for managing cassette directories and VCR configurations. In addition, the `tests/integration_tests/vectorstores/test_elasticsearch.py` file has been updated to use the `@pytest.mark.vcr` decorator for recording and replaying HTTP interactions. Finally, this commit removes the `documents` fixture from the `test_elasticsearch.py` file and replaces it with a new fixture defined in `tests/integration_tests/vectorstores/conftest.py` that yields a list of documents to use in any other tests. This also includes my second attempt to fix issue : https://github.com/hwchase17/langchain/issues/2386 Maybe related https://github.com/hwchase17/langchain/issues/2484	2023-04-07 07:28:57 -07:00
Harrison Chase	26314d7004	Harrison/openapi parser (#2461 ) Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>	2023-04-05 22:19:09 -07:00
sergerdn	b410dc76aa	fix: elasticsearch (#2402 ) - Create a new docker-compose file to start an Elasticsearch instance for integration tests. - Add new tests to `test_elasticsearch.py` to verify Elasticsearch functionality. - Include an optional group `test_integration` in the `pyproject.toml` file. This group should contain dependencies for integration tests and can be installed using the command `poetry install --with test_integration`. Any new dependencies should be added by running `poetry add some_new_deps --group "test_integration" ` Note: New tests running in live mode, which involve end-to-end testing of the OpenAI API. In the future, adding `pytest-vcr` to record and replay all API requests would be a nice feature for testing process.More info: https://pytest-vcr.readthedocs.io/en/latest/ Fixes https://github.com/hwchase17/langchain/issues/2386	2023-04-05 06:51:32 -07:00

1 2

97 Commits