langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-08 07:10:35 +00:00

Author	SHA1	Message	Date
Daniel Chalef	0551bc90a5	Zep Hybrid Search (#5742 ) Zep now supports persisting custom metadata with messages and hybrid search across both message embeddings and structured metadata. This PR implements custom metadata and enhancements to the `ZepChatMessageHistory` and `ZepRetriever` classes to implement this support. Tag maintainers/contributors who might be interested: VectorStores / Retrievers / Memory - @dev2049 --------- Co-authored-by: Daniel Chalef <daniel.chalef@private.org>	2023-06-05 12:59:28 -07:00
Natalie	199cc700a3	Ability to specify credentials wihen using Google BigQuery as a data loader (#5466 ) # Adds ability to specify credentials when using Google BigQuery as a data loader Fixes #5465 . Adds ability to set credentials which must be of the `google.auth.credentials.Credentials` type. This argument is optional and will default to `None. Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-30 16:25:22 -07:00
Ayan Bandyopadhyay	8181f9e362	Update psychicapi version (#5471 ) Update [psychicapi](https://pypi.org/project/psychicapi/) python package dependency to the latest version 0.5. The newest python package version addresses breaking changes in the Psychic http api.	2023-05-30 15:55:22 -07:00
Paul-Emile Brotons	a61b7f7e7c	adding MongoDBAtlasVectorSearch (#5338 ) # Add MongoDBAtlasVectorSearch for the python library Fixes #5337 --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-30 07:59:01 -07:00
Harrison Chase	760632b292	Harrison/spark reader (#5405 ) Co-authored-by: Rithwik Ediga Lakhamsani <rithwik.ediga@databricks.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-29 20:23:17 -07:00
German Martin	0b3e0dd1d2	New Trello document loader (#4767 ) # Added New Trello loader class and documentation Simple Loader on top of py-trello wrapper. With a board name you can pull cards and to do some field parameter tweaks on load operation. I included documentation and examples. Included unit test cases using patch and a fixture for py-trello client class. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-29 19:47:56 -07:00
Michael Landis	7047a2c1af	feat: add Momento as a standard cache and chat message history provider (#5221 ) # Add Momento as a standard cache and chat message history provider This PR adds Momento as a standard caching provider. Implements the interface, adds integration tests, and documentation. We also add Momento as a chat history message provider along with integration tests, and documentation. [Momento](https://www.gomomento.com/) is a fully serverless cache. Similar to S3 or DynamoDB, it requires zero configuration, infrastructure management, and is instantly available. Users sign up for free and get 50GB of data in/out for free every month. ## Before submitting ✅ We have added documentation, notebooks, and integration tests demonstrating usage. Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-25 19:13:21 -07:00
Davis Chase	ca88b25da6	Zep sdk version (#5267 ) zep-python's sync methods no longer need an asyncio wrapper. This was causing issues with FastAPI deployment. Zep also now supports putting and getting of arbitrary message metadata. Bump zep-python version to v0.30 Remove nest-asyncio from Zep example notebooks. Modify tests to include metadata. --------- Co-authored-by: Daniel Chalef <daniel.chalef@private.org> Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>	2023-05-25 13:42:10 -07:00
Eugene Yurtsev	5cfa72a130	Bibtex integration for document loader and retriever (#5137 ) # Bibtex integration Wrap bibtexparser to retrieve a list of docs from a bibtex file. * Get the metadata from the bibtex entries * `page_content` get from the local pdf referenced in the `file` field of the bibtex entry using `pymupdf` * If no valid pdf file, `page_content` set to the `abstract` field of the bibtex entry * Support Zotero flavour using regex to get the file path * Added usage example in `docs/modules/indexes/document_loaders/examples/bibtex.ipynb` --------- Co-authored-by: Sébastien M. Popoff <sebastien.popoff@espci.fr> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-25 00:21:31 -07:00
Davis Chase	2b2176a3c1	tfidf retriever (#5114 ) Co-authored-by: vempaliakhil96 <vempaliakhil96@gmail.com>	2023-05-24 10:02:09 -07:00
Tian Wei	d7f807b71f	Add AzureCognitiveServicesToolkit to call Azure Cognitive Services API (#5012 ) # Add AzureCognitiveServicesToolkit to call Azure Cognitive Services API: achieve some multimodal capabilities This PR adds a toolkit named AzureCognitiveServicesToolkit which bundles the following tools: - AzureCogsImageAnalysisTool: calls Azure Cognitive Services image analysis API to extract caption, objects, tags, and text from images. - AzureCogsFormRecognizerTool: calls Azure Cognitive Services form recognizer API to extract text, tables, and key-value pairs from documents. - AzureCogsSpeech2TextTool: calls Azure Cognitive Services speech to text API to transcribe speech to text. - AzureCogsText2SpeechTool: calls Azure Cognitive Services text to speech API to synthesize text to speech. This toolkit can be used to process image, document, and audio inputs. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-23 06:45:48 -07:00
Jamie Broomall	d4fd589638	WhyLabs callback (#4906 ) # Add a WhyLabs callback handler * Adds a simple WhyLabsCallbackHandler * Add required dependencies as optional * protect against missing modules with imports * Add docs/ecosystem basic example based on initial prototype from @andrewelizondo > this integration gathers privacy preserving telemetry on text with whylogs and sends stastical profiles to WhyLabs platform to monitoring these metrics over time. For more information on what WhyLabs is see: https://whylabs.ai After you run the notebook (if you have env variables set for the API Keys, org_id and dataset_id) you get something like this in WhyLabs: ![Screenshot (443)](https://github.com/hwchase17/langchain/assets/88007022/6bdb3e1c-4243-4ae8-b974-23a8bb12edac) Co-authored-by: Andre Elizondo <andre@whylabs.ai> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-22 20:29:47 -07:00
Matt Rickard	de6a401a22	Add OpenLM LLM multi-provider (#4993 ) OpenLM is a zero-dependency OpenAI-compatible LLM provider that can call different inference endpoints directly via HTTP. It implements the OpenAI Completion class so that it can be used as a drop-in replacement for the OpenAI API. This changeset utilizes BaseOpenAI for minimal added code. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-22 18:09:53 -07:00
Gergely Imreh	69de33e024	Add Mastodon toots loader (#5036 ) # Add Mastodon toots loader. Loader works either with public toots, or Mastodon app credentials. Toot text and user info is loaded. I've also added integration test for this new loader as it works with public data, and a notebook with example output run now. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-22 16:43:07 -07:00
Michael Landis	6eacd88ae7	fix: revert docarray explicit transitive dependencies and use extras instead (#5015 ) tldr: The docarray [integration PR](https://github.com/hwchase17/langchain/pull/4483) introduced a pinned dependency to protobuf. This is a docarray dependency, not a langchain dependency. Since this is handled by the docarray dependencies, it is unnecessary here. Further, as a pinned dependency, this quickly leads to incompatibilities with application code that consumes the library. Much less with a heavily used library like protobuf. Detail: as we see in the [docarray integration](https://github.com/hwchase17/langchain/pull/4483/files#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711R81-R83), the transitive dependencies of docarray were also listed as langchain dependencies. This is unnecessary as the docarray project has an appropriate [extras](`a01a05542d/pyproject.toml (L70)`). The docarray project also does not require this _pinned_ version of protobuf, rather [a minimum version](`a01a05542d/pyproject.toml (L41)`). So this pinned version was likely in error. To fix this, this PR reverts the explicit hnswlib and protobuf dependencies and adds the hnswlib extras install for docarray (which installs hnswlib and protobuf, as originally intended). Because version `0.32.0` of the docarray hnswlib extras added protobuf, we bump the docarray dependency from `^0.31.0` to `^0.32.0`. # revert docarray explicit transitive dependencies and use extras instead ## Who can review? @dev2049 -- reviewed the original PR @eyurtsev -- bumped the pinned protobuf dependency a few days ago --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-22 12:48:09 -04:00
Harrison Chase	10ba201d05	Harrison/neo4j (#5078 ) Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-22 07:31:48 -07:00
Harrison Chase	b0431c672b	Harrison/psychic (#5063 ) Co-authored-by: Ayan Bandyopadhyay <ayanb9440@gmail.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-21 09:13:20 -07:00
Davis Chase	080eb1b3fc	Fix graphql tool (#4984 ) Fix construction and add unit test.	2023-05-19 15:27:50 -07:00
Mike McGarry	ddd595fe81	feature/4493 Improve Evernote Document Loader (#4577 ) # Improve Evernote Document Loader When exporting from Evernote you may export more than one note. Currently the Evernote loader concatenates the content of all notes in the export into a single document and only attaches the name of the export file as metadata on the document. This change ensures that each note is loaded as an independent document and all available metadata on the note e.g. author, title, created, updated are added as metadata on each document. It also uses an existing optional dependency of `html2text` instead of `pypandoc` to remove the need to download the pandoc application via `download_pandoc()` to be able to use the `pypandoc` python bindings. Fixes #4493 Co-authored-by: Mike McGarry <mike.mcgarry@finbourne.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-19 14:28:17 -07:00
Eugene Yurtsev	e46202829f	feat #4479 : TextLoader auto detect encoding and improved exceptions (#4927 ) # TextLoader auto detect encoding and enhanced exception handling - Add an option to enable encoding detection on `TextLoader`. - The detection is done using `chardet` - The loading is done by trying all detected encodings by order of confidence or raise an exception otherwise. ### New Dependencies: - `chardet` Fixes #4479 ## Before submitting <!-- If you're adding a new integration, include an integration test and an example notebook showing its use! --> ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: - @eyurtsev --------- Co-authored-by: blob42 <spike@w530>	2023-05-18 09:55:14 -04:00
Davis Chase	8966f61ca5	Zep memory (#4898 ) Co-authored-by: Daniel Chalef <daniel.chalef@private.org> Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>	2023-05-17 20:01:01 -07:00
Eugene Yurtsev	c5ab9782c6	Add beautiful soup 4 to extended testing extra (#4869 ) # Add bs4 to extended testing extra Updating extended testing extra in preparation for more refactors.	2023-05-17 14:11:26 -04:00
Adam Quigley	e78c9be312	Add Confluence Loader unit tests (#3333 ) Adds some basic unit tests for the ConfluenceLoader that can be extended later. Ports this [PR from llama-hub](https://github.com/emptycrown/llama-hub/pull/208) and adapts it to `langchain`. @Jflick58 and @zywilliamli adding you here as potential reviewers --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-16 15:17:07 -07:00
Raduan Al-Shedivat	00c6ec8a2d	fix(document_loaders/telegram): fix pandas calls + add tests (#4806 ) # Fix Telegram API loader + add tests. I was testing this integration and it was broken with next error: ```python message_threads = loader._get_message_threads(df) KeyError: False ``` Also, this particular loader didn't have any tests / related group in poetry, so I added those as well. @hwchase17 / @eyurtsev please take a look on this fix PR. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-16 14:35:25 -07:00
Eugene Yurtsev	c3b6129beb	Block sockets for unit-tests (#4803 ) # Block usage of sockets during unit tests Catch any tests that attempt to use the network.	2023-05-16 14:41:24 -04:00
Eugene Yurtsev	d403f659ea	Update google protobuf dep (#4798 ) # Update google protobuf dep Resolve: https://github.com/hwchase17/langchain/security/dependabot/11	2023-05-16 12:25:07 -04:00
Eugene Yurtsev	3ecd7c9641	Add check to verify poetry.toml (#4794 ) # Add poetry check to github action Check poetry toml file during tests for errors	2023-05-16 11:53:06 -04:00
Eugene Yurtsev	14bedf1cc5	Github Action: Fix poetry lock file checking (#4789 ) Fix how poetry lock file is checked to avoid skipping caches silently.	2023-05-16 11:40:28 -04:00
Roma	cb802edf75	[Feature] Add GraphQL Query Tool (#4409 ) # Add GraphQL Query Support This PR introduces a GraphQL API Wrapper tool that allows LLM agents to query GraphQL databases. The tool utilizes the httpx and gql Python packages to interact with GraphQL APIs and provides a simple interface for running queries with LLM agents. @vowelparrot --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-15 14:06:12 -07:00
Eugene Yurtsev	09587a3201	Clean up tests for pdf parsers (#4595 ) # Organize tests for pdf parsers Clean up tests for pdf parsers, remove duplicate tests, convert to unit tests.	2023-05-15 14:21:05 -04:00
Eugene Yurtsev	3c490b5ba3	Docugami DataLoader (#4727 ) ### Adds a document loader for Docugami Specifically: 1. Adds a data loader that talks to the [Docugami](http://docugami.com) API to download processed documents as semantic XML 2. Parses the semantic XML into chunks, with additional metadata capturing chunk semantics 3. Adds a detailed notebook showing how you can use additional metadata returned by Docugami for techniques like the [self-querying retriever](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query_retriever.html) 4. Adds an integration test, and related documentation Here is an example of a result that is not possible without the capabilities added by Docugami (from the notebook): <img width="1585" alt="image" src="https://github.com/hwchase17/langchain/assets/749277/bb6c1ce3-13dc-4349-a53b-de16681fdd5b"> --------- Co-authored-by: Taqi Jaffri <tjaffri@docugami.com> Co-authored-by: Taqi Jaffri <tjaffri@gmail.com>	2023-05-15 10:53:00 -04:00
Harrison Chase	cdc20d1203	Harrison/json loader fix (#4686 ) Co-authored-by: Triet Le <112841660+triet-lq-holistics@users.noreply.github.com>	2023-05-14 18:25:59 -07:00
Eugene Yurtsev	08ed927c32	Turn on extended tests (#4588 ) # Turn on strict extended tests This PR turns on strict testing for extended tests.	2023-05-12 14:50:08 -04:00
Zander Chase	d96f6a106b	Add Steamship Image Generation Tool (#4580 ) Co-authored-by: Enias Cailliau <enias@steamship.com>	2023-05-12 10:35:01 -07:00
Davis Chase	46b100ea63	Add DocArray vector stores (#4483 ) Thanks to @anna-charlotte and @jupyterjazz for the contribution! Made few small changes to get it across the finish line --------- Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> Signed-off-by: jupyterjazz <saba.sturua@jina.ai> Co-authored-by: anna-charlotte <charlotte.gerhaher@jina.ai> Co-authored-by: jupyterjazz <saba.sturua@jina.ai> Co-authored-by: Saba Sturua <45267439+jupyterjazz@users.noreply.github.com>	2023-05-10 15:22:16 -07:00
Eugene Yurtsev	80558b5b27	Add workflow for testing with all deps (#4410 ) # Add action to test with all dependencies installed PR adds a custom action for setting up poetry that allows specifying a cache key: https://github.com/actions/setup-python/issues/505#issuecomment-1273013236 This makes it possible to run 2 types of unit tests: (1) unit tests with only core dependencies (2) unit tests with extended dependencies (e.g., those that rely on an optional pdf parsing library) As part of this PR, we're moving some pdf parsing tests into the unit-tests section and making sure that these unit tests get executed when running with extended dependencies.	2023-05-10 09:35:07 -04:00
Aivin V. Solatorio	6567b73e1a	JSON loader (#4067 ) This implements a loader of text passages in JSON format. The `jq` syntax is used to define a schema for accessing the relevant contents from the JSON file. This requires dependency on the `jq` package: https://pypi.org/project/jq/. --------- Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>	2023-05-05 14:48:13 -07:00
Harrison Chase	fba6921b50	Harrison/one drive loader (#4081 ) Co-authored-by: José Ferraz Neto <netoferraz@gmail.com>	2023-05-03 22:55:34 -07:00
Harrison Chase	bd7e0a534c	Harrison/csv loader (#3771 ) Co-authored-by: mrT23 <tal.r@codium.ai>	2023-04-28 21:54:24 -07:00
Harrison Chase	c55ba43093	Harrison/vespa (#3761 ) Co-authored-by: Lester Solbakken <lesters@users.noreply.github.com>	2023-04-28 19:48:43 -07:00
Davis Chase	b807a114e4	Add query parsing unit tests (#3672 )	2023-04-27 13:42:12 -07:00
Davis Chase	3b609642ae	Self-query with generic query constructor (#3607 ) Alternate implementation of #3452 that relies on a generic query constructor chain and language and then has vector store-specific translation layer. Still refactoring and updating examples but general structure is there and seems to work s well as #3452 on exampels --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-27 08:36:00 -07:00
Harrison Chase	a35bbbfa9e	Harrison/lancedb (#3634 ) Co-authored-by: Minh Le <minhle@canva.com>	2023-04-27 08:14:36 -07:00
Eduard van Valkenburg	a3e3f26090	Some more PowerBI pydantic and import fixes (#3461 )	2023-04-26 22:09:12 -07:00
Eduard van Valkenburg	ba7a5ac9d7	Azure CosmosDB memory (#3434 ) Still needs docs, otherwise works.	2023-04-24 22:15:12 -07:00
Davit Buniatyan	2c0023393b	Deep Lake mini upgrades (#3375 ) Improvements * set default num_workers for ingestion to 0 * upgraded notebooks for avoiding dataset creation ambiguity * added `force_delete_dataset_by_path` * bumped deeplake to 3.3.0 * creds arg passing to deeplake object that would allow custom S3 Notes * please double check if poetry is not messed up (thanks!) Asks * Would be great to create a shared slack channel for quick questions --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>	2023-04-23 21:23:54 -07:00
Harrison Chase	a6664be79c	Harrison/myscale (#3352 ) Co-authored-by: Fangrui Liu <fangruil@moqi.ai> Co-authored-by: 刘方瑞 <fangrui.liu@outlook.com> Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>	2023-04-22 09:17:38 -07:00
Harrison Chase	cc6fe18152	Harrison/power bi (#3205 ) Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>	2023-04-22 08:24:48 -07:00
Harrison Chase	d2520a5f1e	Harrison/ddg (#3206 ) Co-authored-by: itai <itai.marks@gmail.com> Co-authored-by: Itai Marks <itaim@users.noreply.github.com> Co-authored-by: Tianyi Pan <60060750+tipani86@users.noreply.github.com> Co-authored-by: Tianyi Pan <tianyi.pan@clobotics.com> Co-authored-by: Adilzhan Ismailov <13088690+aismlv@users.noreply.github.com> Co-authored-by: Justin Flick <Justinjayflick@gmail.com> Co-authored-by: Justin Flick <jflick@homesite.com>	2023-04-19 21:32:26 -07:00
Harrison Chase	f19b3890c9	Harrison/site map tqdm (#3184 ) Co-authored-by: Tianyi Pan <60060750+tipani86@users.noreply.github.com> Co-authored-by: Tianyi Pan <tianyi.pan@clobotics.com>	2023-04-19 20:48:47 -07:00

1 2 3

111 Commits