langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-08 07:10:35 +00:00

Author	SHA1	Message	Date
Zander Chase	0c6ed657ef	Convert Chain to a Chain Factory (#4605 ) ## Change Chain argument in client to accept a chain factory The `run_over_dataset` functionality seeks to treat each iteration of an example as an independent trial. Chains have memory, so it's easier to permit this type of behavior if we accept a factory method rather than the chain object directly. There's still corner cases / UX pains people will likely run into, like: - Caching may cause issues - if memory is persisted to a shared object (e.g., same redis queue) , this could impact what is retrieved - If we're running the async methods with concurrency using local models, if someone naively instantiates the chain and loads each time, it could lead to tons of disk I/O or OOM	2023-05-13 02:13:21 +00:00
Davis Chase	36f9e9a0ba	Skip flaky unit test (#4591 )	2023-05-12 11:54:40 -07:00
Eugene Yurtsev	08ed927c32	Turn on extended tests (#4588 ) # Turn on strict extended tests This PR turns on strict testing for extended tests.	2023-05-12 14:50:08 -04:00
Zander Chase	d96f6a106b	Add Steamship Image Generation Tool (#4580 ) Co-authored-by: Enias Cailliau <enias@steamship.com>	2023-05-12 10:35:01 -07:00
Eugene Yurtsev	a5371a0fa2	Add pytest --only-extended and --only-core options (#4494 ) # Adds testing options to pytest This PR adds the following options: * `--only-core` will skip all extended tests, running all core tests. * `--only-extended` will skip all core tests. Forcing alll extended tests to be run. Running `py.test` without specifying either option will remain unaffected. Run all tests that can be run within the unit_tests direction. Extended tests will run if required packages are installed. ## Before submitting ## Who can review?	2023-05-12 11:35:22 -04:00
Leonid Ganeline	e17d0319d5	Add `arxiv` retriever (#4538 )	2023-05-11 22:48:38 -07:00
SimFG	7bcf238a1a	Optimize the initialization method of GPTCache (#4522 ) Optimize the initialization method of GPTCache, so that users can use GPTCache more quickly.	2023-05-11 16:15:23 -07:00
Zander Chase	f4d3cf2dfb	Add Invocation Params (#4509 ) ### Add Invocation Params to Logged Run Adds an llm type to each chat model as well as an override of the dict() method to log the invocation parameters for each call --------- Co-authored-by: Ankush Gola <ankush.gola@gmail.com>	2023-05-11 15:34:06 -07:00
Zander Chase	4ee47926ca	Add on_chat_message_start (#4499 ) ### Add on_chat_message_start to callback manager and base tracer Goal: trace messages directly to permit reloading as chat messages (store in an integration-agnostic way) Add an `on_chat_message_start` method. Fall back to `on_llm_start()` for handlers that don't have it implemented. Does so in a non-backwards-compat breaking way (for now)	2023-05-11 11:06:39 -07:00
Sunish Sheth	812e5f43f5	Add _type for all parsers (#4189 ) Used for serialization. Also add test that recurses through our subclasses to check they have them implemented Would fix https://github.com/hwchase17/langchain/issues/3217 Blocking: https://github.com/mlflow/mlflow/pull/8297 --------- Signed-off-by: Sunish Sheth <sunishsheth2009@gmail.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-11 01:27:58 -07:00
kYLe	0d51a1f12b	Add LLMs support for Anyscale Service (#4350 ) Add Anyscale service integration under LLM Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-11 00:39:59 -07:00
Evan Jones	f668251948	parameterized distance metrics; lint; format; tests (#4375 ) # Parameterize Redis vectorstore index Redis vectorstore allows for three different distance metrics: `L2` (flat L2), `COSINE`, and `IP` (inner product). Currently, the `Redis._create_index` method hard codes the distance metric to COSINE. I've parameterized this as an argument in the `Redis.from_texts` method -- pretty simple. Fixes #4368 ## Before submitting I've added an integration test showing indexes can be instantiated with all three values in the `REDIS_DISTANCE_METRICS` literal. An example notebook seemed overkill here. Normal API documentation would be more appropriate, but no standards are in place for that yet. ## Who can review? Not sure who's responsible for the vectorstore module... Maybe @eyurtsev / @hwchase17 / @agola11 ?	2023-05-11 00:20:01 -07:00
Zander Chase	d969f43ed8	Load HuggingFace Tool (#4475 ) # Add option to `load_huggingface_tool` Expose a method to load a huggingface Tool from the HF hub --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-11 00:07:36 -07:00
Davis Chase	9ec60ad832	Add azure cognitive search retriever (#4467 ) All credit to @UmerHA, made a couple small changes --------- Co-authored-by: UmerHA <40663591+UmerHA@users.noreply.github.com>	2023-05-10 15:27:27 -07:00
Davis Chase	46b100ea63	Add DocArray vector stores (#4483 ) Thanks to @anna-charlotte and @jupyterjazz for the contribution! Made few small changes to get it across the finish line --------- Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> Signed-off-by: jupyterjazz <saba.sturua@jina.ai> Co-authored-by: anna-charlotte <charlotte.gerhaher@jina.ai> Co-authored-by: jupyterjazz <saba.sturua@jina.ai> Co-authored-by: Saba Sturua <45267439+jupyterjazz@users.noreply.github.com>	2023-05-10 15:22:16 -07:00
Harrison Chase	b2f920e891	add tracing v2 env var (#4465 ) Co-authored-by: Ankush Gola <ankush.gola@gmail.com>	2023-05-10 11:08:29 -07:00
Davis Chase	04475bea7d	Mv plan and execute to experimental (#4459 )	2023-05-10 08:31:53 -07:00
Eugene Yurtsev	80558b5b27	Add workflow for testing with all deps (#4410 ) # Add action to test with all dependencies installed PR adds a custom action for setting up poetry that allows specifying a cache key: https://github.com/actions/setup-python/issues/505#issuecomment-1273013236 This makes it possible to run 2 types of unit tests: (1) unit tests with only core dependencies (2) unit tests with extended dependencies (e.g., those that rely on an optional pdf parsing library) As part of this PR, we're moving some pdf parsing tests into the unit-tests section and making sure that these unit tests get executed when running with extended dependencies.	2023-05-10 09:35:07 -04:00
Matt Robinson	3637d6da6e	feat: add loader for open office odt files (#4405 ) # ODF File Loader Adds a data loader for handling Open Office ODT files. Requires `unstructured>=0.6.3`. ### Testing The following should work using the `fake.odt` example doc from the [`unstructured` repo](https://github.com/Unstructured-IO/unstructured). ```python from langchain.document_loaders import UnstructuredODTLoader loader = UnstructuredODTLoader(file_path="fake.odt", mode="elements") loader.load() loader = UnstructuredODTLoader(file_path="fake.odt", mode="single") loader.load() ```	2023-05-10 01:37:17 -07:00
Harrison Chase	6b8d144ccc	Harrison/plan and solve (#4422 )	2023-05-09 21:07:56 -07:00
Rukmani	2b14036126	Update WhatsAppChatLoader to include the character ~ in the sender name (#4420 ) Fixes #4153 If the sender of a message in a group chat isn't in your contact list, they will appear with a ~ prefix in the exported chat. This PR adds support for parsing such lines.	2023-05-09 15:00:04 -07:00
Zander Chase	f2150285a4	Fix nested runs example ID (#4413 ) #### Only reference example ID on the parent run Previously, I was assigning the example ID to every child run. Adds a test.	2023-05-09 12:21:53 -07:00
Aivin V. Solatorio	6335cb5b3a	Add support for Qdrant nested filter (#4354 ) # Add support for Qdrant nested filter This extends the filter functionality for the Qdrant vectorstore. The current filter implementation is limited to a single-level metadata structure; however, Qdrant supports nested metadata filtering. This extends the functionality for users to maximize the filter functionality when using Qdrant as the vectorstore. Reference: https://qdrant.tech/documentation/filtering/#nested-key --------- Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>	2023-05-09 10:34:11 -07:00
Martin Holzhauer	872605a5c5	Add an option to extract more metadata from crawled websites (#4347 ) This pr makes it possible to extract more metadata from websites for later use. my usecase: parsing ld+json or microdata from sites and store it as structured data in the metadata field	2023-05-09 10:18:33 -07:00
Leonid Ganeline	ce15ffae6a	added `Wikipedia` retriever (#4302 ) - added `Wikipedia` retriever. It is effectively a wrapper for `WikipediaAPIWrapper`. It wrapps load() into get_relevant_documents() - sorted `__all__` in the `retrievers/__init__` - added integration tests for the WikipediaRetriever - added an example (as Jupyter notebook) for the WikipediaRetriever	2023-05-09 10:08:39 -07:00
Eugene Yurtsev	2ceb807da2	Add PDF parser implementations (#4356 ) # Add PDF parser implementations This PR separates the data loading from the parsing for a number of existing PDF loaders. Parser tests have been designed to help encourage developers to create a consistent interface for parsing PDFs. This interface can be made more consistent in the future by adding information into the initializer on desired behavior with respect to splitting by page etc. This code is expected to be backwards compatible -- with the exception of a bug fix with pymupdf parser which was returning `bytes` in the page content rather than strings. Also changing the lazy parser method of document loader to return an Iterator rather than Iterable over documents. ## Before submitting <!-- If you're adding a new integration, include an integration test and an example notebook showing its use! --> ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @ <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoader Abstractions - @eyurtsev LLM/Chat Wrappers - @hwchase17 - @agola11 Tools / Toolkits - @vowelparrot -->	2023-05-09 10:24:17 -04:00
Eugene Yurtsev	ae0c3382dd	Add MimeType based parser (#4376 ) # Add MimeType Based Parser This PR adds a MimeType Based Parser. The parser inspects the mime-type of the blob it is parsing and based on the mime-type can delegate to the sub parser. ## Before submitting Waiting on adding notebooks until more implementations are landed. ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @hwchase17 @vowelparrot	2023-05-09 10:22:56 -04:00
Davis Chase	ba0057c077	Check OpenAI model kwargs (#4366 ) Handle duplicate and incorrectly specified OpenAI params Thanks @PawelFaron for the fix! Made small update Closes #4331 --------- Co-authored-by: PawelFaron <42373772+PawelFaron@users.noreply.github.com> Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>	2023-05-08 16:37:34 -07:00
Davis Chase	02ebb15c4a	Fix TextSplitter.from_tiktoken(#4361 ) Thanks to @danb27 for the fix! Minor update Fixes https://github.com/hwchase17/langchain/issues/4357 --------- Co-authored-by: Dan Bianchini <42096328+danb27@users.noreply.github.com>	2023-05-08 16:36:38 -07:00
Naveen Tatikonda	782df1db10	OpenSearch: Add Similarity Search with Score (#4089 ) ### Description Add `similarity_search_with_score` method for OpenSearch to return scores along with documents in the search results Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-05-08 16:35:21 -07:00
Eugene Yurtsev	aa11f7c89b	Add progress bar to filesystemblob loader, update pytest config for unit tests (#4212 ) This PR adds: * Option to show a tqdm progress bar when using the file system blob loader * Update pytest run configuration to be stricter * Adding a new marker that checks that required pkgs exist	2023-05-08 16:15:09 -04:00
Zander Chase	8b284f9ad0	Pass parsed inputs through to tool _run (#4309 )	2023-05-08 09:13:05 -07:00
Zander Chase	35c9e6ab40	Pass Callbacks through load_tools (#4298 ) - Update the load_tools method to properly accept `callbacks` arguments. - Add a deprecation warning when `callback_manager` is passed - Add two unit tests to check the deprecation warning is raised and to confirm the callback is passed through. Closes issue #4096	2023-05-08 08:44:26 -07:00
Jinto Jose	8a338412fa	mongodb support for chat history (#4266 )	2023-05-08 08:34:05 -07:00
Harrison Chase	c8b0b6e6c1	add youtube tools (#4320 )	2023-05-08 08:29:30 -07:00
Leonid Ganeline	9544b30821	added `Wikipedia` document loader (#4141 ) - Added the `Wikipedia` document loader. It is based on the existing `unilities/WikipediaAPIWrapper` - Added a respective ut-s and example notebook - Sorted list of classes in __init__	2023-05-06 09:32:45 -07:00
Eugene Yurtsev	423f497168	Add BlobParser abstraction (#3979 ) This PR adds the BlobParser abstraction. It follows the proposal described here: https://github.com/hwchase17/langchain/pull/2833#issuecomment-1509097756	2023-05-05 21:43:38 -04:00
Davis Chase	5ca13cc1f0	Dev2049/pypdfium2 (#4209 ) thanks @jerrytigerxu for the addition! --------- Co-authored-by: Jere Xu <jtxu2008@gmail.com> Co-authored-by: jerrytigerxu <jere.tiger.xu@gmailc.om>	2023-05-05 17:55:31 -07:00
George	2324f19c85	Update qdrant interface (#3971 ) Hello 1) Passing `embedding_function` as a callable seems to be outdated and the common interface is to pass `Embeddings` instance 2) At the moment `Qdrant.add_texts` is designed to be used with `embeddings.embed_query`, which is 1) slow 2) causes ambiguity due to 1. It should be used with `embeddings.embed_documents` This PR solves both problems and also provides some new tests	2023-05-05 16:46:40 -07:00
Zander Chase	1017e5cee2	Add LCP Client (#4198 ) Adding a client to fetch datasets, examples, and runs from a LCP instance and run objects over them.	2023-05-05 16:28:56 -07:00
Zander Chase	a30f42da4e	Update V2 Tracer (#4193 ) - Update the RunCreate object to work with recent changes - Add optional Example ID to the tracer - Adjust default persist_session behavior to attempt to load the session if it exists - Raise more useful HTTP errors for logging - Add unit testing - Fix the default ID to be a UUID for v2 tracer sessions Broken out from the big draft here: https://github.com/hwchase17/langchain/pull/4061	2023-05-05 14:55:01 -07:00
Mike Wang	c3044b1bf0	[test] Add integration_test for PandasAgent (#4056 ) - confirm creation - confirm functionality with a simple dimension check. The test now is calling OpenAI API directly, but learning from @vowelparrot that we’re caching the requests, so that it’s not that expensive. I also found we’re calling OpenAI api in other integration tests. Please lmk if there is any concern of real external API calls. I can alternatively make a fake LLM for this test. Thanks	2023-05-05 14:49:02 -07:00
Aivin V. Solatorio	6567b73e1a	JSON loader (#4067 ) This implements a loader of text passages in JSON format. The `jq` syntax is used to define a schema for accessing the relevant contents from the JSON file. This requires dependency on the `jq` package: https://pypi.org/project/jq/. --------- Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>	2023-05-05 14:48:13 -07:00
hp0404	2a3c5f8353	Update WhatsAppChatLoader regex to handle multiple date-time formats (#4186 ) This PR updates the `message_line_regex` used by `WhatsAppChatLoader` to support different date-time formats used in WhatsApp chat exports; resolves #4153. The new regex handles the following input formats: ```terminal [05.05.23, 15:48:11] James: Hi here [11/8/21, 9:41:32 AM] User name: Message 123 1/23/23, 3:19 AM - User 2: Bye! 1/23/23, 3:22_AM - User 1: And let me know if anything changes ``` Tests have been added to verify that the loader works correctly with all formats.	2023-05-05 13:13:05 -07:00
Zander Chase	84cfa76e00	Update Cohere Reranker (#4180 ) The forward ref annotations don't get updated if we only iimport with type checking --------- Co-authored-by: Abhinav Verma <abhinav_win12@yahoo.co.in>	2023-05-05 09:11:37 -07:00
Zander Chase	6032a051e9	Add Tenant ID to V2 Tracer (#4135 ) Update the V2 tracer to - use UUIDs instead of int's - load a tenant ID and use that when saving sessions	2023-05-04 21:35:20 -07:00
Zander Chase	2f087d63af	Fix Python RePL Tool (#4137 ) Filter out kwargs from inferred schema when determining if a tool is single input. Add a couple unit tests. Move tool unit tests to the tools dir	2023-05-04 20:31:16 -07:00
Harrison Chase	d4cf1eb60a	Add firestore memory (#3792 ) (#3941 ) If you have any other suggestions or feedback, please let me know. --------- Co-authored-by: yakigac <10434946+yakigac@users.noreply.github.com>	2023-05-03 22:55:47 -07:00
Mike Wang	67db495fcf	[agent] Add Spark Agent (#4020 ) - added support for spark through pyspark library. - added jupyter notebook as example.	2023-05-03 22:45:23 -07:00
rogerserper	b1446bea5f	google-serper: async + full json results + support for Google Images, Places and News (#4078 ) * implemented arun, results, and aresults. Reuses aiosession if available. * helper tools GoogleSerperRun and GoogleSerperResults * support for Google Images, Places and News (examples given) and filtering based on time (e.g. past hour) * updated docs	2023-05-03 22:35:48 -07:00

1 2 3 4 5 ...

412 Commits