langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
Bagatur	8cb2594562	Bagatur/dingo (#9079 ) Co-authored-by: gary <1625721671@qq.com>	2023-08-11 10:54:45 -07:00
Jacques Arnoux	926c64da60	Fix web research retriever for unknown links in results (#9115 ) Fixes an issue with web research retriever for unknown links in results. This is currently making the retrieve crash sometimes. @rlancemartin	2023-08-11 10:50:37 -07:00
Alvaro Bartolome	f7ae183f40	`ArgillaCallbackHandler` to properly use default values for `api_url` and `api_key` (#9113 ) As of the recent PR at #9043, after some testing we've realised that the default values were not being used for `api_key` and `api_url`. Besides that, the default for `api_key` was set to `argilla.apikey`, but since the default values are intended for people using the Argilla Quickstart (easy to run and setup), the defaults should be instead `owner.apikey` if using Argilla 1.11.0 or higher, or `admin.apikey` if using a lower version of Argilla. Additionally, we've removed the f-string replacements from the docstrings. --------- Co-authored-by: Gabriel Martin <gabriel@argilla.io>	2023-08-11 09:37:06 -07:00
Bagatur	01ef786e7e	bump 262 (#9108 )	2023-08-11 01:29:07 -07:00
Bagatur	3b754b5461	Bagatur/filter metadata (#9015 ) Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>	2023-08-11 01:10:00 -07:00
Kim Minjong	7f0e847c13	Update pydantic format instruction prompt (#9095 ) - remove unopened bracket	2023-08-11 00:22:13 -07:00
Ashutosh Sanzgiri	991b448dfc	minor edits (#9093 ) Description: Minor edit to PR#845 Thanks!	2023-08-10 23:40:36 -07:00
Bagatur	3ab4e21579	fix json tool (#9096 )	2023-08-10 23:39:25 -07:00
Sam Groenjes	2184e3a400	Fix IndexError when input_list is Empty in prep_prompts (#5769 ) This MR corrects the IndexError arising in prep_prompts method when no documents are returned from a similarity search. Fixes #1733 Co-authored-by: Sam Groenjes <sam.groenjes@darkwolfsolutions.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-10 22:50:39 -07:00
Chenyu Zhao	c0acbdca1b	Update Fireworks model names (#9085 )	2023-08-10 19:23:42 -07:00
Bagatur	b80e3825a6	Bagatur/pinecone by vector (#9087 ) Co-authored-by: joseph <joe@outverse.com>	2023-08-10 18:28:55 -07:00
Nikhil Kumar	6abb2c2c08	Buffer method of ConversationTokenBufferMemory should be able to return messages as string (#7057 ) ### Description: `ConversationBufferTokenMemory` should have a simple way of returning the conversation messages as a string. Previously to complete this, you would only have the option to return memory as an array through the buffer method and call `get_buffer_string` by importing it from `langchain.schema`, or use the `load_memory_variables` method and key into `self.memory_key`. ### Maintainer @hwchase17 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-10 18:17:22 -07:00
William FH	57dd4daa9a	Add string example mapper (#9086 ) Now that we accept any runnable or arbitrary function to evaluate, we don't always look up the input keys. If an evaluator requires references, we should try to infer if there's one key present. We only have delayed validation here but it's better than nothing	2023-08-10 17:07:02 -07:00
Bidhan Roy	02430e25b6	BagelDB (bageldb.ai), VectorStore integration. (#8971 ) - Description: [BagelDB](bageldb.ai) a collaborative vector database. Integrated the bageldb PyPi package with langchain with related tests and code. - Issue: Not applicable. - Dependencies: `betabageldb` PyPi package. - Tag maintainer: @rlancemartin, @eyurtsev, @baskaryan - Twitter handle: bageldb_ai (https://twitter.com/BagelDB_ai) We ran `make format`, `make lint` and `make test` locally. Followed the contribution guideline thoroughly https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --------- Co-authored-by: Towhid1 <nurulaktertowhid@gmail.com>	2023-08-10 16:48:36 -07:00
DJ Atha	ee52482db8	Fix issue 7445 (#7635 ) Description: updated BabyAGI examples and experimental to append the iteration to the result id to fix error storing data to vectorstore. Issue: 7445 Dependencies: no Tag maintainer: @eyurtsev This fix worked for me locally. Happy to take some feedback and iterate on a better solution. I was considering appending a uuid instead but didn't want to over complicate the example. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-10 16:29:31 -07:00
Harrison Chase	bb6fbf4c71	openai adapters (#8988 ) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Nuno Campos <nuno@boringbits.io>	2023-08-10 16:08:50 -07:00
Harrison Chase	45f0f9460a	add async for python repl (#9080 )	2023-08-10 16:07:06 -07:00
Neil Murphy	105c787e5a	Add convenience methods to ConversationBufferMemory and ConversationB… (#8981 ) Add convenience methods to `ConversationBufferMemory` and `ConversationBufferWindowMemory` to get buffer either as messages or as string. Helps when `return_messages` is set to `True` but you want access to the messages as a string, and vice versa. @hwchase17 One use case: Using a `MultiPromptRouter` where `default_chain` is `ConversationChain`, but destination chains are `LLMChains`. Injecting chat memory into prompts for destination chains prints a stringified `List[Messages]` in the prompt, which creates a lot of noise. These convenience methods allow caller to choose either as needed. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-10 15:45:30 -07:00
Zend	6221eb5974	Recursive url loader w/ test (#8813 ) Description: Due to some issue on the test, this is a separate PR with the test for #8502 Tag maintainer: @rlancemartin --------- Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-10 14:50:31 -07:00
Junlin Zhou	cb5fb751e9	Enhance regex of structured_chat agents' output parser (#8965 ) Current regex only extracts agent's action between '` ``` ``` `', this commit will extract action between both '` ```json ``` `' and '` ``` ``` `' This is very similar to #7511 Co-authored-by: zjl <junlinzhou@yzbigdata.com>	2023-08-10 14:26:07 -07:00
Bagatur	16bd328aab	Use Embeddings in pinecone (#8982 ) cc @eyurtsev @olivier-lacroix @jamescalam redo of #2741	2023-08-10 14:22:41 -07:00
Piyush Jain	8eea46ed0e	Bedrock embeddings async methods (#9024 ) ## Description This PR adds the `aembed_query` and `aembed_documents` async methods for improving the embeddings generation for large documents. The implementation uses asyncio tasks and gather to achieve concurrency as there is no bedrock async API in boto3. ### Maintainers @agola11 @aarora79 ### Open questions To avoid throttling from the Bedrock API, should there be an option to limit the concurrency of the calls?	2023-08-10 14:21:03 -07:00
Eugene Yurtsev	67ca187560	Fix incorrect code blocks in documentation (#9060 ) Fixes incorrect code block syntax in doc strings.	2023-08-10 14:13:42 -07:00
Eugene Yurtsev	46f3428cb3	Fix more incorrect code blocks in doc strings (#9073 ) Fix 2 more incorrect code blocks in strings	2023-08-10 13:49:15 -07:00
Eugene Yurtsev	a5a4c53280	RedisStore: Update init and Documentation updates (#9044 ) * Update Redis Store to support init from parameters * Update notebook to show how to use redis store, and some fixes in documentation	2023-08-10 15:30:29 -04:00
Leonid Ganeline	fcbbddedae	ArxivLoader fix for issue 9046 (#9061 ) Fixed #9046 Added ut-s for this fix. @eyurtsev	2023-08-10 14:59:39 -04:00
Mike Lambert	e94a5d753f	Move from test to supported claude-instant-1 model (#9066 ) Moves from "test" model to "claude-instant-1" model which is supported and has actual capacity	2023-08-10 11:57:28 -07:00
Eugene Yurtsev	b7bc8ec87f	Add excludes to FileSystemBlobLoader (#9064 ) Add option to specify exclude patterns. https://github.com/langchain-ai/langchain/discussions/9059	2023-08-10 14:56:58 -04:00
Eugene Yurtsev	6c70f491ba	ChatPromptTemplate pending deprecation proposal (#9004 ) Pending deprecations for ChatPromptTemplate proposals	2023-08-10 14:40:55 -04:00
TRY-ER	2431eca700	Agent vector store tool doc (#9029 ) I was initially confused weather to use create_vectorstore_agent or create_vectorstore_router_agent due to lack of documentation so I created a simple documentation for each of the function about their different usecase. Replace this comment with: - Description: Added the doc_strings in create_vectorstore_agent and create_vectorstore_router_agent to point out the difference in their usecase - Tag maintainer: @rlancemartin, @eyurtsev --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-10 11:13:12 -07:00
Alvaro Bartolome	08a0741d82	Update `ArgillaCallbackHandler` as of latest `argilla` release (#9043 ) Hi @agola11, or whoever is reviewing this PR 😄 ## What's in this PR? As of the latest Argilla release, we'll change and refactor some things to make some workflows easier, one of those is how everything's pushed to Argilla, so that now there's no need to call `push_to_argilla` over a `FeedbackDataset` when either `push_to_argilla` is called for the first time, or `from_argilla` is called; among others. We also add some class variables to make sure those are easy to update in case we update those internally in the future, also to make the `warnings.warn` message lighter from the code view. P.S. Regarding the Twitter/X mention feel free to do so at either https://twitter.com/argilla_io or https://twitter.com/alvarobartt, or both if applicable, otherwise, just the first Twitter/X handle.	2023-08-10 10:59:46 -07:00
Blake (Yung Cher Ho)	8d351bfc20	Takeoff integration (#9045 ) ## Description: This PR adds the Titan Takeoff Server to the available LLMs in LangChain. Titan Takeoff is an inference server created by [TitanML](https://www.titanml.co/) that allows you to deploy large language models locally on your hardware in a single command. Most generative model architectures are included, such as Falcon, Llama 2, GPT2, T5 and many more. Read more about Titan Takeoff here: - [Blog](https://medium.com/@TitanML/introducing-titan-takeoff-6c30e55a8e1e) - [Docs](https://docs.titanml.co/docs/titan-takeoff/getting-started) #### Testing As Titan Takeoff runs locally on port 8000 by default, no network access is needed. Responses are mocked for testing. - [x] Make Lint - [x] Make Format - [x] Make Test #### Dependencies No new dependencies are introduced. However, users will need to install the titan-iris package in their local environment and start the Titan Takeoff inferencing server in order to use the Titan Takeoff integration. Thanks for your help and please let me know if you have any questions. cc: @hwchase17 @baskaryan	2023-08-10 10:56:06 -07:00
Nuno Campos	3bdc273ab3	Implement .transform() in RunnablePassthrough() (#9032 ) - This ensures passthrough doesnt break streaming --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-10 10:41:19 -07:00
Bagatur	206f809366	fix sched ci (more) (#9056 )	2023-08-10 10:39:29 -07:00
Ismail Pelaseyed	abb1264edf	Fix issue with Metaphor Search Tool throwing error on missing keys in API response (#9051 ) - Description: Fixes an issue with Metaphor Search Tool throwing when missing keys in API response. - Issue: #9048 - Tag maintainer: @hinthornw @hwchase17 - Twitter handle: @pelaseyed	2023-08-10 09:07:00 -07:00
Eugene Yurtsev	5e05ba2140	Add embeddings cache (#8976 ) This PR adds the ability to temporarily cache or persistently store embeddings. A notebook has been included showing how to set up the cache and how to use it with a vectorstore.	2023-08-10 11:15:30 -04:00
Bagatur	6e14f9548b	bump 261 (#9041 )	2023-08-10 07:59:27 -07:00
Eugene Yurtsev	d21333d710	Add redis storage (#8980 ) Add a redis implementation of a BaseStore	2023-08-10 10:48:35 -04:00
Bagatur	434a96415b	make runnable dir (#9016 ) Co-authored-by: Nuno Campos <nuno@boringbits.io>	2023-08-10 08:56:37 +01:00
Nuno Campos	c7a489ae0d	Small improvements for tracer and debug output of runnables (#8683 ) <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md -->	2023-08-10 07:24:12 +01:00
EricFan	618cf5241e	Open file in UTF-8 encoding (#6919 ) (#8943 ) FileCallbackHandler cannot handle some language, for example: Chinese. Open file using UTF-8 encoding can fix it. @agola11 Issue: #6919 Dependencies: NO dependencies, --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-09 17:54:21 -07:00
colegottdank	f4a47ec717	Add optional model kwargs to ChatAnthropic to allow overrides (#9013 ) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-09 17:34:00 -07:00
Kaizen	bbbd2b076f	DirectoryLoader slicing (#8994 ) DirectoryLoader can now return a random sample of files in a directory. Parameters added are: sample_size randomize_sample sample_seed @rlancemartin, @eyurtsev --------- Co-authored-by: Andrew Oseen <amovfx@protonmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-09 16:05:16 -07:00
IanRogers-101Ways	d248481f13	skip over empty google spreadsheets (#8974 ) - Description: Allow GoogleDriveLoader to handle empty spreadsheets - Issue: Currently GoogleDriveLoader will crash if it tries to load a spreadsheet with an empty sheet - Dependencies: n/a - Tag maintainer: @rlancemartin, @eyurtsev	2023-08-09 16:05:02 -07:00
Eugene Yurtsev	efa02ed768	Suppress divide by zero wranings for cosine similarity (#9006 ) Suppress run time warnings for divide by zero as the downstream code handles the scenario (handling inf and nan)	2023-08-09 15:56:51 -07:00
Leonid Ganeline	5454591b0a	docstrings cleanup (#8993 ) Added/Updated docstrings @baskaryan	2023-08-09 15:49:06 -07:00
Massimiliano Pronesti	c72da53c10	Add logprobs to SamplingParameters in vllm (#9010 ) This PR aims at amending #8806 , that I opened a few days ago, adding the extra `logprobs` parameter that I accidentally forgot	2023-08-09 15:48:29 -07:00
Bagatur	8dd071ad08	import airbyte loaders (#9009 )	2023-08-09 14:51:15 -07:00
Bagatur	96d064e305	bump 260 (#9002 )	2023-08-09 13:40:49 -07:00
Nuno Campos	808248049d	Implement a router for openai functions (#8589 )	2023-08-09 21:17:04 +01:00
Eugene Yurtsev	a6e6e9bb86	Fix airbyte loader (#8998 ) Fix airbyte loader https://github.com/langchain-ai/langchain/issues/8996	2023-08-09 16:13:06 -04:00
William FH	90579021f8	Update Key Check (#8948 ) In eval loop. It needn't be done unless you are creating the corresponding evaluators	2023-08-09 12:33:00 -07:00
Jerzy Czopek	539672a7fd	Feature/fix azureopenai model mappings (#8621 ) This pull request aims to ensure that the `OpenAICallbackHandler` can properly calculate the total cost for Azure OpenAI chat models. The following changes have resolved this issue: - The `model_name` has been added to the ChatResult llm_output. Without this, the default values of `gpt-35-turbo` were applied. This was causing the total cost for Azure OpenAI's GPT-4 to be significantly inaccurate. - A new parameter `model_version` has been added to `AzureChatOpenAI`. Azure does not include the model version in the response. With the addition of `model_name`, this is not a significant issue for GPT-4 models, but it's an issue for GPT-3.5-Turbo. Version 0301 (default) of GPT-3.5-Turbo on Azure has a flat rate of 0.002 per 1k tokens for both prompt and completion. However, version 0613 introduced a split in pricing for prompt and completion tokens. - The `OpenAICallbackHandler` implementation has been updated with the proper model names, versions, and cost per 1k tokens. Unit tests have been added to ensure the functionality works as expected; the Azure ChatOpenAI notebook has been updated with examples. Maintainers: @hwchase17, @baskaryan Twitter handle: @jjczopek --------- Co-authored-by: Jerzy Czopek <jerzy.czopek@avanade.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-09 10:56:15 -07:00
shibuiwilliam	3adb1e12ca	make trajectory eval chain stricter and add unit tests (#8909 ) - update trajectory eval logic to be stricter - add tests to trajectory eval chain	2023-08-09 10:57:18 -04:00
Nuno Campos	b8df15cd64	Adds transform support for runnables (#8762 ) <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> --------- Co-authored-by: jacoblee93 <jacoblee93@gmail.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-09 12:34:23 +01:00
Harrison Chase	4d72288487	async output parser (#8894 ) Co-authored-by: Nuno Campos <nuno@boringbits.io>	2023-08-09 08:25:38 +01:00
Bagatur	3c6eccd701	bump 259 (#8951 )	2023-08-09 00:07:47 -07:00
Harrison Chase	7de6a1b78e	parent document retriever (#8941 )	2023-08-08 22:39:08 -07:00
Aarav Borthakur	3f64b8a761	Integrate Rockset as a chat history store (#8940 ) Description: Adds Rockset as a chat history store Dependencies: no changes Tag maintainer: @hwchase17 This PR passes linting and testing. I added a test for the integration and an example notebook showing its use.	2023-08-08 18:54:07 -07:00
William FH	e3056340da	Add id in error in tracer (#8944 )	2023-08-08 18:25:27 -07:00
Bagatur	95cf7de112	scheduled tests GHA (#8879 ) Adding scheduled daily GHA that runs marked integration tests. To start just marking some tests in test_openai	2023-08-08 14:55:25 -07:00
Joe Reuter	8f0cd91d57	Airbyte based loaders (#8586 ) This PR adds 8 new loaders: * `AirbyteCDKLoader` This reader can wrap and run all python-based Airbyte source connectors. * Separate loaders for the most commonly used APIs: * `AirbyteGongLoader` * `AirbyteHubspotLoader` * `AirbyteSalesforceLoader` * `AirbyteShopifyLoader` * `AirbyteStripeLoader` * `AirbyteTypeformLoader` * `AirbyteZendeskSupportLoader` ## Documentation and getting started I added the basic shape of the config to the notebooks. This increases the maintenance effort a bit, but I think it's worth it to make sure people can get started quickly with these important connectors. This is also why I linked the spec and the documentation page in the readme as these two contain all the information to configure a source correctly (e.g. it won't suggest using oauth if that's avoidable even if the connector supports it). ## Document generation The "documents" produced by these loaders won't have a text part (instead, all the record fields are put into the metadata). If a text is required by the use case, the caller needs to do custom transformation suitable for their use case. ## Incremental sync All loaders support incremental syncs if the underlying streams support it. By storing the `last_state` from the reader instance away and passing it in when loading, it will only load updated records. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-08 14:49:25 -07:00
Eugene Yurtsev	15f650ae8c	Add base storage interface, 2 implementations and utility encoder (#8895 ) This PR defines an abstract interface for key value stores. It provides 2 implementations: 1. Local File System 2. In memory -- used to facilitate testing It also provides an encoder utility to help take care of serialization from arbitrary data to data that can be stored by the given store	2023-08-08 17:29:06 -04:00
Harrison Chase	7543a3d70e	Harrison/image (#845 ) Co-authored-by: Ashutosh Sanzgiri <sanzgiri@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-08 13:58:27 -07:00
Bagatur	ab193338aa	bump 258 (#8932 )	2023-08-08 12:54:51 -07:00
Eugene Yurtsev	bb12184551	Internal code deprecation API (#8763 ) Proposal for an internal API to deprecate LangChain code. This PR is heavily based on: https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/_api/deprecation.py This PR only includes deprecation functionality (no renaming etc.). Additional functionality can be added on a need basis (e.g., renaming parameters), but best to roll out as an MVP to test this out. DeprecationWarnings are ignored by default. We can change the policy for the deprecation warnings, but we'll need to make sure we're not creating noise for users due to internal code invoking deprecated functionality.	2023-08-08 15:42:22 -04:00
Leonid Ganeline	33a2f58fbf	`tensoflow_datasets` document loader (#8721 ) This PR adds `tensoflow_datasets` document loader	2023-08-08 15:19:28 -04:00
Holt Skinner	fad26e79a3	fix: Resolve `AttributeError` in Google Cloud Enterprise Search retriever (#8872 ) - Reverting some of the changes made in https://github.com/langchain-ai/langchain/pull/8369	2023-08-08 12:11:12 -07:00
William FH	b2eb4ff0fc	Relax Validation in Eval (#8902 ) Just check for missing keys	2023-08-08 11:59:30 -07:00
Leonid Ganeline	2d078c7767	`PubMed` document loader (#8893 ) - added `PubMed Document Loader` artifacts; ut-s; examples - fixed `PubMed utility`; ut-s @hwchase17	2023-08-08 14:26:03 -04:00
Ofer Mendelevitch	a7824f16f2	Added consistent timeout for Vectara calls (#8892 ) - Description: consistent timeout at 60s for all calls to Vectara API - Tag maintainer: @rlancemartin, @eyurtsev --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-08 11:10:32 -07:00
Bagatur	642b57c7ff	nit (#8927 )	2023-08-08 10:54:25 -07:00
manmax31	4a07fba9f0	Improve query prompt of BGE embeddings (#8908 ) Replace this comment with: - Description: Improved query of BGE embeddings after talking with the devs of BGE embeddings , - Dependencies: any dependencies required for this change, - Tag maintainer: @hwchase17 , - Twitter handle: @ManabChetia3 --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2023-08-08 10:20:37 -07:00
Chris Pappalardo	beab637f04	added filter kwarg to VectorStoreIndexWrapper query and query_with_so… (#8844 ) - Description: added filter to query methods in VectorStoreIndexWrapper for filtering by metadata (i.e. search_kwargs) - Tag maintainer: @rlancemartin, @eyurtsev Updated the doc snippet on this topic as well. It took me a long while to figure out how to filter the vectorstore by filename, so this might help someone else out. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-08 10:10:45 -07:00
David vonThenen	bf4a112aa6	Fixes to the Nebula LLM Integration (#8918 ) This addresses some issues with introducing the Nebula LLM to LangChain in this PR: https://github.com/langchain-ai/langchain/pull/8876 This fixes the following: - Removes `SYMBLAI` from variable names - Fixes bug with `Bearer` for the API KEY Thanks again in advance for your help! cc: @hwchase17, @baskaryan --------- Co-authored-by: dvonthenen <david.vonthenen@gmail.com>	2023-08-08 10:04:43 -07:00
Marie-Philippe Gill	6b9f266837	Add user_context to AmazonKendraRetriever (#8869 ) ### Description Now, we can pass information like a JWT token using user_context: ```python self.retriever = AmazonKendraRetriever(index_id=kendraIndexId, user_context={"Token": jwt_token}) ``` - [x] `make lint` - [x] `make format` - [x] `make test` Also tested by pip installing in my own project, and it allows access through the token. ### Maintainers @rlancemartin, @eyurtsev ### My twitter handle [girlknowstech](https://twitter.com/girlknowstech)	2023-08-08 08:37:03 -07:00
GitHub-L	67718c1d6b	Update OpenAPI code to fetch use the requestBody - Description: The API doc passed to LLM only included the content of responses but did not include the content of requestBody, causing the agent to be unable to construct the correct request parameters based on the requestBody information. Add two lines of code fixed the bug, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: @hinthornw , - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out!	2023-08-08 10:33:21 -04:00
Leonid Kuligin	52d6b91c18	Fixed a source for documents uploaded from GCS (#8912 ) Sets source for documents uploaded from GCS to source on gcs #8911 Co-authored-by: Leonid Kuligin <kuligin@google.com>	2023-08-08 09:34:43 -04:00
Bagatur	022ef170f8	bump 257 (#8903 )	2023-08-08 01:16:33 -07:00
Jacob Lee	fa30a57034	Adds Ollama as an LLM (#8829 ) Adds Ollama as an LLM. Ollama can run various open source models locally e.g. Llama 2 and Vicuna, automatically configuring and GPU-optimizing them. @rlancemartin @hwchase17 --------- Co-authored-by: Lance Martin <lance@langchain.dev>	2023-08-07 21:19:22 -07:00
Ash Vardanian	1f9124ceaa	Add: USearch Vector Store (#8835 ) ## Description I am excited to propose an integration with USearch, a lightweight vector-search engine available for both Python and JavaScript, among other languages. ## Dependencies It introduces a new PyPi dependency - `usearch`. I am unsure if it must be added to the Poetry file, as this would make the PR too clunky. Please let me know. ## Profiles - Maintainers: @ashvardanian @davvard - Twitter handles: @ashvardanian @unum_cloud --------- Co-authored-by: Davit Vardanyan <78792753+davvard@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-07 20:41:00 -07:00
Leonid Kuligin	b52a3785c9	Allow to specify a custom loader for GcsFileLoader (#8868 ) Co-authored-by: Leonid Kuligin <kuligin@google.com>	2023-08-07 22:57:31 -04:00
Bruno Bornsztein	d56eff042a	Make json output parser handle newlines inside markdown code blocks (#8682 ) Update to #8528 Newlines and other special characters within markdown code blocks returned as `action_input` should be handled correctly (in particular, unescaped `"` => `\"` and `\n` => `\\n`) so they don't break JSON parsing. @baskaryan	2023-08-07 15:49:54 -07:00
Oege Dijk	cff52638b2	when encountering error during fetch return "" in web_base.py (#8753 ) when e.g. downloading a sitemap with a malformed url (e.g. "ttp://example.com/index.html" with the h omitted at the beginning of the url), this will ensure that the sitemap download does not crash, but just emits a warning. (maybe should be optional with e.g. a `skip_faulty_urls:bool=True` parameter, but this was the most straightforward fix) @rlancemartin, @eyurtsev --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-07 15:35:41 -07:00
Bennji94	33cdb06b5c	Async RetryOutputParser, RetryWithErrorOutputParser and OutputFixingParser (#8776 ) Added async parsing functions for RetryOutputParser, RetryWithErrorOutputParser and OutputFixingParser. The async parse functions call the arun methods of the used LLMChains. Fix for #7989 --------- Co-authored-by: Benjamin May <benjamin.may94@gmail.com>	2023-08-07 14:42:48 -07:00
Joshua Sundance Bailey	7fc07ba5df	Create ChatAnyscale (#8770 ) - Description: Adds the ChatAnyscale class with llama-2 7b, llama-2 13b, and llama-2 70b on [Anyscale Endpoints](https://app.endpoints.anyscale.com/) - It inherits from ChatOpenAI and requires openai (probably unnecessary but it made for a quick and easy implementation) - Inspired by https://github.com/langchain-ai/langchain/pull/8434 (@kylehh and @baskaryan )	2023-08-07 13:21:05 -07:00
idcore	fe78aff1f2	Add new parameter forced_decoder_ids to OpenAIWhisperParserLocal + small bug fix (#8793 ) - Description: new parameter forced_decoder_ids for OpenAIWhisperParserLocal to force input language, and enable optional translate mode. Usage example: processor = WhisperProcessor.from_pretrained("openai/whisper-medium") forced_decoder_ids = processor.get_decoder_prompt_ids(language="french", task="transcribe") #forced_decoder_ids = processor.get_decoder_prompt_ids(language="french", task="translate") loader = GenericLoader(YoutubeAudioLoader(urls, save_dir), OpenAIWhisperParserLocal(lang_model="openai/whisper-medium",forced_decoder_ids=forced_decoder_ids)) - Issue #8792 - Tag maintainer: @rlancemartin, @eyurtsev --------- Co-authored-by: idcore <eugene.novozhilov@gmail.com>	2023-08-07 13:17:58 -07:00
David vonThenen	40079d4936	Introduce Nebula LLM to LangChain (#8876 ) ## Description This PR adds Nebula to the available LLMs in LangChain. Nebula is an LLM focused on conversation understanding and enables users to extract conversation insights from video, audio, text, and chat-based conversations. These conversations can occur between any mix of human or AI participants. Examples of some questions you could ask Nebula from a given conversation are: - What could be the customer’s pain points based on the conversation? - What sales opportunities can be identified from this conversation? - What best practices can be derived from this conversation for future customer interactions? You can read more about Nebula here: https://symbl.ai/blog/extract-insights-symbl-ai-generative-ai-recall-ai-meetings/ #### Integration Test An integration test is added, but it requires network access. Since Nebula is fully managed like OpenAI, network access is required to exercise the integration test. #### Linting - [x] make lint - [x] make test (TODO: there seems to be a failure in another non-related test??? Need to check on this.) - [x] make format ### Dependencies No new dependencies were introduced. ### Twitter handle [@symbldotai](https://twitter.com/symbldotai) [@dvonthenen](https://twitter.com/dvonthenen) If you have any questions, please let me know. cc: @hwchase17, @baskaryan --------- Co-authored-by: dvonthenen <david.vonthenen@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-07 13:15:26 -07:00
Eugene Yurtsev	f616aee35a	JsonOutputFunctionParser: Fix mutation in place bug (#8758 ) Fixes mutation in place in the JsonOutputFunctionParser. This causes issues when trying to re-use the original AI message.	2023-08-07 14:32:46 -04:00
shibuiwilliam	ab47557db3	fix evaluation parse test (#8859 ) # What - fix evaluation parse test <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: Fix evaluation parse test - Issue: None - Dependencies: None - Tag maintainer: @baskaryan - Twitter handle: @MLOpsJ Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md -->	2023-08-07 11:15:41 -07:00
manmax31	40096c73cd	Add BGE embeddings support (#8848 ) - Description: [BGE-large](https://huggingface.co/BAAI/bge-large-en) embeddings from BAAI are at the top of [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard). Hence adding support for it. - Tag maintainer: @baskaryan - Twitter handle: @ManabChetia3 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-08-07 11:15:30 -07:00
shibuiwilliam	fbc83dfdbb	Fix/abstract add message (#8856 ) <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: Fix/abstract add message - Issue: None - Dependencies: None - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: @MLOpsJ Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md -->	2023-08-07 11:02:19 -07:00
William FH	91be7eee66	Add concurrency support for run_on_dataset (#8841 ) Long-term, would be better to use the lower-level batch() method(s) but it may take me a bit longer to clean up. This unblocks in the meantime, though it may fail when the evaluated chain raises a `NotImplementedError` for a corresponding async method	2023-08-07 09:24:48 -07:00
Bagatur	fc2f450f2d	bump 256 (#8870 )	2023-08-07 08:29:02 -07:00
Tudor Golubenco	aeaef8f3a3	Add support for Xata as a vector store (#8822 ) This adds support for [Xata](https://xata.io) (data platform based on Postgres) as a vector store. We have recently added [Xata to Langchain.js](https://github.com/hwchase17/langchainjs/pull/2125) and would love to have the equivalent in the Python project as well. The PR includes integration tests and a Jupyter notebook as docs. Please let me know if anything else would be needed or helpful. I have added the xata python SDK as an optional dependency. ## To run the integration tests You will need to create a DB in xata (see the docs), then run something like: ``` OPENAI_API_KEY=sk-... XATA_API_KEY=xau_... XATA_DB_URL='https://....xata.sh/db/langchain' poetry run pytest tests/integration_tests/vectorstores/test_xata.py ``` <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Philip Krauss <35487337+philkra@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-07 08:14:52 -07:00
Leonid Kuligin	6e3fa59073	Added chat history to codey models (#8831 ) #7469 since 1.29.0, Vertex SDK supports a chat history provided to a codey chat model. Co-authored-by: Leonid Kuligin <kuligin@google.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-08-07 07:34:35 -07:00
Massimiliano Pronesti	a616e19975	feat(llms): add support for vLLM (#8806 ) Hello langchain maintainers, this PR aims at integrating [vllm](https://vllm.readthedocs.io/en/latest/#) into langchain. This PR closes #8729. This feature clearly depends on `vllm`, but I've seen other models supported here depend on packages that are not included in the pyproject.toml (e.g. `gpt4all`, `text-generation`) so I thought it was the case for this as well. @hwchase17, @baskaryan --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-08-07 07:32:02 -07:00
Bagatur	100d9ce4c7	bump 255 (#8865 )	2023-08-07 07:25:23 -07:00
Vic Cao	c9da300e4d	fix: overwrite stream for ChatOpenAI in runtime (#8288 ) <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> @hwchase17, @baskaryan --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Nuno Campos <nuno@boringbits.io>	2023-08-07 10:18:30 +01:00
Karthik Raja A	5a9765b1b5	MultiOn client toolkit update 2.0 (#8750 ) - Updated to use newer better function interaction - Previous version had only one callback - @hinthornw @hwchase17 Can you look into this - Shout out to @MultiON_AI @DivGarg9 on twitter --------- Co-authored-by: Naman Garg <ngarg3@binghamton.edu> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-08-06 22:24:10 -07:00

1 2 3 4 5 ...

396 Commits