langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-08 07:10:35 +00:00

Author	SHA1	Message	Date
Bagatur	e7a0def1bc	QoL improvements to query constructor (#11504 ) updating query constructor and self query retriever to - make it easier to pass in examples - validate attributes used in query - remove invalid parts of query - make it easier to get + edit prompt - make query constructor a runnable - make self query retriever use as runnable	2023-10-09 08:10:52 -07:00
Taikono-Himazin	eec53fa294	Added autodetect_encoding option to csvLoader (#11327 )	2023-10-09 08:06:43 -07:00
Holt Skinner	09c66fe04f	feat: Update Google Document AI Parser (#11413 ) - Description: Code Refactoring, Documentation Improvements for Google Document AI PDF Parser - Adds Online (synchronous) processing option. - Adds default field mask to limit payload size. - Skips Human review by default. - Issue: Fixes #10589 --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2023-10-09 08:04:25 -07:00
Nuno Campos	628cc4cce8	Rename RunnableMap to RunnableParallel (#11487 ) - keep alias for RunnableMap - update docs to use RunnableParallel and RunnablePassthrough.assign <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-10-09 11:22:03 +01:00
Eugene Yurtsev	6a10e8ef31	Add documentation to Runnable (#11516 )	2023-10-08 08:09:04 +01:00
William FH	eb572f41a6	Add LangSmith Run Chat Loader (#11458 )	2023-10-06 17:02:18 -07:00
David Duong	484947c492	Fetch up-to-date attributes for env-pulled kwargs during serialisation of OpenAI classes (#11499 )	2023-10-06 22:43:29 +01:00
Bagatur	5470e730d2	raise openapi import error (#11495 )	2023-10-06 12:57:24 -07:00
Erick Friis	29f5f70415	Rename some last hwchase17/langchain links (#11494 )	2023-10-06 12:34:30 -07:00
Fabrice Pont	872836c541	feat: add markdown list parser (#11411 ) Description: add `MarkdownListOutputParser` as a new `ListOutputParser` Issue: #11410	2023-10-06 12:25:45 -07:00
Erick Friis	8f50b616c5	Remove optional from vectara source (#11493 ) fyi @ofermend --------- Co-authored-by: Ofer Mendelevitch <ofer@vectara.com> Co-authored-by: Ofer Mendelevitch <ofermend@gmail.com>	2023-10-06 12:12:44 -07:00
Bagatur	53887242a1	bump 310 (#11486 )	2023-10-06 09:49:10 -07:00
Jesús Vélez Santiago	a1c7532298	Add async sql record manager and async indexing API (#10726 ) - Description: Add support for a SQLRecordManager in async environments. It includes the creation of `RecorManagerAsync` abstract class. - Issue: None - Dependencies: Optional `aiosqlite`. - Tag maintainer: @nfcampos - Twitter handle: @jvelezmagic --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-10-06 09:38:44 -04:00
Qihui Xie	57ade13b2b	fix llm_inputs duplication problem in intermediate_steps in SQLDatabaseChain (#10279 ) Use `.copy()` to fix the bug that the first `llm_inputs` element is overwritten by the second `llm_inputs` element in `intermediate_steps`. *Problem description:* In [line 127]( `c732d8fffd/libs/experimental/langchain_experimental/sql/base.py (L127C17-L127C17)`), the `llm_inputs` of the sql generation step is appended as the first element of `intermediate_steps`: ``` intermediate_steps.append(llm_inputs) # input: sql generation ``` However, `llm_inputs` is a mutable dict, it is updated in [line 179](https://github.com/langchain-ai/langchain/blob/master/libs/experimental/langchain_experimental/sql/base.py#L179) for the final answer step: ``` llm_inputs["input"] = input_text ``` Then, the updated `llm_inputs` is appended as another element of `intermediate_steps` in [line 180](`c732d8fffd/libs/experimental/langchain_experimental/sql/base.py (L180)`): ``` intermediate_steps.append(llm_inputs) # input: final answer ``` As a result, the final `intermediate_steps` returned in [line 189](`c732d8fffd/libs/experimental/langchain_experimental/sql/base.py (L189C43-L189C43)`) actually contains two same `llm_inputs` elements, i.e., the `llm_inputs` for the sql generation step overwritten by the one for final answer step by mistake. Users are not able to get the actual `llm_inputs` for the sql generation step from `intermediate_steps` Simply calling `.copy()` when appending `llm_inputs` to `intermediate_steps` can solve this problem.	2023-10-05 21:32:08 -07:00
Florian	d78f418c0d	Extract abstracts from Pubmed articles, even if they have no extra label (#10245 ) ### Description This pull request involves modifications to the extraction method for abstracts/summaries within the PubMed utility. A condition has been added to verify the presence of unlabeled abstracts. Now an abstract will be extracted even if it does not have a subtitle. In addition, the extraction of the abstract was extended to books. ### Issue The PubMed utility occasionally returns an empty result when extracting abstracts from articles, despite the presence of an abstract for the paper on PubMed. This issue arises due to the varying structure of articles; some articles follow a "subtitle/label: text" format, while others do not include subtitles in their abstracts. An example of the latter case can be found at: [https://pubmed.ncbi.nlm.nih.gov/37666905/](url) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-10-05 18:56:46 -07:00
Viktor Zhemchuzhnikov	fd9da60aea	Add async support to SelfQueryRetriever (#10175 ) ### Description SelfQueryRetriever is missing async support, so I am adding it. I also removed deprecated predict_and_parse method usage here, and added some tests. ### Issue N/A ### Tag maintainer Not yet ### Twitter handle N/A	2023-10-05 18:54:21 -07:00
Theron Tau	35297ca0d3	Add feature for extracting images from pdf and recognizing text from images. (#10653 ) Description It is for #10423 that it will be a useful feature if we can extract images from pdf and recognize text on them. I have implemented it with `PyPDFLoader`, `PyPDFium2Loader`, `PyPDFDirectoryLoader`, `PyMuPDFLoader`, `PDFMinerLoader`, and `PDFPlumberLoader`. [RapidOCR](https://github.com/RapidAI/RapidOCR.git) is used to recognize text on extracted images. It is time-consuming for ocr so a boolen parameter `extract_images` is set to control whether to extract and recognize. I have tested the time usage for each parser on my own laptop thinkbook 14+ with AMD R7-6800H by unit test and the result is: \| extract_images \| PyPDFParser \| PDFMinerParser \| PyMuPDFParser \| PyPDFium2Parser \| PDFPlumberParser \| \| ------------- \| ------------- \| ------------- \| ------------- \| ------------- \| ------------- \| \| False \| 0.27s \| 0.39s \| 0.06s \| 0.08s \| 1.01s \| \| True \| 17.01s \| 20.67s \| 20.32s \| 19,75s \| 20.55s \| Issue #10423 Dependencies rapidocr_onnxruntime in [RapidOCR](https://github.com/RapidAI/RapidOCR/tree/main) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-10-05 18:51:59 -07:00
Bagatur	8e3fbc97ca	Add vowpal_wabbit RL chain (#11462 )	2023-10-05 18:39:45 -07:00
Haris Wang	f1269830a0	Fix bug in MarkdownHeaderTextSplitter for codeblock (#10262 ) - Description: The previous version of the MarkdownHeaderTextSplitter did not take into account the possibility of '#' appearing within code blocks, which caused segmentation anomalies in these situations. This PR has fixed this issue. - Issue: - Dependencies: No - Tag maintainer: - Twitter handle: cc @baskaryan @eyurtsev @rlancemartin --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-10-05 18:34:42 -07:00
Eddie Cohen	656d2303f7	add in, nin for pinecone (#10303 ) Description: Adds the in and nin comparators for pinecone seen [here](https://docs.pinecone.io/docs/metadata-filtering#metadata-query-language) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-10-05 18:31:09 -07:00
Bagatur	a3a2ce623e	Revise vowpal_wabbit notebook	2023-10-05 18:18:19 -07:00
Bagatur	8fafa1af91	merge	2023-10-05 18:09:35 -07:00
olgavrou	3b07c0cf3d	RL Chain with VowpalWabbit (#10242 ) - Description: This PR adds a new chain `rl_chain.PickBest` for learned prompt variable injection, detailed description and usage can be found in the example notebook added. It essentially adds a [VowpalWabbit](https://github.com/VowpalWabbit/vowpal_wabbit) layer before the llm call in order to learn or personalize prompt variable selections. Most of the code is to make the API simple and provide lots of defaults and data wrangling that is needed to use Vowpal Wabbit, so that the user of the chain doesn't have to worry about it. - Dependencies: [vowpal-wabbit-next](https://pypi.org/project/vowpal-wabbit-next/), - sentence-transformers (already a dep) - numpy (already a dep) - tagging @ataymano who contributed to this chain - Tag maintainer: @baskaryan - Twitter handle: @olgavrou Added example notebook and unit tests	2023-10-05 18:07:22 -07:00
Manikanta5112	56048b909f	added ContentFormatter escape special characters for message content (#10319 ) --------- Co-authored-by: Manikanta5112 <42089393+mani5112@users.noreply.github.com>	2023-10-05 18:02:29 -07:00
Leonid Ganeline	d17416ec79	docstrings `callbacks` (#11456 ) Added missed docstrings to the `callbacks/` --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2023-10-05 17:13:14 -07:00
Ofer Mendelevitch	3c7653bf0f	"source" argument in constructor of Vectara (#11454 ) Replace this entire comment with: - Description: minor update to constructor to allow for specification of "source" - Tag maintainer: @baskaryan - Twitter handle: @ofermend	2023-10-05 17:04:14 -07:00
Eugene Yurtsev	d9018ae5f1	Improve CLI ux (#11452 ) Improve UX for cli	2023-10-05 19:40:00 -04:00
Jaikanth J	9f85f7c543	fix(cache): use dumps for RedisCache (#10408 ) # Description Attempts to fix RedisCache for ChatGenerations using `loads` and `dumps` used in SQLAlchemy cache by @hwchase17 . this is better than pickle dump, because this won't execute any arbitrary code during de-serialisation. # Issues #7722 & #8666 # Dependencies None, but removes the warning introduced in #8041 by @baskaryan Handle: @jaikanthjay46	2023-10-05 16:34:07 -07:00
rodrigo-clickup	5944c1851b	Add ClickUp Toolkit (#10662 ) - Description: Adds a toolkit to interact with the [ClickUp](https://clickup.com/) [Public API](https://clickup.com/api/) - Dependencies: None - Tag maintainer: @rodrigo-georgian, @rodrigo-clickup, @aiswaryasankarwork - Twitter handle: - Aiswarya (https://twitter.com/Aiswarya_Sankar, https://www.linkedin.com/in/sankaraiswarya/) - Rodrigo (https://www.linkedin.com/in/rodrigo-ceballos-lentini/) --------- Co-authored-by: Aiswarya Sankar <aiswaryasankar@Aiswaryas-MacBook-Pro.local> Co-authored-by: aiswaryasankarwork <143119412+aiswaryasankarwork@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-10-05 16:33:05 -07:00
John Reynolds	68901e1e40	Update output_parser.py (#10430 ) - Description: Updated output parser for mrkl to remove any hallucination actions after the final answer; this was encountered when using Anthropic claude v2 for planning; reopening PR with updated unit tests - Issue: #10278 - Dependencies: N/A - Twitter handle: @johnreynolds	2023-10-05 15:47:24 -07:00
Joshua Sundance Bailey	790010703b	ArcGISLoader: Limit number of results in query (#10615 ) Description: this PR changes the `ArcGISLoader` to set `return_all_records` to `False` when `result_record_count` is provided as a keyword argument. Previously, `return_all_records` was `True` by default and this made the API ignore `result_record_count`. Issue: `ArcGISLoader` would ignore `result_record_count` unless user also passed `return_all_records=False`.	2023-10-05 15:46:02 -07:00
mrbean	9903a70379	Add youdotcom retriever (#11304 ) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-10-05 13:48:11 -07:00
ashish-dahal	1655ff2ded	Fix PyMuPDFLoader kwargs (#11434 ) - Description: Fix the `PyMuPDFLoader` to accept `loader_kwargs` from the document loader's `loader_kwargs` option. This provides more flexibility in formatting the output from documents. - Issue: The `loader_kwargs` is not passed into the `load` method from the document loader, which limits configuration options. - Dependencies: None --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-10-05 13:25:19 -07:00
Leonid Kuligin	e4a46747dc	integration test for DocAI parser (#11424 ) - Description: added an integration test - Issue: #11407 @baskaryan	2023-10-05 12:38:29 -07:00
Aashish Saini	2abbdc6ecb	Update bageldb.py (#11421 ) I have restructured the code to ensure uniform handling of ImportError. In place of previously used ValueError, I've adopted the standard practice of raising ImportError with explanatory messages. This modification enhances code readability and clarifies that any problems stem from module importation.	2023-10-05 12:37:56 -07:00
maks-operlejn-ds	2aae1102b0	Instance anonymization (#10501 ) ### Description Add instance anonymization - if `John Doe` will appear twice in the text, it will be treated as the same entity. The difference between `PresidioAnonymizer` and `PresidioReversibleAnonymizer` is that only the second one has a built-in memory, so it will remember anonymization mapping for multiple texts: ``` >>> anonymizer = PresidioAnonymizer() >>> anonymizer.anonymize("My name is John Doe. Hi John Doe!") 'My name is Noah Rhodes. Hi Noah Rhodes!' >>> anonymizer.anonymize("My name is John Doe. Hi John Doe!") 'My name is Brett Russell. Hi Brett Russell!' ``` ``` >>> anonymizer = PresidioReversibleAnonymizer() >>> anonymizer.anonymize("My name is John Doe. Hi John Doe!") 'My name is Noah Rhodes. Hi Noah Rhodes!' >>> anonymizer.anonymize("My name is John Doe. Hi John Doe!") 'My name is Noah Rhodes. Hi Noah Rhodes!' ``` ### Twitter handle @deepsense_ai / @MaksOpp ### Tag maintainer @baskaryan @hwchase17 @hinthornw --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-10-05 11:23:02 -07:00
Kyle Pancamo	203258b4d6	Update pdf.py comment for PyPDFLoader (#10495 ) PyPDF does not chunk at the character level to my understanding. Description: PyPDF does not chunk at the character level, but instead breaks up content by page. Fixup comment --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-10-05 11:22:40 -07:00
Juan Daza	4236ae3851	Added Streaming Capability to SageMaker LLMs (#10535 ) This PR adds the ability to declare a Streaming response in the SageMaker LLM by leveraging the `invoke_endpoint_with_response_stream` capability in `boto3`. It is heavily based on the AWS Blog Post announcement linked [here](https://aws.amazon.com/blogs/machine-learning/elevating-the-generative-ai-experience-introducing-streaming-support-in-amazon-sagemaker-hosting/). It does not add any additional dependencies since it uses the existing `boto3` version. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-10-05 11:08:43 -07:00
Laurentiu Piciu	d9670a5945	openai_functions_multi_agent: solved the case when the "arguments" is valid JSON but it does not contain `actions` key (#10543 ) Description: There are cases when the output from the LLM comes fine (i.e. function_call["arguments"] is a valid JSON object), but it does not contain the key "actions". So I split the validation in 2 steps: loading arguments as JSON and then checking for "actions" in it. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-10-05 11:08:09 -07:00
Eugene Yurtsev	fcccde406d	Add SymbolicMathChain to experiment in preparation for deprecation (#11129 ) Move symbolic math chain to experimental	2023-10-05 13:54:43 -04:00
Holt Skinner	9f73fec057	fix: Update Google Cloud Enterprise Search to Vertex AI Search (#10513 ) - Description: Google Cloud Enterprise Search was renamed to Vertex AI Search - https://cloud.google.com/blog/products/ai-machine-learning/vertex-ai-search-and-conversation-is-now-generally-available - This PR updates the documentation and Retriever class to use the new terminology. - Changed retriever class from `GoogleCloudEnterpriseSearchRetriever` to `GoogleVertexAISearchRetriever` - Updated documentation to specify that `extractive_segments` requires the new [Enterprise edition](https://cloud.google.com/generative-ai-app-builder/docs/about-advanced-features#enterprise-features) to be enabled. - Fixed spelling errors in documentation. - Change parameter for Retriever from `search_engine_id` to `data_store_id` - When this retriever was originally implemented, there was no distinction between a data store and search engine, but now these have been split. - Fixed an issue blocking some users where the api_endpoint can't be set	2023-10-05 10:47:47 -07:00
Patrick Randell	1d678f805f	Additional Weaviate Filter Comparators (#10522 ) ### Description When using Weaviate Self-Retrievers, certain common filter comparators generated by user queries were unimplemented, resulting in errors. This PR implements some of them. All linting and format commands have been run and tests passed. ### Issue #10474 ### Dependencies timestamp module --------- Co-authored-by: Patrick Randell <prandell@deloitte.com.au>	2023-10-05 10:40:04 -07:00
Nuno Campos	79011f835f	Remove str() from RunnableConfigurableAlternatives (#11446 )	2023-10-05 18:40:00 +01:00
Harrison Chase	31d5bd84d7	make vectorstores optional (#11393 )	2023-10-05 10:14:05 -07:00
Eugene Yurtsev	8aa545901a	Update agent type docs (#11137 ) In code docs for agent types	2023-10-05 12:51:14 -04:00
Eugene Yurtsev	3e31d6e35f	Start deprecation of LLMBashChain (#11300 ) In preparation for migration LLMBashChain and related tools add a derprecation warning to the code.	2023-10-05 12:48:22 -04:00
Bagatur	8b6b8bf68c	bump 309 (#11443 )	2023-10-05 09:29:14 -07:00
billytrend-cohere	2ff91a46c0	Add cohere /chat integration (#11389 ) Add cohere /chat integration and an iPython notebook to demonstrate the addition. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-10-05 09:20:47 -07:00
adrienohana	ca346011b7	added interactive login for azure cognitive search vector store (#11360 ) Description: Previously if the access to Azure Cognitive Search was not done via an API key, the default credential was called which doesn't allow to use an interactive login. I simply added the option to use "INTERACTIVE" as a key name, and this will launch a login window upon initialization of the AzureSearch object.	2023-10-05 09:20:18 -07:00
Eugene Yurtsev	5a1f614175	Add docker compose to CLI (#11406 ) Add docker compose to cli	2023-10-05 15:58:56 +01:00
Predrag Gruevski	e2d6c41177	Upgrade langchain dependencies. (#11420 ) I was hoping this would pick up numpy 1.26, which is required to support the new Python 3.12 release, but it didn't. It seems that some transitive dependency requirement on numpy is preventing that, and the highest we can currently go is 1.24.x. But to find this out required a 15min `poetry lock`, so I figured we might as well upgrade the dependencies we can and hopefully make the next dependency upgrade a bit smaller.	2023-10-05 15:57:20 +01:00
Jacob Lee	71fd6428c5	Remove overridden async not implemented method on embeddings filters and add default async implementation for document compressors (#11415 ) @nfcampos @eyurtsev @baskaryan --------- Co-authored-by: Nuno Campos <nuno@boringbits.io>	2023-10-05 15:56:03 +01:00
Nuno Campos	2f490be09b	Fix .dict() for agent/chain (#11436 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-10-05 15:51:21 +01:00
Nuno Campos	1e59c44d36	Nc/5oct/runnable release (#11428 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-10-05 14:27:50 +01:00
Bagatur	58b7a3ba16	Rm bedrock anthropic error (#11403 )	2023-10-04 23:31:51 -04:00
Predrag Gruevski	c9986bc3a9	Tweak type hints to match dependency's behavior. (#11355 ) Needs #11353 to merge first, and a new `langchain` to be published with those changes.	2023-10-04 22:36:58 -04:00
William FH	940b9ae30a	Normalize Option in Scoring Chain (#11412 )	2023-10-04 15:59:28 -07:00
Eugene Yurtsev	70be04a816	CLI: Readme update (#11404 ) Consolidating to a single README for now, will be easier to maintain we can differentiate between poetry and pip later. Does not seem critical. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2023-10-04 16:25:37 -04:00
Nuno Campos	fde19c8667	Add CLI command to create a new project (#7837 ) First version of CLI command to create a new langchain project template Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-10-04 15:43:41 -04:00
mhwang-stripe	9cea796671	Make langchain compatible with SQLAlchemy<1.4.0 (#11390 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> ## Description Currently SQLAlchemy >=1.4.0 is a hard requirement. We are unable to run `from langchain.vectorstores import FAISS` with SQLAlchemy <1.4.0 due to top-level imports, even if we aren't even using parts of the library that use SQLAlchemy. See Testing section for repro. Let's make it so that langchain is still compatible with SQLAlchemy <1.4.0, especially if we aren't using parts of langchain that require it. The main conflict is that SQLAlchemy removed `declarative_base` from `sqlalchemy.ext.declarative` in 1.4.0 and moved it to `sqlalchemy.orm`. We can fix this by try-catching the import. This is the same fix as applied in https://github.com/langchain-ai/langchain/pull/883. (I see that there seems to be some refactoring going on about isolating dependencies, e.g. `c87e9fb2ce`, so if this issue will be eventually fixed by isolating imports in langchain.vectorstores that also works). ## Issue I can't find a matching issue. ## Dependencies No additional dependencies ## Maintainer @hwchase17 since you reviewed https://github.com/langchain-ai/langchain/pull/883 ## Testing I didn't add a test, but I manually tested this. 1. Current failure: ``` langchain==0.0.305 sqlalchemy==1.3.24 ``` ``` python python -i >>> from langchain.vectorstores import FAISS Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/pay/src/zoolander/vendor3/lib/python3.8/site-packages/langchain/vectorstores/__init__.py", line 58, in <module> from langchain.vectorstores.pgembedding import PGEmbedding File "/pay/src/zoolander/vendor3/lib/python3.8/site-packages/langchain/vectorstores/pgembedding.py", line 10, in <module> from sqlalchemy.orm import Session, declarative_base, relationship ImportError: cannot import name 'declarative_base' from 'sqlalchemy.orm' (/pay/src/zoolander/vendor3/lib/python3.8/site-packages/sqlalchemy/orm/__init__.py) ``` 2. This fix: ``` langchain==<this PR> sqlalchemy==1.3.24 ``` ``` python python -i >>> from langchain.vectorstores import FAISS <succeeds> ```	2023-10-04 15:41:20 -04:00
Nuno Campos	4d66756d93	Improve output of Runnable.astream_log() (#11391 ) - Make logs a dictionary keyed by run name (and counter for repeats) - Ensure no output shows up in lc_serializable format - Fix up repr for RunLog and RunLogPatch <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-10-04 20:16:37 +01:00
Lester Solbakken	a30f98f534	Add Vespa vector store (#11329 ) Addition of Vespa vector store integration including notebook showing its use. Maintainer: @lesters Twitter handle: LesterSolbakken	2023-10-04 14:59:11 -04:00
Nuno Campos	58a88f3911	Add optional input_types to prompt template (#11385 ) - default MessagesPlaceholder one to list of messages <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-10-04 18:54:53 +01:00
Tomaz Bratanic	71290315cf	Add optional Cypher validation tool (#11078 ) LLMs have trouble with consistently getting the relationship direction accurately. That's why I organized a competition how to best and most simple to fix it based on the existing schema as a post-processing step. https://github.com/tomasonjo/cypher-direction-competition I am adding the winner's code in this PR: https://github.com/sakusaku-rich/cypher-direction-competition	2023-10-04 12:54:37 -04:00
Bagatur	dd514c2781	bump 308 (#11383 )	2023-10-04 12:10:09 -04:00
Leonid Kuligin	4f4e0f38fc	a better error description when GCP project is not set (#11377 ) - Description: a little bit better error description - Issue: #10879	2023-10-04 11:57:47 -04:00
Nuno Campos	0d80226c64	Add _type to json functions output parser (#11381 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-10-04 16:56:45 +01:00
Bagatur	106608bc89	add default async (#11141 )	2023-10-04 11:40:35 -04:00
Nuno Campos	b0893c7c6a	Use an enum for configurable_alternatives to make the generated json schema nicer (#11350 )	2023-10-04 11:32:41 -04:00
Bagatur	b499de2926	Anthropic system message fix (#11301 ) Removes human prompt prefix before system message for anthropic models Bedrock anthropic api enforces that Human and Assistant messages must be interleaved (cannot have same type twice in a row). We currently treat System Messages as human messages when converting messages -> string prompt. Our validation when using Bedrock/BedrockChat raises an error when this happens. For ChatAnthropic we don't validate this so no error is raised, but perhaps the behavior is still suboptimal	2023-10-04 11:32:24 -04:00
Massimiliano Angelino	2f83350eac	Feat bedrock cohere support (#11230 ) Description: Added support for Cohere command model via Bedrock. With this change it is now possible to use the `cohere.command-text-v14` model via Bedrock API. About Streaming: Cohere model outputs 2 additional chunks at the end of the text being generated via streaming: a chunk containing the text `<EOS_TOKEN>`, and a chunk indicating the end of the stream. In this implementation I chose to ignore both chunks. An alternative solution could be to replace `<EOS_TOKEN>` with `\n` Tests: manually tested that the new model work with both `llm.generate()` and `llm.stream()`. Tested with `temperature`, `p` and `stop` parameters. Issue: #11181 Dependencies: No new dependencies Tag maintainer: @baskaryan Twitter handle: mangelino --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-10-04 11:12:19 -04:00
Daniel Butler	939bceccb0	GitHubIssuesLoader Custom API URL Support (#11378 ) - Description: Adds support for custom API URL in the GitHubIssuesLoader. This allows it to be used with Github enterprise instances.	2023-10-04 10:17:46 -04:00
Bagatur	16a80779b9	bump 307 (#11380 )	2023-10-04 10:03:17 -04:00
mziru	9e3c1d4463	add HTMLHeaderTextSplitter (#11039 ) Description: Similar in concept to the `MarkdownHeaderTextSplitter`, the `HTMLHeaderTextSplitter` is a "structure-aware" chunker that splits text at the element level and adds metadata for each header "relevant" to any given chunk. It can return chunks element by element or combine elements with the same metadata, with the objectives of (a) keeping related text grouped (more or less) semantically and (b) preserving context-rich information encoded in document structures. It can be used with other text splitters as part of a chunking pipeline. Dependency: lxml python package Maintainer: @hwchase17 Twitter handle: @MartinZirulnik --------- Co-authored-by: PresidioVantage <github@presidiovantage.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-10-04 09:24:25 -04:00
Predrag Gruevski	289de601c8	Use parameterized queries to select SQL schemas. (#11356 )	2023-10-04 05:43:30 +01:00
Nuno Campos	b0097f8908	In ProgressBarCallback update the progress counter also when runs fin… (#11332 )	2023-10-04 05:04:59 +01:00
William FH	06f39be1c2	Wfh/eval max concurrency (#11368 )	2023-10-03 20:18:14 -07:00
Aashish Saini	4adb2b399d	Fixed exception type in py files (#11322 ) I've refactored the code to ensure that ImportError is consistently handled. Instead of using ValueError as before, I've now followed the standard practice of raising ImportError along with clear and informative error messages. This change enhances the code's clarity and explicitly signifies that any problems are associated with module imports.	2023-10-03 21:46:26 -04:00
니콜라스	c6d7124675	Add 'device' to GPT4All (#11216 ) Add device to GPT4All - Description: GPT4All now supports GPU. This commit adds the option to enable it. - Issue: It closes https://github.com/langchain-ai/langchain/issues/10486 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-10-03 17:37:30 -07:00
Harrison Chase	6e848b879a	add default for async (#11367 )	2023-10-03 17:28:14 -07:00
Fynn Flügge	0a4baca291	chore: add kotlin code splitter (#11364 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> - Description: Adds Kotlin language to `TextSplitter` --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-10-03 18:35:36 -04:00
Ofer Mendelevitch	b93a08079e	Updates to Vectara Implementation (#11366 ) Replace this entire comment with: - Description: updates to documentation and API headers - Tag maintainer: @baskarya - Twitter handle: @ofermend	2023-10-03 18:34:39 -04:00
Erick Friis	745e3e29da	add getattr case for llms.type_to_cls_dict (#11362 ) For external libraries that depend on `type_to_cls_dict`, adds a workaround to continue using the old format. Recommend people use `get_type_to_cls_dict()` instead and only resolve the imports when they're used.	2023-10-03 14:34:30 -07:00
Vicente Reyes	f3e13e7e5a	Use term keyword according to the official python doc glossary (#11338 ) - Description: use term keyword according to the official python doc glossary, see https://docs.python.org/3/glossary.html - Issue: not applicable - Dependencies: not applicable - Tag maintainer: @hwchase17 - Twitter handle: vreyespue	2023-10-03 12:56:08 -07:00
Predrag Gruevski	5d6b83d9cf	Make a copy of external data instead of mutating another object's attributes. (#11349 ) Fix for a bug surfaced as part of #11339. `mypy` caught this since the types didn't match up.	2023-10-03 15:27:51 -04:00
Predrag Gruevski	42d979efdd	Improve type hints and interface for SQL execution functionality. (#11353 ) The previous API of the `_execute()` function had a few rough edges that this PR addresses: - The `fetch` argument was type-hinted as being able to take any string, but any string other than `"all"` or `"one"` would `raise ValueError`. The new type hints explicitly declare that only those values are supported. - The return type was type-hinted as `Sequence` but using `fetch = "one"` would actually return a single result item. This was incorrectly suppressed using `# type: ignore`. We now always return a list. - Using `fetch = "one"` would return a single item if data was found, or an empty list if no data was found. This was confusing, and we now always return a list to simplify. - The return type was `Sequence[Any]` which was a bit difficult to use since it wasn't clear what one could do with the returned rows. I'm making the new type `Dict[str, Any]` that corresponds to the column names and their values in the query. I've updated the use of this method elsewhere in the file to match the new behavior.	2023-10-03 15:19:08 -04:00
Mohammad Mohtashim	3bddd708f7	Add memory to sql chain (#8597 ) continuation of PR #8550 @hwchase17 please see and merge. And also close the PR #8550. --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2023-10-03 12:04:39 -07:00
Harrison Chase	feabf2e0d5	make llm imports optional (#11237 )	2023-10-03 09:14:15 -07:00
Harrison Chase	88bad37ec2	fix get_tool_return (#11346 )	2023-10-03 09:01:05 -07:00
Harrison Chase	bdf865d8e8	better error message on parsing errors (#11342 )	2023-10-03 09:00:17 -07:00
Eugene Yurtsev	2343302fc6	Remove langserve from langchain repo (#11288 ) LangServe has been moved to a separate repo	2023-10-03 10:48:35 -04:00
William FH	6950b44bfc	Consolidate run collector. Add link helper (#11269 ) Instead of: ``` client = Client() with collect_runs() as cb: chain.invoke() run = cb.traced_runs[0] client.get_run_url(run) ``` it's ``` with tracing_v2_enabled() as cb: chain.invoke() cb.get_run_url() ```	2023-10-03 06:20:58 -07:00
Nuno Campos	0aedbcf7b2	Pass kwargs in runnable retry (#11324 )	2023-10-03 09:55:02 +01:00
Jacob Lee	933655b4ac	Adds Tavily Search API retriever (#11314 ) @baskaryan @efriis	2023-10-02 17:12:17 -07:00
David Duong	3ec970cc11	Mark Vertex AI classes as serialisable (#10484 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. These live is docs/extras directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17, @rlancemartin. --> --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2023-10-02 16:48:21 -07:00
David Duong	db36a0ee99	Make Google PaLM classes serialisable (#11121 ) Similarly to Vertex classes, PaLM classes weren't marked as serialisable. Should be working fine with LangSmith. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2023-10-02 15:46:48 -07:00
CG80499	943e4f30d8	Add scoring chain (#11123 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-10-02 15:15:31 -07:00
Predrag Gruevski	cd2479dfae	Upgrade `langchain` dependency versions to resolve dependabot alerts. (#11307 )	2023-10-02 18:06:41 -04:00
Nuno Campos	4df3191092	Add .configurable_fields() and .configurable_alternatives() to expose fields of a Runnable to be configured at runtime (#11282 )	2023-10-02 21:18:36 +01:00
Eugene Yurtsev	5e2d5047af	add LLMBashChain to experimental (#11305 ) Add LLMBashChain to experimental	2023-10-02 16:00:14 -04:00
Bagatur	38d5b63a10	Bedrock scheduled tests (#11194 )	2023-10-02 15:21:54 -04:00
Eugene Yurtsev	f9b565fa8c	Bump min version of numexpr (#11302 ) Bump min version	2023-10-02 15:06:32 -04:00
William FH	64febf7751	Make numexpr optional (#11049 ) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-10-02 14:42:51 -04:00
Eugene Yurtsev	20b7bd497c	Add pending deprecation warning (#11133 ) This PR uses 2 dedicated LangChain warnings types for deprecations (mirroring python's built in deprecation and pending deprecation warnings). These deprecation types are unslienced during initialization in langchain achieving the same default behavior that we have with our current warnings approach. However, because these warnings have a dedicated type, users will be able to silence them selectively (I think this is strictly better than our current handling of warnings). The PR adds a deprecation warning to llm symbolic math. --------- Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>	2023-10-02 13:55:16 -04:00
Nuno Campos	0638f7b83a	Create new RunnableSerializable base class in preparation for configurable runnables (#11279 ) - Also move RunnableBranch to its own file <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-10-02 17:41:23 +01:00
Bagatur	8eec43ed91	bump 306 (#11289 )	2023-10-02 10:25:08 -04:00
Nuno Campos	c6a720f256	Lint	2023-10-02 10:34:13 +01:00
Nuno Campos	1d46ddd16d	Lint	2023-10-02 10:29:20 +01:00
Nuno Campos	17708fc156	Lint	2023-10-02 10:28:58 +01:00
Nuno Campos	a3b82d1831	Move RunnableWithFallbacks to its own file	2023-10-02 10:26:10 +01:00
Nuno Campos	01dbfc2bc7	Lint	2023-10-02 10:21:40 +01:00
Nuno Campos	a6afd45c63	Lint	2023-10-02 10:14:56 +01:00
Nuno Campos	f7dd10b820	Lint	2023-10-02 10:13:09 +01:00
Nuno Campos	040bb2983d	Lint	2023-10-02 10:11:26 +01:00
Nuno Campos	52e5a8b43e	Create new RunnableSerializable class in preparation for configurable runnables - Also move RunnableBranch to its own file	2023-10-02 10:07:30 +01:00
Yeonji-Lim	61ab1b1266	Fix typo in docstring (#11256 ) Description : Remove meaningless 's' in docstring	2023-10-01 15:55:11 -04:00
Kazuki Maeda	a363ab5292	rename repo namespace to langchain-ai (#11259 ) ### Description renamed several repository links from `hwchase17` to `langchain-ai`. ### Why I discovered that the README file in the devcontainer contains an old repository name, so I took the opportunity to rename the old repository name in all files within the repository, excluding those that do not require changes. ### Dependencies none ### Tag maintainer @baskaryan ### Twitter handle [kzk_maeda](https://twitter.com/kzk_maeda)	2023-10-01 15:30:58 -04:00
Dayuan Jiang	17cdeb72ef	minor fix: remove redundant code from OpenAIFunctionsAgent (#11245 ) minor fix: remove redundant code from OpenAIFunctionsAgent (#11245)	2023-10-01 13:22:15 -04:00
Michael Goin	33eb5f8300	Update DeepSparse LLM (#11236 ) Description: Adds streaming and many more sampling parameters to the DeepSparse interface --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-29 13:55:19 -07:00
Eugene Yurtsev	f91ce4eddf	Bump deps in langserve (#11234 ) Bump deps in langserve lockfile	2023-09-29 16:19:37 -04:00
Haozhe	4c97a10bd0	fix code injection vuln (#11233 ) - Description: Fix a code injection vuln by adding one more keyword into the filtering list - Issue: N/A - Dependencies: N/A - Tag maintainer: - Twitter handle: Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-09-29 16:16:00 -04:00
Eugene Yurtsev	aebdb1ad01	Ignore aadd (#11235 )	2023-09-29 21:10:53 +01:00
Eugene Yurtsev	8b4cb4eb60	Add type to message chunks (#11232 )	2023-09-29 20:14:52 +01:00
Nuno Campos	fb66b392c6	Implement RunnablePassthrough.assign(...) (#11222 ) Passes through dict input and assigns additional keys <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-29 20:12:48 +01:00
Nuno Campos	1ddf9f74b2	Add a streaming json parser (#11193 ) <img width="1728" alt="Screenshot 2023-09-28 at 20 15 01" src="https://github.com/langchain-ai/langchain/assets/56902/ed0644c3-6db7-41b9-9543-e34fce46d3e5"> <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-29 20:09:52 +01:00
Nuno Campos	ee56c616ff	Remove flawed test - It is not possible to access properties on classes, only on instances, therefore this test is not something we can implement	2023-09-29 20:05:33 +01:00
Nuno Campos	f3f3f71811	Lint	2023-09-29 19:57:40 +01:00
Nuno Campos	f6b0b065d3	Update json.py Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-09-29 19:34:35 +01:00
Nuno Campos	cbe18057b0	Update json.py Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-09-29 19:34:27 +01:00
Nuno Campos	aa8b4120a8	Keep exceptions when not in streaming mode	2023-09-29 19:21:27 +01:00
Nuno Campos	1f30e25681	Lint	2023-09-29 18:03:41 +01:00
Nuno Campos	c9d0f2b984	Combine with existing json output parsers	2023-09-29 17:55:30 +01:00
Eugene Yurtsev	b4354b7694	Make tests stricter, remove old code, fix up pydantic import when using v2 (#11231 ) Make tests stricter, remove old code, fix up pydantic import when using v2 (#11231)	2023-09-29 12:47:02 -04:00
Eugene Yurtsev	572968fee3	Using langchain input types (#11204 ) Using langchain input type	2023-09-29 12:37:09 -04:00
Bagatur	77c7c9ab97	bump 305 (#11224 )	2023-09-29 08:55:00 -07:00
Nuno Campos	4b8442896b	Make test deterministic	2023-09-29 16:50:00 +01:00
Attila Tőkés	ba9371854f	OpenAI gpt-3.5-turbo-instruct cost information (#11218 ) Added pricing info for `gpt-3.5-turbo-instruct` for OpenAI and Azure OpenAI. Co-authored-by: Attila Tőkés <atokes@rws.com>	2023-09-29 08:44:55 -07:00
Eugene Yurtsev	de69ea26e8	Suppress warnings in interactive env that stem from tab completion (#11190 ) Suppress warnings in interactive environments that can arise from users relying on tab completion (without even using deprecated modules). jupyter seems to filter warnings by default (at least for me), but ipython surfaces them all	2023-09-29 11:44:30 -04:00
Jon Saginaw	715ffda28b	mongodb doc loader init (#10645 ) - Description: A Document Loader for MongoDB - Issue: n/a - Dependencies: Motor, the async driver for MongoDB - Tag maintainer: n/a - Twitter handle: pigpenblue Note that an initial mongodb document loader was created 4 months ago, but the [PR ](https://github.com/langchain-ai/langchain/pull/4285)was never pulled in. @leo-gan had commented on that PR, but given it is extremely far behind the master branch and a ton has changed in Langchain since then (including repo name and structure), I rewrote the branch and issued a new PR with the expectation that the old one can be closed. Please reference that old PR for comments/context, but it can be closed in favor of this one. Thanks! --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-09-29 11:44:07 -04:00
Nuno Campos	3d8aa88e26	Add async tests and comments	2023-09-29 15:28:46 +01:00
Nuno Campos	4ad0f3de2b	Add RunnableGenerator (#11214 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-29 15:21:37 +01:00
Guy Korland	748a757306	Clean warnings: replace type with isinstance and fix syntax (#11219 ) Clean warnings: replace type with `isinstance` and fix on notebook syntax syntax	2023-09-29 10:06:33 -04:00
Nuno Campos	091d8845d5	Backwards compat	2023-09-29 14:18:38 +01:00
Nuno Campos	4e28a7a513	Implement diff	2023-09-29 14:12:48 +01:00
Nuno Campos	5cbe2b7b6a	Implement diff	2023-09-29 14:12:18 +01:00
Nuno Campos	6c0a6b70e0	WIP Add tests§	2023-09-29 14:11:34 +01:00
Nuno Campos	63f2ef8d1c	Implement str one	2023-09-29 14:11:34 +01:00
Nuno Campos	f672b39cc9	Add a streaming json parser	2023-09-29 14:11:34 +01:00
Nuno Campos	2387647d30	Lint	2023-09-29 14:11:03 +01:00
Nuno Campos	0318cdd33c	Add tests	2023-09-29 12:25:19 +01:00
Nuno Campos	b67db8deaa	Add RunnableGenerator	2023-09-29 12:04:32 +01:00
Nuno Campos	e35ea565d1	Lint	2023-09-29 12:00:56 +01:00
Nuno Campos	7f589ebbc2	Lint	2023-09-29 11:57:01 +01:00
Nuno Campos	8be598f504	Fix invocation	2023-09-29 11:57:01 +01:00
Nuno Campos	6eb6c45c98	Enable creating Tools from any Runnable	2023-09-29 11:57:01 +01:00
Nuno Campos	61b5942adf	Implement better reprs for Runnables (#11175 ) ``` ChatPromptTemplate(messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='You are a nice assistant.')), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], template='{question}'))]) \| RunnableLambda(lambda x: x) \| { chat: FakeListChatModel(responses=["i'm a chatbot"]), llm: FakeListLLM(responses=["i'm a textbot"]) } ``` <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-29 11:56:28 +01:00
Nuno Campos	e8e2b812c9	Even more	2023-09-29 11:54:22 +01:00
Nuno Campos	fc072100fa	skip more	2023-09-29 11:51:48 +01:00
Nuno Campos	7bfee012d5	Skip in py3.8	2023-09-29 11:49:12 +01:00
Nuno Campos	b8e3e1118d	Skip for py3.8	2023-09-29 11:45:20 +01:00
William FH	db05ea2b78	Add from_embeddings for opensearch (#10957 )	2023-09-29 00:00:58 -07:00
William FH	73693c18fc	Add support for project metadata in run_on_dataset (#11200 )	2023-09-28 21:26:37 -07:00
James Braza	b11f21c25f	Updated `LocalAIEmbeddings` docstring to better explain why `openai` (#10946 ) Fixes my misgivings in https://github.com/langchain-ai/langchain/issues/10912	2023-09-28 19:56:42 -07:00
Eugene Yurtsev	2c114fcb5e	Fix web-base loader (#11135 ) Fix initialization https://github.com/langchain-ai/langchain/issues/11095	2023-09-28 19:36:46 -07:00
jreinjr	3bc44b01c0	Typo fix to MathpixPDFLoader - changed processed_file_format default … (#10960 ) …from mmd to md. https://github.com/langchain-ai/langchain/issues/7282 <!-- - Description: minor fix to a breaking typo - MathPixPDFLoader processed_file_format is "mmd" by default, doesn't work, changing to "md" fixes the issue, - Issue: 7282 (https://github.com/langchain-ai/langchain/issues/7282), - Dependencies: none, - Tag maintainer: @hwchase17, - Twitter handle: none --> Co-authored-by: jare0530 <7915+jare0530@users.noreply.ghe.oculus-rep.com>	2023-09-28 19:03:30 -07:00
Dr. Fabien Tarrade	66415eed6e	Support new version of tiktoken that are working with langchain (tag "^0.3.2" => "">=0.3.2,<0.6.0" and python "^3.9" =>">=3.9") (#11006 ) - Description: be able to use langchain with other version than tiktoken 0.3.3 i.e 0.5.1 - Issue: cannot installed the conda-forge version since it applied all optional dependency: https://github.com/conda-forge/langchain-feedstock/pull/85 replace "^0.3.2" by "">=0.3.2,<0.6.0" and "^3.9" by python=">=3.9" Tested with python 3.10, langchain=0.0.288 and tiktoken==0.5.0 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-28 18:53:24 -07:00
Clément Sicard	1b48d6cb8c	`LlamaCppEmbeddings`: adds `verbose` parameter, similar to `llms.LlamaCpp` class (#11038 ) ## Description As of now, when instantiating and during inference, `LlamaCppEmbeddings` outputs (a lot of) verbose when controlled from Langchain binding - it is a bit annoying when computing the embeddings of long documents, for instance. This PR adds `verbose` for `LlamaCppEmbeddings` objects to be able not to print the verbose of the model to `stderr`. It is natively supported by `llama-cpp-python` and directly passed to the library – the PR is hence very small. The value of `verbose` is `True` by default, following the way it is defined in [`LlamaCpp` (`llamacpp.py` #L136-L137)](`c87e9fb2ce/libs/langchain/langchain/llms/llamacpp.py (L136-L137)`) ## Issue _No issue linked_ ## Dependencies _No additional dependency needed_ ## To see it in action ```python from langchain.embeddings import LlamaCppEmbeddings MODEL_PATH = "<path_to_gguf_file>" if __name__ == "__main__": llm_embeddings = LlamaCppEmbeddings( model_path=MODEL_PATH, n_gpu_layers=1, n_batch=512, n_ctx=2048, f16_kv=True, verbose=False, ) ``` Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-28 18:37:51 -07:00
Noah Czelusta	a00a73ef18	Add last_edited_time and created_time props to NotionDBLoader (#11020 ) # Description Adds logic for NotionDBLoader to correctly populate `last_edited_time` and `created_time` fields from [page properties](https://developers.notion.com/reference/page#property-value-object). There are no relevant tests for this code to be updated. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-28 18:37:34 -07:00
Eugene Yurtsev	e06e84b293	LangServe: Relax requirements (#11198 ) Relax requirements	2023-09-28 21:27:19 -04:00
PaperMoose	5d7c6d1bca	Synthetic Data generation (#9472 ) --------- Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-28 18:16:05 -07:00
Donatas Remeika	a4e0cf6300	SearchApi integration (#11023 ) Based on the customers' requests for native langchain integration, SearchApi is ready to invest in AI and LLM space, especially in open-source development. - This is our initial PR and later we want to improve it based on customers' and langchain users' feedback. Most likely changes will affect how the final results string is being built. - We are creating similar native integration in Python and JavaScript. - The next plan is to integrate into Java, Ruby, Go, and others. - Feel free to assign @SebastjanPrachovskij as a main reviewer for any SearchApi-related searches. We will be glad to help and support langchain development.	2023-09-28 18:08:37 -07:00
Bagatur	8cd18a48e4	fix trubrics lint issue (#11202 )	2023-09-28 18:07:50 -07:00
Fynn Flügge	b738ccd91e	chore: add support for TypeScript code splitting (#11160 ) - Description: Adds typescript language to `TextSplitter` --------- Co-authored-by: Jacob Lee <jacoblee93@gmail.com>	2023-09-28 16:41:51 -07:00
Kenneth Choe	17fcbed92c	Support add_embeddings for opensearch (#11050 ) - Description: - Make running integration test for opensearch easy - Provide a way to use different text for embedding: refer to #11002 for more of the use case and design decision. - Issue: N/A - Dependencies: None other than the existing ones.	2023-09-28 16:41:11 -07:00
Jeff Kayne	c586f6dc1b	Callback integration for Trubrics (#11059 ) After contributing to some examples in the [langsmith-cookbook](https://github.com/langchain-ai/langsmith-cookbook) with @hinthornw, here is a PR that adds a callback handler to use LangChain with [Trubrics](https://github.com/trubrics/trubrics-sdk).	2023-09-28 16:20:19 -07:00
Michael Landis	a8db594012	fix: short-circuit black and mypy calls when no changes made (#11051 ) Both black and mypy expect a list of files or directories as input. As-is the Makefile computes a list files changed relative to the last commit; these are passed to black and mypy in the `format_diff` and `lint_diff` targets. This is done by way of the Makefile variable `PYTHON_FILES`. This is to save time by skipping running mypy and black over the whole source tree. When no changes have been made, this variable is empty, so the call to black (and mypy) lacks input files. The call exits with error causing the Makefile target to error out with: ```bash $ make format_diff poetry run black Usage: black [OPTIONS] SRC ... One of 'SRC' or 'code' is required. make: *** [format_diff] Error 1 ``` This is unexpected and undesirable, as the naive caller (that's me! 😄 ) will think something else is wrong. This commit smooths over this by short circuiting when `PYTHON_FILES` is empty.	2023-09-28 16:13:07 -07:00
Michael Kim	fbcd8e02f2	Change type annotations from LLMChain to Chain in MultiPromptChain (#11082 ) - Description: The types of 'destination_chains' and 'default_chain' in 'MultiPromptChain' were changed from 'LLMChain' to 'Chain'. and removed variables declared overlapping with the parent class - Issue: When a class that inherits only Chain and not LLMChain, such as 'SequentialChain' or 'RetrievalQA', is entered in 'destination_chains' and 'default_chain', a pydantic validation error is raised. - - codes ``` retrieval_chain = ConversationalRetrievalChain( retriever=doc_retriever, combine_docs_chain=combine_docs_chain, question_generator=question_gen_chain, ) destination_chains = { 'retrieval': retrieval_chain, } main_chain = MultiPromptChain( router_chain=router_chain, destination_chains=destination_chains, default_chain=default_chain, verbose=True, ) ``` ✅ `make format`, `make lint` and `make test`	2023-09-28 15:59:25 -07:00
Piyush Jain	32d09bcd1e	Expanded version range for networkx, fixed sample notebook (#11094 ) ## Description Expanded the upper bound for `networkx` dependency to allow installation of latest stable version. Tested the included sample notebook with version 3.1, and all steps ran successfully. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-28 15:33:30 -07:00
Piotr Mardziel	b40ecee4b9	FIx eval prompt (#11087 ) Description: fixes a common typo in some of the eval criteria.	2023-09-28 15:21:15 -07:00
Guy Korland	5564833bd2	Add `add_graph_documents` support for FalkorDBGraph (#11122 ) Adding `add_graph_documents` support for FalkorDBGraph and extending the `Neo4JGraph` api so it can support `cypher.py`	2023-09-28 15:03:54 -07:00
Tomaz Bratanic	7d25a65b10	add from_existing_graph to neo4j vector (#11124 ) This PR adds the option to create a Neo4jvector instance from existing graph, which embeds existing text in the database and creates relevant indices.	2023-09-28 15:02:26 -07:00
Noah Stapp	2c952de21a	Add support for MongoDB Atlas $vectorSearch vector search (#11139 ) Adds support for the `$vectorSearch` operator for MongoDBAtlasVectorSearch, which was announced at .Local London (September 26th, 2023). This change maintains breaks compatibility support for the existing `$search` operator used by the original integration (https://github.com/langchain-ai/langchain/pull/5338) due to incompatibilities in the Atlas search implementations. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-28 15:01:03 -07:00
Hugues	b599f91e33	LLMonitor Callback handler: fix bug (#11128 ) Here is a small bug fix for the LLMonitor callback handler. I've also added user identification capabilities.	2023-09-28 15:00:38 -07:00
William FH	e9b51513e9	Shared Executor (#11028 )	2023-09-28 13:30:58 -07:00
Justin Plock	926e4b6bad	[Feat] Add optional client-side encryption to DynamoDB chat history memory (#11115 ) Description: Added optional client-side encryption to the Amazon DynamoDB chat history memory with an AWS KMS Key ID using the [AWS Database Encryption SDK for Python](https://docs.aws.amazon.com/database-encryption-sdk/latest/devguide/python.html) Issue: #7886 Dependencies: [dynamodb-encryption-sdk](https://pypi.org/project/dynamodb-encryption-sdk/) Tag maintainer: @hwchase17 Twitter handle: [@jplock](https://twitter.com/jplock/) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-28 13:29:46 -07:00
Eugene Yurtsev	4947ac2965	Add langserve version (#11195 ) Add langserve version	2023-09-28 16:24:00 -04:00
Joseph McElroy	822fc590d9	[ElasticsearchStore] Improve migration text to ElasticsearchStore (#11158 ) We noticed that as we have been moving developers to the new `ElasticsearchStore` implementation, we want to keep the ElasticVectorSearch class still available as developers transition slowly to the new store. To speed up this process, I updated the blurb giving them a better recommendation of why they should use ElasticsearchStore.	2023-09-28 12:40:18 -07:00
Naveen Tatikonda	9b0029b9c2	[OpenSearch] Add Self Query Retriever Support to OpenSearch (#11184 ) ### Description Add Self Query Retriever Support to OpenSearch ### Maintainers @rlancemartin, @eyurtsev, @navneet1v ### Twitter Handle @OpenSearchProj Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-09-28 12:36:52 -07:00
Arthur Telders	0da484be2c	Add source metadata to OutlookMessageLoader (#11183 ) Description: Add "source" metadata to OutlookMessageLoader This pull request adds the "source" metadata to the OutlookMessageLoader class in the load method. The "source" metadata is required when indexing with RecordManager in order to sync the index documents with a source. Issue: None Dependencies: None Twitter handle: @ATelders Co-authored-by: Arthur Telders <arthur.telders@roquette.com>	2023-09-28 14:58:12 -04:00
Bagatur	3508e582f1	add anthropic scheduled tests and unit tests (#11188 )	2023-09-28 11:47:29 -07:00
Eugene Yurtsev	fd96878c4b	Fix anthropic secret key when passed in via init (#11185 ) Fixes anthropic secret key when passed via init https://github.com/langchain-ai/langchain/issues/11182	2023-09-28 14:21:41 -04:00
Bagatur	f201d80d40	temporarily skip embedding empty string test (#11187 )	2023-09-28 11:20:00 -07:00
Eugene Yurtsev	b3cf9c8759	LangServe: Update langchain requirement for publishing (#11186 ) Update langchain requirement for publishing	2023-09-28 14:11:58 -04:00
mani2348	89ddc7cbb6	Update Bedrock service name to "bedrock-runtime" and model identifiers (#11161 ) - Description: Bedrock updated boto service name to "bedrock-runtime" for the InvokeModel and InvokeModelWithResponseStream APIs. This update also includes new model identifiers for Titan text, embedding and Anthropic. Co-authored-by: Mani Kumar Adari <maniadar@amazon.com>	2023-09-28 09:42:56 -07:00
Eugene Yurtsev	de3e25683e	Expose lc_id as a classmethod (#11176 ) * Expose LC id as a class method * User should not need to know that the last part of the id is the class name	2023-09-28 17:25:27 +01:00
Nuno Campos	5ca461160b	Lint	2023-09-28 17:12:07 +01:00
Nuno Campos	151f27d502	Lint	2023-09-28 16:42:58 +01:00
Eugene Yurtsev	4ba9c16f74	mypy	2023-09-28 11:27:20 -04:00
Eugene Yurtsev	44489e7029	LangServe: Clean up init files (#11174 ) Clean up init files	2023-09-28 11:10:42 -04:00
Akio Nishimura	785b9d47b7	Fix stop key of TextGen. (#11109 ) The key of stopping strings used in text-generation-webui api is [`stopping_strings`](https://github.com/oobabooga/text-generation-webui/blob/main/api-examples/api-example.py#L51), not `stop`. <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-28 11:05:24 -04:00
Eugene Yurtsev	d1d7d0cb27	x	2023-09-28 10:56:50 -04:00
Eugene Yurtsev	c86b2b5e42	x	2023-09-28 10:53:30 -04:00
Eugene Yurtsev	fe4f3b8fdf	x	2023-09-28 10:51:28 -04:00
Eugene Yurtsev	a5b15e9d0f	x	2023-09-28 10:51:17 -04:00
Nuno Campos	5c1f462bb9	Implement better reprs for Runnables	2023-09-28 15:24:51 +01:00
Nan LI	53a9d6115e	Xata chat memory FIX (#11145 ) - Description: Changed data type from `text` to `json` in xata for improved performance. Also corrected the `additionalKwargs` key in the `messages()` function to `additional_kwargs` to adhere to `BaseMessage` requirements. - Issue: The Chathisroty.messages() will return {} of `additional_kwargs`, as the name is wrong for `additionalKwargs` . - Dependencies: N/A - Tag maintainer: N/A - Twitter handle: N/A My PR is passing linting and testing before submitting.	2023-09-28 09:52:15 -04:00
William FH	8ae9b71e41	Async support for OpenAIFunctionsAgentOutputParser (#11140 )	2023-09-28 09:42:59 -04:00
Bagatur	ce08f436db	Expose loads and dumps in load namespace	2023-09-28 09:34:48 -04:00
Nuno Campos	cfa2203c62	Add input/output schemas to runnables (#11063 ) This adds `input_schema` and `output_schema` properties to all runnables, which are Pydantic models for the input and output types respectively. These are inferred from the structure of the Runnable as much as possible, the only manual typing needed is - optionally add type hints to lambdas (which get translated to input/output schemas) - optionally add type hint to RunnablePassthrough These schemas can then be used to create JSON Schema descriptions of input and output types, see the tests - [x] Ensure no InputType and OutputType in our classes use abstract base classes (replace with union of subclasses) - [x] Implement in BaseChain and LLMChain - [x] Implement in RunnableBranch - [x] Implement in RunnableBinding, RunnableMap, RunnablePassthrough, RunnableEach, RunnableRouter - [x] Implement in LLM, Prompt, Chat Model, Output Parser, Retriever - [x] Implement in RunnableLambda from function signature - [x] Implement in Tool <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-28 11:05:15 +01:00
Eugene Yurtsev	b05bb9e136	LangServe (#11046 ) Adds LangServe package * Integrate Runnables with Fast API creating Server and a RemoteRunnable client * Support multiple runnables for a given server * Support sync/async/batch/abatch/stream/astream/astream_log on the client side (using async implementations on server) * Adds validation using annotations (relying on pydantic under the hood) -- this still has some rough edges -- e.g., open api docs do NOT generate correctly at the moment * Uses pydantic v1 namespace Known issues: type translation code doesn't handle a lot of types (e.g., TypedDicts) --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2023-09-28 10:52:44 +01:00
Nuno Campos	77ce9ed6f1	Support using async callback handlers with sync callback manager (#10945 ) The current behaviour just calls the handler without awaiting the coroutine, which results in exceptions/warnings, and obviously doesn't actually execute whatever the callback handler does <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-28 10:39:01 +01:00
Bagatur	48a04aed75	bump 304 (#11147 )	2023-09-27 19:24:09 -07:00
Jonathan Evans	23065f54c0	Added prompt wrapping for Claude with Bedrock (#11090 ) - Description: Prompt wrapping requirements have been implemented on the service side of AWS Bedrock for the Anthropic Claude models to provide parity between Anthropic's offering and Bedrock's offering. This overnight change broke most existing implementations of Claude, Bedrock and Langchain. This PR just steals the the Anthropic LLM implementation to enforce alias/role wrapping and implements it in the existing mechanism for building the request body. This has also been tested to fix the chat_model implementation as well. Happy to answer any further questions or make changes where necessary to get things patched and up to PyPi ASAP, TY. - Issue: No issue opened at the moment, though will update when these roll in. - Dependencies: None --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-27 19:20:07 -07:00
xiaoyu	b87cc8b31e	add 3 property types in metadata for notiondb loader (#8509 ) ### Description: NotionDB supports a number of common property types. I have found three common types that are not included in notiondb loader. When programs loaded them with notiondb, which will cause some metadata information not to be passed to langchain. Therefore, I added three common types: - date - created_time - last_edit_time. ### Issue: no ### Dependencies: No dependencies added :) ### Tag maintainer: @rlancemartin, @eyurtsev ### Twitter handle: @BJTUTC	2023-09-27 17:38:05 -07:00
Harrison Chase	258d67b0ac	Revert "improve the performance of base.py" (#11143 ) Reverts langchain-ai/langchain#8610 this is actually an oversight - this merges all dfs into one df. we DO NOT want to do this - the idea is we work and manipulate multiple dfs	2023-09-27 17:37:29 -07:00
Mohamad Zamini	9306394078	improve the performance of base.py (#8610 ) This removes the use of the intermediate df list and directly concatenates the dataframes if path is a list of strings. The pd.concat function combines the dataframes efficiently, making it faster and more memory-efficient compared to appending dataframes to a list. <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-27 17:36:03 -07:00
Mincoolee	05b75f3f13	feat: add support for arxiv identifier in ArxivAPIWrapper() (#9318 ) - Description: this PR adds the support for arxiv identifier of the ArxivAPIWrapper. I modified the `run()` and `load()` functions in `arxiv.py`, using regex to recognize if the query is in the form of arxiv identifier (see [https://info.arxiv.org/help/find/index.html](https://info.arxiv.org/help/find/index.html)). If so, it will directly search the paper corresponding to the arxiv identifier. I also modified and added tests in `test_arxiv.py`. - Issue: #9047 - Dependencies: N/A - Tag maintainer: N/A --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-27 17:35:16 -07:00
William FH	d3c2ca5656	Enhanced pairwise error (#11131 )	2023-09-27 16:04:43 -07:00
Taqi Jaffri	b7e9db5e73	Stop sequences in fireworks, plus notebook updates (#11136 ) The new Fireworks and FireworksChat implementations are awesome! Added in this PR https://github.com/langchain-ai/langchain/pull/11117 thank you @ZixinYang However, I think stop words were not plumbed correctly. I've made some simple changes to do that, and also updated the notebook to be a bit clearer with what's needed to use both new models. --------- Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>	2023-09-27 16:01:05 -07:00
William FH	33da8bd711	Add Exact match and Regex Match Evaluators (#11132 )	2023-09-27 14:18:07 -07:00
Harrison Chase	e355606b11	add more import checks (#11033 )	2023-09-27 11:17:12 -07:00
Dan Bolser	efb7c459a2	Update base.py (#10843 ) Fixing a typo in the example code in the docstring... You have to start somewhere though right? Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-27 11:15:58 -07:00
tanujtiwari-at	a79f595543	Support extra tools argument for pandas agent toolkit (#11040 ) Description We support adding new tools in some toolkits already like the [SQLAgent toolkit](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/agents/agent_toolkits/sql/base.py#L27). Related [SO](https://stackoverflow.com/questions/76583163/are-langchain-toolkits-able-to-be-modified-can-we-add-tools-to-a-pandas-datafra) thread This replicates the same functionality here, so users can add custom bespoke tools.	2023-09-27 10:57:04 -07:00
Bagatur	410ac8129d	bump 303 (#11120 )	2023-09-27 08:30:33 -07:00
Bagatur	8e4dbae428	Add fireworks chat model (#11117 )	2023-09-27 08:22:12 -07:00
Bagatur	657581dbdf	Fix ChatFireworks typing	2023-09-27 08:15:40 -07:00
Bagatur	12aad659dd	add ChatFireworks to chat_models	2023-09-27 08:11:26 -07:00
Bagatur	872ebdaf90	remove FireworksChat from llms	2023-09-27 08:10:41 -07:00
Bagatur	9451240941	Fix fireworks chat linting issues	2023-09-27 08:09:33 -07:00
Tomáš Dvořák	865a21938c	speed up enforce_stop_tokens helper function (#10984 ) Description: As long as `enforce_stop_tokens` returns a first occurrence, we can speed up the execution by setting the optional `maxsplit` parameter to 1. Tag maintainer: @agola11 @hwchase17 <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-27 05:29:29 -07:00
Austin Walker	bb41252dab	fix: bump min_unstructured_version for UnstructuredAPIFileLoader (#11025 ) Description: New metadata fields were added to `unstructured==0.10.15`, and our hosted api has been updated to reflect this. When users call `partition_via_api` with an older version of the library, they'll hit a parsing error related to the new fields.	2023-09-27 05:28:06 -07:00
William FH	75b3893daf	Fix runnable branch callbacks (#11091 ) We aren't calling on_chain_end here unless we use the default option	2023-09-27 11:38:56 +01:00
Bagatur	6c5251feb0	poetry	2023-09-26 20:12:49 -07:00
Bagatur	5310184f96	poetry	2023-09-26 20:12:29 -07:00
Cynthia Yang	6dd44ff1c0	Refactor Fireworks and add ChatFireworks (#3 ) (#10597 ) Description * Refactor Fireworks within Langchain LLMs. * Remove FireworksChat within Langchain LLMs. * Add ChatFireworks (which uses chat completion api) to Langchain chat models. * Users have to install `fireworks-ai` and register an api key to use the api. Issue - Not applicable Dependencies - None Tag maintainer - @rlancemartin @baskaryan	2023-09-26 20:11:55 -07:00
Bagatur	5514ebe859	Don't type chains in output_parsers (#11092 ) Can't use TYPE_CHECKING style imports for pydantic params because it will try to instantiate the typed object by default.	2023-09-26 17:49:35 -07:00
CG80499	64385c4eae	Make pairwise comparison chain more like LLM as a judge (#11013 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description:: Adds LLM as a judge as an eval chain - Tag maintainer: @hwchase17 Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> --------- Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>	2023-09-26 13:19:04 -07:00
Joseph McElroy	175ef0a55d	[ElasticsearchStore] Enable custom Bulk Args (#11065 ) This enables bulk args like `chunk_size` to be passed down from the ingest methods (from_text, from_documents) to be passed down to the bulk API. This helps alleviate issues where bulk importing a large amount of documents into Elasticsearch was resulting in a timeout. Contribution Shoutout - @elastic - [x] Updated Integration tests --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-26 12:53:50 -07:00
Eugene Yurtsev	d19fd0cfae	LogEntry/LogStream use str instead of uuid for id (#11080 ) Cast the UUID to a string	2023-09-26 20:38:51 +01:00
Bagatur	d85339b9f2	extract sublinks exclude by abs path (#11079 )	2023-09-26 12:26:27 -07:00
Bagatur	7ee8b2d1bf	exclude dirs in async recursive loading (#11077 )	2023-09-26 09:59:04 -07:00
Bagatur	12fb393a43	bump 302 (#11070 )	2023-09-26 08:13:01 -07:00
Bagatur	097ecef06b	refactor web base loader (#11057 )	2023-09-26 08:11:31 -07:00
Bagatur	487611521d	fix root import (#11072 )	2023-09-26 08:11:16 -07:00
Bagatur	a2f7246f0e	skip excluded sublinks before recursion (#11036 )	2023-09-26 02:24:54 -07:00
William FH	4aec587979	Update LangSmith Walkthrough (#11043 )	2023-09-25 22:32:56 -07:00
Harrison Chase	bea78b3271	make warnings more modular (#11047 )	2023-09-25 20:46:43 -07:00
Harrison Chase	c87e9fb2ce	conditional imports (#11017 )	2023-09-25 15:46:32 -07:00
Tomaz Bratanic	0625ab7a9e	Filtering graph schema for Cypher generation (#10577 ) Sometimes you don't want the LLM to be aware of the whole graph schema, and want it to ignore parts of the graph when it is constructing Cypher statements.	2023-09-25 14:14:15 -07:00
Palau	89ef440c14	Kay retriever (#10657 ) - Description: Adding retrievers for [kay.ai](https://kay.ai) and SEC filings powered by Kay and Cybersyn. Kay provides context as a service: it's an API built for RAG. - Issue: N/A - Dependencies: Just added a dep to the [kay](https://pypi.org/project/kay/) package - Tag maintainer: @baskaryan @hwchase17 Discussed in slack - Twtter handle: [@vishalrohra_](https://twitter.com/vishalrohra_) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-25 13:10:13 -07:00
Harrison Chase	5f13668fa0	Harrison/move vectorstore base (#11030 )	2023-09-25 12:44:23 -07:00
Eugene Yurtsev	af5390d416	Add a batch size for cleanup (#10948 ) Add pagination to indexing cleanup to deal with large numbers of documents that need to be deleted.	2023-09-25 14:52:32 -04:00
Eugene Yurtsev	09486ed188	Update Serializable to use classmethods (#10956 )	2023-09-25 18:39:30 +01:00
Taqi Jaffri	b7290f01d8	Batching for hf_pipeline (#10795 ) The huggingface pipeline in langchain (used for locally hosted models) does not support batching. If you send in a batch of prompts, it just processes them serially using the base implementation of _generate: https://github.com/docugami/langchain/blob/master/libs/langchain/langchain/llms/base.py#L1004C2-L1004C29 This PR adds support for batching in this pipeline, so that GPUs can be fully saturated. I updated the accompanying notebook to show GPU batch inference. --------- Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>	2023-09-25 18:23:11 +01:00
Bagatur	aa6e6db8c7	bump 301 (#11018 )	2023-09-25 08:50:47 -07:00
Nuno Campos	956ee981c0	Fix issue where requests wrapper passes auth kwarg twice (#11010 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> Closes #8842	2023-09-25 15:45:04 +01:00
Scotty	88a02076af	fix ChatMessageChunk concat error (#10174 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. These live is docs/extras directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17, @rlancemartin. --> - Description: fix `ChatMessageChunk` concat error - Issue: #10173 - Dependencies: None - Tag maintainer: @baskaryan, @eyurtsev, @rlancemartin - Twitter handle: None --------- Co-authored-by: wangshuai.scotty <wangshuai.scotty@bytedance.com> Co-authored-by: Nuno Campos <nuno@boringbits.io>	2023-09-25 11:17:11 +01:00
Naveen Tatikonda	b0f21e2b50	[OpenSearch] Pass ids using from_texts and indexname in add_texts and search (#10969 ) ### Description This PR makes the following changes to OpenSearch: 1. Pass optional ids with `from_texts` 2. Pass an optional index name with `add_texts` and `search` instead of using the same index name that was used during `from_texts` ### Issue https://github.com/langchain-ai/langchain/issues/10967 ### Maintainers @rlancemartin, @eyurtsev, @navneet1v Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-09-23 16:12:51 -07:00
deanchanter	f945426874	Resolve GHI 10674 (#10977 )	2023-09-23 16:11:52 -07:00
Anar	ff732e10f8	LLMRails Embedding (#10959 ) LLMRails Embedding Integration This PR provides integration with LLMRails. Implemented here are: langchain/embeddings/llm_rails.py docs/extras/integrations/text_embedding/llm_rails.ipynb Hi @hwchase17 after adding our vectorstore integration to langchain with confirmation of you and @baskaryan, now we want to add our embedding integration --------- Co-authored-by: Anar Aliyev <aaliyev@mgmt.cloudnet.services> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-23 16:11:02 -07:00
Michael Feil	94e31647bd	Support for Gradient.ai embedding (#10968 ) Adds support for gradient.ai's embedding model. This will remain a Draft, as the code will likely be refactored with the `pip install gradientai` python sdk.	2023-09-23 16:10:23 -07:00
C.J. Jameson	05d5fcfdf8	fix make-coverage local invocation #10941 (#10974 ) Fix the invocation of `make coverage` in `libs/langchain` Fixes #10941	2023-09-23 16:03:53 -07:00
Bagatur	040d436b3f	Add vertex scheduled test (#10958 )	2023-09-23 15:51:59 -07:00
Piyush Jain	8602a32b7e	Fixes error with providers that don't have model_id (#10966 ) ## Description Fixes error with using the chain for providers that don't have `model_id` field. ![image](https://github.com/langchain-ai/langchain/assets/289369/a86074cf-6c99-4390-a135-b3af7a4f0827)	2023-09-23 15:34:28 -07:00
Nuno Campos	7b13292e35	Remove python eval from vector sql db chain (#10937 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-23 08:51:03 -07:00
Richard Wang	b809c243af	Fix bug in `index` api (#10614 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> - Description: a fix for `index`. - Issue: Not applicable. - Dependencies: None - Tag maintainer: - Twitter handle: richarddwang # Problem Replication code ```python from pprint import pprint from langchain.embeddings import OpenAIEmbeddings from langchain.indexes import SQLRecordManager, index from langchain.schema import Document from langchain.vectorstores import Qdrant from langchain_setup.qdrant import pprint_qdrant_documents, create_inmemory_empty_qdrant # Documents metadata1 = {"source": "fullhell.alchemist"} doc1_1 = Document(page_content="1-1 I have a dog~", metadata=metadata1) doc1_2 = Document(page_content="1-2 I have a daugter~", metadata=metadata1) doc1_3 = Document(page_content="1-3 Ahh! O..Oniichan", metadata=metadata1) doc2 = Document(page_content="2 Lancer died again.", metadata={"source": "fate.docx"}) # Create empty vectorstore collection_name = "secret_of_D_disk" vectorstore: Qdrant = create_inmemory_empty_qdrant() # Create record Manager import tempfile from pathlib import Path record_manager = SQLRecordManager( namespace="qdrant/{collection_name}", db_url=f"sqlite:///{Path(tempfile.gettempdir())/collection_name}.sql", ) record_manager.create_schema() # 必須 sync_result = index( [doc1_1, doc1_2, doc1_2, doc2], record_manager, vectorstore, cleanup="full", source_id_key="source", ) print(sync_result, end="\n\n") pprint_qdrant_documents(vectorstore) ``` <details> <summary>Code of helper functions `pprint_qdrant_documents` and `create_inmemory_empty_qdrant`</summary> ```python def create_inmemory_empty_qdrant(from_texts_kwargs): # Qdrant requires vector size, which can be only know after applying embedder vectorstore = Qdrant.from_texts(["dummy"], location=":memory:", embedding=OpenAIEmbeddings(), from_texts_kwargs) dummy_document_id = vectorstore.client.scroll(vectorstore.collection_name)[0][0].id vectorstore.delete([dummy_document_id]) return vectorstore def pprint_qdrant_documents(vectorstore, limit: int = 100, scroll_kwargs): document_ids, documents = [], [] for record in vectorstore.client.scroll( vectorstore.collection_name, limit=100, scroll_kwargs )[0]: document_ids.append(record.id) documents.append( Document( page_content=record.payload["page_content"], metadata=record.payload["metadata"] or {}, ) ) pprint_documents(documents, document_ids=document_ids) def pprint_document(document: Document = None, document_id=None, return_string=False): displayed_text = "" if document_id: displayed_text += f"Document {document_id}:\n\n" displayed_text += f"{document.page_content}\n\n" metadata_text = pformat(document.metadata, indent=1) if "\n" in metadata_text: displayed_text += f"Metadata:\n{metadata_text}" else: displayed_text += f"Metadata:{metadata_text}" if return_string: return displayed_text else: print(displayed_text) def pprint_documents(documents, document_ids=None): if not document_ids: document_ids = [i + 1 for i in range(len(documents))] displayed_texts = [] for document_id, document in zip(document_ids, documents): displayed_text = pprint_document( document_id=document_id, document=document, return_string=True ) displayed_texts.append(displayed_text) print(f"\n{'-' * 100}\n".join(displayed_texts)) ``` </details> You will get ``` {'num_added': 3, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0} Document 1b19816e-b802-53c0-ad60-5ff9d9b9b911: 1-2 I have a daugter~ Metadata:{'source': 'fullhell.alchemist'} ---------------------------------------------------------------------------------------------------- Document 3362f9bc-991a-5dd5-b465-c564786ce19c: 1-1 I have a dog~ Metadata:{'source': 'fullhell.alchemist'} ---------------------------------------------------------------------------------------------------- Document a4d50169-2fda-5339-a196-249b5f54a0de: 1-2 I have a daugter~ Metadata:{'source': 'fullhell.alchemist'} ``` This is not correct. We should be able to expect that the vectorsotre now includes doc1_1, doc1_2, and doc2, but not doc1_1, doc1_2, and doc1_2. # Reason In `index`, the original code is ```python uids = [] docs_to_index = [] for doc, hashed_doc, doc_exists in zip(doc_batch, hashed_docs, exists_batch): if doc_exists: # Must be updated to refresh timestamp. record_manager.update([hashed_doc.uid], time_at_least=index_start_dt) num_skipped += 1 continue uids.append(hashed_doc.uid) docs_to_index.append(doc) ``` In the aforementioned example, `len(doc_batch) == 4`, but `len(hashed_docs) == len(exists_batch) == 3`. This is because the deduplication of input documents [doc1_1, doc1_2, doc1_2, doc2] is [doc1_1, doc1_2, doc2]. So `index` insert doc1_1, doc1_2, doc1_2 with the uid of doc1_1, doc1_2, doc2. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-09-22 22:41:07 -04:00
Joshua Sundance Bailey	d67b120a41	Make anthropic_api_key a secret str (#10724 ) This PR makes `ChatAnthropic.anthropic_api_key` a `pydantic.SecretStr` to avoid inadvertently exposing API keys when the `ChatAnthropic` object is represented as a str.	2023-09-22 22:06:20 -04:00
Bagatur	1b65779905	fix integration tests (#10952 )	2023-09-22 12:04:38 -07:00
Harrison Chase	9062e36722	Harrison/agents structured (#10911 )	2023-09-22 10:21:23 -07:00
C.J. Jameson	b4d2663beb	CONTRIBUTING.md Quick Start: focus on langchain core; clarify docs and experimental are separate (#10906 ) follow up to https://github.com/langchain-ai/langchain/pull/7959 , explaining better to focus just on langchain core no dependencies twitter @cjcjameson	2023-09-22 10:17:08 -07:00
Michael Landis	f30b4697d4	fix: broken link in libs/langchain README (#10920 ) Description Fixes broken link to `CONTRIBUTING.md` in `libs/langchain/README.md`. Because`libs/langchain/README.md` was copied from the top level README, and because the README contains a link to `.github/CONTRIBUTING.md`, the copied README's link relative path must be updated. This commit fixes that link.	2023-09-22 10:14:19 -07:00
Bagatur	3cb460d5d8	bump 300 (#10940 )	2023-09-22 09:44:47 -07:00
Nuno Campos	3d5e92e3ef	Accept run name arg for non-chain runs (#10935 )	2023-09-22 08:41:25 -07:00
Nuno Campos	aac2d4dcef	In MergerRetriever async call all retrievers in parallel (#10938 )	2023-09-22 08:40:16 -07:00
German Martin	66d5a7e7cf	Add async support to multi-query retriever. (#10873 ) Added async support to the MultiQueryRetriever class. --------- Co-authored-by: Nuno Campos <nuno@boringbits.io>	2023-09-22 08:33:20 -07:00
Leonid Kuligin	9d4b710a48	small fixes to Vertex (#10934 ) Fixed tests, updated the required version of the SDK and a few minor changes after the recent improvement (https://github.com/langchain-ai/langchain/pull/10910)	2023-09-22 08:18:09 -07:00
wo0d	4e58b78102	Fix chat_history message order (#10869 ) Not all databases uses id as default order, so add it explicitly sqlite uses rawid as default order in select statement: [https://www.sqlite.org/lang_createtable.html#rowid](https://www.sqlite.org/lang_createtable.html#rowid), but some other databases like postgresql not behaves like this. since this class supports multiple db engine. we should have an order.	2023-09-22 11:15:59 -04:00
Roman Shaptala	3d40de75c5	Fix default refine prompt template bug (#10928 ) Description: Default refine template does not actually use the refine template defined above, it uses a string with the variable name. @baskaryan, @eyurtsev, @hwchase17	2023-09-22 11:04:28 -04:00
Bagatur	cab55e9bc1	add vertex prod features (#10910 ) - chat vertex async - vertex stream - vertex full generation info - vertex use server-side stopping - model garden async - update docs for all the above in follow up will add [] chat vertex full generation info [] chat vertex retries [] scheduled tests	2023-09-22 01:44:09 -07:00
Bagatur	dccc20b402	add model feat table (#10921 )	2023-09-22 01:10:27 -07:00
William FH	ee8653f62c	Wfh/allow nonparallel (#10914 )	2023-09-21 20:21:01 -07:00
Leonid Kuligin	95e1d1fae6	fix in the docstring (#10902 ) Description: A fix in the documentation on how to use `GoogleSearchAPIWrapper`.	2023-09-21 14:30:32 -07:00
Bagatur	af41bc84e6	bump 299 (#10904 )	2023-09-21 12:56:52 -07:00
Bagatur	9a858a9107	Bagatur/arxiv kwargs (#10903 ) support all arXiv api wrapper kwargs in loader	2023-09-21 12:49:56 -07:00
niklas	e5f420d2bc	Fix typo in URL document loader example (#10585 ) - Description: Fix typo in URL document loader example - Issue: N/A - Dependencies: N/A - Tag maintainer: not urgent	2023-09-21 11:35:27 -07:00
Nuno Campos	ea26c12b23	Fix Runnable.transform() for false-y inputs (#10893 ) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-21 11:27:09 -07:00
Nuno Campos	fcb5aba9f0	Add `Runnable.astream_log()` (#10374 ) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-21 10:19:55 -07:00
Harrison Chase	a1ade48e8f	update agent docs (#10894 )	2023-09-21 09:09:33 -07:00
Bagatur	d37ce48e60	sep base url and loaded url in sub link extraction (#10895 )	2023-09-21 08:47:41 -07:00
Bagatur	24cb5cd379	bump 298 (#10892 )	2023-09-21 08:26:11 -07:00
Bagatur	c1f9cc0bc5	recursive loader add status check (#10891 )	2023-09-21 08:25:43 -07:00
Matvey Arye	6e02c45ca4	Add integration for Timescale Vector(Postgres) (#10650 ) Description: This commit adds a vector store for the Postgres-based vector database (`TimescaleVector`). Timescale Vector(https://www.timescale.com/ai) is PostgreSQL++ for AI applications. It enables you to efficiently store and query billions of vector embeddings in `PostgreSQL`: - Enhances `pgvector` with faster and more accurate similarity search on 1B+ vectors via DiskANN inspired indexing algorithm. - Enables fast time-based vector search via automatic time-based partitioning and indexing. - Provides a familiar SQL interface for querying vector embeddings and relational data. Timescale Vector scales with you from POC to production: - Simplifies operations by enabling you to store relational metadata, vector embeddings, and time-series data in a single database. - Benefits from rock-solid PostgreSQL foundation with enterprise-grade feature liked streaming backups and replication, high-availability and row-level security. - Enables a worry-free experience with enterprise-grade security and compliance. Timescale Vector is available on Timescale, the cloud PostgreSQL platform. (There is no self-hosted version at this time.) LangChain users get a 90-day free trial for Timescale Vector. --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Avthar Sewrathan <avthar@timescale.com>	2023-09-21 07:33:37 -07:00
Michael Feil	55570e54e1	gradient.ai LLM intregration (#10800 ) - Description: This PR implements a new LLM API to https://gradient.ai - Issue: Feature request for LLM #10745 - Dependencies: No additional dependencies are introduced. - Tag maintainer: I am opening this PR for visibility, once ready for review I'll tag. - ```make format && make lint && make test``` is running. - added a `integration` and `mock unit` test. Co-authored-by: michaelfeil <me@michaelfeil.eu> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-21 07:29:16 -07:00
Bagatur	5097007407	cleanup recursive url session (#10863 )	2023-09-21 07:22:13 -07:00
Harrison Chase	777b33b873	fix experimental imports (#10875 )	2023-09-20 23:44:17 -07:00
Harrison Chase	808caca607	beef up agent docs (#10866 )	2023-09-20 23:09:58 -07:00
Sharath Rajasekar	96023f94d9	Add Javelin integration (#10275 ) We are introducing the py integration to Javelin AI Gateway www.getjavelin.io. Javelin is an enterprise-scale fast llm router & gateway. Could you please review and let us know if there is anything missing. Javelin AI Gateway wraps Embedding, Chat and Completion LLMs. Uses javelin_sdk under the covers (pip install javelin_sdk). Author: Sharath Rajasekar, Twitter: @sharathr, @javelinai Thanks!!	2023-09-20 16:36:39 -07:00
Bagatur	957956ba6d	bump 297 (#10861 )	2023-09-20 14:45:49 -07:00
Harrison Chase	1bc3244db9	fix loading of sql chain (#10860 ) Closing #6889	2023-09-20 14:37:49 -07:00
Bagatur	b05a74b106	fix recursive loader (#10856 )	2023-09-20 13:55:47 -07:00
Bagatur	de0a02f507	fix extract sublink bug (#10855 )	2023-09-20 13:30:42 -07:00
Harrison Chase	7dec2d399b	format intermediate steps (#10794 ) Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2023-09-20 13:02:55 -07:00
Harrison Chase	386ef1e654	add agent output parsers (#10790 )	2023-09-20 12:10:09 -07:00
Mukit Momin	67c5950df3	Amazon Bedrock Support Streaming (#10393 ) ### Description - Add support for streaming with `Bedrock` LLM and `BedrockChat` Chat Model. - Bedrock as of now supports streaming for the `anthropic.claude-` and `amazon.titan-` models only, hence support for those have been built. - Also increased the default `max_token_to_sample` for Bedrock `anthropic` model provider to `256` from `50` to keep in line with the `Anthropic` defaults. - Added examples for streaming responses to the bedrock example notebooks. _NOTE:_: This PR fixes the issues mentioned in #9897 and makes that PR redundant.	2023-09-20 11:55:38 -07:00
Bagatur	0749a642f5	Stream refac and vertex streaming (#10470 ) --------- Co-authored-by: Terry Cruz Melo <tcruz@vozy.co> Co-authored-by: Terry Cruz Melo <33166112+TerryCM@users.noreply.github.com>	2023-09-20 11:49:16 -07:00
William FH	f421af8b80	Criteria Parser Improvements (#10824 )	2023-09-20 11:18:33 -07:00
Bagatur	46aa90062b	bump exp 19 (#10851 )	2023-09-20 10:17:52 -07:00
Bagatur	775f3edffd	bump 296 (#10842 )	2023-09-20 08:31:14 -07:00
Bagatur	96a9c27116	fix recursive loader (#10752 ) maintain same base url throughout recursion, yield initial page, fixing recursion depth tracking	2023-09-20 08:16:54 -07:00
Nuno Campos	276125a33b	Use shallow copy on runnable locals (#10825 ) - deep copy prevents storing complex objects in locals	2023-09-20 08:13:06 -07:00
DanielZzz	ebe08412ad	fix: chat_models Qianfan not compatiable with SystemMessage (#10642 ) - Description: QianfanEndpoint bugs for SystemMessages. When the `SystemMessage` is input as the messages to `chat_models.QianfanEndpoint`. A `TypeError` will be raised. - Issue: #10643 - Dependencies: - Tag maintainer: @baskaryan - Twitter handle: no	2023-09-19 22:35:51 -07:00
Massimiliano Pronesti	f0198354d9	fix(embeddings): number of texts in Azure OpenAIEmbeddings batch (#10707 ) This PR addresses the limitation of Azure OpenAI embeddings, which can handle at maximum 16 texts in a batch. This can be solved setting `chunk_size=16`. However, I'd love to have this automated, not to force the user to figure where the issue comes from and how to solve it. Closes #4575. @baskaryan --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-19 21:50:39 -07:00
zhanghexian	0abe996409	add clustered vearch in langchain (#10771 ) --------- Co-authored-by: zhanghexian1 <zhanghexian1@jd.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-19 21:22:23 -07:00
HeTaoPKU	f505320a73	Add Minimax chat model (#10776 ) resolve the merging issues for https://github.com/langchain-ai/langchain/pull/6757 --------- Co-authored-by: 何涛 <taohe@bytedance.com>	2023-09-19 20:43:49 -07:00
Anar	c656a6b966	LLMRails (#10796 ) ### LLMRails Integration This PR provides integration with LLMRails. Implemented here are: langchain/vectorstore/llm_rails.py tests/integration_tests/vectorstores/test_llm_rails.py docs/extras/integrations/vectorstores/llm-rails.ipynb --------- Co-authored-by: Anar Aliyev <aaliyev@mgmt.cloudnet.services> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-19 20:33:33 -07:00
mateai	900dbd1cbe	Substring support for similarity_search_with_score (#10746 ) Description: Possible to filter with substrings in similarity_search_with_score, for example: filter={'user_id': {'substring': 'user'}} --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-19 20:32:44 -07:00
Ansil M B	740eafe41d	Updated return parameter of YouTubeSearchTool (#10743 ) Description: changed return parameter of YouTubeSearchTool 1. changed the returning links of youtube videos by adding prefix "https://www.youtube.com", now this will return the exact links to the videos 2. updated the returning type from 'string' to 'list', which will be more suited for further processings Issue: Fixes #10742 Dependencies: None <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: changed return parameter of YouTubeSearchTool - Issue: the issue # it fixes (if applicable), - Dependencies: None - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-19 17:04:06 -07:00
Harrison Chase	1dae3c383e	Harrison/add submodule to docs (#10803 )	2023-09-19 17:03:32 -07:00
Henry (Hezheng) Yin	c15bbaac31	misc: add gpt-3.5-turbo-instruct to model_token_mapping (#10808 ) A one-line fix to get`max_tokens=-1` working `OpenAI` class for `gpt-3.5-turbo-instruct` model. Closes https://github.com/langchain-ai/langchain/issues/10806	2023-09-19 17:03:16 -07:00
Harrison Chase	d2bee34d4c	Harrison/add vald (#10807 ) Co-authored-by: datelier <57349093+datelier@users.noreply.github.com>	2023-09-19 16:42:52 -07:00
Jacob Lee	bbc3fe259b	Start RunnableBranch callback tags with 1 instead of 0 (#10755 ) Changes to match `RunnableSequences` @eyurtsev	2023-09-19 16:38:08 -07:00
Ziyang Liu	931b292126	Add support for HTTP PUT in the open api agent prompt (#10763 ) Description: This PR adds HTTP PUT support for the langchain openapi agent toolkit by leveraging existing structure and HTTP put request wrapper. The PUT method is almost identical to HTTP POST but should be idempotent and therefore tighter than POST which is not idempotent. Some APIs may consider to use PUT instead of POST which is unfortunately not supported with the current toolkit yet.	2023-09-19 16:37:20 -07:00
Mateusz Wosinski	a29cd89923	Synthetic data generation (#9759 ) ### Description Implements synthetic data generation with the fields and preferences given by the user. Adds showcase notebook. Corresponding prompt was proposed for langchain-hub. ### Example ``` output = chain({"fields": {"colors": ["blue", "yellow"]}, "preferences": {"style": "Make it in a style of a weather forecast."}}) print(output) # {'fields': {'colors': ['blue', 'yellow']}, 'preferences': {'style': 'Make it in a style of a weather forecast.'}, 'text': "Good morning! Today's weather forecast brings a beautiful combination of colors to the sky, with hues of blue and yellow gently blending together like a mesmerizing painting."} ``` ### Twitter handle @deepsense_ai @matt_wosinski --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-19 16:29:50 -07:00
Bagatur	c4a6de3fc9	Revert "Add ChatGLM for llm and chat_model by using ChatGLM API (#9797 )" (#10805 ) @etveritas reverting for now until this is resolved https://github.com/langchain-ai/langchain/pull/9797/files#r1330795585, apologies for merging too eagerly!	2023-09-19 16:23:42 -07:00
Mickaël	c86a1a6710	chore: allow using dataclasses_json dependency v0.6.0 (#10775 ) Description: upgrade the `dataclasses_json` dependency to its latest version ([no real breaking change](https://github.com/lidatong/dataclasses-json/releases/tag/v0.6.0) if used correctly), while allowing previous version to not break other users' setup Issue: I need to use the latest version of that dependency in my project, but `langchain` prevents it. Note: it looks like running `poetry lock --no-update` did some changes to the lockfiles as it was the first time it was with the `macosx_11_0_arm64` architecture 🤷 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-19 16:22:35 -07:00
Bagatur	76dd7480e6	Add batch_size param to Weaviate vector store (#9890 ) cc @mcantillon21 @hsm207 @cs0lar	2023-09-19 16:20:23 -07:00
Mateusz Wosinski	720f6dbaac	Add XMLOutputParser (#10051 ) Description Adds new output parser, this time enabling the output of LLM to be of an XML format. Seems to be particularly useful together with Claude model. Addresses [issue 9820](https://github.com/langchain-ai/langchain/issues/9820). Twitter handle @deepsense_ai @matt_wosinski	2023-09-19 16:17:33 -07:00
etVERITAS	d6df288380	Add ChatGLM for llm and chat_model by using ChatGLM API (#9797 ) using sample: ``` endpoint_url = API URL ChatGLM_llm = ChatGLM( endpoint_url=endpoint_url, api_key=Your API Key by ChatGLM ) print(ChatGLM_llm("hello")) ``` ``` model = ChatChatGLM( chatglm_api_key="api_key", chatglm_api_base="api_base_url", model_name="model_name" ) chain = LLMChain(llm=model) ``` Description: The call of ChatGLM has been adapted. Issue: The call of ChatGLM has been adapted. Dependencies: Need python package `zhipuai` and `aiostream` Tag maintainer: @baskaryan Twitter handle: None I remove the compatibility test for pydantic version 2, because pydantic v2 can't not pickle classmethod,but BaseModel use @root_validator is a classmethod decorator. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-19 16:17:07 -07:00
Harrison Chase	d60145229b	make agent action serializable (#10797 ) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-09-19 16:16:14 -07:00
Maxime Bourliatoux	21b236e5e4	Fixing _InactiveRpcError in MatchingEngine vectorstore (#10056 ) - Description: There was an issue with the MatchingEngine VectorStore, preventing from using it with a public endpoint. In the Google Cloud library there are two similar methods for private or public endpoints : `match()` and `find_neighbors()`. - Issue: Fixes #8378 - This uses the `google.cloud.aiplatform` library : https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform/matching_engine/matching_engine_index_endpoint.py	2023-09-19 16:16:04 -07:00
Sam Chou	4f19ba3065	Azure Search: Remove select field restrictions and expand metadata to other fields, also expose kwargs to searches (#9894 ) Description: If metadata field returned in results, previous behavior unchanged. If metadata field does not exist in results, expand metadata to any fields returned outside of content field. There's precedence for this as well, see the retriever: https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/retrievers/azure_cognitive_search.py#L96C46-L96C46 Issue: #9765 - Ameliorates hard-coding in case you already indexed to cognitive search without a metadata field but rather placed metadata in separate fields. @hwchase17	2023-09-19 16:10:29 -07:00
Piyush Jain	94cf71ecfa	Updated Neptune graph to use boto (#10121 ) ## Description This PR updates the `NeptuneGraph` class to start using the boto API for connecting to the Neptune service. With boto integration, the graph class now supports authenticating requests using Sigv4; this is encapsulated with the boto API, and users only have to ensure they have the correct AWS credentials setup in their workspace to work with the graph class. This PR also introduces a conditional prompt that uses a simpler prompt when using the `Anthropic` model provider. A simpler prompt have seemed to work better for generating cypher queries in our testing. Note: This version will require boto3 version 1.28.38 or greater to work.	2023-09-19 16:03:08 -07:00
Douglas Monsky	d5f1969d55	Introducing Enhanced Functionality to WeaviateHybridSearchRetriever: Accepting Additional Keyword Arguments (#10802 ) Description: This commit enriches the `WeaviateHybridSearchRetriever` class by introducing a new parameter, `hybrid_search_kwargs`, within the `_get_relevant_documents` method. This parameter accommodates arbitrary keyword arguments (`kwargs`) which can be channeled to the inherited public method, `get_relevant_documents`, originating from the `BaseRetriever` class. This modification facilitates more intricate querying capabilities, allowing users to convey supplementary arguments to the `.with_hybrid()` method. This expansion not only makes it possible to perform a more nuanced search targeting specific properties but also grants the ability to boost the weight of searched properties, to carry out a search with a custom vector, and to apply the Fusion ranking method. The documentation has been updated accordingly to delineate these new possibilities in detail. In light of the layered approach in which this search operates, initiating with `query.get()` and then transitioning to `.with_hybrid()`, several advantageous opportunities are unlocked for the hybrid component that were previously unattainable. Here’s a representative example showcasing a query structure that was formerly unfeasible: [Specific Properties Only](https://weaviate.io/developers/weaviate/search/hybrid#selected-properties-only) "The example below illustrates a BM25 search targeting the keyword 'food' exclusively within the 'question' property, integrated with vector search results corresponding to 'food'." ```python response = ( client.query .get("JeopardyQuestion", ["question", "answer"]) .with_hybrid( query="food", properties=["question"], # Will now be possible moving forward alpha=0.25 ) .with_limit(3) .do() ) ``` This functionality is now accessible through my alterations, by conveying `hybrid_search_kwargs={"properties": ["question", "answer"]}` as an argument to `WeaviateHybridSearchRetriever.get_relevant_documents()`. For example: ```python import os from weaviate import Client from langchain.retrievers import WeaviateHybridSearchRetriever client = Client( url=os.getenv("WEAVIATE_CLIENT_URL"), additional_headers={ "X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY"), "Authorization": f"Bearer {os.getenv('WEAVIATE_API_KEY')}", }, ) index_name = "Document" text_key = "content" attributes = ["title", "summary", "header", "url"] retriever = ExtendedWeaviateHybridSearchRetriever( client=client, index_name=index_name, text_key=text_key, attributes=attributes, ) # Warning: to utilize properties in this way, each use property must also be in the list `attributes + [text_key]`. hybrid_search_kwargs = {"properties": ["summary^2", "content"]} query_text = "Some Query Text" relevant_docs = retriever.get_relevant_documents( query=query_text, hybrid_search_kwargs=hybrid_search_kwargs ) ``` In my experience working with the `weaviate-client` library, I have found that these supplementary options stand as vital tools for refining/finetuning searches, notably within multifaceted datasets. As a final note, this implementation supports both backwards and forward (within reason) compatiblity. It accommodates any future additional parameters Weaviate may add to `.with_hybrid()`, without necessitating further alterations. Additional Documentation: For a more comprehensive understanding and to explore a myriad of useful options that are now accessible, please refer to the Weaviate documentation: - [Fusion Ranking Method](https://weaviate.io/developers/weaviate/search/hybrid#fusion-ranking-method) - [Selected Properties Only](https://weaviate.io/developers/weaviate/search/hybrid#selected-properties-only) - [Weight Boost Searched Properties](https://weaviate.io/developers/weaviate/search/hybrid#weight-boost-searched-properties) - [With a Custom Vector](https://weaviate.io/developers/weaviate/search/hybrid#with-a-custom-vector) Tag Maintainer:** @hwchase17 - I have tagged you based on your frequent contributions to the pertinent file, `/retrievers/weaviate_hybrid_search.py`. My apologies if this was not the appropriate choice. Thank you for considering my contribution, I look forward to your feedback, and to future collaboration.	2023-09-19 15:56:22 -07:00
Jacob Lee	61cecf8b1b	Fix for versioned OpenAI instruct models (#10788 ) Versioned OpenAI instruct models may end with numbers, e.g. `gpt-3.5-turbo-instruct-0914`. Fixes https://github.com/langchain-ai/langchainjs/issues/2669 in Python	2023-09-19 15:50:06 -07:00
Cory Zue	62603f2664	make auto-setting the encodings optional, alow explicitly setting it (#10774 ) I was trying to use web loaders on some spanish documentation (e.g. [this site](https://www.fromdoppler.com/es/mailing-tendencias/), but the auto-encoding introduced in https://github.com/langchain-ai/langchain/pull/3602 was detected as "MacRoman" instead of the (correct) "UTF-8". To address this, I've added the ability to disable the auto-encoding, as well as the ability to explicitly tell the loader what encoding to use. - Description: Makes auto-setting the encoding optional in `WebBaseLoader`, and introduces an `encoding` option to explicitly set it. - Dependencies: N/A - Tag maintainer: @hwchase17 - Twitter handle: @czue	2023-09-19 12:59:52 -07:00
Harrison Chase	c68be4eb2b	tool rendering (#10786 )	2023-09-19 12:05:39 -07:00
Aashish Saini	1b050b98f5	Corrected some spelling mistakes and grammatical errors (#10791 ) Corrected some spelling mistakes and grammatical errors CC: @baskaryan, @eyurtsev, @hwchase17. --------- Co-authored-by: Ishita Chauhan <136303787+IshitaChauhanShortHillsAI@users.noreply.github.com> Co-authored-by: Aashish Saini <141953346+AashishSainiShorthillsAI@users.noreply.github.com> Co-authored-by: ManpreetShorthillsAI <142380984+ManpreetShorthillsAI@users.noreply.github.com> Co-authored-by: AryamanJaiswalShorthillsAI <142397527+AryamanJaiswalShorthillsAI@users.noreply.github.com> Co-authored-by: Adarsh Shrivastav <142413097+AdarshKumarShorthillsAI@users.noreply.github.com> Co-authored-by: Vishal <141389263+VishalYadavShorthillsAI@users.noreply.github.com> Co-authored-by: ChetnaGuptaShorthillsAI <142381084+ChetnaGuptaShorthillsAI@users.noreply.github.com> Co-authored-by: PankajKumarShorthillsAI <142473460+PankajKumarShorthillsAI@users.noreply.github.com> Co-authored-by: AbhishekYadavShorthillsAI <142393903+AbhishekYadavShorthillsAI@users.noreply.github.com> Co-authored-by: AmitSinghShorthillsAI <142410046+AmitSinghShorthillsAI@users.noreply.github.com> Co-authored-by: Md Nazish Arman <142379599+MdNazishArmanShorthillsAI@users.noreply.github.com> Co-authored-by: KamalSharmaShorthillsAI <142474019+KamalSharmaShorthillsAI@users.noreply.github.com> Co-authored-by: Lakshya <lakshyagupta87@yahoo.com> Co-authored-by: Aayush <142384656+AayushShorthillsAI@users.noreply.github.com> Co-authored-by: AnujMauryaShorthillsAI <142393269+AnujMauryaShorthillsAI@users.noreply.github.com> Co-authored-by: ishita <chauhanishita5356@gmail.com>	2023-09-19 10:08:59 -07:00
Ahmad Bunni	5272e42b0d	Add namespace to pinecone hybrid search (#10677 ) Description: Pinecone hybrid search is now limited to default namespace. There is no option for the user to provide a namespace to partition an index, which is one of the most important features of pinecone. Resource: https://docs.pinecone.io/docs/namespaces --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-19 08:39:10 -07:00
Bagatur	0d1550da91	Bagatur/bump 295 (#10785 )	2023-09-19 08:22:42 -07:00
Vikram Shitole	a4e858b111	Sagemaker endpoint capability to inject boto3 client for cross account scenarios (#10728 ) - Description: Allow to inject boto3 client for Cross account access type of scenarios in using Sagemaker Endpoint - Issue:#10634 #10184 - Dependencies: None - Tag maintainer: - Twitter handle:lethargicoder Co-authored-by: Vikram(VS) <vssht@amazon.com>	2023-09-19 08:06:12 -07:00
William FH	c8f386db97	Merge metadata + tags in config (#10762 ) Think these should be a merge/update rather than overwrite	2023-09-19 08:00:30 -07:00
BarberAlec	c898a4d7ba	Update ContextCallbackHandler Docstring & metadata key (#10732 ) - Description: Updating URL in Context Callback Docstrings and update metadata key Context CallbackHandler uses to send model names. - Issue: The URL in ContextCallbackHandler is out of date. Model data being sent to Context should be under the "model" key and not "llm_model". This allows Context to do more sophisticated analysis. - Dependencies: None Tagging @agamble.	2023-09-18 22:04:13 -07:00
Harrison Chase	8b68d1a03b	keep reference to old embeddings base (#10759 )	2023-09-18 20:09:44 -07:00
Jacob Lee	babf46692d	Allow extra variables when invoking prompt templates (#10765 ) Makes chaining easier as many maps have extra properties. @baskaryan @hwchase17	2023-09-18 20:08:54 -07:00
Bagatur	8515e27d82	bump 294 (#10751 )	2023-09-18 16:04:02 -07:00
Jacob Lee	579d14fbc1	Allow 3.5-turbo instruct models in the OpenAI LLM class (#10750 ) @baskaryan @hwchase17	2023-09-18 15:55:13 -07:00
Harrison Chase	e404fd39dd	add anthropic page (#10666 )	2023-09-18 11:10:44 -07:00
Bagatur	5072138893	bump 293 (#10740 )	2023-09-18 08:41:38 -07:00
Harrison Chase	12ff780089	move embeddings to schema (#10696 )	2023-09-18 08:37:14 -07:00
Jiayi Ni	ce61840e3b	ENH: Add `llm_kwargs` for Xinference LLMs (#10354 ) - This pr adds `llm_kwargs` to the initialization of Xinference LLMs (integrated in #8171 ). - With this enhancement, users can not only provide `generate_configs` when calling the llms for generation but also during the initialization process. This allows users to include custom configurations when utilizing LangChain features like LLMChain. - It also fixes some format issues for the docstrings.	2023-09-18 11:36:29 -04:00

... 5 6 7 8 9 ...

1603 Commits