langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-08 07:10:35 +00:00

Author	SHA1	Message	Date
Dayuan Jiang	17cdeb72ef	minor fix: remove redundant code from OpenAIFunctionsAgent (#11245 ) minor fix: remove redundant code from OpenAIFunctionsAgent (#11245)	2023-10-01 13:22:15 -04:00
Michael Goin	33eb5f8300	Update DeepSparse LLM (#11236 ) Description: Adds streaming and many more sampling parameters to the DeepSparse interface --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-29 13:55:19 -07:00
Eugene Yurtsev	f91ce4eddf	Bump deps in langserve (#11234 ) Bump deps in langserve lockfile	2023-09-29 16:19:37 -04:00
Haozhe	4c97a10bd0	fix code injection vuln (#11233 ) - Description: Fix a code injection vuln by adding one more keyword into the filtering list - Issue: N/A - Dependencies: N/A - Tag maintainer: - Twitter handle: Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-09-29 16:16:00 -04:00
Eugene Yurtsev	aebdb1ad01	Ignore aadd (#11235 )	2023-09-29 21:10:53 +01:00
Eugene Yurtsev	8b4cb4eb60	Add type to message chunks (#11232 )	2023-09-29 20:14:52 +01:00
Nuno Campos	fb66b392c6	Implement RunnablePassthrough.assign(...) (#11222 ) Passes through dict input and assigns additional keys <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-29 20:12:48 +01:00
Nuno Campos	1ddf9f74b2	Add a streaming json parser (#11193 ) <img width="1728" alt="Screenshot 2023-09-28 at 20 15 01" src="https://github.com/langchain-ai/langchain/assets/56902/ed0644c3-6db7-41b9-9543-e34fce46d3e5"> <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-29 20:09:52 +01:00
Nuno Campos	ee56c616ff	Remove flawed test - It is not possible to access properties on classes, only on instances, therefore this test is not something we can implement	2023-09-29 20:05:33 +01:00
Nuno Campos	f3f3f71811	Lint	2023-09-29 19:57:40 +01:00
Nuno Campos	f6b0b065d3	Update json.py Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-09-29 19:34:35 +01:00
Nuno Campos	cbe18057b0	Update json.py Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-09-29 19:34:27 +01:00
Nuno Campos	aa8b4120a8	Keep exceptions when not in streaming mode	2023-09-29 19:21:27 +01:00
Nuno Campos	1f30e25681	Lint	2023-09-29 18:03:41 +01:00
Nuno Campos	c9d0f2b984	Combine with existing json output parsers	2023-09-29 17:55:30 +01:00
Eugene Yurtsev	b4354b7694	Make tests stricter, remove old code, fix up pydantic import when using v2 (#11231 ) Make tests stricter, remove old code, fix up pydantic import when using v2 (#11231)	2023-09-29 12:47:02 -04:00
Eugene Yurtsev	572968fee3	Using langchain input types (#11204 ) Using langchain input type	2023-09-29 12:37:09 -04:00
Bagatur	77c7c9ab97	bump 305 (#11224 )	2023-09-29 08:55:00 -07:00
Nuno Campos	4b8442896b	Make test deterministic	2023-09-29 16:50:00 +01:00
Attila Tőkés	ba9371854f	OpenAI gpt-3.5-turbo-instruct cost information (#11218 ) Added pricing info for `gpt-3.5-turbo-instruct` for OpenAI and Azure OpenAI. Co-authored-by: Attila Tőkés <atokes@rws.com>	2023-09-29 08:44:55 -07:00
Eugene Yurtsev	de69ea26e8	Suppress warnings in interactive env that stem from tab completion (#11190 ) Suppress warnings in interactive environments that can arise from users relying on tab completion (without even using deprecated modules). jupyter seems to filter warnings by default (at least for me), but ipython surfaces them all	2023-09-29 11:44:30 -04:00
Jon Saginaw	715ffda28b	mongodb doc loader init (#10645 ) - Description: A Document Loader for MongoDB - Issue: n/a - Dependencies: Motor, the async driver for MongoDB - Tag maintainer: n/a - Twitter handle: pigpenblue Note that an initial mongodb document loader was created 4 months ago, but the [PR ](https://github.com/langchain-ai/langchain/pull/4285)was never pulled in. @leo-gan had commented on that PR, but given it is extremely far behind the master branch and a ton has changed in Langchain since then (including repo name and structure), I rewrote the branch and issued a new PR with the expectation that the old one can be closed. Please reference that old PR for comments/context, but it can be closed in favor of this one. Thanks! --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-09-29 11:44:07 -04:00
Nuno Campos	3d8aa88e26	Add async tests and comments	2023-09-29 15:28:46 +01:00
Nuno Campos	4ad0f3de2b	Add RunnableGenerator (#11214 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-29 15:21:37 +01:00
Guy Korland	748a757306	Clean warnings: replace type with isinstance and fix syntax (#11219 ) Clean warnings: replace type with `isinstance` and fix on notebook syntax syntax	2023-09-29 10:06:33 -04:00
Nuno Campos	091d8845d5	Backwards compat	2023-09-29 14:18:38 +01:00
Nuno Campos	4e28a7a513	Implement diff	2023-09-29 14:12:48 +01:00
Nuno Campos	5cbe2b7b6a	Implement diff	2023-09-29 14:12:18 +01:00
Nuno Campos	6c0a6b70e0	WIP Add tests§	2023-09-29 14:11:34 +01:00
Nuno Campos	63f2ef8d1c	Implement str one	2023-09-29 14:11:34 +01:00
Nuno Campos	f672b39cc9	Add a streaming json parser	2023-09-29 14:11:34 +01:00
Nuno Campos	2387647d30	Lint	2023-09-29 14:11:03 +01:00
Nuno Campos	0318cdd33c	Add tests	2023-09-29 12:25:19 +01:00
Nuno Campos	b67db8deaa	Add RunnableGenerator	2023-09-29 12:04:32 +01:00
Nuno Campos	e35ea565d1	Lint	2023-09-29 12:00:56 +01:00
Nuno Campos	7f589ebbc2	Lint	2023-09-29 11:57:01 +01:00
Nuno Campos	8be598f504	Fix invocation	2023-09-29 11:57:01 +01:00
Nuno Campos	6eb6c45c98	Enable creating Tools from any Runnable	2023-09-29 11:57:01 +01:00
Nuno Campos	61b5942adf	Implement better reprs for Runnables (#11175 ) ``` ChatPromptTemplate(messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='You are a nice assistant.')), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], template='{question}'))]) \| RunnableLambda(lambda x: x) \| { chat: FakeListChatModel(responses=["i'm a chatbot"]), llm: FakeListLLM(responses=["i'm a textbot"]) } ``` <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-29 11:56:28 +01:00
Nuno Campos	e8e2b812c9	Even more	2023-09-29 11:54:22 +01:00
Nuno Campos	fc072100fa	skip more	2023-09-29 11:51:48 +01:00
Nuno Campos	7bfee012d5	Skip in py3.8	2023-09-29 11:49:12 +01:00
Nuno Campos	b8e3e1118d	Skip for py3.8	2023-09-29 11:45:20 +01:00
William FH	db05ea2b78	Add from_embeddings for opensearch (#10957 )	2023-09-29 00:00:58 -07:00
William FH	73693c18fc	Add support for project metadata in run_on_dataset (#11200 )	2023-09-28 21:26:37 -07:00
James Braza	b11f21c25f	Updated `LocalAIEmbeddings` docstring to better explain why `openai` (#10946 ) Fixes my misgivings in https://github.com/langchain-ai/langchain/issues/10912	2023-09-28 19:56:42 -07:00
Eugene Yurtsev	2c114fcb5e	Fix web-base loader (#11135 ) Fix initialization https://github.com/langchain-ai/langchain/issues/11095	2023-09-28 19:36:46 -07:00
jreinjr	3bc44b01c0	Typo fix to MathpixPDFLoader - changed processed_file_format default … (#10960 ) …from mmd to md. https://github.com/langchain-ai/langchain/issues/7282 <!-- - Description: minor fix to a breaking typo - MathPixPDFLoader processed_file_format is "mmd" by default, doesn't work, changing to "md" fixes the issue, - Issue: 7282 (https://github.com/langchain-ai/langchain/issues/7282), - Dependencies: none, - Tag maintainer: @hwchase17, - Twitter handle: none --> Co-authored-by: jare0530 <7915+jare0530@users.noreply.ghe.oculus-rep.com>	2023-09-28 19:03:30 -07:00
Dr. Fabien Tarrade	66415eed6e	Support new version of tiktoken that are working with langchain (tag "^0.3.2" => "">=0.3.2,<0.6.0" and python "^3.9" =>">=3.9") (#11006 ) - Description: be able to use langchain with other version than tiktoken 0.3.3 i.e 0.5.1 - Issue: cannot installed the conda-forge version since it applied all optional dependency: https://github.com/conda-forge/langchain-feedstock/pull/85 replace "^0.3.2" by "">=0.3.2,<0.6.0" and "^3.9" by python=">=3.9" Tested with python 3.10, langchain=0.0.288 and tiktoken==0.5.0 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-28 18:53:24 -07:00
Clément Sicard	1b48d6cb8c	`LlamaCppEmbeddings`: adds `verbose` parameter, similar to `llms.LlamaCpp` class (#11038 ) ## Description As of now, when instantiating and during inference, `LlamaCppEmbeddings` outputs (a lot of) verbose when controlled from Langchain binding - it is a bit annoying when computing the embeddings of long documents, for instance. This PR adds `verbose` for `LlamaCppEmbeddings` objects to be able not to print the verbose of the model to `stderr`. It is natively supported by `llama-cpp-python` and directly passed to the library – the PR is hence very small. The value of `verbose` is `True` by default, following the way it is defined in [`LlamaCpp` (`llamacpp.py` #L136-L137)](`c87e9fb2ce/libs/langchain/langchain/llms/llamacpp.py (L136-L137)`) ## Issue _No issue linked_ ## Dependencies _No additional dependency needed_ ## To see it in action ```python from langchain.embeddings import LlamaCppEmbeddings MODEL_PATH = "<path_to_gguf_file>" if __name__ == "__main__": llm_embeddings = LlamaCppEmbeddings( model_path=MODEL_PATH, n_gpu_layers=1, n_batch=512, n_ctx=2048, f16_kv=True, verbose=False, ) ``` Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-28 18:37:51 -07:00
Noah Czelusta	a00a73ef18	Add last_edited_time and created_time props to NotionDBLoader (#11020 ) # Description Adds logic for NotionDBLoader to correctly populate `last_edited_time` and `created_time` fields from [page properties](https://developers.notion.com/reference/page#property-value-object). There are no relevant tests for this code to be updated. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-28 18:37:34 -07:00
Eugene Yurtsev	e06e84b293	LangServe: Relax requirements (#11198 ) Relax requirements	2023-09-28 21:27:19 -04:00
PaperMoose	5d7c6d1bca	Synthetic Data generation (#9472 ) --------- Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-28 18:16:05 -07:00
Donatas Remeika	a4e0cf6300	SearchApi integration (#11023 ) Based on the customers' requests for native langchain integration, SearchApi is ready to invest in AI and LLM space, especially in open-source development. - This is our initial PR and later we want to improve it based on customers' and langchain users' feedback. Most likely changes will affect how the final results string is being built. - We are creating similar native integration in Python and JavaScript. - The next plan is to integrate into Java, Ruby, Go, and others. - Feel free to assign @SebastjanPrachovskij as a main reviewer for any SearchApi-related searches. We will be glad to help and support langchain development.	2023-09-28 18:08:37 -07:00
Bagatur	8cd18a48e4	fix trubrics lint issue (#11202 )	2023-09-28 18:07:50 -07:00
Fynn Flügge	b738ccd91e	chore: add support for TypeScript code splitting (#11160 ) - Description: Adds typescript language to `TextSplitter` --------- Co-authored-by: Jacob Lee <jacoblee93@gmail.com>	2023-09-28 16:41:51 -07:00
Kenneth Choe	17fcbed92c	Support add_embeddings for opensearch (#11050 ) - Description: - Make running integration test for opensearch easy - Provide a way to use different text for embedding: refer to #11002 for more of the use case and design decision. - Issue: N/A - Dependencies: None other than the existing ones.	2023-09-28 16:41:11 -07:00
Jeff Kayne	c586f6dc1b	Callback integration for Trubrics (#11059 ) After contributing to some examples in the [langsmith-cookbook](https://github.com/langchain-ai/langsmith-cookbook) with @hinthornw, here is a PR that adds a callback handler to use LangChain with [Trubrics](https://github.com/trubrics/trubrics-sdk).	2023-09-28 16:20:19 -07:00
Michael Landis	a8db594012	fix: short-circuit black and mypy calls when no changes made (#11051 ) Both black and mypy expect a list of files or directories as input. As-is the Makefile computes a list files changed relative to the last commit; these are passed to black and mypy in the `format_diff` and `lint_diff` targets. This is done by way of the Makefile variable `PYTHON_FILES`. This is to save time by skipping running mypy and black over the whole source tree. When no changes have been made, this variable is empty, so the call to black (and mypy) lacks input files. The call exits with error causing the Makefile target to error out with: ```bash $ make format_diff poetry run black Usage: black [OPTIONS] SRC ... One of 'SRC' or 'code' is required. make: *** [format_diff] Error 1 ``` This is unexpected and undesirable, as the naive caller (that's me! 😄 ) will think something else is wrong. This commit smooths over this by short circuiting when `PYTHON_FILES` is empty.	2023-09-28 16:13:07 -07:00
Michael Kim	fbcd8e02f2	Change type annotations from LLMChain to Chain in MultiPromptChain (#11082 ) - Description: The types of 'destination_chains' and 'default_chain' in 'MultiPromptChain' were changed from 'LLMChain' to 'Chain'. and removed variables declared overlapping with the parent class - Issue: When a class that inherits only Chain and not LLMChain, such as 'SequentialChain' or 'RetrievalQA', is entered in 'destination_chains' and 'default_chain', a pydantic validation error is raised. - - codes ``` retrieval_chain = ConversationalRetrievalChain( retriever=doc_retriever, combine_docs_chain=combine_docs_chain, question_generator=question_gen_chain, ) destination_chains = { 'retrieval': retrieval_chain, } main_chain = MultiPromptChain( router_chain=router_chain, destination_chains=destination_chains, default_chain=default_chain, verbose=True, ) ``` ✅ `make format`, `make lint` and `make test`	2023-09-28 15:59:25 -07:00
Piyush Jain	32d09bcd1e	Expanded version range for networkx, fixed sample notebook (#11094 ) ## Description Expanded the upper bound for `networkx` dependency to allow installation of latest stable version. Tested the included sample notebook with version 3.1, and all steps ran successfully. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-28 15:33:30 -07:00
Piotr Mardziel	b40ecee4b9	FIx eval prompt (#11087 ) Description: fixes a common typo in some of the eval criteria.	2023-09-28 15:21:15 -07:00
Guy Korland	5564833bd2	Add `add_graph_documents` support for FalkorDBGraph (#11122 ) Adding `add_graph_documents` support for FalkorDBGraph and extending the `Neo4JGraph` api so it can support `cypher.py`	2023-09-28 15:03:54 -07:00
Tomaz Bratanic	7d25a65b10	add from_existing_graph to neo4j vector (#11124 ) This PR adds the option to create a Neo4jvector instance from existing graph, which embeds existing text in the database and creates relevant indices.	2023-09-28 15:02:26 -07:00
Noah Stapp	2c952de21a	Add support for MongoDB Atlas $vectorSearch vector search (#11139 ) Adds support for the `$vectorSearch` operator for MongoDBAtlasVectorSearch, which was announced at .Local London (September 26th, 2023). This change maintains breaks compatibility support for the existing `$search` operator used by the original integration (https://github.com/langchain-ai/langchain/pull/5338) due to incompatibilities in the Atlas search implementations. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-28 15:01:03 -07:00
Hugues	b599f91e33	LLMonitor Callback handler: fix bug (#11128 ) Here is a small bug fix for the LLMonitor callback handler. I've also added user identification capabilities.	2023-09-28 15:00:38 -07:00
William FH	e9b51513e9	Shared Executor (#11028 )	2023-09-28 13:30:58 -07:00
Justin Plock	926e4b6bad	[Feat] Add optional client-side encryption to DynamoDB chat history memory (#11115 ) Description: Added optional client-side encryption to the Amazon DynamoDB chat history memory with an AWS KMS Key ID using the [AWS Database Encryption SDK for Python](https://docs.aws.amazon.com/database-encryption-sdk/latest/devguide/python.html) Issue: #7886 Dependencies: [dynamodb-encryption-sdk](https://pypi.org/project/dynamodb-encryption-sdk/) Tag maintainer: @hwchase17 Twitter handle: [@jplock](https://twitter.com/jplock/) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-28 13:29:46 -07:00
Eugene Yurtsev	4947ac2965	Add langserve version (#11195 ) Add langserve version	2023-09-28 16:24:00 -04:00
Joseph McElroy	822fc590d9	[ElasticsearchStore] Improve migration text to ElasticsearchStore (#11158 ) We noticed that as we have been moving developers to the new `ElasticsearchStore` implementation, we want to keep the ElasticVectorSearch class still available as developers transition slowly to the new store. To speed up this process, I updated the blurb giving them a better recommendation of why they should use ElasticsearchStore.	2023-09-28 12:40:18 -07:00
Naveen Tatikonda	9b0029b9c2	[OpenSearch] Add Self Query Retriever Support to OpenSearch (#11184 ) ### Description Add Self Query Retriever Support to OpenSearch ### Maintainers @rlancemartin, @eyurtsev, @navneet1v ### Twitter Handle @OpenSearchProj Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-09-28 12:36:52 -07:00
Arthur Telders	0da484be2c	Add source metadata to OutlookMessageLoader (#11183 ) Description: Add "source" metadata to OutlookMessageLoader This pull request adds the "source" metadata to the OutlookMessageLoader class in the load method. The "source" metadata is required when indexing with RecordManager in order to sync the index documents with a source. Issue: None Dependencies: None Twitter handle: @ATelders Co-authored-by: Arthur Telders <arthur.telders@roquette.com>	2023-09-28 14:58:12 -04:00
Bagatur	3508e582f1	add anthropic scheduled tests and unit tests (#11188 )	2023-09-28 11:47:29 -07:00
Eugene Yurtsev	fd96878c4b	Fix anthropic secret key when passed in via init (#11185 ) Fixes anthropic secret key when passed via init https://github.com/langchain-ai/langchain/issues/11182	2023-09-28 14:21:41 -04:00
Bagatur	f201d80d40	temporarily skip embedding empty string test (#11187 )	2023-09-28 11:20:00 -07:00
Eugene Yurtsev	b3cf9c8759	LangServe: Update langchain requirement for publishing (#11186 ) Update langchain requirement for publishing	2023-09-28 14:11:58 -04:00
mani2348	89ddc7cbb6	Update Bedrock service name to "bedrock-runtime" and model identifiers (#11161 ) - Description: Bedrock updated boto service name to "bedrock-runtime" for the InvokeModel and InvokeModelWithResponseStream APIs. This update also includes new model identifiers for Titan text, embedding and Anthropic. Co-authored-by: Mani Kumar Adari <maniadar@amazon.com>	2023-09-28 09:42:56 -07:00
Eugene Yurtsev	de3e25683e	Expose lc_id as a classmethod (#11176 ) * Expose LC id as a class method * User should not need to know that the last part of the id is the class name	2023-09-28 17:25:27 +01:00
Nuno Campos	5ca461160b	Lint	2023-09-28 17:12:07 +01:00
Nuno Campos	151f27d502	Lint	2023-09-28 16:42:58 +01:00
Eugene Yurtsev	4ba9c16f74	mypy	2023-09-28 11:27:20 -04:00
Eugene Yurtsev	44489e7029	LangServe: Clean up init files (#11174 ) Clean up init files	2023-09-28 11:10:42 -04:00
Akio Nishimura	785b9d47b7	Fix stop key of TextGen. (#11109 ) The key of stopping strings used in text-generation-webui api is [`stopping_strings`](https://github.com/oobabooga/text-generation-webui/blob/main/api-examples/api-example.py#L51), not `stop`. <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-28 11:05:24 -04:00
Eugene Yurtsev	d1d7d0cb27	x	2023-09-28 10:56:50 -04:00
Eugene Yurtsev	c86b2b5e42	x	2023-09-28 10:53:30 -04:00
Eugene Yurtsev	fe4f3b8fdf	x	2023-09-28 10:51:28 -04:00
Eugene Yurtsev	a5b15e9d0f	x	2023-09-28 10:51:17 -04:00
Nuno Campos	5c1f462bb9	Implement better reprs for Runnables	2023-09-28 15:24:51 +01:00
Nan LI	53a9d6115e	Xata chat memory FIX (#11145 ) - Description: Changed data type from `text` to `json` in xata for improved performance. Also corrected the `additionalKwargs` key in the `messages()` function to `additional_kwargs` to adhere to `BaseMessage` requirements. - Issue: The Chathisroty.messages() will return {} of `additional_kwargs`, as the name is wrong for `additionalKwargs` . - Dependencies: N/A - Tag maintainer: N/A - Twitter handle: N/A My PR is passing linting and testing before submitting.	2023-09-28 09:52:15 -04:00
William FH	8ae9b71e41	Async support for OpenAIFunctionsAgentOutputParser (#11140 )	2023-09-28 09:42:59 -04:00
Bagatur	ce08f436db	Expose loads and dumps in load namespace	2023-09-28 09:34:48 -04:00
Nuno Campos	cfa2203c62	Add input/output schemas to runnables (#11063 ) This adds `input_schema` and `output_schema` properties to all runnables, which are Pydantic models for the input and output types respectively. These are inferred from the structure of the Runnable as much as possible, the only manual typing needed is - optionally add type hints to lambdas (which get translated to input/output schemas) - optionally add type hint to RunnablePassthrough These schemas can then be used to create JSON Schema descriptions of input and output types, see the tests - [x] Ensure no InputType and OutputType in our classes use abstract base classes (replace with union of subclasses) - [x] Implement in BaseChain and LLMChain - [x] Implement in RunnableBranch - [x] Implement in RunnableBinding, RunnableMap, RunnablePassthrough, RunnableEach, RunnableRouter - [x] Implement in LLM, Prompt, Chat Model, Output Parser, Retriever - [x] Implement in RunnableLambda from function signature - [x] Implement in Tool <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-28 11:05:15 +01:00
Eugene Yurtsev	b05bb9e136	LangServe (#11046 ) Adds LangServe package * Integrate Runnables with Fast API creating Server and a RemoteRunnable client * Support multiple runnables for a given server * Support sync/async/batch/abatch/stream/astream/astream_log on the client side (using async implementations on server) * Adds validation using annotations (relying on pydantic under the hood) -- this still has some rough edges -- e.g., open api docs do NOT generate correctly at the moment * Uses pydantic v1 namespace Known issues: type translation code doesn't handle a lot of types (e.g., TypedDicts) --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2023-09-28 10:52:44 +01:00
Nuno Campos	77ce9ed6f1	Support using async callback handlers with sync callback manager (#10945 ) The current behaviour just calls the handler without awaiting the coroutine, which results in exceptions/warnings, and obviously doesn't actually execute whatever the callback handler does <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-28 10:39:01 +01:00
Bagatur	48a04aed75	bump 304 (#11147 )	2023-09-27 19:24:09 -07:00
Jonathan Evans	23065f54c0	Added prompt wrapping for Claude with Bedrock (#11090 ) - Description: Prompt wrapping requirements have been implemented on the service side of AWS Bedrock for the Anthropic Claude models to provide parity between Anthropic's offering and Bedrock's offering. This overnight change broke most existing implementations of Claude, Bedrock and Langchain. This PR just steals the the Anthropic LLM implementation to enforce alias/role wrapping and implements it in the existing mechanism for building the request body. This has also been tested to fix the chat_model implementation as well. Happy to answer any further questions or make changes where necessary to get things patched and up to PyPi ASAP, TY. - Issue: No issue opened at the moment, though will update when these roll in. - Dependencies: None --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-27 19:20:07 -07:00
xiaoyu	b87cc8b31e	add 3 property types in metadata for notiondb loader (#8509 ) ### Description: NotionDB supports a number of common property types. I have found three common types that are not included in notiondb loader. When programs loaded them with notiondb, which will cause some metadata information not to be passed to langchain. Therefore, I added three common types: - date - created_time - last_edit_time. ### Issue: no ### Dependencies: No dependencies added :) ### Tag maintainer: @rlancemartin, @eyurtsev ### Twitter handle: @BJTUTC	2023-09-27 17:38:05 -07:00
Harrison Chase	258d67b0ac	Revert "improve the performance of base.py" (#11143 ) Reverts langchain-ai/langchain#8610 this is actually an oversight - this merges all dfs into one df. we DO NOT want to do this - the idea is we work and manipulate multiple dfs	2023-09-27 17:37:29 -07:00
Mohamad Zamini	9306394078	improve the performance of base.py (#8610 ) This removes the use of the intermediate df list and directly concatenates the dataframes if path is a list of strings. The pd.concat function combines the dataframes efficiently, making it faster and more memory-efficient compared to appending dataframes to a list. <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-27 17:36:03 -07:00
Mincoolee	05b75f3f13	feat: add support for arxiv identifier in ArxivAPIWrapper() (#9318 ) - Description: this PR adds the support for arxiv identifier of the ArxivAPIWrapper. I modified the `run()` and `load()` functions in `arxiv.py`, using regex to recognize if the query is in the form of arxiv identifier (see [https://info.arxiv.org/help/find/index.html](https://info.arxiv.org/help/find/index.html)). If so, it will directly search the paper corresponding to the arxiv identifier. I also modified and added tests in `test_arxiv.py`. - Issue: #9047 - Dependencies: N/A - Tag maintainer: N/A --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-27 17:35:16 -07:00
William FH	d3c2ca5656	Enhanced pairwise error (#11131 )	2023-09-27 16:04:43 -07:00
Taqi Jaffri	b7e9db5e73	Stop sequences in fireworks, plus notebook updates (#11136 ) The new Fireworks and FireworksChat implementations are awesome! Added in this PR https://github.com/langchain-ai/langchain/pull/11117 thank you @ZixinYang However, I think stop words were not plumbed correctly. I've made some simple changes to do that, and also updated the notebook to be a bit clearer with what's needed to use both new models. --------- Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>	2023-09-27 16:01:05 -07:00
William FH	33da8bd711	Add Exact match and Regex Match Evaluators (#11132 )	2023-09-27 14:18:07 -07:00
Harrison Chase	e355606b11	add more import checks (#11033 )	2023-09-27 11:17:12 -07:00
Dan Bolser	efb7c459a2	Update base.py (#10843 ) Fixing a typo in the example code in the docstring... You have to start somewhere though right? Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-09-27 11:15:58 -07:00
tanujtiwari-at	a79f595543	Support extra tools argument for pandas agent toolkit (#11040 ) Description We support adding new tools in some toolkits already like the [SQLAgent toolkit](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/agents/agent_toolkits/sql/base.py#L27). Related [SO](https://stackoverflow.com/questions/76583163/are-langchain-toolkits-able-to-be-modified-can-we-add-tools-to-a-pandas-datafra) thread This replicates the same functionality here, so users can add custom bespoke tools.	2023-09-27 10:57:04 -07:00
Bagatur	410ac8129d	bump 303 (#11120 )	2023-09-27 08:30:33 -07:00
Bagatur	8e4dbae428	Add fireworks chat model (#11117 )	2023-09-27 08:22:12 -07:00
Bagatur	657581dbdf	Fix ChatFireworks typing	2023-09-27 08:15:40 -07:00
Bagatur	12aad659dd	add ChatFireworks to chat_models	2023-09-27 08:11:26 -07:00
Bagatur	872ebdaf90	remove FireworksChat from llms	2023-09-27 08:10:41 -07:00
Bagatur	9451240941	Fix fireworks chat linting issues	2023-09-27 08:09:33 -07:00
Tomáš Dvořák	865a21938c	speed up enforce_stop_tokens helper function (#10984 ) Description: As long as `enforce_stop_tokens` returns a first occurrence, we can speed up the execution by setting the optional `maxsplit` parameter to 1. Tag maintainer: @agola11 @hwchase17 <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-27 05:29:29 -07:00
Austin Walker	bb41252dab	fix: bump min_unstructured_version for UnstructuredAPIFileLoader (#11025 ) Description: New metadata fields were added to `unstructured==0.10.15`, and our hosted api has been updated to reflect this. When users call `partition_via_api` with an older version of the library, they'll hit a parsing error related to the new fields.	2023-09-27 05:28:06 -07:00
William FH	75b3893daf	Fix runnable branch callbacks (#11091 ) We aren't calling on_chain_end here unless we use the default option	2023-09-27 11:38:56 +01:00
Bagatur	6c5251feb0	poetry	2023-09-26 20:12:49 -07:00
Bagatur	5310184f96	poetry	2023-09-26 20:12:29 -07:00
Cynthia Yang	6dd44ff1c0	Refactor Fireworks and add ChatFireworks (#3 ) (#10597 ) Description * Refactor Fireworks within Langchain LLMs. * Remove FireworksChat within Langchain LLMs. * Add ChatFireworks (which uses chat completion api) to Langchain chat models. * Users have to install `fireworks-ai` and register an api key to use the api. Issue - Not applicable Dependencies - None Tag maintainer - @rlancemartin @baskaryan	2023-09-26 20:11:55 -07:00
Bagatur	5514ebe859	Don't type chains in output_parsers (#11092 ) Can't use TYPE_CHECKING style imports for pydantic params because it will try to instantiate the typed object by default.	2023-09-26 17:49:35 -07:00
CG80499	64385c4eae	Make pairwise comparison chain more like LLM as a judge (#11013 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description:: Adds LLM as a judge as an eval chain - Tag maintainer: @hwchase17 Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> --------- Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>	2023-09-26 13:19:04 -07:00
Joseph McElroy	175ef0a55d	[ElasticsearchStore] Enable custom Bulk Args (#11065 ) This enables bulk args like `chunk_size` to be passed down from the ingest methods (from_text, from_documents) to be passed down to the bulk API. This helps alleviate issues where bulk importing a large amount of documents into Elasticsearch was resulting in a timeout. Contribution Shoutout - @elastic - [x] Updated Integration tests --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-26 12:53:50 -07:00
Eugene Yurtsev	d19fd0cfae	LogEntry/LogStream use str instead of uuid for id (#11080 ) Cast the UUID to a string	2023-09-26 20:38:51 +01:00
Bagatur	d85339b9f2	extract sublinks exclude by abs path (#11079 )	2023-09-26 12:26:27 -07:00
Bagatur	7ee8b2d1bf	exclude dirs in async recursive loading (#11077 )	2023-09-26 09:59:04 -07:00
Bagatur	12fb393a43	bump 302 (#11070 )	2023-09-26 08:13:01 -07:00
Bagatur	097ecef06b	refactor web base loader (#11057 )	2023-09-26 08:11:31 -07:00
Bagatur	487611521d	fix root import (#11072 )	2023-09-26 08:11:16 -07:00
Bagatur	a2f7246f0e	skip excluded sublinks before recursion (#11036 )	2023-09-26 02:24:54 -07:00
William FH	4aec587979	Update LangSmith Walkthrough (#11043 )	2023-09-25 22:32:56 -07:00
Harrison Chase	bea78b3271	make warnings more modular (#11047 )	2023-09-25 20:46:43 -07:00
Harrison Chase	c87e9fb2ce	conditional imports (#11017 )	2023-09-25 15:46:32 -07:00
Tomaz Bratanic	0625ab7a9e	Filtering graph schema for Cypher generation (#10577 ) Sometimes you don't want the LLM to be aware of the whole graph schema, and want it to ignore parts of the graph when it is constructing Cypher statements.	2023-09-25 14:14:15 -07:00
Palau	89ef440c14	Kay retriever (#10657 ) - Description: Adding retrievers for [kay.ai](https://kay.ai) and SEC filings powered by Kay and Cybersyn. Kay provides context as a service: it's an API built for RAG. - Issue: N/A - Dependencies: Just added a dep to the [kay](https://pypi.org/project/kay/) package - Tag maintainer: @baskaryan @hwchase17 Discussed in slack - Twtter handle: [@vishalrohra_](https://twitter.com/vishalrohra_) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-25 13:10:13 -07:00
Harrison Chase	5f13668fa0	Harrison/move vectorstore base (#11030 )	2023-09-25 12:44:23 -07:00
Eugene Yurtsev	af5390d416	Add a batch size for cleanup (#10948 ) Add pagination to indexing cleanup to deal with large numbers of documents that need to be deleted.	2023-09-25 14:52:32 -04:00
Eugene Yurtsev	09486ed188	Update Serializable to use classmethods (#10956 )	2023-09-25 18:39:30 +01:00
Taqi Jaffri	b7290f01d8	Batching for hf_pipeline (#10795 ) The huggingface pipeline in langchain (used for locally hosted models) does not support batching. If you send in a batch of prompts, it just processes them serially using the base implementation of _generate: https://github.com/docugami/langchain/blob/master/libs/langchain/langchain/llms/base.py#L1004C2-L1004C29 This PR adds support for batching in this pipeline, so that GPUs can be fully saturated. I updated the accompanying notebook to show GPU batch inference. --------- Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>	2023-09-25 18:23:11 +01:00
Bagatur	aa6e6db8c7	bump 301 (#11018 )	2023-09-25 08:50:47 -07:00
Nuno Campos	956ee981c0	Fix issue where requests wrapper passes auth kwarg twice (#11010 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> Closes #8842	2023-09-25 15:45:04 +01:00
Scotty	88a02076af	fix ChatMessageChunk concat error (#10174 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. These live is docs/extras directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17, @rlancemartin. --> - Description: fix `ChatMessageChunk` concat error - Issue: #10173 - Dependencies: None - Tag maintainer: @baskaryan, @eyurtsev, @rlancemartin - Twitter handle: None --------- Co-authored-by: wangshuai.scotty <wangshuai.scotty@bytedance.com> Co-authored-by: Nuno Campos <nuno@boringbits.io>	2023-09-25 11:17:11 +01:00
Naveen Tatikonda	b0f21e2b50	[OpenSearch] Pass ids using from_texts and indexname in add_texts and search (#10969 ) ### Description This PR makes the following changes to OpenSearch: 1. Pass optional ids with `from_texts` 2. Pass an optional index name with `add_texts` and `search` instead of using the same index name that was used during `from_texts` ### Issue https://github.com/langchain-ai/langchain/issues/10967 ### Maintainers @rlancemartin, @eyurtsev, @navneet1v Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-09-23 16:12:51 -07:00
deanchanter	f945426874	Resolve GHI 10674 (#10977 )	2023-09-23 16:11:52 -07:00
Anar	ff732e10f8	LLMRails Embedding (#10959 ) LLMRails Embedding Integration This PR provides integration with LLMRails. Implemented here are: langchain/embeddings/llm_rails.py docs/extras/integrations/text_embedding/llm_rails.ipynb Hi @hwchase17 after adding our vectorstore integration to langchain with confirmation of you and @baskaryan, now we want to add our embedding integration --------- Co-authored-by: Anar Aliyev <aaliyev@mgmt.cloudnet.services> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-09-23 16:11:02 -07:00
Michael Feil	94e31647bd	Support for Gradient.ai embedding (#10968 ) Adds support for gradient.ai's embedding model. This will remain a Draft, as the code will likely be refactored with the `pip install gradientai` python sdk.	2023-09-23 16:10:23 -07:00
C.J. Jameson	05d5fcfdf8	fix make-coverage local invocation #10941 (#10974 ) Fix the invocation of `make coverage` in `libs/langchain` Fixes #10941	2023-09-23 16:03:53 -07:00
Bagatur	040d436b3f	Add vertex scheduled test (#10958 )	2023-09-23 15:51:59 -07:00
Piyush Jain	8602a32b7e	Fixes error with providers that don't have model_id (#10966 ) ## Description Fixes error with using the chain for providers that don't have `model_id` field. ![image](https://github.com/langchain-ai/langchain/assets/289369/a86074cf-6c99-4390-a135-b3af7a4f0827)	2023-09-23 15:34:28 -07:00
Nuno Campos	7b13292e35	Remove python eval from vector sql db chain (#10937 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2023-09-23 08:51:03 -07:00
Richard Wang	b809c243af	Fix bug in `index` api (#10614 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> - Description: a fix for `index`. - Issue: Not applicable. - Dependencies: None - Tag maintainer: - Twitter handle: richarddwang # Problem Replication code ```python from pprint import pprint from langchain.embeddings import OpenAIEmbeddings from langchain.indexes import SQLRecordManager, index from langchain.schema import Document from langchain.vectorstores import Qdrant from langchain_setup.qdrant import pprint_qdrant_documents, create_inmemory_empty_qdrant # Documents metadata1 = {"source": "fullhell.alchemist"} doc1_1 = Document(page_content="1-1 I have a dog~", metadata=metadata1) doc1_2 = Document(page_content="1-2 I have a daugter~", metadata=metadata1) doc1_3 = Document(page_content="1-3 Ahh! O..Oniichan", metadata=metadata1) doc2 = Document(page_content="2 Lancer died again.", metadata={"source": "fate.docx"}) # Create empty vectorstore collection_name = "secret_of_D_disk" vectorstore: Qdrant = create_inmemory_empty_qdrant() # Create record Manager import tempfile from pathlib import Path record_manager = SQLRecordManager( namespace="qdrant/{collection_name}", db_url=f"sqlite:///{Path(tempfile.gettempdir())/collection_name}.sql", ) record_manager.create_schema() # 必須 sync_result = index( [doc1_1, doc1_2, doc1_2, doc2], record_manager, vectorstore, cleanup="full", source_id_key="source", ) print(sync_result, end="\n\n") pprint_qdrant_documents(vectorstore) ``` <details> <summary>Code of helper functions `pprint_qdrant_documents` and `create_inmemory_empty_qdrant`</summary> ```python def create_inmemory_empty_qdrant(from_texts_kwargs): # Qdrant requires vector size, which can be only know after applying embedder vectorstore = Qdrant.from_texts(["dummy"], location=":memory:", embedding=OpenAIEmbeddings(), from_texts_kwargs) dummy_document_id = vectorstore.client.scroll(vectorstore.collection_name)[0][0].id vectorstore.delete([dummy_document_id]) return vectorstore def pprint_qdrant_documents(vectorstore, limit: int = 100, scroll_kwargs): document_ids, documents = [], [] for record in vectorstore.client.scroll( vectorstore.collection_name, limit=100, scroll_kwargs )[0]: document_ids.append(record.id) documents.append( Document( page_content=record.payload["page_content"], metadata=record.payload["metadata"] or {}, ) ) pprint_documents(documents, document_ids=document_ids) def pprint_document(document: Document = None, document_id=None, return_string=False): displayed_text = "" if document_id: displayed_text += f"Document {document_id}:\n\n" displayed_text += f"{document.page_content}\n\n" metadata_text = pformat(document.metadata, indent=1) if "\n" in metadata_text: displayed_text += f"Metadata:\n{metadata_text}" else: displayed_text += f"Metadata:{metadata_text}" if return_string: return displayed_text else: print(displayed_text) def pprint_documents(documents, document_ids=None): if not document_ids: document_ids = [i + 1 for i in range(len(documents))] displayed_texts = [] for document_id, document in zip(document_ids, documents): displayed_text = pprint_document( document_id=document_id, document=document, return_string=True ) displayed_texts.append(displayed_text) print(f"\n{'-' * 100}\n".join(displayed_texts)) ``` </details> You will get ``` {'num_added': 3, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0} Document 1b19816e-b802-53c0-ad60-5ff9d9b9b911: 1-2 I have a daugter~ Metadata:{'source': 'fullhell.alchemist'} ---------------------------------------------------------------------------------------------------- Document 3362f9bc-991a-5dd5-b465-c564786ce19c: 1-1 I have a dog~ Metadata:{'source': 'fullhell.alchemist'} ---------------------------------------------------------------------------------------------------- Document a4d50169-2fda-5339-a196-249b5f54a0de: 1-2 I have a daugter~ Metadata:{'source': 'fullhell.alchemist'} ``` This is not correct. We should be able to expect that the vectorsotre now includes doc1_1, doc1_2, and doc2, but not doc1_1, doc1_2, and doc1_2. # Reason In `index`, the original code is ```python uids = [] docs_to_index = [] for doc, hashed_doc, doc_exists in zip(doc_batch, hashed_docs, exists_batch): if doc_exists: # Must be updated to refresh timestamp. record_manager.update([hashed_doc.uid], time_at_least=index_start_dt) num_skipped += 1 continue uids.append(hashed_doc.uid) docs_to_index.append(doc) ``` In the aforementioned example, `len(doc_batch) == 4`, but `len(hashed_docs) == len(exists_batch) == 3`. This is because the deduplication of input documents [doc1_1, doc1_2, doc1_2, doc2] is [doc1_1, doc1_2, doc2]. So `index` insert doc1_1, doc1_2, doc1_2 with the uid of doc1_1, doc1_2, doc2. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-09-22 22:41:07 -04:00
Joshua Sundance Bailey	d67b120a41	Make anthropic_api_key a secret str (#10724 ) This PR makes `ChatAnthropic.anthropic_api_key` a `pydantic.SecretStr` to avoid inadvertently exposing API keys when the `ChatAnthropic` object is represented as a str.	2023-09-22 22:06:20 -04:00

1 2 3 4 5 ...

1286 Commits