langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-10 01:10:59 +00:00

Author	SHA1	Message	Date
Hrvoje Milković	b7344e3347	community[minor]: Infobip tool integration (#16805 ) Description: Adding Tool that wraps Infobip API for sending sms or emails and email validation. Dependencies: None, Twitter handle: @hmilkovic Implementation: ``` libs/community/langchain_community/utilities/infobip.py ``` Integration tests: ``` libs/community/tests/integration_tests/utilities/test_infobip.py ``` Example notebook: ``` docs/docs/integrations/tools/infobip.ipynb ``` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 19:01:27 +00:00
Luka Krapic	727a2ea9f1	community[patch]: history size support for DynamoDBChatMessageHistory (#16794 ) Description: PR adds support for limiting number of messages preserved in a session history for DynamoDBChatMessageHistory --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 18:56:21 +00:00
Dt22	6dbf1a2de0	community[patch]: fix redis input type for index_schema field (#16874 ) ### Subject: Fix Type Misdeclaration for index_schema in redis/base.py I noticed a type misdeclaration for the index_schema column in the redis/base.py file. When following the instructions outlined in [Redis Custom Metadata Indexing](https://python.langchain.com/docs/integrations/vectorstores/redis) to create our own index_schema, it leads to a Pylance type error. <br/> The error message indicates that Dict[str, list[Dict[str, str]]] is incompatible with the type Optional[Union[Dict[str, str], str, os.PathLike]]. ``` index_schema = { "tag": [{"name": "credit_score"}], "text": [{"name": "user"}, {"name": "job"}], "numeric": [{"name": "age"}], } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users_modified", index_schema=index_schema, ) ``` Therefore, I have created this pull request to rectify the type declaration problem. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 18:55:54 +00:00
morgana	074ad5095f	community[patch]: mmr search for Rockset vectorstore integration (#16908 ) - Description: Adding support for mmr search in the Rockset vectorstore integration. - Issue: N/A - Dependencies: N/A - Twitter handle: `@_morgan_adams_` --------- Co-authored-by: Rockset API Bot <admin@rockset.io> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 18:45:22 +00:00
shahrin014	f51e6a35ba	community[patch]: OllamaEmbeddings - Pass headers to post request (#16880 ) ## Feature - Set additional headers in constructor - Headers will be sent in post request This feature is useful if deploying Ollama on a cloud service such as hugging face, which requires authentication tokens to be passed in the request header. ## Tests - Test if header is passed - Test if header is not passed Similar to https://github.com/langchain-ai/langchain/pull/15881 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 18:44:52 +00:00
Jan Chorowski	b8b42ccbc5	community[minor]: Pathway vectorstore(#14859 ) - Description: Integration with pathway.com data processing pipeline acting as an always updated vectorstore - Issue: not applicable - Dependencies: optional dependency on [`pathway`](https://pypi.org/project/pathway/) - Twitter handle: pathway_com The PR provides and integration with `pathway` to provide an easy to use always updated vector store: ```python import pathway as pw from langchain.embeddings.openai import OpenAIEmbeddings from langchain.text_splitter import CharacterTextSplitter from langchain.vectorstores import PathwayVectorClient, PathwayVectorServer data_sources = [] data_sources.append( pw.io.gdrive.read(object_id="17H4YpBOAKQzEJ93xmC2z170l0bP2npMy", service_user_credentials_file="credentials.json", with_metadata=True)) text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) embeddings_model = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"]) vector_server = PathwayVectorServer( *data_sources, embedder=embeddings_model, splitter=text_splitter, ) vector_server.run_server(host="127.0.0.1", port="8765", threaded=True, with_cache=False) client = PathwayVectorClient( host="127.0.0.1", port="8765", ) query = "What is Pathway?" docs = client.similarity_search(query) ``` The `PathwayVectorServer` builds a data processing pipeline which continusly scans documents in a given source connector (google drive, s3, ...) and builds a vector store. The `PathwayVectorClient` implements LangChain's `VectorStore` interface and connects to the server to retrieve documents. --------- Co-authored-by: Mateusz Lewandowski <lewymati@users.noreply.github.com> Co-authored-by: mlewandowski <mlewandowski@MacBook-Pro-mlewandowski.local> Co-authored-by: Berke <berkecanrizai1@gmail.com> Co-authored-by: Adrian Kosowski <adrian@pathway.com> Co-authored-by: mlewandowski <mlewandowski@macbook-pro-mlewandowski.home> Co-authored-by: berkecanrizai <63911408+berkecanrizai@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: mlewandowski <mlewandowski@MBPmlewandowski.ht.home> Co-authored-by: Szymon Dudycz <szymond@pathway.com> Co-authored-by: Szymon Dudycz <szymon.dudycz@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 10:50:39 -07:00
Arturs Konfino	2319212d54	community[patch]: avoid executing `toolkit.get_context()` when not necessary (#19762 ) If `prompt` is passed into `create_sql_agent()`, then `toolkit.get_context()` shouldn't be executed against the database unless relevant prompt variables (`table_info` or `table_names`) are present .	2024-03-29 16:42:21 +00:00
高璟琦	ec7a59c96c	community[minor]: Add solar embedding (#19761 ) Solar is a large language model developed by [Upstage](https://upstage.ai/). It's a powerful and purpose-trained LLM. You can visit the embedding service provided by Solar within this pr. You may get SOLAR_API_KEY from https://console.upstage.ai/services/embedding You can refer to more details about accepted llm integration at https://python.langchain.com/docs/integrations/llms/solar.	2024-03-29 09:36:05 -07:00
Tomaz Bratanic	dec00d3050	community[patch]: Add the ability to pass maps to neo4j retrieval query (#19758 ) Makes it easier to flatten complex values to text, so you don't have to use a lot of Cypher to do it.	2024-03-29 08:33:48 -07:00
Robby	f7e8a382cc	community[minor]: add hugging face text-to-speech inference API (#18880 ) Description: I implemented a tool to use Hugging Face text-to-speech inference API. Issue: n/a Dependencies: n/a Twitter handle: No Twitter, but do have [LinkedIn](https://www.linkedin.com/in/robby-horvath/) lol. --------- Co-authored-by: Robby <h0rv@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-03-29 15:02:29 +00:00
DasDingoCodes	73eb3f8fd9	community[minor]: Implement DirectoryLoader lazy_load function (#19537 ) Thank you for contributing to LangChain! - [x] PR title: "community: Implement DirectoryLoader lazy_load function" - [x] Description: The `lazy_load` function of the `DirectoryLoader` yields each document separately. If the given `loader_cls` of the `DirectoryLoader` also implemented `lazy_load`, it will be used to yield subdocuments of the file. - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access: `libs/community/tests/unit_tests/document_loaders/test_directory_loader.py` 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory: `docs/docs/integrations/document_loaders/directory.ipynb` - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-03-29 14:46:52 +00:00
Jialei	f7c903e24a	community[minor]: add support for Moonshot llm and chat model (#17100 )	2024-03-29 08:54:23 +00:00
Ethan Yang	7164015135	community[minor]: Add Openvino embedding support (#19632 ) This PR is used to support both HF and BGE embeddings with openvino --------- Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>	2024-03-29 01:34:51 -07:00
T Cramer	540ebf35a9	community[patch]: Add explicit error message to Bedrock error output. (#17328 ) - Description: Propagate Bedrock errors into Langchain explicitly. Use-case: unset region error is hidden behind 'Could not load credentials...' message - Issue: [17654](https://github.com/langchain-ai/langchain/issues/17654) - Dependencies: None --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 03:07:33 +00:00
Marcus Virginia	69bb96c80f	community[patch]: surrealdb handle for empty metadata and allow collection names with complex characters (#17374 ) - Description: Handle for empty metadata and allow collection names with complex characters - Issue: #17057 - Dependencies: `surrealdb` --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 01:04:27 +00:00
kYLe	124ab79c23	community[minor]: Add Anyscale embedding support (#17605 ) Description: Add embedding model support for Anyscale Endpoint Dependencies: openai --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:53:53 +00:00
Lance Martin	12843f292f	community[patch]: llama cpp embeddings reset default n_batch (#17594 ) When testing Nomic embeddings -- ``` from langchain_community.embeddings import LlamaCppEmbeddings embd_model_path = "/Users/rlm/Desktop/Code/llama.cpp/models/nomic-embd/nomic-embed-text-v1.Q4_K_S.gguf" embd_lc = LlamaCppEmbeddings(model_path=embd_model_path) embedding_lc = embd_lc.embed_query(query) ``` We were seeing this error for strings > a certain size -- ``` File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/llama.py:827, in Llama.embed(self, input, normalize, truncate, return_count) 824 s_sizes = [] 826 # add to batch --> 827 self._batch.add_sequence(tokens, len(s_sizes), False) 828 t_batch += n_tokens 829 s_sizes.append(n_tokens) File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/_internals.py:542, in _LlamaBatch.add_sequence(self, batch, seq_id, logits_all) 540 self.batch.token[j] = batch[i] 541 self.batch.pos[j] = i --> 542 self.batch.seq_id[j][0] = seq_id 543 self.batch.n_seq_id[j] = 1 544 self.batch.logits[j] = logits_all ValueError: NULL pointer access ``` The default `n_batch` of llama-cpp-python's Llama is `512` but we were explicitly setting it to `8`. These need to be set to equal for embedding models. * The embedding.cpp example has an assertion to make sure these are always equal. * Apparently this is not being done properly in llama-cpp-python. With `n_batch` set to 8, if more than 8 tokens are passed the batch runs out of space and it crashes. This also explains why the CPU compute buffer size was small: raw client with default `n_batch=512` ``` llama_new_context_with_model: CPU input buffer size = 3.51 MiB llama_new_context_with_model: CPU compute buffer size = 21.00 MiB ``` langchain with `n_batch=8` ``` llama_new_context_with_model: CPU input buffer size = 0.04 MiB llama_new_context_with_model: CPU compute buffer size = 0.33 MiB ``` We can work around this by passing `n_batch=512`, but this will not be obvious to some users: ``` embedding = LlamaCppEmbeddings(model_path=embd_model_path, n_batch=512) ``` From discussion w/ @cebtenzzre. Related: https://github.com/abetlen/llama-cpp-python/issues/1189 Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:47:22 +00:00
Zijian Han	8e976545f3	community[patch]: support OpenAI whisper base url (#17695 ) Description: The base URL for OpenAI is retrieved from the environment variable "OPENAI_BASE_URL", whereas for langchain it is obtained from "OPENAI_API_BASE". By adding `base_url = os.environ.get("OPENAI_API_BASE")`, the OpenAI proxy can execute correctly. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:35:27 +00:00
Paulo Nascimento	44a3484503	community[patch]: add NotebookLoader unit test (#17721 ) Thank you for contributing to LangChain! - Description: added unit tests for NotebookLoader. Linked PR: https://github.com/langchain-ai/langchain/pull/17614 - Issue: [#17614](https://github.com/langchain-ai/langchain/pull/17614) - Twitter handle: @paulodoestech - [x] Pass lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified to check that you're passing lint and testing. See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/ - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: lachiewalker <lachiewalker1@hotmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:27:46 +00:00
Paulo Nascimento	4c3a67122f	community[patch]: add Integration for OpenAI image gen with v1 sdk (#17771 ) Description: Created a Langchain Tool for OpenAI DALLE Image Generation. Issue: [#15901](https://github.com/langchain-ai/langchain/issues/15901) Dependencies: n/a Twitter handle: @paulodoestech - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:23:14 +00:00
Jiaming	3d3cc71287	community[patch]: fix bugs for bilibili Loader (#18036 ) - Description: 1. Fix the BiliBiliLoader that can receive cookie parameters, it requires 3 other parameters to run. The change is backward compatible. 2. Add test; 3. Add example in docs - Issue: [#14213] Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-28 16:39:38 -07:00
Sachin Paryani	25c9f3d1d1	community[patch]: Support Streaming in Azure Machine Learning (#18246 ) - [x] PR title: "community: Support streaming in Azure ML and few naming changes" - [x] PR message: - Description: Added support for streaming for azureml_endpoint. Also, renamed and AzureMLEndpointApiType.realtime to AzureMLEndpointApiType.dedicated. Also, added new classes CustomOpenAIChatContentFormatter and CustomOpenAIContentFormatter and updated the classes LlamaChatContentFormatter and LlamaContentFormatter to now show a deprecated warning message when instantiated. --------- Co-authored-by: Sachin Paryani <saparan@microsoft.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 23:38:20 +00:00
Victor Adan	afa2d85405	community[patch]: Added missing from_documents method to KNNRetriever. (#18411 ) - Description: Added missing `from_documents` method to `KNNRetriever`, providing the ability to supply metadata to LangChain `Document`s, and to give it parity to the other retrievers, which do have `from_documents`. - Issue: None - Dependencies: None - Twitter handle: None Co-authored-by: Victor Adan <vadan@netroadshow.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-28 23:18:50 +00:00
Smit Parmar	dfc4177b50	community[patch]: mypy ignore fix (#18483 ) Relates to #17048 Description : Applied fix to dynamodb and elasticsearch file. Error was : `Cannot override writeable attribute with read-only property` Suggestion: instead of adding ``` @messages.setter def messages(self, messages: List[BaseMessage]) -> None: raise NotImplementedError("Use add_messages instead") ``` we can change base class property `messages: List[BaseMessage]` to ``` @property def messages(self) -> List[BaseMessage]:... ``` then we don't need to add `@messages.setter` in all child classes.	2024-03-28 15:36:53 -07:00
Luca Dorigo	f19229c564	core[patch]: fix beta, deprecated typing (#18877 ) Description: While not technically incorrect, the TypeVar used for the `@beta` decorator prevented pyright (and thus most vscode users) from correctly seeing the types of functions/classes decorated with `@beta`. This is in part due to a small bug in pyright (https://github.com/microsoft/pyright/issues/7448 ) - however, the `Type` bound in the typevar `C = TypeVar("C", Type, Callable)` is not doing anything - classes are `Callables` by default, so by my understanding binding to `Type` does not actually provide any more safety - the modified annotation still works correctly for both functions, properties, and classes. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 22:33:43 +00:00
wulixuan	b7c8bc8268	community[patch]: fix yuan2 errors in LLMs (#19004 ) 1. fix yuan2 errors while invoke Yuan2. 2. update tests.	2024-03-28 14:37:44 -07:00
高远	688ca48019	community[patch]: Adding validation when vector does not exist (#19698 ) Adding validation when vector does not exist Co-authored-by: gaoyuan <gaoyuan.20001218@bytedance.com>	2024-03-28 13:58:23 -07:00
Chaunte W. Lacewell	4a49fc5a95	community[patch]: Fix bug in vdms (#19728 ) Description: Fix embedding check in vdms Contribution maintainer: [@cwlacewe](https://github.com/cwlacewe)	2024-03-28 12:54:24 -07:00
高璟琦	75173d31db	community[minor]: Add solar model chat model (#18556 ) Add our solar chat models, available model choices: * solar-1-mini-chat * solar-1-mini-translate-enko * solar-1-mini-translate-koen More documents and pricing can be found at https://console.upstage.ai/services/solar. The references to our solar model can be found at * https://arxiv.org/abs/2402.17032 --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 12:31:11 -07:00
Davide Menini	f7042321f1	community[patch]: gather token usage info in BedrockChat during generation (#19127 ) This PR allows to calculate token usage for prompts and completion directly in the generation method of BedrockChat. The token usage details are then returned together with the generations, so that other downstream tasks can access them easily. This allows to define a callback for tokens tracking and cost calculation, similarly to what happens with OpenAI (see [OpenAICallbackHandler](https://api.python.langchain.com/en/latest/_modules/langchain_community/callbacks/openai_info.html#OpenAICallbackHandler). I plan on adding a BedrockCallbackHandler later. Right now keeping track of tokens in the callback is already possible, but it requires passing the llm, as done here: https://how.wtf/how-to-count-amazon-bedrock-anthropic-tokens-with-langchain.html. However, I find the approach of this PR cleaner. Thanks for your reviews. FYI @baskaryan, @hwchase17 --------- Co-authored-by: taamedag <Davide.Menini@swisscom.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 18:58:46 +00:00
ligang-super	a662468dde	community[patch]: Fix the error of Baidu Qianfan not passing the stop parameter (#18666 ) - [x] PR title: "community: fix baidu qianfan missing stop parameter" - [x] PR message: - **Description: Baidu Qianfan lost the stop parameter when requesting service due to extracting it from kwargs. This bug can cause the agent to receive incorrect results --------- Co-authored-by: ligang33 <ligang33@baidu.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 18:21:49 +00:00
kaijietti	9c4b6dc979	community[patch]: fix bug in cohere that `async for` a coroutine in ChatCohere (#19381 ) Without `await`, the `stream` returned from the `async_client` is actually a coroutine, which could not be used in `async for`.	2024-03-27 21:34:46 -07:00
Christian Galo	1adaa3c662	community[minor]: Update Azure Cognitive Services to Azure AI Services (#19488 ) This is a follow up to #18371. These are the changes: - New Azure AI Services toolkit and tools to replace those of Azure Cognitive Services. - Updated documentation for Microsoft platform. - The image analysis tool has been rewritten to use the new package `azure-ai-vision-imageanalysis`, doing a proper replacement of `azure-ai-vision`. These changes: - Update outdated naming from "Azure Cognitive Services" to "Azure AI Services". - Update documentation to use non-deprecated methods to create and use agents. - Removes need to depend on yanked python package (`azure-ai-vision`) There is one new dependency that is needed as a replacement to `azure-ai-vision`: - `azure-ai-vision-imageanalysis`. This is optional and declared within a function. There is a new `azure_ai_services.ipynb` notebook showing usage; Changes have been linted and formatted. I am leaving the actions of adding deprecation notices and future removal of Azure Cognitive Services up to the LangChain team, as I am not sure what the current practice around this is. --- If this PR makes it, my handle is @galo@mastodon.social --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-03-28 03:19:02 +00:00
Shengsheng Huang	ac1dd8ad94	community[minor]: migrate `bigdl-llm` to `ipex-llm` (#19518 ) - Description: `bigdl-llm` library has been renamed to [`ipex-llm`](https://github.com/intel-analytics/ipex-llm). This PR migrates the `bigdl-llm` integration to `ipex-llm` . - Issue: N/A. The original PR of `bigdl-llm` is https://github.com/langchain-ai/langchain/pull/17953 - Dependencies: `ipex-llm` library - Contribution maintainer: @shane-huang Updated doc: docs/docs/integrations/llms/ipex_llm.ipynb Updated test: libs/community/tests/integration_tests/llms/test_ipex_llm.py	2024-03-27 20:12:59 -07:00
Chaunte W. Lacewell	a31f692f4e	community[minor]: Add VDMS vectorstore (#19551 ) - Description: Add support for Intel Lab's [Visual Data Management System (VDMS)](https://github.com/IntelLabs/vdms) as a vector store - Dependencies: `vdms` library which requires protobuf = "4.24.2". There is a conflict with dashvector in `langchain` package but conflict is resolved in `community`. - Contribution maintainer: [@cwlacewe](https://github.com/cwlacewe) - Added tests: libs/community/tests/integration_tests/vectorstores/test_vdms.py - Added docs: docs/docs/integrations/vectorstores/vdms.ipynb - Added cookbook: cookbook/multi_modal_RAG_vdms.ipynb --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 03:12:11 +00:00
William FH	b7b62e29fb	community[patch], mongodb[patch]: Stop spamming SIMD import warnings (#19531 ) If you use an embedding dist function in an eval loop, you get warned every time. Would prefer to just check once and forget about it. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 03:11:02 +00:00
yongheng.liu	7e29b6061f	community[minor]: integrate China Mobile Ecloud vector search (#15298 ) - Description: integrate China Mobile Ecloud vector search, - Dependencies: elasticsearch==7.10.1 Co-authored-by: liuyongheng <liuyongheng@cmss.chinamobile.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-27 23:02:40 +00:00
Hyeongchan Kim	9b70131aed	community[patch]: refactor the type hint of `file_path` in `UnstructuredAPIFileLoader` class (#18839 ) * Description: add `None` type for `file_path` along with `str` and `List[str]` types. * `file_path`/`filename` arguments in `get_elements_from_api()` and `partition()` can be `None`, however, there's no `None` type hint for `file_path` in `UnstructuredAPIFileLoader` and `UnstructuredFileLoader` currently. * calling the function with `file_path=None` is no problem, but my IDE annoys me lol. * Issue: N/A * Dependencies: N/A Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-27 22:31:54 +00:00
CaroFG	cf96060ab7	community[patch]: update for compatibility with latest Meilisearch version (#18970 ) - Description: Updates Meilisearch vectorstore for compatibility with v1.6 and above. Adds embedders settings and embedder_name which are now required. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-27 22:08:27 +00:00
chyroc	be2adb1083	community[patch]: support unstructured_kwargs for s3 loader (#15473 ) fix https://github.com/langchain-ai/langchain/issues/15472 Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-27 22:03:48 +00:00
Tomaz Bratanic	87d2a6b777	community[minor]: Add the option to omit schema refresh in Neo4jGraph (#19654 )	2024-03-27 14:20:12 -04:00
Rajendra Kadam	0019d8a948	community[minor]: Add support for non-file-based Document Loaders in PebbloSafeLoader (#19574 ) Description: PebbloSafeLoader: Add support for non-file-based Document Loaders This pull request enhances PebbloSafeLoader by introducing support for several non-file-based Document Loaders. With this update, PebbloSafeLoader now seamlessly integrates with the following loaders: - GoogleDriveLoader - SlackDirectoryLoader - Unstructured EmailLoader Issue: NA Dependencies: - None Twitter handle: @Raj__725 --------- Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	2024-03-27 17:39:52 +00:00
hulitaitai	dc2c9dd4d7	Update text2vec.py (#19657 ) Add that URL of the embedding tool "text2vec". Fix minor mistakes in the doc-string.	2024-03-27 13:13:30 -04:00
Guangdong Liu	7042934b5f	community[patch]: Fix the bug that Chroma does not specify `embedding_function` (#19277 ) - Issue: close #18291 - @baskaryan, @eyurtsev PTAL	2024-03-27 11:43:38 -04:00
yuwenzho	3a7d2cf443	community[minor]: Add ITREX optimized Embeddings (#18474 ) Introduction [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers) is an innovative toolkit designed to accelerate GenAI/LLM everywhere with the optimal performance of Transformer-based models on various Intel platforms Description adding ITREX runtime embeddings using intel-extension-for-transformers. added mdx documentation and example notebooks added embedding import testing. --------- Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-27 07:22:06 +00:00
Fabrizio Ruocco	f12cb0bea4	community[patch]: Microsoft Azure Document Intelligence updates (#16932 ) - Description: Update Azure Document Intelligence implementation by Microsoft team and RAG cookbook with Azure AI Search --------- Co-authored-by: Lu Zhang (AI) <luzhan@microsoft.com> Co-authored-by: Yateng Hong <yatengh@microsoft.com> Co-authored-by: teethache <hongyateng2006@126.com> Co-authored-by: Lu Zhang <44625949+luzhang06@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 23:36:59 -07:00
Timothy	ad77fa15ee	community[patch]: Adding try-except block for GCSDirectoryLoader (#19591 ) - Description: Implemented try-except block for `GCSDirectoryLoader`. Reason: Users processing large number of unstructured files in a folder may experience many different errors. A try-exception block is added to capture these errors. A new argument `use_try_except=True` is added to enable silent failure so that error caused by processing one file does not break the whole function. - Issue: N/A - Dependencies: no new dependencies - Twitter handle: timothywong731 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-27 00:12:24 +00:00
xsai9101	160a8eb178	community[minor]: add oracle autonomous database doc loader integration (#19536 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: Adding oracle autonomous database document loader integration. This will allow users to connect to oracle autonomous database through connection string or TNS configuration. https://www.oracle.com/autonomous-database/ - Issue: None - Dependencies: oracledb python package https://pypi.org/project/oracledb/ - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. Unit test and doc are added. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-26 17:02:18 -07:00
Adam Law	aeb7b6b11d	community[patch]: use semantic_configurations in AzureSearch (#19347 ) - Description: Currently the semantic_configurations are not used when creating an AzureSearch instance, instead creating a new one with default values. This PR changes the behavior to use the passed semantic_configurations if it is present, and the existing default configuration if not. --------- Co-authored-by: Adam Law <adamlaw@microsoft.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-26 13:57:39 -07:00
Adrian Valente	2763d8cbe5	community: add len() implementation to Chroma (#19419 ) Thank you for contributing to LangChain! - [x] Add len() implementation to Chroma: "package: community" - [x] PR message: - Description: add an implementation of the __len__() method for the Chroma vectostore, for convenience. - Issue: no exposed method to know the size of a Chroma vectorstore - Dependencies: None - Twitter handle: lowrank_adrian - [x] Add tests and docs - [x] Lint and test --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 12:53:10 -04:00
Tom Aarsen	e0a1278d2b	docs: HFEmbeddings: Add more information to model_kwargs/encode_kwargs (#19594 ) - Description: Be more explicit with the `model_kwargs` and `encode_kwargs` for `HuggingFaceEmbeddings`. - Issue: - - Dependencies: - I received some reports by my users that they didn't realise that you could change the default `batch_size` with `HuggingFaceEmbeddings`, which may be attributed to how the `model_kwargs` and `encode_kwargs` don't give much information about what you can specify. I've added some parameter names & links to the Sentence Transformers documentation to help clear it up. Let me know if you'd rather have Markdown/Sphinx-style hyperlinks rather than a "bare URL". - Tom Aarsen	2024-03-26 12:46:04 -04:00
Dobiichi-Origami	18e6f9376d	community[Qianfan]: add function_call in additional_kwargs (#19550 ) - Description: add lacked `function_call` field in `additional_kwargs` in previous version - Dependencies: None of new dependency	2024-03-26 12:20:19 -04:00
mwmajewsk	f7a1fd91b8	community: better support of pathlib paths in document loaders (#18396 ) So this arose from the https://github.com/langchain-ai/langchain/pull/18397 problem of document loaders not supporting `pathlib.Path`. This pull request provides more uniform support for Path as an argument. The core ideas for this upgrade: - if there is a local file path used as an argument, it should be supported as `pathlib.Path` - if there are some external calls that may or may not support Pathlib, the argument is immidiately converted to `str` - if there `self.file_path` is used in a way that it allows for it to stay pathlib without conversion, is is only converted for the metadata. Twitter handle: https://twitter.com/mwmajewsk	2024-03-26 11:51:52 -04:00
Yuki Watanabe	cfecbda48b	community[minor]: Allow passing `allow_dangerous_deserialization` when loading LLM chain (#18894 ) ### Issue Recently, the new `allow_dangerous_deserialization` flag was introduced for preventing unsafe model deserialization that relies on pickle without user's notice (#18696). Since then some LLMs like Databricks requires passing in this flag with true to instantiate the model. However, this breaks existing functionality to loading such LLMs within a chain using `load_chain` method, because the underlying loader function [load_llm_from_config](`f96dd57501/libs/langchain/langchain/chains/loading.py (L40)`) (and load_llm) ignores keyword arguments passed in. ### Solution This PR fixes this issue by propagating the `allow_dangerous_deserialization` argument to the class loader iff the LLM class has that field. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 11:07:55 -04:00
hulitaitai	d7c14cb6f9	community[minor]: Add embeddings integration for text2vec (#19267 ) Create a Class which allows to use the "text2vec" open source embedding model. It should install the model by running 'pip install -U text2vec'. Example to call the model through LangChain: from langchain_community.embeddings.text2vec import Text2vecEmbeddings embedding = Text2vecEmbeddings() bookend.embed_documents([ "This is a CoSENT(Cosine Sentence) model.", "It maps sentences to a 768 dimensional dense vector space.", ]) bookend.embed_query( "It can be used for text matching or semantic search." ) --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-03-26 11:06:58 -04:00
Kalyan Mudumby	d27600c6f7	community[patch]: GPTCache pydantic validation error on lookup (#19427 ) Description: this change fixes the pydantic validation error when looking up from GPTCache, the `ChatOpenAI` class returns `ChatGeneration` as response which is not handled. use the existing `_loads_generations` and `_dumps_generations` functions to handle it Trace ``` File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/development/scripts/chatbot-postgres-test.py", line 90, in <module> print(llm.invoke("tell me a joke")) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 166, in invoke self.generate_prompt( File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 544, in generate_prompt return self.generate(prompt_messages, stop=stop, callbacks=callbacks, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 408, in generate raise e File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 398, in generate self._generate_with_cache( File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 585, in _generate_with_cache cache_val = llm_cache.lookup(prompt, llm_string) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_community/cache.py", line 807, in lookup return [ ^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_community/cache.py", line 808, in <listcomp> Generation(generation_dict) for generation_dict in json.loads(res) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/load/serializable.py", line 120, in __init__ super().__init__(**kwargs) File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/pydantic/v1/main.py", line 341, in __init__ raise validation_error pydantic.v1.error_wrappers.ValidationError: 1 validation error for Generation type unexpected value; permitted: 'Generation' (type=value_error.const; given=ChatGeneration; permitted=('Generation',)) ``` Although I don't seem to find any issues here, here's an [issue](https://github.com/zilliztech/GPTCache/issues/585) raised in GPTCache. Please let me know if I need to do anything else Thank you --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 10:52:30 -04:00
Piyush Jain	72ba738bf5	community[minor]: Improvements for NeptuneRdfGraph, Improve discovery of graph schema using database statistics (#19546 ) Fixes linting for PR [19244](https://github.com/langchain-ai/langchain/pull/19244) --------- Co-authored-by: mhavey <mchavey@gmail.com>	2024-03-26 10:36:51 -04:00
Christophe Bornet	8595c3ab59	community[minor]: Add InMemoryVectorStore to module level imports (#19576 )	2024-03-26 14:07:44 +00:00
Aayush Kataria	03c38005cb	community[patch]: Fixing some caching issues for AzureCosmosDBSemanticCache (#18884 ) Fixing some issues for AzureCosmosDBSemanticCache - Added the entry for "AzureCosmosDBSemanticCache" which was missing in langchain/cache.py - Added application name when creating the MongoClient for the AzureCosmosDBVectorSearch, for tracking purposes. @baskaryan, can you please review this PR, we need this to go in asap. These are just small fixes which we found today in our testing.	2024-03-25 19:06:17 -07:00
Clément Tamines	a6cbb755a7	community[patch]: fix semantic answer bug in AzureSearch vector store (#18938 ) - Description: The `semantic_hybrid_search_with_score_and_rerank` method of `AzureSearch` contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets from `AzureSearch` test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use the `FIELDS_METADATA` variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search in `semantic_hybrid_search_with_score_and_rerank`. - Issue: https://github.com/langchain-ai/langchain/issues/18731 - Prior fix to this bug: This bug was fixed in this PR https://github.com/langchain-ai/langchain/pull/15642 by adding a check for the existence of the metadata field named `FIELDS_METADATA` and retrieving a value for the key called "key" in that metadata if it exists. If the field named `FIELDS_METADATA` was not present, an empty string was returned. This fix was removed in this PR https://github.com/langchain-ai/langchain/pull/15659 (see `ed1ffca911`#). @lz-chen: could you confirm this wasn't intentional? - New fix to this bug: I believe there was an oversight in the logic of the fix from [#1564](https://github.com/langchain-ai/langchain/pull/15642) which I explain below. The `semantic_hybrid_search_with_score_and_rerank` method creates a dictionary `semantic_answers_dict` with semantic answers returned by the search as follows. `5c2f7e6b2b/libs/community/langchain_community/vectorstores/azuresearch.py (L574-L581)` The keys in this dictionary are the unique document ids in the index, if I understand the [documentation of semantic answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers) in Azure AI Search correctly. When the method transforms a search result into a `Document` object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between a `Document` object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for the `semantic_answers_dict` dictionary. This id is defined in the search result's field named `FIELDS_ID`. I added a check to avoid any error in case no field named `FIELDS_ID` exists in a search result (which shouldn't happen in theory). A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field. @levalencia could you confirm my analysis and test the fix? @raunakshrivastava7 do you agree with the fix? Thanks for the help!	2024-03-25 18:51:54 -07:00
Anindyadeep	b2a11ce686	community[minor]: Prem AI langchain integration (#19113 ) ### Prem SDK integration in LangChain This PR adds the integration with [PremAI's](https://www.premai.io/) prem-sdk with langchain. User can now access to deployed models (llms/embeddings) and use it with langchain's ecosystem. This PR adds the following: ### This PR adds the following: - [x] Add chat support - [X] Adding embedding support - [X] writing integration tests - [X] writing tests for chat - [X] writing tests for embedding - [X] writing unit tests - [X] writing tests for chat - [X] writing tests for embedding - [X] Adding documentation - [X] writing documentation for chat - [X] writing documentation for embedding - [X] run `make test` - [X] run `make lint`, `make lint_diff` - [X] Final checks (spell check, lint, format and overall testing) --------- Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 01:37:19 +00:00
Souhail Hanfi	cbec43afa9	community[patch]: avoid creating extension PGvector while using readOnly Databases (#19268 ) - Description: PgVector class always runs "create extension" on init and this statement crashes on ReadOnly databases (read only replicas). but wierdly the next create collection etc work even in readOnly databases - Dependencies: no new dependencies - Twitter handle: @VenOmaX666 Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 01:25:01 +00:00
Barun Amalkumar Halder	9246ec6b36	community[patch] : [Fiddler] ensure dataset is not added if model is present (#19293 ) Description: - minor PR to speed up onboarding by not trying to add a dataset, if a model is already present. - replace batch publish API with streaming when single events are published. Dependencies: any dependencies required for this change Twitter handle: behalder Co-authored-by: Barun Halder <barun@fiddler.ai>	2024-03-25 17:28:05 -07:00
JSDu	6e090280fd	community[patch]: milvus will autoflush, manual flush is slowly (#19300 ) reference: https://milvus.io/docs/configure_quota_limits.md#quotaAndLimitsflushRateenabled https://github.com/milvus-io/milvus/issues/31407 Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 00:26:58 +00:00
mackong	e65dc4b95b	community[patch]: clean warning when delete by ids (#19301 ) * Description: rearrange to avoid variable overwrite, which cause warning always. * Issue: N/A * Dependencies: N/A	2024-03-25 17:23:22 -07:00
Stefano Mosconi	01fc69c191	community[patch]: expanding version in confluence loader (#19324 ) Description: Expanding version in all the Confluence API calls so to get when the page was last modified/created in all cases. Issue: #12812 Twitter handle: zzste	2024-03-25 17:08:01 -07:00
Dmitry Tyumentsev	08b769d539	community[patch]: YandexGPT Use recent yandexcloud sdk version (#19341 ) Fixed inability to work with [yandexcloud SDK](https://pypi.org/project/yandexcloud/) version higher 0.265.0	2024-03-25 17:05:57 -07:00
Marlene	f1313339ac	community[patch]: Fixing incorrect base URLs for Azure Cognitive Search Retriever (#19352 ) This PR adds code to make sure that the correct base URL is being created for the Azure Cognitive Search retriever. At the moment an incorrect base URL is being generated. I think this is happening because the original code was based on a depreciated API version. No dependencies need to be added. I've also added more context to the test doc strings. I should also note that ACS is now Azure AI Search. I will open a separate PR to make these changes as that would be a breaking change and should potentially be discussed. Twitter: @marlene_zw - No new tests added, however the current ACS retriever tests are now passing when I run them. - Code was linted. Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 00:04:59 +00:00
FinTech秋田	03ba1d4731	community[patch]: Add Support for GPU Index Types in Milvus 2.4 (#19468 ) - Description: This commit introduces support for the newly available GPU index types introduced in Milvus 2.4 within the LangChain project's `milvus.py`. With the release of Milvus 2.4, a range of GPU-accelerated index types have been added, offering enhanced search capabilities and performance optimizations for vector search operations. This update ensures LangChain users can fully utilize the new performance benefits for vector search operations. - Reference: https://milvus.io/docs/gpu_index.md Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-25 23:39:54 +00:00
Ash Vardanian	d01bad5169	core[patch]: Convert SimSIMD back to NumPy (#19473 ) This patch fixes the #18022 issue, converting the SimSIMD internal zero-copy outputs to NumPy. I've also noticed, that oftentimes `dtype=np.float32` conversion is used before passing to SimSIMD. Which numeric types do LangChain users generally care about? We support `float64`, `float32`, `float16`, and `int8` for cosine distances and `float16` seems reasonable for practically any kind of embeddings and any modern piece of hardware, so we can change that part as well 🤗	2024-03-25 16:36:26 -07:00
Mikelarg	dac2e0165a	community[minor]: Added GigaChat Embeddings support + updated previous GigaChat integration (#19516 ) - Description: Added integration with [GigaChat](https://developers.sber.ru/portal/products/gigachat) embeddings. Also added support for extra fields in GigaChat LLM and fixed docs.	2024-03-25 16:08:37 -07:00
Martin Kolb	e5bdb26f76	community[patch]: More flexible handling for entity names in vector store "HANA Cloud" (#19523 ) - Description: Added support for lower-case and mixed-case names The names for tables and columns previouly had to be UPPER_CASE. With this enhancement, also lower_case and MixedCase are supported, - Issue: N/A - Dependencies: no new dependecies added - Twitter handle: @sapopensource	2024-03-25 15:52:45 -07:00
billytrend-cohere	63343b4987	cohere[patch]: add cohere as a partner package (#19049 ) Description: adds support for langchain_cohere --------- Co-authored-by: Harry M <127103098+harry-cohere@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-03-25 20:23:47 +00:00
ccurme	82de8fd6c9	add kwargs (#19519 ) `HanaDB.add_texts` is missing **kwargs.	2024-03-25 11:56:01 -04:00
Nikhil Kumar	3d3b46a782	docs: Update docs for `HuggingFacePipeline` (#19306 ) Updated `HuggingFacePipeline` docs to be in sync with list of supported tasks, including translation. - [x] PR title: "community: Update docs for `HuggingFacePipeline`" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: - Description: Update docs for `HuggingFacePipeline`, was earlier missing `translation` as a valid task - Issue: N/A - Dependencies: N/A - Twitter handle: None - [x] Add tests and docs: - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/	2024-03-25 00:29:21 -07:00
Igor Muniz Soares	743f888580	community[minor]: Dappier chat model integration (#19370 ) Description: This PR adds [Dappier](https://dappier.com/) for the chat model. It supports generate, async generate, and batch functionalities. We added unit and integration tests as well as a notebook with more details about our chat model. Dependencies: No extra dependencies are needed.	2024-03-25 07:29:05 +00:00
Hugoberry	96dc180883	community[minor]: Add `DuckDB` as a vectorstore (#18916 ) DuckDB has a cosine similarity function along list and array data types, which can be used as a vector store. - Description: The latest version of DuckDB features a cosine similarity function, which can be used with its support for list or array column types. This PR surfaces this functionality to langchain. - Dependencies: duckdb 0.10.0 - Twitter handle: @igocrite --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-25 07:02:35 +00:00
preak95	6ea3e57a63	community[minor]: S3FileLoader to use expose mode and post_processors arguments of unstructured loader (#19270 ) Description: Update s3_file.py to use arguments mode and post_processors from the base class UnstructuredBaseLoader to include more metadata about the files from the S3 bucket such as 'page_number', 'languages' etc. Issue: NA Dependencies: None Twitter handle: preak95 --------- Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-25 06:56:55 +00:00
fengjial	3b52ee05d1	community[patch]: fix bugs in baiduvectordb as vectorstore (#19380 ) fix small bugs in vectorstore/baiduvectordb	2024-03-22 17:03:59 -07:00
aditya thomas	515aab3312	community[patch]: invoke callback prior to yielding token (openai) (#19389 ) Description: Invoke callback prior to yielding token for BaseOpenAI & OpenAIChat Issue: [Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913](https://github.com/langchain-ai/langchain/issues/16913) Dependencies: None	2024-03-22 16:45:55 -07:00
aditya thomas	49e932cd24	community[patch]: invoke callback prior to yielding token (fireworks) (#19388 ) Description: Invoke callback prior to yielding token for Fireworks Issue: [Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913](https://github.com/langchain-ai/langchain/issues/16913) Dependencies: None	2024-03-22 16:44:06 -07:00
Tarun Jain	ef6d3d66d6	community[patch]: docarray requires hnsw installation (#19416 ) I have a small dataset, and I tried to use docarray: ``DocArrayHnswSearch ``. But when I execute, it returns: ```bash raise ImportError( ImportError: Could not import docarray python package. Please install it with `pip install "langchain[docarray]"`. ``` Instead of docarray it needs to be ```bash docarray[hnswlib] ``` Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-22 22:39:07 +00:00
German Swan	d4dc98a9f9	community[patch]: RecursiveUrlLoader: add base_url option (#19421 ) RecursiveUrlLoader does not currently provide an option to set `base_url` other than the `url`, though it uses a function with such an option. For example, this causes it unable to parse the `https://python.langchain.com/docs`, as it returns the 404 page, and `https://python.langchain.com/docs/get_started/introduction` has no child routes to parse. `base_url` allows setting the `https://python.langchain.com/docs` to filter by, while the starting URL is anything inside, that contains relevant links to continue crawling. I understand that for this case, the docusaurus loader could be used, but it's a common issue with many websites. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-22 15:34:31 -07:00
aditya thomas	4856a87261	community[patch]: invoke callback prior to yielding token (llama.cpp) (#19392 ) Description: Invoke callback prior to yielding token for llama.cpp Issue: [Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913](https://github.com/langchain-ai/langchain/issues/16913) Dependencies: None	2024-03-22 16:17:56 -04:00
billytrend-cohere	f6bcd42421	community[patch]: Replace positional argument with text=text for cohere>=5 compatibility (#19407 ) - Description: Replace positional argument with text=text for cohere>=5 compatibility	2024-03-21 10:42:51 -07:00
Yudhajit Sinha	7d216ad1e1	community[patch]: Invoke callback prior to yielding token (titan_takeoff_pro) (#18624 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/titan_takeoff_pro. - Issue: #16913 - Dependencies: None	2024-03-20 07:58:18 -07:00
Yudhajit Sinha	455a74486b	community[patch]: Invoke callback prior to yielding token (sparkllm) (#18625 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/sparkllm. - Issue: #16913 - Dependencies: None	2024-03-20 07:57:53 -07:00
Yudhajit Sinha	5ac1860484	community[patch]: Invoke callback prior to yielding token (replicate) (#18626 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/replicate. - Issue: #16913 - Dependencies: None	2024-03-20 07:57:27 -07:00
Yudhajit Sinha	9525e392de	community[patch]: Invoke callback prior to yielding token (pai_eas_endpoint) (#18627 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/pai_eas_endpoint. - Issue: #16913 - Dependencies: None	2024-03-20 07:56:58 -07:00
Yudhajit Sinha	140f06e59a	community[patch]: Invoke callback prior to yielding token (openai) (#18628 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/openai. - Issue: #16913 - Dependencies: None	2024-03-20 07:56:30 -07:00
Yudhajit Sinha	280a914920	community[patch]: Invoke callback prior to yielding token (ollama) (#18629 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ & _astream_ methods in llms/ollama. - Issue: #16913 - Dependencies: None	2024-03-20 07:56:09 -07:00
Christophe Bornet	00614f332a	community[minor]: Add InMemoryVectorStore (#19326 ) This is a basic VectorStore implementation using an in-memory dict to store the documents. It doesn't need any extra/optional dependency as it uses numpy which is already a dependency of langchain. This is useful for quick testing, demos, examples. Also it allows to write vendor-neutral tutorials, guides, etc...	2024-03-20 10:21:07 -04:00
Nithish Raghunandanan	7ad0a3f2a7	community: add Couchbase Vector Store (#18994 ) - Description: Added support for Couchbase Vector Search to LangChain. - Dependencies: couchbase>=4.1.12 - Twitter handle: @nithishr --------- Co-authored-by: Nithish Raghunandanan <nithishr@users.noreply.github.com>	2024-03-19 12:39:51 -07:00
Christophe Bornet	30e4a35d7a	community: Use langchain-astradb for AstraDB caches (#18419 ) - [x] Needs https://github.com/langchain-ai/langchain-datastax/pull/4 - [x] Needs a new release of langchain-astradb	2024-03-19 14:04:36 -04:00
Vittorio Rigamonti	9b2f9ee952	community: VectorStore Infinispan, adding autoconfiguration (#18967 ) Description: this PR enable VectorStore autoconfiguration for Infinispan: if metadatas are only of basic types, protobuf config will be automatically generated for the user.	2024-03-18 21:33:45 -07:00
gonvee	b82644078e	community: Add `keep_alive` parameter to control how long the model w… (#19005 ) Add `keep_alive` parameter to control how long the model will stay loaded into memory with Ollama。 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-19 04:29:01 +00:00
Harrison Chase	efcdf54edd	Josha91 fix docstring (#19249 ) Co-authored-by: Josha van Houdt <josha.van.houdt@sap.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-18 21:19:56 -07:00
Taqi Jaffri	044bc22acc	Community: Add mistral oss model support to azureml endpoints, plus configurable timeout (#19123 ) - Description: There was no formatter for mistral models for Azure ML endpoints. Adding that, plus a configurable timeout (it was hard coded before) - Dependencies: none - Twitter handle: @tjaffri @docugami	2024-03-18 21:10:42 -07:00
Hamza Muhammad Farooqi	24a0a4472a	Add docstrings for Clickhouse class methods (#19195 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	2024-03-19 04:03:12 +00:00
Rohit Gupta	785f8ab174	[langchain_community] milvus vectorstores upsert: add kwargs to make it use for other argument also (#19193 ) add kwargs in add_documents for upsert, to make it use for other argument also. Lets use this, it was unused as of now. - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. Co-authored-by: Rohit Gupta <rohit.gupta2@walmart.com>	2024-03-18 21:01:12 -07:00
Guangdong Liu	c3310c5e7f	community: Fix Milvus got multiple values for keyword argument 'timeout' (#19232 ) - Description: Fix Milvus got multiple values for keyword argument 'timeout' - Issue: fix #18580 - @baskaryan @eyurtsev PTAL	2024-03-18 20:44:25 -07:00
Leonid Ganeline	7de1d9acfd	community: `llms` imports fixes (#18943 ) Classes are missed in __all__ and in different places of __init__.py - BaichuanLLM - ChatDatabricks - ChatMlflow - Llamafile - Mlflow - Together Added classes to __all__. I also sorted __all__ list. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-03-18 20:24:40 +00:00
Kenzie Mihardja	21f75991d4	deprecate community docugami loader (#19230 ) Thank you for contributing to LangChain! - [x] PR title: "community: deprecate DocugamiLoader" - [x] PR message: Deprecate the langchain_community and use the docugami_langchain DocugamiLoader --------- Co-authored-by: Kenzie Mihardja <kenzie28@cs.washington.edu>	2024-03-18 12:56:47 -07:00
Pengfei Jiang	514fe80778	community[patch]: add stop parameter support to volcengine maas (#19052 ) - Description: add stop parameter to volcengine maas model - Dependencies: no --------- Co-authored-by: 江鹏飞 <jiangpengfei.jiangpf@bytedance.com>	2024-03-17 01:58:50 +00:00
htaoruan	bcc771e37c	docs: ChatTongyi example error (#19013 )	2024-03-17 01:55:56 +00:00
primate88	5aa68936e0	community: Fix import path for StreamingStdOutCallbackHandler example (#19170 ) - Description: - Updated the import path for `StreamingStdOutCallbackHandler` in the streaming response example within `huggingface_endpoint.py`. This change corrects the import statement to reflect the actual location of `StreamingStdOutCallbackHandler` in `langchain_core.callbacks.streaming_stdout`. - Issue: - None - Dependencies: - No additional dependencies are required for this change. - Twitter handle: - None ## Note: I have tested this change locally and confirmed that the `StreamingStdOutCallbackHandler` works as expected with the updated import path. This PR does not require the addition of new tests since it is a correction to documentation/examples rather than functional code.	2024-03-17 00:50:37 +00:00
Nikhil Kumar	635b3372bd	community[minor]: Add support for translation in HuggingFacePipeline (#19190 ) - [x] Support for translation: "community: Add support for translation in `HuggingFacePipeline`" - [x] Add support for translation in `HuggingFacePipeline`: - Description: Add support for translation in `HuggingFacePipeline`, which earlier used to support only text summarization and generation. - Issue: N/A - Dependencies: N/A - Twitter handle: None	2024-03-17 00:48:13 +00:00
k.muto	8d2c34e655	community: Fix all page numbers were the same for _BaseGoogleVertexAISearchRetriever (#19175 ) - Description: - This pull request is to fix a bug where page numbers were not set correctly. In the current code, all chunks share the same metadata object doc_metadata, so the page number is set with the same value for all documents. To fix this, I changed to using separate metadata objects for each chunk. - Issue: - None - Dependencies: - No additional dependencies are required for this change. - Twitter handle: - @eycjur - Test - Even if it's not a bug, there are cases where everything ends up with the same number of pages, so it's very difficult for me to write integration tests.	2024-03-16 22:28:56 +00:00
Cailin Wang	7cd87d2f6a	community: Add `partition` parameter to DashVector (#19023 ) Description: DashVector Add partition parameter Twitter handle: @CailinWang_ --------- Co-authored-by: root <root@Bluedot-AI>	2024-03-16 15:20:30 -07:00
Rodrigo Nogueira	e64cf1aba4	community: Add model argument for maritalk models and better error handling (#19187 )	2024-03-16 15:18:56 -07:00
Sergey Kozlov	1a55e950aa	community[patch]: support fastembed v1 and v2 (#19125 ) Description: #18040 forces `fastembed>2.0`, and this causes dependency conflicts with the new `unstructured` package (different `onnxruntime`). There may be other dependency conflicts.. The only way to use `langchain-community>=0.0.28` is rollback to `unstructured 0.10.X`. But new `unstructured` contains many fixes. This PR allows to use both `fastembed` `v1` and `v2`. How to reproduce: `pyproject.toml`: ```toml [tool.poetry] name = "depstest" version = "0.0.0" description = "test" authors = ["<dev@example.org>"] [tool.poetry.dependencies] python = ">=3.10,<3.12" langchain-community = "^0.0.28" fastembed = "^0.2.0" unstructured = {extras = ["pdf"], version = "^0.12"} ``` ```bash $ poetry lock ``` Co-authored-by: Sergey Kozlov <sergey.kozlov@ludditelabs.io>	2024-03-15 18:33:51 -07:00
高远	ef9813dae6	docs: add vikingdb docstrings(#19016 ) Co-authored-by: gaoyuan <gaoyuan.20001218@bytedance.com>	2024-03-15 16:29:29 -07:00
wulixuan	0e0030f494	community[patch]: fix yuan2 chat model errors while invoke. (#19015 ) 1. fix yuan2 chat model errors while invoke. 2. update related tests. 3. fix some deprecationWarning.	2024-03-15 16:28:36 -07:00
Shuai Liu	c244e1a50b	community[patch]: Fixed bug in merging `generation_info` during chunk concatenation in Tongyi and ChatTongyi (#19014 ) - Description: In #16218 , during the `GenerationChunk` and `ChatGenerationChunk` concatenation, the `generation_info` merging changed from simple keys & values replacement to using the util method [`merge_dicts`](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/utils/_merge.py): ![image](https://github.com/langchain-ai/langchain/assets/2098020/10f315bf-7fe0-43a7-a0ce-6a3834b99a15) The `merge_dicts` method could not handle merging values of `int` or some other types, and would raise a [`TypeError`](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/utils/_merge.py#L55). This PR fixes this issue in the Tongyi and ChatTongyi Model by adopting the `generation_info` of the last chunk and discarding the `generation_info` of the intermediate chunks, ensuring that `stream` and `astream` function correctly. - Issue: - Related issues or PRs about Tongyi & ChatTongyi: #16605, #17105 - Other models or cases: #18441, #17376 - Dependencies: No new dependencies	2024-03-15 16:27:53 -07:00
Christophe Bornet	f2a7dda4bd	community[patch]: Use langchain-astradb for AstraDB doc loader (#19071 ) Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-15 22:57:25 +00:00
Holt Skinner	cee03630d9	community[patch]: Add Blended Search Support to `GoogleVertexAISearchRetriever` (#19082 ) https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es#multi-data-stores --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-15 22:39:31 +00:00
case-k	ebc4a64f9e	docs: fix databricks document url (#19096 ) Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-15 22:25:11 +00:00
Guangdong Liu	cced3eb9bc	community[patch]: Fix sparkllm embeddings api bug. (#19122 ) - Description: Fix sparkllm embeddings api bug. @baskaryan PTAL	2024-03-15 15:08:49 -07:00
kaijietti	c20aeef79a	community[patch]: implement qdrant _aembed_query and use it in other async funcs (#19155 ) `amax_marginal_relevance_search ` and `asimilarity_search_with_score ` should use an async version of `_embed_query `.	2024-03-15 21:20:12 +00:00
Barun Amalkumar Halder	34d6f0557d	community[patch] : publishes duration as milliseconds to Fiddler (#19166 ) Description: Many LLM steps complete in sub-second duration, which can lead to non-collection of duration field for Fiddler. This PR updates duration from seconds to milliseconds. Issue: [INTERNAL] FDL-17568 Dependencies: NA Twitter handle: behalder Co-authored-by: Barun Halder <barun@fiddler.ai>	2024-03-15 14:04:56 -07:00
Barun Amalkumar Halder	b551d49cf5	community[patch] : adds feedback and status for Fiddler callback handler events (#19157 ) Description: This PR adds updates the fiddler events schema to also pass user feedback, and llm status to fiddler Tickets: [INTERNAL] FDL-17559 Dependencies: NA Twitter handle: behalder Co-authored-by: Barun Halder <barun@fiddler.ai>	2024-03-15 12:03:49 -07:00
Juan Felipe Arias	f5b9aedc48	community[patch]: add args_schema to sql_database tools for langGraph integration (#18595 ) - Description: This modification adds pydantic input definition for sql_database tools. This helps for function calling capability in LangGraph. Since actions nodes will usually check for the args_schema attribute on tools, This update should make these tools compatible with it (only implemented on the InfoSQLDatabaseTool) - Issue: N/A - Dependencies: N/A - Twitter handle: juanfe8881	2024-03-15 19:03:36 +00:00
fengjial	c922ea36cb	community[minor]: Add Baidu VectorDB as vector store (#17997 ) Co-authored-by: fengjialin <fengjialin@MacBook-Pro.local>	2024-03-15 19:01:58 +00:00
Erick Friis	781aee0068	community, langchain, infra: revert store extended test deps outside of poetry (#19153 ) Reverts langchain-ai/langchain#18995 Because it makes installing dependencies in python 3.11 extended testing take 80 minutes	2024-03-15 17:10:47 +00:00
Erick Friis	9e569d85a4	community, langchain, infra: store extended test deps outside of poetry (#18995 ) poetry can't reliably handle resolving the number of optional "extended test" dependencies we have. If we instead just rely on pip to install extended test deps in CI, this isn't an issue.	2024-03-15 05:55:30 +00:00
Erick Friis	7ce81eb6f4	voyageai[patch]: init package (#19098 ) Co-authored-by: fodizoltan <zoltan@conway.expert> Co-authored-by: Yujie Qian <thomasq0809@gmail.com> Co-authored-by: fzowl <160063452+fzowl@users.noreply.github.com>	2024-03-15 00:56:10 +00:00
billytrend-cohere	7253b816cc	community: Add support for cohere SDK v5 (keeps v4 backwards compatibility) (#19084 ) - Description: Add support for cohere SDK v5 (keeps v4 backwards compatibility) --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-03-14 15:53:24 -07:00
Eugene Yurtsev	6cdca4355d	community[minor]: Revamp PGVector Filtering (#18992 ) This PR makes the following updates in the pgvector database: 1. Use JSONB field for metadata instead of JSON 2. Update operator syntax to include required `$` prefix before the operators (otherwise there will be name collisions with fields) 3. The change is non-breaking, old functionality is still the default, but it will emit a deprecation warning 4. Previous functionality has bugs associated with comparisons due to casting to text (so lexical ordering is used incorrectly for numeric fields) 5. Adds an a GIN index on the JSONB field for more efficient querying	2024-03-14 16:56:00 -04:00
Anton Parkhomenko	ae73b9d839	community[patch]: Fix NotionDBLoader 400 Error by conditionally adding filter parameter (#19075 ) - Description: This change fixes a bug where attempts to load data from Notion using the NotionDBLoader resulted in a 400 Bad Request error. The issue was traced to the unconditional addition of an empty 'filter' object in the request payload, which Notion's API does not accept. The modification ensures that the 'filter' object is only included in the payload when it is explicitly provided and not empty, thus preventing the 400 error from occurring. - Issue: Fixes [#18009](https://github.com/langchain-ai/langchain/issues/18009) - Dependencies: None - Twitter handle: @gunnzolder Co-authored-by: Anton Parkhomenko <anton@merge.rocks>	2024-03-14 13:56:57 +00:00
Leonid Ganeline	9c8523b529	community[patch]: flattening imports 3 (#18939 ) @eyurtsev	2024-03-12 15:18:54 -07:00
Dobiichi-Origami	471f2ed40a	community[patch]: re-arrange the addtional_kwargs of returned qianfan structure to avoid _merge_dict issue (#18889 ) fix issue: https://github.com/langchain-ai/langchain/issues/18441 PTAL, thanks @baskaryan, @efriis, @eyurtsev, @hwchase17. --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-12 05:43:56 +00:00
Tymofii	0bec1f6877	commnity[patch]: refactor code for faiss vectorstore, update faiss vectorstore documentation (#18092 ) Description: Refactor code of FAISS vectorcstore and update the related documentation. Details: - replace `.format()` with f-strings for strings formatting; - refactor definition of a filtering function to make code more readable and more flexible; - slightly improve efficiency of `max_marginal_relevance_search_with_score_by_vector` method by removing unnecessary looping over the same elements; - slightly improve efficiency of `delete` method by using set data structure for checking if the element was already deleted; Issue: fix small inconsistency in the documentation (the old example was incorrect and unappliable to faiss vectorstore) Dependencies: basic langchain-community dependencies and `faiss` (for CPU or for GPU) Twitter handle: antonenkodev	2024-03-11 22:33:03 -07:00
Leonid Ganeline	11195cfa42	community[patch]: speed up import times in the community package (#18928 ) This PR speeds up import times in the community package	2024-03-11 16:37:36 -04:00
Virat Singh	cafffe8a21	community: Add PolygonAggregates tool (#18882 ) Description: In this PR, I am adding a `PolygonAggregates` tool, which can be used to get historical stock price data (called aggregates by Polygon) for a given ticker. Polygon [docs](https://polygon.io/docs/stocks/get_v2_aggs_ticker__stocksticker__range__multiplier___timespan___from___to) for this endpoint. Twitter: [@virattt](https://twitter.com/virattt)	2024-03-11 11:58:10 -07:00
Mohammad Mohtashim	43db4cd20e	core[major]: On Tool End Observation Casting Fix (#18798 ) This PR updates the on_tool_end handlers to return the raw output from the tool instead of casting it to a string. This is technically a breaking change, though it's impact is expected to be somewhat minimal. It will fix behavior in `astream_events` as well. Fixes the following issue #18760 raised by @eyurtsev --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-03-11 10:59:04 -04:00
Massimiliano Pronesti	8113d612bb	community[patch]: support modin document loader (#18866 ) Langchain community document loaders support `pyspark`, `polars`, and `pandas` dataframes but not `modin`'s. This PR addresses this point.	2024-03-10 18:40:04 -07:00
Pol Ruiz Farre	a7f63d8cb4	community[patch]: Fix BasePDFLoader suffix for s3 presigned urls (#18844 ) BasePDFLoader doesn't parse the suffix of the file correctly when parsing S3 presigned urls. This fix enables the proper detection and parsing of S3 presigned URLs to prevent errors such as `OSError: [Errno 36] File name too long`. No additional dependencies required.	2024-03-11 00:58:51 +00:00
Joshua Carroll	ddaf9de169	community: Fix bug with StreamlitChatMessageHistory (#18834 ) - Description: Fix Streamlit bug which was introduced by https://github.com/langchain-ai/langchain/pull/18250, update integration test - Issue: https://github.com/langchain-ai/langchain/issues/18684 - Dependencies: None	2024-03-09 13:42:22 -08:00
Tomaz Bratanic	a28be31a96	Switch to md5 for deduplication in neo4j integrations (#18846 ) Deduplicate documents using MD5 of the page_content. Also allows for custom deduplication with graph ingestion method by providing metadata id attribute --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2024-03-09 13:28:55 -08:00
Leonid Ganeline	476d6dc596	community[patch]: Use getattr for `toolkits` imports (#18825 ) This will preserve the namespace, without actually loading the underlying packages on init.	2024-03-08 20:54:28 -05:00
Luis Antonio Vieira Junior	67c880af74	community[patch]: adding linearization config to AmazonTextractPDFLoader (#17489 ) - Description: Adding an optional parameter `linearization_config` to the `AmazonTextractPDFLoader` so the caller can define how the output will be linearized, instead of forcing a predefined set of linearization configs. It will still have a default configuration as this will be an optional parameter. - Issue: #17457 - Dependencies: The same ones that already exist for `AmazonTextractPDFLoader` - Twitter handle: [@lvieirajr19](https://twitter.com/lvieirajr19) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-08 17:25:22 -08:00
Anis ZAKARI	37e89ba5b1	community[patch]: Bedrock add support for mistral models (#18756 ) Description*: My previous [PR](https://github.com/langchain-ai/langchain/pull/18521) was mistakenly closed, so I am reopening this one. Context: AWS released two Mistral models on Bedrock last Friday (March 1, 2024). This PR includes some code adjustments to ensure their compatibility with the Bedrock class. --------- Co-authored-by: Anis ZAKARI <anis.zakari@hymaia.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-03-09 01:20:38 +00:00
Keith Chan	914af69b44	community[patch]: Update azuresearch vectorstore from_texts() method to include fields argument (#17661 ) - Description: Update azuresearch vectorstore from_texts() method to include fields argument, necessary for creating an Azure AI Search index with custom fields. - Issue: Currently index fields are fixed to default fields if Azure Search index is created using from_texts() method - Dependencies: None - Twitter handle: None --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-08 17:05:35 -08:00
al1p	46f0cea2b9	community[patch][: improved the suffix prompt to avoid loop (#17791 ) Small improvement to the openapi prompt. The agent was not finding the server base URL (looping through all nodes). This small change narrows the search and enables finding the url faster. No dependency Twitter : @al1pra	2024-03-08 16:53:09 -08:00
Théo LEBRUN	cf94091cd0	community[patch]: Skip nested directories when using S3DirectoryLoader (#17829 ) - Description: `S3DirectoryLoader` is failing if prefix is a folder (ex: `my_folder/`) because `S3FileLoader` will try to load that folder and will fail. This PR skip nested directories so prefix can be set to folder instead of `my_folder/files_prefix`. - Issue: - #11917 - #6535 - #4326 - Dependencies: none - Twitter handle: @Falydoor - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/	2024-03-08 16:50:58 -08:00
Venkatesan	7a18b63dbf	community[patch]: Mongo index creation (#17748 ) - [ ] Title: Mongodb: MongoDB connection performance improvement. - [ ] Message: - Description: I made collection index_creation as optional. Index Creation is one time process. - Issue: MongoDBChatMessageHistory class object is attempting to create an index during connection, causing each request to take longer than usual. This should be optional with a parameter. - Dependencies: N/A - Branch to be checked: origin/mongo_index_creation --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-08 16:43:17 -08:00
wt3639	5b5b37a999	community[patch]: Add embedding instruction to HuggingFaceBgeEmbeddings (#18017 ) - Description: Add embedding instruction to HuggingFaceBgeEmbeddings, so that it can be compatible with nomic and other models that need embedding instruction. --------- Co-authored-by: Tao Wu <tao.wu@rwth-aachen.de> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-08 16:39:29 -08:00
Ishani Vyas	2b0cbd65ba	community[patch]: Add Passio Nutrition AI Food Search Tool to Community Package (#18278 ) ## Add Passio Nutrition AI Food Search Tool to Community Package ### Description We propose adding a new tool to the `community` package, enabling integration with Passio Nutrition AI for food search functionality. This tool will provide a simple interface for retrieving nutrition facts through the Passio Nutrition AI API, simplifying user access to nutrition data based on food search queries. ### Implementation Details - Class Structure: Implement `NutritionAI`, extending `BaseTool`. It includes an `_run` method that accepts a query string and, optionally, a `CallbackManagerForToolRun`. - API Integration: Use `NutritionAIAPI` for the API wrapper, encapsulating all interactions with the Passio Nutrition AI and providing a clean API interface. - Error Handling: Implement comprehensive error handling for API request failures. ### Expected Outcome - User Benefits: Enable easy querying of nutrition facts from Passio Nutrition AI, enhancing the utility of the `langchain_community` package for nutrition-related projects. - Functionality: Provide a straightforward method for integrating nutrition information retrieval into users' applications. ### Dependencies - `langchain_core` for base tooling support - `pydantic` for data validation and settings management - Consider `requests` or another HTTP client library if not covered by `NutritionAIAPI`. ### Tests and Documentation - Unit Tests: Include tests that mock network interactions to ensure tool reliability without external API dependency. - Documentation: Create an example notebook in `docs/docs/integrations/tools/passio_nutrition_ai.ipynb` showing usage, setup, and example queries. ### Contribution Guidelines Compliance - Adhere to the project's linting and formatting standards (`make format`, `make lint`, `make test`). - Ensure compliance with LangChain's contribution guidelines, particularly around dependency management and package modifications. ### Additional Notes - Aim for the tool to be a lightweight, focused addition, not introducing significant new dependencies or complexity. - Potential future enhancements could include caching for common queries to improve performance. ### Twitter Handle - Here is our Passio AI [twitter handle](https://twitter.com/@passio_ai) where we announce our products. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	2024-03-08 20:33:22 +00:00
Kushagra	b1f22bf76c	community[minor]: added a feature to filter documents in Mongoloader (#18253 ) "community: added a feature to filter documents in Mongoloader" - Description: added a feature to filter documents in Mongoloader - Feature: the feature #18251 - Dependencies: No - Twitter handle: https://twitter.com/im_Kushagra	2024-03-08 12:06:35 -08:00
Christophe Bornet	e54a49b697	community[minor]: Add lazy_table_reflection param to SqlDatabase (#18742 ) For some DBs with lots of tables, reflection of all the tables can take very long. So this change will make the tables be reflected lazily when get_table_info() is called and `lazy_table_reflection` is True.	2024-03-08 14:10:23 -05:00
Christophe Bornet	ead2a74806	community: Implement lazy_load() for JSONLoader (#18643 ) Covered by `tests/unit_tests/document_loaders/test_json_loader.py`	2024-03-08 13:58:17 -05:00
Phat Vo	3ecb903d49	community[patch] : Tidy up and update Clarifai SDK functions (#18314 ) Description : * Tidy up, add missing docstring and fix unused params * Enable using session token	2024-03-07 19:47:44 -08:00
Tomaz Bratanic	4bfe888717	comunity[patch]: Fix neo4j sanitizing values (#18750 ) Fixing sanitization for when deeply nested lists appear	2024-03-07 19:21:52 -08:00
Yunmo Koo	fee6f983ef	community[minor]: Integration for `Friendli` LLM and `ChatFriendli` ChatModel. (#17913 ) ## Description - Add [Friendli](https://friendli.ai/) integration for `Friendli` LLM and `ChatFriendli` chat model. - Unit tests and integration tests corresponding to this change are added. - Documentations corresponding to this change are added. ## Dependencies - Optional dependency [`friendli-client`](https://pypi.org/project/friendli-client/) package is added only for those who use `Frienldi` or `ChatFriendli` model. ## Twitter handle - https://twitter.com/friendliai	2024-03-08 02:20:47 +00:00
Smit Parmar	aed46cd6f2	community[patch]: Added support for filter out AWS Kendra search by score confidence (#12920 ) Description: It will add support for filter out kendra search by score confidence which will make result more accurate. For example ``` retriever = AmazonKendraRetriever( index_id=kendra_index_id, top_k=5, region_name=region, score_confidence="HIGH" ) ``` Result will not include the records which has score confidence "LOW" or "MEDIUM". Relevant docs https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kendra/client/query.html https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kendra/client/retrieve.html Issue: the issue # it resolve #11801 twitter: [@SmitCode](https://twitter.com/SmitCode)	2024-03-07 17:28:09 -08:00
Ian	390ef6abe3	community[minor]: Add Initial Support for TiDB Vector Store (#15796 ) This pull request introduces initial support for the TiDB vector store. The current version is basic, laying the foundation for the vector store integration. While this implementation provides the essential features, we plan to expand and improve the TiDB vector store support with additional enhancements in future updates. Upcoming Enhancements: * Support for Vector Index Creation: To enhance the efficiency and performance of the vector store. * Support for max marginal relevance search. * Customized Table Structure Support: Recognizing the need for flexibility, we plan for more tailored and efficient data store solutions. Simple use case exmaple ```python from typing import List, Tuple from langchain.docstore.document import Document from langchain_community.vectorstores import TiDBVectorStore from langchain_openai import OpenAIEmbeddings db = TiDBVectorStore.from_texts( embedding=embeddings, texts=['Andrew like eating oranges', 'Alexandra is from England', 'Ketanji Brown Jackson is a judge'], table_name="tidb_vector_langchain", connection_string=tidb_connection_url, distance_strategy="cosine", ) query = "Can you tell me about Alexandra?" docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query) for doc, score in docs_with_score: print("-" * 80) print("Score: ", score) print(doc.page_content) print("-" * 80) ```	2024-03-07 17:18:20 -08:00
Bagatur	3b1eb1f828	community[patch]: chat hf typing fix (#18693 )	2024-03-07 17:06:38 -08:00
Eugene Yurtsev	e188d4ecb0	Add dangerous parameter to requests tool (#18697 ) The tools are already documented as dangerous. Not clear whether adding an opt-in parameter is necessary or not	2024-03-07 15:10:56 -05:00
Erick Friis	1beb84b061	community[patch]: move pdf text tests to integration (#18746 )	2024-03-07 10:34:22 -08:00
Christophe Bornet	6cd7607816	community[patch]: Implement lazy_load() for MHTMLLoader (#18648 ) Covered by `tests/unit_tests/document_loaders/test_mhtml.py`	2024-03-07 11:50:18 -05:00
axiangcoding	9745b5894d	community[patch]: Chroma use uuid4 instead of uuid1 to generate random ids (#18723 ) - Description: Chroma use uuid4 instead of uuid1 as random ids. Use uuid1 may leak mac address, changing to uuid4 will not cause other effects. - Issue: None - Dependencies: None - Twitter handle: None	2024-03-07 11:48:25 -05:00
Guangdong Liu	ced5e7bae7	community[patch]: Fix sparkllm authentication problem. (#18651 ) - Description: fix sparkllm authentication problem.The current timestamp is in RFC1123 format. The time deviation must be controlled within 300s. I changed to re-obtain the url every time I ask a question. https://www.xfyun.cn/doc/spark/general_url_authentication.html#_1-2-%E9%89%B4%E6%9D%83%E5%8F%82%E6%95%B0	2024-03-06 18:43:16 -08:00
Piyush Jain	2b234a4d96	Support for claude v3 models. (#18630 ) Fixes #18513. ## Description This PR attempts to fix the support for Anthropic Claude v3 models in BedrockChat LLM. The changes here has updated the payload to use the `messages` format instead of the formatted text prompt for all models; `messages` API is backwards compatible with all models in Anthropic, so this should not break the experience for any models. ## Notes The PR in the current form does not support the v3 models for the non-chat Bedrock LLM. This means, that with these changes, users won't be able to able to use the v3 models with the Bedrock LLM. I can open a separate PR to tackle this use-case, the intent here was to get this out quickly, so users can start using and test the chat LLM. The Bedrock LLM classes have also grown complex with a lot of conditions to support various providers and models, and is ripe for a refactor to make future changes more palatable. This refactor is likely to take longer, and requires more thorough testing from the community. Credit to PRs [18579](https://github.com/langchain-ai/langchain/pull/18579) and [18548](https://github.com/langchain-ai/langchain/pull/18548) for some of the code here. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-03-06 15:46:18 -08:00
Sam Khano	1b4dcf22f3	community[minor]: Add DocumentDBVectorSearch VectorStore (#17757 ) Description: - Added Amazon DocumentDB Vector Search integration (HNSW index) - Added integration tests - Updated AWS documentation with DocumentDB Vector Search instructions - Added notebook for DocumentDB integration with example usage --------- Co-authored-by: EC2 Default User <ec2-user@ip-172-31-95-226.ec2.internal>	2024-03-06 15:11:34 -08:00
Vittorio Rigamonti	51f3902bc4	community[minor]: Adding support for Infinispan as VectorStore (#17861 ) Description: This integrates Infinispan as a vectorstore. Infinispan is an open-source key-value data grid, it can work as single node as well as distributed. Vector search is supported since release 15.x For more: [Infinispan Home](https://infinispan.org) Integration tests are provided as well as a demo notebook	2024-03-06 15:11:02 -08:00
Max Jakob	cca0167917	elasticsearch[patch], community[patch]: update references, deprecate community classes (#18506 ) Follow up on https://github.com/langchain-ai/langchain/pull/17467. - Update all references to the Elasticsearch classes to use the partners package. - Deprecate community classes. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-06 15:09:12 -08:00
Djordje	12b4a4d860	community[patch]: Opensearch delete method added - indexing supported (#18522 ) - Description: Added delete method for OpenSearchVectorSearch, therefore indexing supported - Issue: No - Dependencies: No - Twitter handle: stkbmf	2024-03-06 15:08:47 -08:00
Christophe Bornet	db8db6faae	community: Implement lazy_load() for PlaywrightURLLoader (#18676 ) Integration tests: `tests/integration_tests/document_loaders/test_url_playwright.py`	2024-03-06 16:52:13 -05:00
Aaron Yi	c092db862e	community[patch]: make metadata and text optional as expected in DocArray (#18678 ) ValidationError: 2 validation errors for DocArrayDoc text Field required [type=missing, input_value={'embedding': [-0.0191128...9, 0.01005221541175212]}, input_type=dict] For further information visit https://errors.pydantic.dev/2.5/v/missing metadata Field required [type=missing, input_value={'embedding': [-0.0191128...9, 0.01005221541175212]}, input_type=dict] For further information visit https://errors.pydantic.dev/2.5/v/missing ``` In the `_get_doc_cls` method, the `DocArrayDoc` class is defined as follows: ```python class DocArrayDoc(BaseDoc): text: Optional[str] embedding: Optional[NdArray] = Field(**embeddings_params) metadata: Optional[dict] ```	2024-03-06 16:51:41 -05:00
Eugene Yurtsev	4c25b49229	community[major]: breaking change in some APIs to force users to opt-in for pickling (#18696 ) This is a PR that adds a dangerous load parameter to force users to opt in to use pickle. This is a PR that's meant to raise user awareness that the pickling module is involved.	2024-03-06 16:43:01 -05:00
Eugene Yurtsev	0e52961562	community[patch]: Patch tdidf retriever (CVE-2024-2057) (#18695 ) This is a patch for `CVE-2024-2057`: https://www.cve.org/CVERecord?id=CVE-2024-2057 This affects users that: * Use the `TFIDFRetriever` * Attempt to de-serialize it from an untrusted source that contains a malicious payload	2024-03-06 15:49:04 -05:00
Christophe Bornet	ea141511d8	core: Move document loader interfaces to core (#17723 ) This is needed to be able to move document loaders to partner packages. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-03-06 13:59:00 -05:00
Christophe Bornet	5985454269	Merge pull request #18539 * Implement lazy_load() for GitLoader	2024-03-06 13:25:14 -05:00
Christophe Bornet	9a6f7e213b	Merge pull request #18423 * Implement lazy_load() for BSHTMLLoader	2024-03-06 13:25:01 -05:00
Christophe Bornet	b3a0c44838	Merge pull request #18673 * Implement lazy_load() for PDFMinerPDFasHTMLLoader and PyMuPDFLoader	2024-03-06 13:24:36 -05:00
Christophe Bornet	68fc0cf909	Merge pull request #18674 * Implement lazy_load() for TextLoader	2024-03-06 13:23:42 -05:00
Christophe Bornet	5b92f962f1	Merge pull request #18671 * Implement lazy_load() for MastodonTootsLoader	2024-03-06 13:23:14 -05:00
Christophe Bornet	15b1770326	Merge pull request #18421 * Implement lazy_load() for AssemblyAIAudioTranscriptLoader	2024-03-06 13:16:05 -05:00
Christophe Bornet	bb284eebe4	Merge pull request #18436 * Implement lazy_load() for ConfluenceLoader	2024-03-06 13:15:24 -05:00
Christophe Bornet	691480f491	Merge pull request #18647 * Implement lazy_load() for UnstructuredBaseLoader	2024-03-06 13:13:10 -05:00
Christophe Bornet	52ac67c5d8	Merge pull request #18654 * Implement lazy_load() for ObsidianLoader	2024-03-06 13:06:55 -05:00
Christophe Bornet	b9c0cf9025	Merge pull request #18656 * Implement lazy_load() for PsychicLoader	2024-03-06 13:05:04 -05:00
Christophe Bornet	aa7ac57b67	community: Implement lazy_load() for TrelloLoader (#18658 ) Covered by `tests/unit_tests/document_loaders/test_trello.py`	2024-03-06 13:04:36 -05:00
Christophe Bornet	302985fea1	community: Implement lazy_load() for SlackDirectoryLoader (#18675 ) Integration tests: `tests/integration_tests/document_loaders/test_slack.py`	2024-03-06 13:04:13 -05:00
Christophe Bornet	ed36f9f604	community: Implement lazy_load() for WhatsAppChatLoader (#18677 ) Integration test: `tests/integration_tests/document_loaders/test_whatsapp_chat.py`	2024-03-06 13:03:46 -05:00
Christophe Bornet	f414f5cdb9	community[minor]: Implement lazy_load() for WikipediaLoader (#18680 ) Integration test: `tests/integration_tests/document_loaders/test_wikipedia.py`	2024-03-06 13:03:21 -05:00
Christophe Bornet	1100f8de7a	community[minor]: Implement lazy_load() for ArxivLoader (#18664 ) Integration tests: `tests/integration_tests/utilities/test_arxiv.py` and `tests/integration_tests/document_loaders/test_arxiv.py`	2024-03-06 09:16:49 -05:00
Christophe Bornet	2d96803ddd	community[minor]: Implement lazy_load() for OutlookMessageLoader (#18668 ) Integration test: `tests/integration_tests/document_loaders/test_email.py`	2024-03-06 09:15:57 -05:00
Christophe Bornet	ae167fb5b2	community[minor]: Implement lazy_load() for SitemapLoader (#18667 ) Integration tests: `test_sitemap.py` and `test_docusaurus.py`	2024-03-06 09:15:35 -05:00
Christophe Bornet	623dfcc55c	community[minor]: Implement lazy_load() for FacebookChatLoader (#18669 ) Integration test: `tests/integration_tests/document_loaders/test_facebook_chat.py`	2024-03-06 09:15:00 -05:00
Christophe Bornet	20794bb889	community[minor]: Implement lazy_load() for GitbookLoader (#18670 ) Integration test: `tests/integration_tests/document_loaders/test_gitbook.py`	2024-03-06 09:14:36 -05:00
Liang Zhang	81985b31e6	community[patch]: Databricks SerDe uses cloudpickle instead of pickle (#18607 ) - Description: Databricks SerDe uses cloudpickle instead of pickle when serializing a user-defined function transform_input_fn since pickle does not support functions defined in `__main__`, and cloudpickle supports this. - Dependencies: cloudpickle>=2.0.0 Added a unit test.	2024-03-05 18:04:45 -08:00
Christophe Bornet	7d6de96186	community[patch]: Implement lazy_load() for CubeSemanticLoader (#18535 ) Covered by `test_cube_semantic.py`	2024-03-05 17:32:31 -08:00
Christophe Bornet	a6b5d45e31	community[patch]: Implement lazy_load() for EverNoteLoader (#18538 ) Covered by `test_evernote_loader.py`	2024-03-05 17:29:52 -08:00
Sunchao Wang	dc81dba6cf	community[patch]: Improve amadeus tool and doc (#18509 ) Description: This pull request addresses two key improvements to the langchain repository: Fix for Crash in Flight Search Interface: Previously, the code would crash when encountering a failure scenario in the flight ticket search interface. This PR resolves this issue by implementing a fix to handle such scenarios gracefully. Now, the code handles failures in the flight search interface without crashing, ensuring smoother operation. Documentation Update for Amadeus Toolkit: Prior to this update, examples provided in the documentation for the Amadeus Toolkit were unable to run correctly due to outdated information. This PR includes an update to the documentation, ensuring that all examples can now be executed successfully. With this update, users can effectively utilize the Amadeus Toolkit with accurate and functioning examples. These changes aim to enhance the reliability and usability of the langchain repository by addressing issues related to error handling and ensuring that documentation remains up-to-date and actionable. Issue: https://github.com/langchain-ai/langchain/issues/17375 Twitter Handle: SingletonYxx	2024-03-05 16:17:22 -08:00
Christophe Bornet	f77f7dc3ec	community[patch]: Fix VectorStoreQATool (#18529 ) Fix #18460	2024-03-05 15:56:58 -08:00
Dounx	ad48f55357	community[minor]: add Yuque document loader (#17924 ) This pull request support loading documents from Yuque with Langchain. Yuque is a professional cloud-based knowledge base for team collaboration in documentation. Website: https://www.yuque.com OpenAPI: https://www.yuque.com/yuque/developer/openapi	2024-03-05 15:54:07 -08:00
Kazuki Maeda	60c5d964a8	community[minor]: use jq schema for content_key in json_loader (#18003 ) ### Description Changed the value specified for `content_key` in JSONLoader from a single key to a value based on jq schema. I created [similar PR](https://github.com/langchain-ai/langchain/pull/11255) before, but it has several conflicts because of the architectural change associated stable version release, so I re-create this PR to fit new architecture. ### Why For json data like the following, specify `.data[].attributes.message` for page_content and `.data[].attributes.id` or `.data[].attributes.attributes. tags`, etc., the `content_key` must also parse the json structure. <details> <summary>sample json data</summary> ```json { "data": [ { "attributes": { "message": "message1", "tags": [ "tag1" ] }, "id": "1" }, { "attributes": { "message": "message2", "tags": [ "tag2" ] }, "id": "2" } ] } ``` </details> <details> <summary>sample code</summary> ```python def metadata_func(record: dict, metadata: dict) -> dict: metadata["source"] = None metadata["id"] = record.get("id") metadata["tags"] = record["attributes"].get("tags") return metadata sample_file = "sample1.json" loader = JSONLoader( file_path=sample_file, jq_schema=".data[]", content_key=".attributes.message", ## content_key is parsable into jq schema is_content_key_jq_parsable=True, ## this is added parameter metadata_func=metadata_func ) data = loader.load() data ``` </details> ### Dependencies none ### Twitter handle [kzk_maeda](https://twitter.com/kzk_maeda)	2024-03-05 15:51:24 -08:00
Hech	6a08134661	community[patch], langchain[minor]: Add retriever self_query and score_threshold in DingoDB (#18106 )	2024-03-05 15:47:29 -08:00
Yudhajit Sinha	4570b477b9	community[patch]: Invoke callback prior to yielding token (titan_takeoff) (#18560 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/titan_takeoff. - Issue: #16913 - Dependencies: None	2024-03-05 12:54:26 -08:00

... 2 3 4 5 6 ...

838 Commits