langchain

Commit Graph

Author	SHA1	Message	Date
Pavlo Paliychuk	342df7cf83	community[minor]: Add Zep Cloud components + docs + examples (#21671 ) Thank you for contributing to LangChain! - [x] PR title: community: Add Zep Cloud components + docs + examples - [x] PR message: We have recently released our new zep-cloud sdks that are compatible with Zep Cloud (not Zep Open Source). We have also maintained our Cloud version of langchain components (ChatMessageHistory, VectorStore) as part of our sdks. This PRs goal is to port these components to langchain community repo, and close the gap with the existing Zep Open Source components already present in community repo (added ZepCloudMemory,ZepCloudVectorStore,ZepCloudRetriever). Also added a ZepCloudChatMessageHistory components together with an expression language example ported from our repo. We have left the original open source components intact on purpose as to not introduce any breaking changes. - Issue: - - Dependencies: Added optional dependency of our new cloud sdk `zep-cloud` - Twitter handle: @paulpaliychuk51 - [x] Add tests and docs - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	4 months ago
Jan Soubusta	cccc8fbe2f	community[patch]: DuckDB VS - expose similarity, improve performance of from_texts (#20971 ) 3 fixes of DuckDB vector store: - unify defaults in constructor and from_texts (users no longer have to specify `vector_key`). - include search similarity into output metadata (fixes #20969) - significantly improve performance of `from_documents` Dependencies: added Pandas to speed up `from_documents`. I was thinking about CSV and JSON options, but I expect trouble loading JSON values this way and also CSV and JSON options require storing data to disk. Anyway, the poetry file for langchain-community already contains a dependency on Pandas. --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com>	4 months ago
Ameya Shenoy	8ba492ed6a	community[minor]: clickhouse -- ability to use secure connection (#22108 ) - Description: this PR gives clickhouse client the ability to use a secure connection to the clickhosue server - Issue: fixes #22082 - Dependencies: - - Twitter handle: `_codingcoffee_` Signed-off-by: Ameya Shenoy <shenoy.ameya@gmail.com> Co-authored-by: Shresth Rana <shresth@grapevine.in>	4 months ago
Rahul Triptahi	1a485f59b9	community[patch]: Put authorized identities behind a feature flag in SharepointLoader (#22125 ) Description: Put authorised identities behind a feature flag, load_auth. Documentation: N/A Unit tests: N/A --------- Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	4 months ago
sasha	1c9ceff503	community: add metadata to chain logging; (#22122 ) Hey, I'm Sasha. The SDK engineer from [Comet](https://comet.com). This PR updates the CometTracer class. Added metadata to CometTracerr. From now on, both chains and spans will send it.	4 months ago
Jirka Lhotka	7c0459faf2	community: Update costs of openai finetuned models (#22124 ) - Description: Update costs of finetuned models and add gpt-3-turbo-0125. Source: https://openai.com/api/pricing/ - Issue: N/A - Dependencies: None	4 months ago
Eugene Yurtsev	d3db83abe3	community[major]: lint for usage of xml library (#22132 ) * Lint for usage of standard xml library * Add forced opt-in for quip client * Actual security issue is with underlying QuipClient not LangChain integration (since the client is doing the parsing), but adding enforcement at the LangChain level.	4 months ago
Christophe Bornet	c838de5027	doc: Add doc for CassandraByteStore (#22126 ) Preview: https://langchain-git-fork-cbornet-doc-cassandrabytestore-langchain.vercel.app/v0.2/docs/integrations/stores/cassandra/	4 months ago
Eugene Yurtsev	2d693c484e	docs: fix some spelling mistakes caught by newest version of code spell (#22090 ) Going to merge this even though it doesn't pass all tests, and open a separate PR for the remaining spelling mistakes.	4 months ago
Pavel Zloi	fe26f937e4	community[minor]: ManticoreSearch engine added to vectorstore (#19117 ) Description: ManticoreSearch engine added to vectorstores Issue: no issue, just a new feature Dependencies: https://pypi.org/project/manticoresearch-dev/ Twitter handle: @EvilFreelancer - Example notebook with test integration: https://github.com/EvilFreelancer/langchain/blob/manticore-search-vectorstore/docs/docs/integrations/vectorstores/manticore_search.ipynb --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
maang-h	9aba9e3e33	community[patch]: Update the default “API URL” and “MODEL” of sparkllm (#22070 ) - Description: When I was running the sparkllm, I found that the default parameters currently used could no longer run correctly. - original parameters & values: - spark_api_url: "wss://spark-api.xf-yun.com/v3.1/chat" - spark_llm_domain: "generalv3" ```python # example from langchain_community.chat_models import ChatSparkLLM spark = ChatSparkLLM(spark_app_id="my_app_id", spark_api_key="my_api_key", spark_api_secret="my_api_secret") spark.invoke("hello") ``` ![sparkllm](https://github.com/langchain-ai/langchain/assets/55082429/5369bfdf-4305-496a-bcf5-2d3f59d39414) So I updated them to 3.5 (same as sparkllm official website). After the update, they can be used normally. - new parameters & values: - spark_api_url: "wss://spark-api.xf-yun.com/v3.5/chat" - spark_llm_domain: "generalv3.5"	4 months ago
Martin Triska	2df8ac402a	community[minor]: Added propagation of document metadata from O365BaseLoader (#20663 ) Description: - Added propagation of document metadata from O365BaseLoader to FileSystemBlobLoader (O365BaseLoader uses FileSystemBlobLoader under the hood). - This is done by passing dictionary `metadata_dict`: key=filename and value=dictionary containing document's metadata - Modified `FileSystemBlobLoader` to accept the `metadata_dict`, use `mimetype` from it (if available) and pass metadata further into blob loader. Issue: - `O365BaseLoader` under the hood downloads documents to temp folder and then uses `FileSystemBlobLoader` on it. - However metadata about the document in question is lost in this process. In particular: - `mime_type`: `FileSystemBlobLoader` guesses `mime_type` from the file extension, but that does not work 100% of the time. - `web_url`: this is useful to keep around since in RAG LLM we might want to provide link to the source document. In order to work well with document parsers, we pass the `web_url` as `source` (`web_url` is ignored by parsers, `source` is preserved) Dependencies: None Twitter handle: @martintriska1 Please review @baskaryan --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	4 months ago
Eugene Yurtsev	e5541d1da7	community[patch]: Update doc-string in CloudBlobLoader (#22069 ) Update doc-string	4 months ago
Philippe PRADOS	6dd621d636	community[minor]: Add CloudBlobLoader that supports loading data from cloud buckets (#21957 ) Thank you for contributing to LangChain! - [ ] PR title: "Add CloudBlobLoader" - community: Add CloudBlobLoader - [ ] PR message: Add cloud blob loader - Description: Langchain provides several approaches to read different file formats: Specific loaders (`CVSLoader`) or blob-compatible loaders (`FileSystemBlobLoader`). The only implementation proposed for BlobLoader is `FileSystemBlobLoader`. Many projects retrieve files from cloud storage. We propose a new implementation of `BlobLoader` to read files from the three cloud storage systems. The interface is strictly identical to `FileSystemBlobLoader`. The only difference is the constructor, which takes a cloud "url" object such as `s3://my-bucket`, `az://my-bucket`, or `gs://my-bucket`. By streamlining the process, this novel implementation eliminates the requirement to pre-download files from cloud storage to local temporary files (which are seldom removed). The code relies on the [CloudPathLib](https://cloudpathlib.drivendata.org/stable/) library to interpret cloud URLs. This has been added as an optional dependency. ```Python loader = CloudBlobLoader("s3://mybucket/id") for blob in loader.yield_blobs(): print(blob) ``` - [X] Dependencies: CloudPathLib - [X] Twitter handle: pprados - [X] Add tests and docs: Add unit test, but it's easy to convert to integration test, with some files in a cloud storage (see `test_cloud_blob_loader.py`) - [X] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. Hello from Paris @hwchase17. Can you review this PR? --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>	4 months ago
Christophe Bornet	74947ec894	community[minor]: Add Cassandra ByteStore (#22064 )	4 months ago
Christophe Bornet	fea6b99b16	community[minor]: Add async methods to CassandraChatMessageHistory (#21975 )	4 months ago
Sky	12d65f17ff	community[patch]: surrealdb provide functions for MMR (Maximal Marginal Relevance) (#21185 ) This PR contains 4 added functions: - max_marginal_relevance_search_by_vector - amax_marginal_relevance_search_by_vector - max_marginal_relevance_search - amax_marginal_relevance_search I'm no langchain expert, but tried do inspect other vectorstore sources like chroma, to build these functions for SurrealDB. If someone has some changes for me, please let me know. Otherwise I would be happy, if these changes are added to the repository, so that I can use the orignal repo and not my local monkey patched version. --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
Bruno Alvisio	5eabe90494	community[patch]: Adding HEADER to the list of supported locations (#21946 ) Description: adds headers to the list of supported locations when generating the openai function schema	4 months ago
Bagatur	50186da0a1	infra: rm unused # noqa violations (#22049 ) Updating #21137	4 months ago
acho98	45ed5f3f51	community[minor]: Add Clova Embeddings for LangChain Community (#21890 ) - [ ] PR title: "Add Naver ClovaX embedding to LangChain community" - HyperClovaX is a large language model developed by [Naver](https://clova-x.naver.com/welcome). It's a powerful and purpose-trained LLM. - You can visit the embedding service provided by [ClovaX](https://www.ncloud.com/product/aiService/clovaStudio) - You may get CLOVA_EMB_API_KEY, CLOVA_EMB_APIGW_API_KEY, CLOVA_EMB_APP_ID From https://www.ncloud.com/product/aiService/clovaStudio --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
arpitkumar980	444c2a3d9f	community[patch]: sharepoint loader identity enabled (#21176 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines:https://github.com/arpitkumar980/langchain.git - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	4 months ago
HuiyuanYan	bf3aefce93	community[patch]: Update tongyi.py to support MultimodalConversation in dashscope. (#21249 ) Add the support of multimodal conversation in dashscope,now we can use multimodal language model "qwen-vl-v1", "qwen-vl-chat-v1", "qwen-audio-turbo" to processing picture an audio. :) - [ ] PR title: "community: add multimodal conversation support in dashscope" - [ ] PR message: *Delete this entire checklist* and replace with - Description: add multimodal conversation support in dashscope - Issue: - Dependencies: dashscope≥1.18.0 - Twitter handle: none :) - [ ] How to use it?: - ```python Tongyi_chat = ChatTongyi( top_p=0.5, dashscope_api_key=api_key, model="qwen-vl-v1" ) response= Tongyi_chat.invoke( input = [ { "role": "user", "content": [ {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"}, {"text": "这是什么?"} ] } ] ) ``` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
MSubik	d948783a4c	community[patch]: standardize init args, update for javelin sdk release. (#21980 ) Related to [20085](https://github.com/langchain-ai/langchain/issues/20085) Updated the Javelin chat model to standardize the initialization argument. Also fixed an existing bug, where code was initialized with incorrect call to the JavelinClient defined in the javelin_sdk, resulting in an initialization error. See related [Javelin Documentation](https://docs.getjavelin.io/docs/javelin-python/quickstart).	4 months ago
Mohammad Mohtashim	16617dd239	community[patch]: AzureSearchVectorStoreRetriever Fixed to account for search_kwargs (#21572 ) - Description: Fixed `AzureSearchVectorStoreRetriever` to account for search_kwargs. More explanation is in the mentioned issue. - Issue: #21492 --------- Co-authored-by: MAC <mac@MACs-MacBook-Pro.local> Co-authored-by: Massimiliano Pronesti <massimiliano.pronesti@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
Jerron Lim	28456c2c33	community[patch]: add args_schema to WikipediaQueryRun (#22019 ) Description: This change adds args_schema (pydantic BaseModel) to WikipediaQueryRun for correct schema formatting on LLM function calls Issue: currently using WikipediaQueryRun with OpenAI function calling returns the following error "TypeError: WikipediaQueryRun._run() got an unexpected keyword argument '__arg1' ". This happens because the schema sent to the LLM is "input: '{"__arg1":"Hunter x Hunter"}'" while the method should be called with the "query" parameter. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
Mazen Ramadan	3c1d77dd64	community[minor]: Add Scrapfly Loader community integration (#22036 ) Added [Scrapfly](https://scrapfly.io/) Web Loader integration. Scrapfly is a web scraping API that allows extracting web page data into accessible markdown or text datasets. - __Description__: Added Scrapfly web loader for retrieving web page data as markdown or text. - Dependencies: scrapfly-sdk - Twitter: @thealchemi1st --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
ccurme	b51a1eba4d	langchain, community: move OpenAIAssistantV2Runnable to community (#22044 )	4 months ago
CaroFG	6b98140b38	community[patch]: update for compatibility with Meilisearch v1.8 (#21979 ) Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: Updates Meilisearch vectorstore for compatibility with v1.8. Adds [”showRankingScore”: true”](https://www.meilisearch.com/docs/reference/api/search#ranking-score) in the search parameters and replaces `_semanticScore` field with ` _rankingScore` - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	4 months ago
Oleksii Pokotylo	98c0b093bb	community[patch]: Extend AzureSearch with `maximal_marginal_relevance`, `from_embeddings` (#21065 ) Description: - Extend AzureSearch with `maximal_marginal_relevance` (for vector and hybrid search) - Add construction `from_embeddings` - if the user has already embedded the texts - Add `add_embeddings` - Refactor common parts (`_simple_search`, `_results_to_documents`, `_reorder_results_with_maximal_marginal_relevance`) - Add `vector_search_dimensions` as a parameter to the constructor to avoid extra calls to `embed_query` (most of the time the user applies the same model and knows the dimension) Issue: none Dependencies: none - [x] Add tests and docs: The docstrings have been added to the new functions, and unified for the existing ones. The example notebook is great in illustrating the main usage of AzureSearch, adding the new methods would only dilute the main content. - [x] Lint and test --------- Co-authored-by: Oleksii Pokotylo <oleksii.pokotylo@pwc.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
SaschaStoll	709664a079	community[patch]: Performant filter columns option for Hanavector (#21971 ) Description: Backwards compatible extension of the initialisation interface of HanaDB to allow the user to specify specific_metadata_columns that are used for metadata storage of selected keys which yields increased filter performance. Any not-mentioned metadata remains in the general metadata column as part of a JSON string. Furthermore switched to executemany for batch inserts into HanaDB. Issue: N/A Dependencies: no new dependencies added Twitter handle: @sapopensource --------- Co-authored-by: Martin Kolb <martin.kolb@sap.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
Eric Zhang	e7e41eaabe	langchain: add RankLLM Reranker (#21171 ) Integrate RankLLM reranker (https://github.com/castorini/rank_llm) into LangChain An example notebook is given in `docs/docs/integrations/retrievers/rankllm-reranker.ipynb` --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	4 months ago
maang-h	fc93bed8c4	community: Fix CSVLoader columns is None (#20701 ) - Bug code: In langchain_community/document_loaders/csv_loader.py:100 - Description: currently, when 'CSVLoader' reads the column as None in the 'csv' file, it will report an error because the 'CSVLoader' does not verify whether the column is of str type and does not consider how to handle the corresponding 'row_data' when the column is' None 'in the csv. This pr provides a solution. - Issue: Fix #20699 - thinking: 1. Refer to the processing method for 'langchain_community/document_loaders/csv_loader.py:100' when 'v' equals'None', and apply the same method to 'k'. (Reference`csv.DictReader` ,'k' will only be None when ` len(columns) < len(number_row_data)` is established) 2. ‘k’ equals None only holds when it is the last column, and its corresponding 'v' type is a list. Therefore, I referred to the data format in 'Document' and used ',' to concatenated the elements in the list.(But I'm not sure if you accept this form, if you have any other ideas, communicate) --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	4 months ago
Sihan Chen	1f81277b9b	community[minor]: allow enabling proxy in aiohttp session in AsyncHTML (#19499 ) Allow enabling proxy in aiohttp session async html	4 months ago
Eugene Yurtsev	36813d2f00	community[patch]: Fix remaining __inits__ in community (#22037 ) Fixes the __init__ files in community to use __all__ which is statically defined.	4 months ago
Eugene Yurtsev	58360a1e53	community[patch]: Add unit test to verify that init is correctly defined (#22030 ) Fix some __init__ files and add a unit test	4 months ago
Matthew Hoffman	4f2e3bd7fd	community[patch]: fix public interface for embeddings module (#21650 ) ## Description The existing public interface for `langchain_community.emeddings` is broken. In this file, `__all__` is statically defined, but is subsequently overwritten with a dynamic expression, which type checkers like pyright do not support. pyright actually gives the following diagnostic on the line I am requesting we remove: [reportUnsupportedDunderAll](https://github.com/microsoft/pyright/blob/main/docs/configuration.md#reportUnsupportedDunderAll): ``` Operation on "__all__" is not supported, so exported symbol list may be incorrect ``` Currently, I get the following errors when attempting to use publicablly exported classes in `langchain_community.emeddings`: ```python import langchain_community.embeddings langchain_community.embeddings.HuggingFaceEmbeddings(...) # error: "HuggingFaceEmbeddings" is not exported from module "langchain_community.embeddings" (reportPrivateImportUsage) ``` This is solved easily by removing the dynamic expression.	4 months ago
Tomaz Bratanic	d8a1f1114d	community[patch]: Handle exceptions where node props aren't consistent in neo4j schema (#22027 )	4 months ago
WeichenXu	b0ef5e778a	community[patch]: Fix ChatDatabricsk in case that streaming response doesn't have role field in delta chunk (#21897 ) Thank you for contributing to LangChain! - [X] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" Description: Fix ChatDatabricsk in case that streaming response doesn't have role field in delta chunk - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [X] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Signed-off-by: Weichen Xu <weichen.xu@databricks.com>	4 months ago
Kefan You	24b5c27bb1	community[patch]: raise_for_status logic missing in async _fetch of WebBaseLoader (#21948 ) ## 'raise_for_status' parameter of WebBaseLoader works in sync load but not in async load. In webBaseLoader: Sync load is calling `_scrape` and has `raise_for_status` properly handled. ``` def _scrape( self, url: str, parser: Union[str, None] = None, bs_kwargs: Optional[dict] = None, ) -> Any: from bs4 import BeautifulSoup if parser is None: if url.endswith(".xml"): parser = "xml" else: parser = self.default_parser self._check_parser(parser) html_doc = self.session.get(url, self.requests_kwargs) if self.raise_for_status: html_doc.raise_for_status() if self.encoding is not None: html_doc.encoding = self.encoding elif self.autoset_encoding: html_doc.encoding = html_doc.apparent_encoding return BeautifulSoup(html_doc.text, parser, (bs_kwargs or {})) ``` Async load is calling `_fetch` but missing `raise_for_status` logic. ``` async def _fetch( self, url: str, retries: int = 3, cooldown: int = 2, backoff: float = 1.5 ) -> str: async with aiohttp.ClientSession() as session: for i in range(retries): try: async with session.get( url, headers=self.session.headers, ssl=None if self.session.verify else False, cookies=self.session.cookies.get_dict(), ) as response: return await response.text() ``` Co-authored-by: kefan.you <darkfss@sina.com>	4 months ago
Pengcheng Liu	4cf523949a	community[patch]: Update model client to support vision model in Tong… (#21474 ) - Description: Tongyi uses different client for chat model and vision model. This PR chooses proper client based on model name to support both chat model and vision model. Reference [tongyi document](https://help.aliyun.com/zh/dashscope/developer-reference/tongyi-qianwen-vl-plus-api?spm=a2c4g.11186623.0.0.27404c9a7upm11) for details. ``` from langchain_core.messages import HumanMessage from langchain_community.chat_models import ChatTongyi llm = ChatTongyi(model_name='qwen-vl-max') image_message = { "image": "https://lilianweng.github.io/posts/2023-06-23-agent/agent-overview.png" } text_message = { "text": "summarize this picture", } message = HumanMessage(content=[text_message, image_message]) llm.invoke([message]) ``` - Issue: None - Dependencies: None - Twitter handle: None	4 months ago
Sevin F. Varoglu	1bc0ea5496	community[patch]: update OctoAIEmbeddings to subclass OpenAIEmbeddings (#21805 )	4 months ago
Bagatur	72d4a8eeed	community[patch]: AzureSearch dont overwrite default async (#21989 )	4 months ago
Yulong Wang	8e1aeb8ad5	community[patch]: Fix typo in arxiv tool's doc (#21970 ) Fix typo in arxiv tool's doc	4 months ago
Robert Caulk	54adcd9e82	community[minor]: add AskNews retriever and AskNews tool (#21581 ) We add a tool and retriever for the [AskNews](https://asknews.app) platform with example notebooks. The retriever can be invoked with: ```py from langchain_community.retrievers import AskNewsRetriever retriever = AskNewsRetriever(k=3) retriever.invoke("impact of fed policy on the tech sector") ``` To retrieve 3 documents in then news related to fed policy impacts on the tech sector. The included notebook also includes deeper details about controlling filters such as category and time, as well as including the retriever in a chain. The tool is quite interesting, as it allows the agent to decide how to obtain the news by forming a query and deciding how far back in time to look for the news: ```py from langchain_community.tools.asknews import AskNewsSearch from langchain import hub from langchain.agents import AgentExecutor, create_openai_functions_agent from langchain_openai import ChatOpenAI tool = AskNewsSearch() instructions = """You are an assistant.""" base_prompt = hub.pull("langchain-ai/openai-functions-template") prompt = base_prompt.partial(instructions=instructions) llm = ChatOpenAI(temperature=0) asknews_tool = AskNewsSearch() tools = [asknews_tool] agent = create_openai_functions_agent(llm, tools, prompt) agent_executor = AgentExecutor( agent=agent, tools=tools, verbose=True, ) agent_executor.invoke({"input": "How is the tech sector being affected by fed policy?"}) ``` --------- Co-authored-by: Emre <e@emre.pm>	4 months ago
Jesse S	fc79b372cb	community[minor]: add aerospike vectorstore integration (#21735 ) Please let me know if you see any possible areas of improvement. I would very much appreciate your constructive criticism if time allows. Description: - Added a aerospike vector store integration that utilizes [Aerospike-Vector-Search](https://aerospike.com/products/vector-database-search-llm/) add-on. - Added both unit tests and integration tests - Added a docker compose file for spinning up a test environment - Added a notebook Dependencies: any dependencies required for this change - aerospike-vector-search Twitter handle: - No twitter, you can use my GitHub handle or LinkedIn if you'd like Thanks! --------- Co-authored-by: Jesse Schumacher <jschumacher@aerospike.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
Prince Canuma	3587c60396	community[patch]: Fix MLX LLM Stream (#20575 ) Closes #20561 This PR fixes MLX LLM stream `AttributeError`. Recently, `mlx-lm` changed the token decoding logic, which affected the LC+MLX integration. Additionally, I made minor fixes such as: docs example broken link and enforcing pipeline arguments (max_tokens, temp and etc) for invoke. - Issue: #20561 - Twitter handle: @Prince_Canuma	4 months ago
Rahul Triptahi	96bd0b0844	community[patch]: Remove redundant pebblo cloud api call (#21589 ) Description: removed redundant pebblo cloud api call. Changed classified `doc` key to `ai_apps_data`. Documentation: N/A Unit tests: N/A	4 months ago
Param Singh	d07885f8b7	community[patch]: standardized sparkllm init args (#21633 ) Related to #20085 @baskaryan Thank you for contributing to LangChain! community:sparkllm[patch]: standardized init args updated `spark_api_key` so that aliased to `api_key`. Added integration test for `sparkllm` to test that it continues to set the same underlying attribute. updated temperature with Pydantic Field, added to the integration test. Ran `make format`,`make test`, `make lint`, `make spell_check`	4 months ago
Dhruv Chawla	d4359d3de6	community[patch]: Update UpTrain Callback Handler to support the new UpTrain evaluation schema (#21656 ) UpTrain has a new dashboard now that makes it easier to view projects and evaluations. Using this requires specifying both project_name and evaluation_name when performing evaluations. I have updated the code to support it.	4 months ago
Alex Riina	c0e3c3a350	openai[patch], community[patch]: add pricing and max context window for GPT-4o (#21673 ) # Add pricing and max context window for GPT-4o - community: add cost per 1k tokens and max context window - partners: add max context window Description: adds static information about GPT-4o based on https://openai.com/api/pricing/ and https://platform.openai.com/docs/models/gpt-4o so that GPT-4o reporting is accurate. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
缨缨	bd39b2ccdf	community: enable SupabaseVectorStore to support extended table fields (#21762 ) Thank you for contributing to LangChain! - [x] PR title: "community: enable SupabaseVectorStore to support extended table fields" - [x] PR message: - Added extension fields to the function _add_vectors so that users can add other custom fields when insert a record into the database. eg: ![image](https://github.com/langchain-ai/langchain/assets/10885578/e1d5ca20-936e-4cab-ba69-8fdd23b8ce8f) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
Jens	b0b302ec6b	community[patch]: fixed aleph alpha default emedding request (#21826 ) - Description: In the aleph alpha client the paramater `normalize` is not optional. Setting this to `None` gives an error. - Dependencies: None Co-authored-by: Jens Lücke <jens.luecke@tngtech.com> Co-authored-by: Jens <jens.luecke@hu-berlin.de> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	4 months ago
Jorge Piedrahita Ortiz	e6207ad4f3	community[patch]: Sambanova integration api update (#21848 ) - Description:: SambaStudio generic endpoint compatibility added Improved error description, and handling streaming examples added	4 months ago
Liuww	332ffed393	community[patch]: Adopting the lighter-weight xinference_client (#21900 ) While integrating the xinference_embedding, we observed that the downloaded dependency package is quite substantial in size. With a focus on resource optimization and efficiency, if the project requirements are limited to its vector processing capabilities, we recommend migrating to the xinference_client package. This package is more streamlined, significantly reducing the storage space requirements of the project and maintaining a feature focus, making it particularly suitable for scenarios that demand lightweight integration. Such an approach not only boosts deployment efficiency but also enhances the application's maintainability, rendering it an optimal choice for our current context. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
Jiří Spilka	6499897c87	community[patch]: update apify integration to attribute API activity to langchain (#21909 ) Description: Add `Origin/langchain` to Apify's client's user-agent to attribute API activity to LangChain (at Apify, we aim to monitor our integrations to evaluate whether we should invest more in the LangChain integration regarding functionality and content) Issue: None Dependencies: None Twitter handle: None	4 months ago
Tomaz Bratanic	d85e46321a	community[patch]: Better error message for neo4j vector when text is null (#21861 )	4 months ago
WilliamEspegren	30bca57aae	doc list not empty (#21208 ) Make sure the doc list is not empty, and set Metadata: true in param, to enable the user to disable metadata for slightly faster crawls.	4 months ago
TJ	8cd6ed3e1e	community[patch]: Update documentation string in databricks chat model (#21915 ) Update typos in documentation string in databricks chat model	4 months ago
Eugene Yurtsev	e3f30b4cde	docs: clean up link to bing search (#21825 ) Documentation should be inlined, not linking to medium article.	4 months ago
Sen Lin	eb7f07ae36	community[patch]: fix typo in ValueError message in load_local function (#21818 ) Description: Corrected an error in the `allow_dangerous_deserialization` message within the `load_local` functions	4 months ago
Jorge Piedrahita Ortiz	700b1c7212	community: sambaverse api update (#21816 ) - Description: fix sambaverse integration to make it compatible with sambaverse API update / minor changes in docs	4 months ago
maang-h	9f8d18c028	community[patch]: Fix unintended newline in print statement in exception for BaichuanTextEmbeddings (#21820 ) - Code: langchain_community/embeddings/baichuan.py:82 - Description: When I make an error using 'baichuan embeddings', the printed error message is wrapped (there is actually no need to wrap) ```python # example from langchain_community.embeddings import BaichuanTextEmbeddings # error key BAICHUAN_API_KEY = "sk-xxxxxxxxxxxxx" embeddings = BaichuanTextEmbeddings(baichuan_api_key=BAICHUAN_API_KEY) text_1 = "今天天气不错" query_result = embeddings.embed_query(text_1) ``` ![unintended newline](https://github.com/langchain-ai/langchain/assets/55082429/e1178ce8-62bb-405d-a4af-e3b28eabc158)	4 months ago
Eugene Yurtsev	8607735b80	langchain[patch],community[patch]: Move unit tests that depend on community to community (#21685 )	4 months ago
Marco Lamina	d0fae6cd54	community: Add token cost for GPT-4o model (#21771 ) Adding [token cost for the new GPT-4o model](https://openai.com/api/pricing/): * Input cost US$5.00 / 1M tokens * Output cost US$15.00 / 1M tokens	4 months ago
Massimiliano Pronesti	0c0db7c5db	feat(community): support semantic hybrid score threshold in Azure AI Search (#21527 ) Support semantic hybrid search with a score threshold -- similar to what we do for similarity search and for hybrid search (#20907).	4 months ago
Stefano Lottini	040597e832	community: init signature revision for Cassandra LLM cache classes + small maintenance (#17765 ) This PR improves on the `CassandraCache` and `CassandraSemanticCache` classes, mainly in the constructor signature, and also introduces several minor improvements around these classes. ### Init signature A (sigh) breaking change is tentatively introduced to the constructor. To me, the advantages outweigh the possible discomfort: the new syntax places the DB-connection objects `session` and `keyspace` later in the param list, so that they can be given a default value. This is what enables the pattern of _not_ specifying them, provided one has previously initialized the Cassandra connection through the versatile utility method `cassio.init(...)`. In this way, a much less unwieldy instantiation can be done, such as `CassandraCache()` and `CassandraSemanticCache(embedding=xyz)`, everything else falling back to defaults. A downside is that, compared to the earlier signature, this might turn out to be breaking for those doing positional instantiation. As a way to mitigate this problem, this PR typechecks its first argument trying to detect the legacy usage. (And to make this point less tricky in the future, most arguments are left to be keyword-only). If this is considered too harsh, I'd like guidance on how to further smoothen this transition. Our plan is to make the pattern of optional session/keyspace a standard across all Cassandra classes, so that a repeatable strategy would be ideal. A possibility would be to keep positional arguments for legacy reasons but issue a deprecation warning if any of them is actually used, to later remove them with 0.2 - please advise on this point. ### Other changes - class docstrings: enriched, completely moved to class level, added note on `cassio.init(...)` pattern, added tiny sample usage code. - semantic cache: revised terminology to never mention "distance" (it is in fact a similarity!). Kept the legacy constructor param with a deprecation warning if used. - `llm_caching` notebook: uniform flow with the Cassandra and Astra DB separate cases; better and Cassandra-first description; all imports made explicit and from community where appropriate. - cache integration tests moved to community (incl. the imported tools), env var bugfix for `CASSANDRA_CONTACT_POINTS`. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	4 months ago
Kyle Cassidy	eca8c4bcc6	Standardized openai init params (#21739 ) ## Patch Summary community:openai[patch]: standardize init args ## Details I made changes to the OpenAI Chat API wrapper test in the Langchain open-source repository - File: `libs/community/tests/unit_tests/chat_models/test_openai.py` - Changes: - Updated `max_retries` with Pydantic Field - Updated the corresponding unit test - Related Issues: #20085 - Updated max_retries with Pydantic Field, updated the unit test. --------- Co-authored-by: JuHyung Son <sonju0427@gmail.com>	4 months ago
Ethan Yang	e44b448ec3	community: update openvino doc with streaming support (#21519 ) Co-authored-by: Chester Curme <chester.curme@gmail.com>	4 months ago
ccurme	19e6bf814b	community: fix CI (#21766 )	4 months ago
Mish Ushakov	d77e60a7f4	community: updated Browserbase loader (#21757 ) Thank you for contributing to LangChain! - [x] PR title: "community: updated Browserbase loader" - [x] PR message: Updates the Browserbase loader with more options and improved docs. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/	4 months ago
Cheese	0ead09f84d	community: Implement `bind_tools` for ChatTongyi (#20725 ) ## Description Implement `bind_tools` in ChatTongyi. Usage example: ```py from langchain_core.tools import tool from langchain_community.chat_models.tongyi import ChatTongyi @tool def multiply(first_int: int, second_int: int) -> int: """Multiply two integers together.""" return first_int * second_int llm = ChatTongyi(model="qwen-turbo") llm_with_tools = llm.bind_tools([multiply]) msg = llm_with_tools.invoke("What's 5 times forty two") print(msg) ``` Streaming is also supported. ## Dependencies No Dependency is required for this change. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	4 months ago
Harrison Chase	15be439719	Harrison/move flashrank rerank (#21448 ) third party integration, should be in community	4 months ago
Rajendra Kadam	54e003268e	langchain[minor]: Add PebbloRetrievalQA chain with Identity & Semantic Enforcement support (#20641 ) - Description: PebbloRetrievalQA chain introduces identity enforcement using vector-db metadata filtering - Dependencies: None - Issue: None - Documentation: Adding documentation for PebbloRetrievalQA chain in a separate PR(https://github.com/langchain-ai/langchain/pull/20746) - Unit tests: New unit-tests added --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>	4 months ago
Anush	edd68e4ad4	qdrant: init package (#21146 ) ## Description This PR introduces the new `langchain-qdrant` partner package, intending to deprecate the community package. ## Changes - Moved the Qdrant vector store implementation `/libs/partners/qdrant` with integration tests. - The conditional imports of the client library are now regular with minor implementation improvements. - Added a deprecation warning to `langchain_community.vectorstores.qdrant.Qdrant`. - Replaced references/imports from `langchain_community` with either `langchain_core` or by moving the definitions to the `langchain_qdrant` package itself. - Updated the Qdrant vector store documentation to reflect the changes. ## Testing - `QDRANT_URL` and [`QDRANT_API_KEY`](`583e36bf6b`) env values need to be set to [run integration tests](`d608c93d1f`) in the [cloud](https://cloud.qdrant.tech). - If a Qdrant instance is running at `http://localhost:6333`, the integration tests will use it too. - By default, tests use an [`in-memory`](https://github.com/qdrant/qdrant-client?tab=readme-ov-file#local-mode) instance(Not comprehensive). --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Erick Friis <erickfriis@gmail.com>	4 months ago
Prashanth Rao	63c3a0e56c	[community][graph]: Update KuzuQAChain and docs (#21218 ) This PR makes some small updates for `KuzuQAChain` for graph QA. - Updated Cypher generation prompt (we now support `WHERE EXISTS`) and generalize it more - Support different LLMs for Cypher generation and QA - Update docs and examples	4 months ago
Jofthomas	afd85b60fc	huggingface: init package (#21097 ) First Pr for the langchain_huggingface partner Package - Moved some of the hugging face related class from `community` to the new `partner package` Still needed : - Documentation - Tests - Support for the new apply_chat_template in `ChatHuggingFace` - Confirm choice of class to support for embeddings witht he sentence-transformer team. cc : @efriis --------- Co-authored-by: Cyril Kondratenko <kkn1993@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	4 months ago
Tomaz Bratanic	9fce03e7db	community[patch]: Fix neo4j enhanced schema (#21582 )	4 months ago
Christophe Bornet	66a4da8ad0	community[patch]: Improve Cassandra VectorStore docsctrings (#21620 )	4 months ago
Eugene Yurtsev	25fbe356b4	community[patch]: upgrade to recent version of mypy (#21616 ) This PR upgrades community to a recent version of mypy. It inserts type: ignore on all existing failures.	4 months ago
Jorge Piedrahita Ortiz	4378fbbef0	community[patch]: Fix typos in Sambanova integration doc-strings (#21617 ) - Description: Sambanova integration docstrings updated, bad formated --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>	4 months ago
Christophe Bornet	bcf53f93e1	[community]: Add missing docstring param to CassandraLoader (#21611 )	4 months ago
Christophe Bornet	e6fa4547b1	community[minor]: Add alazy_load to AsyncHtmlLoader (#21536 ) Also fixes a bug that `_scrape` was called and was doing a second HTTP request synchronously. Twitter handle: cbornet_	4 months ago
Leonid Ganeline	500569da48	community[patch]: `vectorstores` import update (#21169 ) Issue: we have several helper functions to import third-party libraries like lancedb.import_lancedb in [community.vectorstores](https://api.python.langchain.com/en/latest/vectorstores/langchain_community.vectorstores.lancedb.import_lancedb.html#langchain_community.vectorstores.lancedb.import_lancedb). And we have core.utils.utils.guard_import that works exactly for this purpose. The import_<package> functions work inconsistently and rather be private functions. Change: replaced these functions with the guard_import function. Related to #21133	4 months ago
Renu Rozera	4035a1d234	Add source metadata to bedrock retriever response (#21349 ) Thank you for contributing to LangChain! - [X] PR title: "community: Add source metadata to bedrock retriever response" - [X] PR message: - Description: Bedrock retrieve API returns extra metadata in the response which is currently not returned in the retriever response - Issue: The change adds the metadata from bedrock retrieve API response to the bedrock retriever in a backward compatible way. Renamed metadata to sourceMetadata as metadata term is being used in the Document already. This is in sync with what we are doing in llama-index as well. - Dependencies: No - [X] Add tests and docs: 1. Added unit tests 2. Notebook already exists and does not need any change 3. Response from end to end testing, just to ensure backward compatibility: `[Document(page_content='Exoplanets.', metadata={'location': {'s3Location': {'uri': 's3://bucket/file_name.txt'}, 'type': 'S3'}, 'score': 0.46886647, 'source_metadata': {'x-amz-bedrock-kb-source-uri': 's3://bucket/file_name.txt', 'tag': 'space', 'team': 'Nasa', 'year': 1946.0}})]` - [X] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Piyush Jain <piyushjain@duck.com>	5 months ago
roiperlman	9992beaff9	community: Add arguments to whisper parser (#20378 ) Description: Added a few additional arguments to the whisper parser, which can be consumed by the underlying API. The prompt is especially important to fine-tune transcriptions. --------- Co-authored-by: Roi Perlman <roi@fivesigmalabs.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	5 months ago
Yash	cb31c3611f	Ndb enterprise (#21233 ) Description: Adds NeuralDBClientVectorStore to the langchain, which is our enterprise client. --------- Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com> Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>	5 months ago
Oguz Vuruskaner	5b35f077f9	[community][fix](DeepInfraEmbeddings): Implement chunking for large batches (#21189 ) Description: This PR introduces chunking logic to the `DeepInfraEmbeddings` class to handle large batch sizes without exceeding maximum batch size of the backend. This enhancement ensures that embedding generation processes large batches by breaking them down into smaller, manageable chunks, each conforming to the maximum batch size limit. Issue: Fixes #21189 Dependencies: No new dependencies introduced.	5 months ago
Sokolov Fedor	f4ddf64faa	community: Add MarkdownifyTransformer to langchain_community.document_transformers (#21247 ) - Added new document_transformer: MarkdonifyTransformer, that uses `markdonify` package with customizable options to convert HTML to Markdown. It's similar to Html2TextTransformer, but has more flexible options and also I've noticed that sometimes MarkdownifyTransformer performs better than html2text one, so that's why I use markdownify on my project. - Added docs and tests - Usage: ```python from langchain_community.document_transformers import MarkdownifyTransformer markdownify = MarkdownifyTransformer() docs_transform = markdownify.transform_documents(docs) ``` - Example of better performance on simple task, that I've noticed: ``` <html> <head><title>Reports on product movement</title></head> <body> <p data-block-key="2wst7">The reports on product movement will be useful for forming supplier orders and controlling outcomes.</p> </body> ``` Html2TextTransformer: ```python [Document(page_content='The reports on product movement will be useful for forming supplier orders and\ncontrolling outcomes.\n\n')] # Here we can see 'and\ncontrolling', which has extra '\n' in it ``` MarkdownifyTranformer: ```python [Document(page_content='Reports on product movement\n\nThe reports on product movement will be useful for forming supplier orders and controlling outcomes.')] ``` --------- Co-authored-by: Sokolov Fedor <f.sokolov@sokolov-macbook.bbrouter> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Sokolov Fedor <f.sokolov@sokolov-macbook.local> Co-authored-by: Sokolov Fedor <f.sokolov@192.168.1.6>	5 months ago
Alex JW	d3ce6aad2e	community: Instantiate GPT4AllEmbeddings with parameters (#21238 ) ### GPT4AllEmbeddings parameters --- Description: As of right now the Embed4All class inside _GPT4AllEmbeddings_ is instantiated as it's default which leaves no room to customize the chosen model and it's behavior. Thus: - GPT4AllEmbeddings can now be instantiated with custom parameters like a different model that shall be used. --------- Co-authored-by: AlexJauchWalser <alexander.jauch-walser@knime.com>	5 months ago
Philippe PRADOS	7be68228da	community[patch]: Make sql record manager fully compatible with async (#20735 ) The `_amake_session()` method does not allow modifying the `self.session_factory` with anything other than `async_sessionmaker`. This prohibits advanced uses of `index()`. In a RAG architecture, it is necessary to import document chunks. To keep track of the links between chunks and documents, we can use the `index()` API. This API proposes to use an SQL-type record manager. In a classic use case, using `SQLRecordManager` and a vector database, it is impossible to guarantee the consistency of the import. Indeed, if a crash occurs during the import (problem with the network, ...) there is an inconsistency between the SQL database and the vector database. With the [PR](https://github.com/langchain-ai/langchain-postgres/pull/32) we are proposing for `langchain-postgres`, it is now possible to guarantee the consistency of the import of chunks into a vector database. It's possible only if the outer session is built with the connection. ```python def main(): db_url = "postgresql+psycopg://postgres:password_postgres@localhost:5432/" engine = create_engine(db_url, echo=True) embeddings = FakeEmbeddings() pgvector:VectorStore = PGVector( embeddings=embeddings, connection=engine, ) record_manager = SQLRecordManager( namespace="namespace", engine=engine, ) record_manager.create_schema() with engine.connect() as connection: session_maker = scoped_session(sessionmaker(bind=connection)) # NOTE: Update session_factories record_manager.session_factory = session_maker pgvector.session_maker = session_maker with connection.begin(): loader = CSVLoader( "data/faq/faq.csv", source_column="source", autodetect_encoding=True, ) result = index( source_id_key="source", docs_source=loader.load()[:1], cleanup="incremental", vector_store=pgvector, record_manager=record_manager, ) print(result) ``` The same thing is possible asynchronously, but a bug in `sql_record_manager.py` in `_amake_session()` must first be fixed. ```python async def _amake_session(self) -> AsyncGenerator[AsyncSession, None]: """Create a session and close it after use.""" # FIXME: REMOVE if not isinstance(self.session_factory, async_sessionmaker):~~ if not isinstance(self.engine, AsyncEngine): raise AssertionError("This method is not supported for sync engines.") async with self.session_factory() as session: yield session ``` Then, it is possible to do the same thing asynchronously: ```python async def main(): db_url = "postgresql+psycopg://postgres:password_postgres@localhost:5432/" engine = create_async_engine(db_url, echo=True) embeddings = FakeEmbeddings() pgvector:VectorStore = PGVector( embeddings=embeddings, connection=engine, ) record_manager = SQLRecordManager( namespace="namespace", engine=engine, async_mode=True, ) await record_manager.acreate_schema() async with engine.connect() as connection: session_maker = async_scoped_session( async_sessionmaker(bind=connection), scopefunc=current_task) record_manager.session_factory = session_maker pgvector.session_maker = session_maker async with connection.begin(): loader = CSVLoader( "data/faq/faq.csv", source_column="source", autodetect_encoding=True, ) result = await aindex( source_id_key="source", docs_source=loader.load()[:1], cleanup="incremental", vector_store=pgvector, record_manager=record_manager, ) print(result) asyncio.run(main()) ``` --------- Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Sean <sean@upstage.ai> Co-authored-by: JuHyung-Son <sonju0427@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: YISH <mokeyish@hotmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Jason_Chen <820542443@qq.com> Co-authored-by: Joan Fontanals <joan.fontanals.martinez@jina.ai> Co-authored-by: Pavlo Paliychuk <pavlo.paliychuk.ca@gmail.com> Co-authored-by: fzowl <160063452+fzowl@users.noreply.github.com> Co-authored-by: samanhappy <samanhappy@gmail.com> Co-authored-by: Lei Zhang <zhanglei@apache.org> Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: merdan <48309329+merdan-9@users.noreply.github.com> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Andres Algaba <andresalgaba@gmail.com> Co-authored-by: davidefantiniIntel <115252273+davidefantiniIntel@users.noreply.github.com> Co-authored-by: Jingpan Xiong <71321890+klaus-xiong@users.noreply.github.com> Co-authored-by: kaka <kaka@zbyte-inc.cloud> Co-authored-by: jingsi <jingsi@leadincloud.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Rahul Triptahi <rahul.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Shengsheng Huang <shannie.huang@gmail.com> Co-authored-by: Michael Schock <mjschock@users.noreply.github.com> Co-authored-by: Anish Chakraborty <anish749@users.noreply.github.com> Co-authored-by: am-kinetica <85610855+am-kinetica@users.noreply.github.com> Co-authored-by: Dristy Srivastava <58721149+dristysrivastava@users.noreply.github.com> Co-authored-by: Matt <matthew.gotteiner@microsoft.com> Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>	5 months ago
Andreas Motl	17e42bbd18	community[patch]: pgvector: Slight refactoring to make code a bit more reusable (#16243 ) - Description: Improve [pgvector vector store adapter](https://github.com/langchain-ai/langchain/blob/v0.1.1/libs/community/langchain_community/vectorstores/pgvector.py) to make it reusable by adapters deriving from that. - Issue: NA - Dependencies: NA - References: https://github.com/crate-workbench/langchain/pull/1 - Addressed to: @eyurtsev, @cbornet Hi from the CrateDB team, first of all, thanks a stack for conceiving and maintaining LangChain. We are currently [preparing a patch](https://github.com/crate-workbench/langchain/pull/1) for adding [CrateDB](https://github.com/crate/crate) to the list of community adapters. Because CrateDB aims to be compatible with PostgreSQL to some degree, the vector store subsystem in LangChain derives functionality from the corresponding implementation for pgvector. Therefore, in order to make the implementation more reusable, we needed to rename the private methods `__from` and `__query_collection` to the less private counterparts `_from` and `_query_collection`, so they can be overwritten, in order to unlock other adapters deriving from [pgvector](https://github.com/langchain-ai/langchain/blob/v0.1.1/libs/community/langchain_community/vectorstores/pgvector.py). With kind regards, Andreas.	5 months ago
Mehrdad Shokri	f103927b88	bugfix(community): fix Playwright import paths. (#21395 ) - Description: Fix import class name exporeted from 'playwright.async_api' and 'playwright.sync_api' to match the correct name in playwright tool. Change import from inline guard_import to helper function that calls guard_import to make code more readable in gmail tool. Upgrade playwright version to 1.43.0 - Issue: #21354 - Dependencies: upgrade playwright version(this is not required for the bugfix itself, just trying to keep dependencies fresh. I can remove the playwright version upgrade if you want.)	5 months ago
Shailendra Mishra	aa966b6161	Replaced bind variable in SQL with formatted string for compatibility with sql syntax. (#21439 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	5 months ago
Eugene Yurtsev	f92006de3c	multiple: langchain 0.2 in master (#21191 ) 0.2rc migrations - [x] Move memory - [x] Move remaining retrievers - [x] graph_qa chains - [x] some dependency from evaluation code potentially on math utils - [x] Move openapi chain from `langchain.chains.api.openapi` to `langchain_community.chains.openapi` - [x] Migrate `langchain.chains.ernie_functions` to `langchain_community.chains.ernie_functions` - [x] migrate `langchain/chains/llm_requests.py` to `langchain_community.chains.llm_requests` - [x] Moving `langchain_community.cross_enoders.base:BaseCrossEncoder` -> `langchain_community.retrievers.document_compressors.cross_encoder:BaseCrossEncoder` (namespace not ideal, but it needs to be moved to `langchain` to avoid circular deps) - [x] unit tests langchain -- add pytest.mark.community to some unit tests that will stay in langchain - [x] unit tests community -- move unit tests that depend on community to community - [x] mv integration tests that depend on community to community - [x] mypy checks Other todo - [x] Make deprecation warnings not noisy (need to use warn deprecated and check that things are implemented properly) - [x] Update deprecation messages with timeline for code removal (likely we actually won't be removing things until 0.4 release) -- will give people more time to transition their code. - [ ] Add information to deprecation warning to show users how to migrate their code base using langchain-cli - [ ] Remove any unnecessary requirements in langchain (e.g., is SQLALchemy required?) --------- Co-authored-by: Erick Friis <erick@langchain.dev>	5 months ago
Dobiichi-Origami	5b00885b49	community: add `bind_tools` and `with_structured_output` support to `QianfanChatEndpoint` (#21412 ) …Endpoint` Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: add `bind_tools` and `with_structured_output` support to `QianfanChatEndpoint` - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/	5 months ago
Leonid Ganeline	791d59a2c8	community: `callbacks` guard_imports (#21173 ) Issue: we have several helper functions to import third-party libraries like import_uptrain in [community.callbacks](https://api.python.langchain.com/en/latest/callbacks/langchain_community.callbacks.uptrain_callback.import_uptrain.html#langchain_community.callbacks.uptrain_callback.import_uptrain). And we have core.utils.utils.guard_import that works exactly for this purpose. The import_<package> functions work inconsistently and rather be private functions. Change: replaced these functions with the guard_import function. Related to #21133	5 months ago
Rahul Triptahi	7994cba18d	[Community][Minor]: Fetch loader_source of GoogleDriveLoader in PebbloSafeLoader. (#21314 ) Description: This PR includes fix for loader_source to be fetched from metadata in case of GdriveLoaders. Documentation: NA Unit Test: NA Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	5 months ago
Eugene Yurtsev	6a1d61dbf1	community[patch]: Fix in memory vectorstore to take into account ids when adding docs (#21384 ) Should respect `ids` if passed	5 months ago
Miroslav	04e2611fea	Added additional headers for HuggingFaceInferenceAPIEmbeddings endpoint. (#21282 ) Thank you for contributing to LangChain! - [ ] HuggingFaceInferenceAPIEmbeddings: "Additional Headers" - Where: langchain, community, embeddings. huggingface.py. - Community: add additional headers when needed by custom HuggingFace TEI embedding endpoints. HuggingFaceInferenceAPIEmbeddings" - [ ] PR message: *Delete this entire checklist* and replace with - Description: Adding the `additional_headers` to be passed to requests library if needed - Dependencies: none - [ ] Add tests and docs: If you're adding a new integration, please include 1. Tested with locally available TEI endpoints with and without `additional_headers` 2. Example Usage ```python embeddings=HuggingFaceInferenceAPIEmbeddings( api_key=MY_CUSTOM_API_KEY, api_url=MY_CUSTOM_TEI_URL, additional_headers={ "Content-Type": "application/json" } ) ``` Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Massimiliano Pronesti <massimiliano.pronesti@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	5 months ago
Guangdong Liu	1fe66f5d39	community(patch) fix MoonshotChat moonshot_api_key is invaild for api key (#21361 ) Description: close https://github.com/langchain-ai/langchain/issues/21237 @baskaryan, @eyurtsev	5 months ago

1 2 3 4 5 ...

1030 Commits (34edfe4a16ea2419d3fcb3185e29d4834505a9ab)