langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-10 01:10:59 +00:00

Author	SHA1	Message	Date
Rahul Triptahi	c172611647	community[patch]: Add classifier_url argument in PebbloSafeLoader and documentation update. (#21030 ) Description: Add classifier_url argument in PebbloSafeLoader. Documentation: Updated PebbloSafeLoader documentation with above change and new links for pebblo github pages. --------- Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	2024-04-29 17:41:09 -04:00
Leonid Ganeline	85094cbb3a	docs: community docstring updates (#21040 ) Added missed docstrings. Updated docstrings to consistent format.	2024-04-29 17:40:23 -04:00
Rodrigo Nogueira	90f19028e5	community[patch]: Add maritalk streaming (sync and async) (#19203 ) Co-authored-by: RosevalJr <rdmalajr@gmail.com> Co-authored-by: Roseval Donisete Malaquias Junior <roseval@maritaca.ai>	2024-04-29 21:31:14 +00:00
Cahid Arda Öz	cc6191cb90	community[minor]: Add support for Upstash Vector (#20824 ) ## Description Adding `UpstashVectorStore` to utilize [Upstash Vector](https://upstash.com/docs/vector/overall/getstarted)! #17012 was opened to add Upstash Vector to langchain but was closed to wait for filtering. Now filtering is added to Upstash vector and we open a new PR. Additionally, [embedding feature](https://upstash.com/docs/vector/features/embeddingmodels) was added and we add this to our vectorstore aswell. ## Dependencies [upstash-vector](https://pypi.org/project/upstash-vector/) should be installed to use `UpstashVectorStore`. Didn't update dependencies because of [this comment in the previous PR](https://github.com/langchain-ai/langchain/pull/17012#pullrequestreview-1876522450). ## Tests Tests are added and they pass. Tests are naturally network bound since Upstash Vector is offered through an API. There was [a discussion in the previous PR about mocking the unittests](https://github.com/langchain-ai/langchain/pull/17012#pullrequestreview-1891820567). We didn't make changes to this end yet. We can update the tests if you can explain how the tests should be mocked. --------- Co-authored-by: ytkimirti <yusuftaha9@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-29 17:25:01 -04:00
chyroc	3e241956d3	community[minor]: add coze chat model (#20770 ) add coze chat model, to call coze.com apis	2024-04-29 12:26:16 -04:00
Massimiliano Pronesti	ce89b34fc0	community[patch]: support hybrid search with threshold in Azure AI Search Retriever (#20907 ) Support hybrid search with a score threshold -- similar to what we do for similarity search.	2024-04-29 12:11:44 -04:00
Andrei Panferov	b3efa38cc0	community[patch]: GigaChat model selection fix (#20988 ) Fixed the error that the model name is never actually put into GigaChat request payload, always defaulting to `GigaChat-Lite`. With this fix, model selection through ```python import os from langchain.chat_models.gigachat import GigaChat chat = GigaChat( name="GigaChat-Pro", # <- HERE!!!!! ... ) ``` should actually work, as intended in [here](`804390ba4b/libs/community/langchain_community/llms/gigachat.py (L36)`). --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-29 16:08:26 +00:00
Patrick McFadin	3331865f6b	community[minor]: add Cassandra Database Toolkit (#20246 ) Description: ToolKit and Tools for accessing data in a Cassandra Database primarily for Agent integration. Initially, this includes the following tools: - `cassandra_db_schema` Gathers all schema information for the connected database or a specific schema. Critical for the agent when determining actions. - `cassandra_db_select_table_data` Selects data from a specific keyspace and table. The agent can pass paramaters for a predicate and limits on the number of returned records. - `cassandra_db_query` Expiriemental alternative to `cassandra_db_select_table_data` which takes a query string completely formed by the agent instead of parameters. May be removed in future versions. Includes unit test and two notebooks to demonstrate usage. Dependencies: cassio Twitter handle: @PatrickMcFadin --------- Co-authored-by: Phil Miesle <phil.miesle@datastax.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-29 15:51:43 +00:00
Igor Brai	b3e74f2b98	community[minor]: add mojeek search util (#20922 ) Description: This pull request introduces a new feature to community tools, enhancing its search capabilities by integrating the Mojeek search engine Dependencies: None --------- Co-authored-by: Igor Brai <igor@mojeek.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-04-29 15:49:53 +00:00
Tomaz Bratanic	67428c4052	community[patch]: Neo4j enhanced schema (#20983 ) Scan the database for example values and provide them to an LLM for better inference of Text2cypher	2024-04-29 10:45:55 -04:00
Pengcheng Liu	1fad39be1c	community[minor]: Add LarkSuite wiki document loader. (#21016 ) Description: Add LarkSuite wiki document loader. Refer to [LarkSuite api document ](https://open.feishu.cn/document/server-docs/docs/wiki-v2/space-node/list)for details. Issue: None Dependencies: None Twitter handle: None	2024-04-29 10:37:50 -04:00
Leonid Ganeline	dc7c06bc07	community[minor]: import fix (#20995 ) Issue: When the third-party package is not installed, whenever we need to `pip install <package>` the ImportError is raised. But sometimes, the `ValueError` or `ModuleNotFoundError` is raised. It is bad for consistency. Change: replaced the `ValueError` or `ModuleNotFoundError` with `ImportError` when we raise an error with the `pip install <package>` message. Note: Ideally, we replace all `try: import... except... raise ... `with helper functions like `import_aim` or just use the existing [langchain_core.utils.utils.guard_import](https://api.python.langchain.com/en/latest/utils/langchain_core.utils.utils.guard_import.html#langchain_core.utils.utils.guard_import) But it would be much bigger refactoring. @baskaryan Please, advice on this.	2024-04-29 10:32:50 -04:00
WilliamEspegren	804390ba4b	community: Spider integration (#20937 ) Added the [Spider.cloud](https://spider.cloud) document loader. [Spider](https://github.com/spider-rs/spider) is the [fastest](https://github.com/spider-rs/spider/blob/main/benches/BENCHMARKS.md) and cheapest crawler that returns LLM-ready data. ``` - Description: Adds Spider data loader - Dependencies: spider-client - Twitter handle: @WilliamEspegren ``` --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: = <=> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-04-27 21:45:03 +00:00
Guilherme Zanotelli	f931a9ce60	community[patch]: Pass kwargs to SPARQLStore from RdfGraph (#20385 ) This introduces `store_kwargs` which behaves similarly to `graph_kwargs` on the `RdfGraph` object, which will enable users to pass `headers` and other arguments to the underlying `SPARQLStore` object. I have also made a [PR in `rdflib` to support passing `default_graph`](https://github.com/RDFLib/rdflib/pull/2761). Example usage: ```python from langchain_community.graphs import RdfGraph graph = RdfGraph( query_endpoint="http://localhost/sparql", standard="rdf", store_kwargs=dict( default_graph="http://example.com/mygraph" ) ) ``` <!--If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.--> --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-27 01:38:29 +00:00
Jorge Piedrahita Ortiz	40b2e2916b	community[minor]: Sambanova llm integration (#20955 ) - Description: Added [Sambanova systems](https://sambanova.ai/) integration, including sambaverse and sambastudio LLMs - Dependencies: sseclient-py (optional) --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-27 01:05:13 +00:00
Rahul Triptahi	955cf186d2	community[patch]: Ingest source, owner and full_path if present in Document's metadata. (#20949 ) Description: The PebbloSafeLoader should first check for owner, full_path and size in metadata before implementing its own logic. Dependencies: None Documentation: NA. Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	2024-04-26 17:50:57 -07:00
Amine Djeghri	790ea75cf7	community[minor]: add exllamav2 library for GPTQ & EXL2 models (#17817 ) Added 3 files : - Library : ExLlamaV2 - Test integration - Notebook --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-27 00:44:43 +00:00
Naveen Tatikonda	8bbdb4f6a0	community[patch]: Add OpenSearch as semantic cache (#20254 ) ### Description Use OpenSearch vector store as Semantic Cache. ### Twitter Handle @OpenSearchProj --------- Signed-off-by: Naveen Tatikonda <navtat@amazon.com> Co-authored-by: Harish Tatikonda <harishtatikonda@Harishs-MacBook-Air.local> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-31-155.ec2.internal> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-27 00:20:24 +00:00
Mayank Solanki	8c085fc697	community[patch]: Added a function `from_existing_collection` in `Qdrant` vector database. (#20779 ) Issue: #20514 The current implementation of `construct_instance` expects a `texts: List[str]` that will call the embedding function. This might not be needed when we already have a client with collection and `path, you don't want to add any text. This PR adds a class method that returns a qdrant instance with an existing client. Here everytime `cb6e5e56c2/libs/community/langchain_community/vectorstores/qdrant.py (L1592)` `construct_instance` is called, this line sends some text for embedding generation. --------- Co-authored-by: Anush <anushshetty90@gmail.com>	2024-04-26 15:34:09 -07:00
Leonid Kuligin	893a924b90	core[minor], community[patch], langchain[patch]: move BaseChatLoader to core (#19607 ) Thank you for contributing to LangChain! - [ ] PR title: "core: move BaseChatLoader and BaseToolkit from community" - [ ] PR message: move BaseChatLoader and BaseToolkit --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-26 21:45:51 +00:00
Dristy Srivastava	5f1d1666e3	community[patch]: Add support for pebblo server and client version (#20269 ) Description: _PebbloSafeLoader_: Add support for pebblo server and client version Documentation: NA Unit test: NA Issue: NA Dependencies: None --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-25 20:39:17 +00:00
am-kinetica	b54b19ba1c	community[minor]: Implemented Kinetica Document Loader and added notebooks (#20002 ) - [ ] Kinetica Document Loader: "community: a class to load Documents from Kinetica" - [ ] Kinetica Document Loader: - Description: implemented KineticaLoader in `kinetica_loader.py` - Dependencies: install the Kinetica API using `pip install gpudb==7.2.0.1 `	2024-04-25 13:39:00 -07:00
Shengsheng Huang	fd1061e7bf	community[patch]: add more data types support to ipex-llm llm integration (#20833 ) - Description: - add support for more data types: by default `IpexLLM` will load the model in int4 format. This PR adds more data types support such as `sym_in5`, `sym_int8`, etc. Data formats like NF3, NF4, FP4 and FP8 are only supported on GPU and will be added in future PR. - Fix a small issue in saving/loading, update api docs - Dependencies: `ipex-llm` library - Document: In `docs/docs/integrations/llms/ipex_llm.ipynb`, added instructions for saving/loading low-bit model. - Tests: added new test cases to `libs/community/tests/integration_tests/llms/test_ipex_llm.py`, added config params. - Contribution maintainer: @shane-huang	2024-04-25 12:58:18 -07:00
Rahul Triptahi	dc921f0823	community[patch]: Add semantic info to metadata, classified by pebblo-server. (#20468 ) Description: Add support for Semantic topics and entities. Classification done by pebblo-server is not used to enhance metadata of Documents loaded by document loaders. Dependencies: None Documentation: Updated. Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	2024-04-25 12:55:33 -07:00
Jingpan Xiong	1202017c56	community[minor]: Add relyt vector database (#20316 ) Co-authored-by: kaka <kaka@zbyte-inc.cloud> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: jingsi <jingsi@leadincloud.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-25 19:49:29 +00:00
davidefantiniIntel	f386f71bb3	community: fix tqdm import (#20263 ) Description: Fix tqdm import in QuantizedBiEncoderEmbeddings	2024-04-25 19:44:53 +00:00
Andres Algaba	05ae8ca7d4	community[patch]: deprecate persist method in Chroma (#20855 ) Thank you for contributing to LangChain! - [x] PR title - [x] PR message: - Description: Deprecate persist method in Chroma no longer exists in Chroma 0.4.x - Issue: #20851 - Dependencies: None - Twitter handle: AndresAlgaba1 - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-25 19:42:03 +00:00
Tomaz Bratanic	520972fd0f	community[patch]: Support passing graph object to Neo4j integrations (#20876 ) For driver connection reusage, we introduce passing the graph object to neo4j integrations	2024-04-25 11:30:22 -07:00
Lei Zhang	748a6ae609	community[patch]: add HTTP response headers Content-Type to metadata of RecursiveUrlLoader document (#20875 ) Description: The RecursiveUrlLoader loader offers a link_regex parameter that can filter out URLs. However, this filtering capability is limited, and if the internal links of the website change, unexpected resources may be loaded. These resources, such as font files, can cause problems in subsequent embedding processing. > https://blog.langchain.dev/assets/fonts/source-sans-pro-v21-latin-ext_latin-regular.woff2?v=0312715cbf We can add the Content-Type in the HTTP response headers to the document metadata so developers can choose which resources to use. This allows developers to make their own choices. For example, the following may be a good choice for text knowledge. - text/plain - simple text file - text/html - HTML web page - text/xml - XML format file - text/json - JSON format data - application/pdf - PDF file - application/msword - Word document and ignore the following - text/css - CSS stylesheet - text/javascript - JavaScript script - application/octet-stream - binary data - image/jpeg - JPEG image - image/png - PNG image - image/gif - GIF image - image/svg+xml - SVG image - audio/mpeg - MPEG audio files - video/mp4 - MP4 video file - application/font-woff - WOFF font file - application/font-ttf - TTF font file - application/zip - ZIP compressed file - application/octet-stream - binary data Twitter handle: @coolbeevip --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-25 11:29:41 -07:00
Joan Fontanals	baefbfb14e	community[mionr]: add Jina Reranker in retrievers module (#19406 ) - Description: Adapt JinaEmbeddings to run with the new Jina AI Rerank API - Twitter handle: https://twitter.com/JinaAI_ - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-25 10:27:10 -07:00
Jason_Chen	53bb7dbd29	community[patch]: add BeautifulSoupTransformer remove_unwanted_classnames method (#20467 ) Add the remove_unwanted_classnames method to the BeautifulSoupTransformer class, which can filter more effectively. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-25 17:04:04 +00:00
Bagatur	5b83130855	core[minor], langchain[patch], community[patch]: mv StructuredQuery (#20849 ) mv StructuredQuery to core	2024-04-25 09:40:26 -07:00
Mish Ushakov	6ccecf2363	community[minor]: added Browserbase loader (#20478 )	2024-04-25 01:11:03 +00:00
ccurme	481d3855dc	patch: remove usage of llm, chat model __call__ (#20788 ) - `llm(prompt)` -> `llm.invoke(prompt)` - `llm(prompt=prompt` -> `llm.invoke(prompt)` (same with `messages=`) - `llm(prompt, callbacks=callbacks)` -> `llm.invoke(prompt, config={"callbacks": callbacks})` - `llm(prompt, kwargs)` -> `llm.invoke(prompt, kwargs)`	2024-04-24 19:39:23 -04:00
Raghav Dixit	9b7fb381a4	community[patch]: LanceDB integration patch update (#20686 ) Description : - added functionalities - delete, index creation, using existing connection object etc. - updated usage - Added LaceDB cloud OSS support make lint_diff , make test checks done	2024-04-24 16:27:43 -07:00
volodymyr-memsql	493afe4d8d	community[patch]: add hybrid search to singlestoredb vectorstore (#20793 ) Implemented the ability to enable full-text search within the SingleStore vector store, offering users a versatile range of search strategies. This enhancement allows users to seamlessly combine full-text search with vector search, enabling the following search strategies: * Search solely by vector similarity. * Conduct searches exclusively based on text similarity, utilizing Lucene internally. * Filter search results by text similarity score, with the option to specify a threshold, followed by a search based on vector similarity. * Filter results by vector similarity score before conducting a search based on text similarity. * Perform searches using a weighted sum of vector and text similarity scores. Additionally, integration tests have been added to comprehensively cover all scenarios. Updated notebook with examples. CC: @baskaryan, @hwchase17 --------- Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-24 21:34:50 +00:00
Tomaz Bratanic	9efab3ed66	community[patch]: Add driver config param for neo4j graph (#20772 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-24 21:14:41 +00:00
Leonid Ganeline	13751c3297	community: `tigergraph` fixes (#20034 ) - added guard on the `pyTigerGraph` import - added a missed example page in the `docs/integrations/graphs/` - formatted the `docs/integrations/providers/` page to the consistent format. Added links.	2024-04-24 16:49:21 -04:00
Martin Kolb	0186e4e633	community[patch]: Advanced filtering for HANA Cloud Vector Engine (#20821 ) - Description: This PR adds support for advanced filtering to the integration of HANA Vector Engine. The newly supported filtering operators are: $eq, $ne, $gt, $gte, $lt, $lte, $between, $in, $nin, $like, $and, $or - Issue: N/A - Dependencies: no new dependencies added Added integration tests to: `libs/community/tests/integration_tests/vectorstores/test_hanavector.py` Description of the new capabilities in notebook: `docs/docs/integrations/vectorstores/hanavector.ipynb`	2024-04-24 13:47:27 -07:00
Alex Sherstinsky	12e5ec6de3	community: Support both Predibase SDK-v1 and SDK-v2 in Predibase-LangChain integration (#20859 )	2024-04-24 13:31:01 -07:00
JeffKatzy	5ab3f9a995	community[patch]: standardize chat init args (#20844 ) Thank you for contributing to LangChain! community:perplexity[patch]: standardize init args updated pplx_api_key and request_timeout so that aliased to api_key, and timeout respectively. Added test that both continue to set the same underlying attributes. Related to [20085](https://github.com/langchain-ai/langchain/issues/20085) --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-24 12:26:05 -07:00
Massimiliano Pronesti	8d1167b32f	community[patch]: add support for similarity_score_threshold search in… (#20852 ) See https://github.com/langchain-ai/langchain/issues/20600#issuecomment-2075569338 for details. @chrislrobert	2024-04-24 19:14:33 +00:00
Eugene Yurtsev	30e48c9878	core[patch],community[patch]: Move file chat history back to community (#20834 ) Marking as patch since we haven't had releases in between. This just reverting part of a PR from yesterday.	2024-04-24 12:47:25 -04:00
Nestor Qin	9111d3a636	community[patch]: Fix message formatting for Anthropic models on Amazon Bedrock (#20801 ) Description: This PR fixes an issue in message formatting function for Anthropic models on Amazon Bedrock. Currently, LangChain BedrockChat model will crash if it uses Anthropic models and the model return a message in the following type: - `AIMessageChunk` Moreover, when use BedrockChat with for building Agent, the following message types will trigger the same issue too: - `HumanMessageChunk` - `FunctionMessage` Issue: https://github.com/langchain-ai/langchain/issues/18831 Dependencies: No. Testing: Manually tested. The following code was failing before the patch and works after. ``` @tool def square_root(x: str): "Useful when you need to calculate the square root of a number" return math.sqrt(int(x)) llm = ChatBedrock( model_id="anthropic.claude-3-sonnet-20240229-v1:0", model_kwargs={ "temperature": 0.0 }, ) prompt = ChatPromptTemplate.from_messages( [ ("system", FUNCTION_CALL_PROMPT), ("human", "Question: {user_input}"), MessagesPlaceholder(variable_name="agent_scratchpad"), ] ) tools = [square_root] tools_string = format_tool_to_anthropic_function(square_root) agent = ( RunnablePassthrough.assign( user_input=lambda x: x['user_input'], agent_scratchpad=lambda x: format_to_openai_function_messages( x["intermediate_steps"] ) ) \| prompt \| llm \| AnthropicFunctionsAgentOutputParser() ) agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, return_intermediate_steps=True) output = agent_executor.invoke({ "user_input": "What is the square root of 2?", "tools_string": tools_string, }) ``` List of messages returned from Bedrock: ``` <SystemMessage> content='You are a helpful assistant.' <HumanMessage> content='Question: What is the square root of 2?' <AIMessageChunk> content="Okay, let's calculate the square root of 2.<scratchpad>\nTo calculate the square root of a number, I can use the square_root tool:\n\n<function_calls>\n <invoke>\n <tool_name>square_root</tool_name>\n <parameters>\n <__arg1>2</__arg1>\n </parameters>\n </invoke>\n</function_calls>\n</scratchpad>\n\n<function_results>\n<search_result>\nThe square root of 2 is approximately 1.414213562373095\n</search_result>\n</function_results>\n\n<answer>\nThe square root of 2 is approximately 1.414213562373095\n</answer>" id='run-92363df7-eff6-4849-bbba-fa16a1b2988c'" <FunctionMessage> content='1.4142135623730951' name='square_root' ```	2024-04-23 22:40:39 +00:00
Aliaksandr Kuzmik	5560cc448c	community[patch]: fix CometTracer bug (#20796 ) Hi! My name is Alex, I'm an SDK engineer from [Comet](https://www.comet.com/site/) This PR updates the `CometTracer` class. Fixed an issue when `CometTracer` failed while logging the data to Comet because this data is not JSON-encodable. The problem was in some of the `Run` attributes that could contain non-default types inside, now these attributes are taken not from the run instance, but from the `run.dict()` return value.	2024-04-23 13:24:41 -04:00
Eugene Yurtsev	645b1e142e	core[minor],langchain[patch],community[patch]: Move InMemory and File implementations of Chat History to core (#20752 ) This PR moves the implementations for chat history to core. So it's easier to determine which dependencies need to be broken / add deprecation warnings	2024-04-23 10:22:11 -04:00
ccurme	7a922f3e48	core, openai: support custom token encoders (#20762 )	2024-04-23 13:57:05 +00:00
Christophe Bornet	0ae5027d98	community[patch]: Remove usage of deprecated StoredBlobHistory in CassandraChatMessageHistory (#20666 )	2024-04-22 17:11:05 -04:00
Eugene Yurtsev	38adbfdf34	community[patch],core[minor]: Move BaseToolKit to core.tools (#20669 )	2024-04-22 14:04:30 -04:00
Mark Needham	ce23f8293a	Community patch clickhouse make it possible to not specify index (#20460 ) Vector indexes in ClickHouse are experimental at the moment and can sometimes break/change behaviour. So this PR makes it possible to say that you don't want to specify an index type. Any queries against the embedding column will be brute force/linear scan, but that gives reasonable performance for small-medium dataset sizes. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-22 10:46:37 -07:00
ccurme	c010ec8b71	patch: deprecate (a)get_relevant_documents (#20477 ) - `.get_relevant_documents(query)` -> `.invoke(query)` - `.get_relevant_documents(query=query)` -> `.invoke(query)` - `.get_relevant_documents(query, callbacks=callbacks)` -> `.invoke(query, config={"callbacks": callbacks})` - `.get_relevant_documents(query, kwargs)` -> `.invoke(query, kwargs)` --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-04-22 11:14:53 -04:00
Matheus Henrique Raymundo	bb69819267	community: Fix the stop sequence key name for Mistral in Bedrock (#20709 ) Fixing the wrong stop sequence key name that causes an error on AWS Bedrock. You can check the MistralAI bedrock parameters [here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-mistral.html) This change fixes this [issue](https://github.com/langchain-ai/langchain/issues/20095)	2024-04-21 20:06:06 -04:00
Bagatur	1c7b3c75a7	community[patch], experimental[patch]: support tool-calling sql and p… (#20639 ) d agents	2024-04-21 15:43:09 -07:00
shumway743	cb6e5e56c2	community[minor]: add graph store implementation for apache age (#20582 ) Description: implemented GraphStore class for Apache Age graph db Dependencies: depends on psycopg2 Unit and integration tests included. Formatting and linting have been run. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-20 14:31:04 -07:00
Christophe Bornet	c909ae0152	community[minor]: Add async methods to CassandraVectorStore (#20602 ) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-04-20 02:09:58 +00:00
Dmitry Tyumentsev	f111efeb6e	community[patch]: YandexGPT API add ability to disable request logging (#20670 ) Closes (#20622) Added the ability to [disable logging of requests to YandexGPT](https://yandex.cloud/en/docs/foundation-models/operations/yandexgpt/disable-logging).	2024-04-19 21:40:37 -04:00
Tomaz Bratanic	8c08cf4619	community: Add support for relationship indexes in neo4j vector (#20657 ) Neo4j has added relationship vector indexes. We can't populate them, but we can use existing indexes for retrieval	2024-04-19 11:22:42 -07:00
Charlie Holtz	1cbab0ebda	community: update Replicate to work with official models (#20633 ) Description: you don't need to pass a version for Replicate official models. That was broken on LangChain until now! You can now run: ``` llm = Replicate( model="meta/meta-llama-3-8b-instruct", model_kwargs={"temperature": 0.75, "max_length": 500, "top_p": 1}, ) prompt = """ User: Answer the following yes/no question by reasoning step by step. Can a dog drive a car? Assistant: """ llm(prompt) ``` I've updated the replicate.ipynb to reflect that. twitter: @charliebholtz --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-04-19 01:43:40 +00:00
Congyu	dd5139e304	community[patch]: truncate zhipuai `temperature` and `top_p` parameters to [0.01, 0.99] (#20261 ) ZhipuAI API only accepts `temperature` parameter between `(0, 1)` open interval, and if `0` is passed, it responds with status code `400`. However, 0 and 1 is often accepted by other APIs, for example, OpenAI allows `[0, 2]` for temperature closed range. This PR truncates temperature parameter passed to `[0.01, 0.99]` to improve the compatibility between langchain's ecosystem's and ZhipuAI (e.g., ragas `evaluate` often generates temperature 0, which results in a lot of 400 invalid responses). The PR also truncates `top_p` parameter since it has the same restriction. Reference: [glm-4 doc](https://open.bigmodel.cn/dev/api#glm-4) (which unfortunately is in Chinese though). --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-19 01:31:30 +00:00
Lance Martin	d5c22b80a5	community[patch]: Fix Ollama for LLaMA3 (#20624 ) We see verbose generations w/ LLaMA3 and Ollama - https://smith.langchain.com/public/88c4cd21-3d57-4229-96fe-53443398ca99/r --- Fix here implies that when stop was being set to an empty list, the stream had no conditions under which to stop, which could lead to excessive or unintended output. Test LLaMA2 - https://smith.langchain.com/public/57dfc64a-591b-46fa-a1cd-8783acaefea2/r Test LLaMA3 - https://smith.langchain.com/public/76ff5f47-ac89-4772-a7d2-5caa907d3fd6/r https://smith.langchain.com/public/a31d2fad-9094-4c93-949a-964b27630ccb/r Test Mistral - https://smith.langchain.com/public/a4fe7114-c308-4317-b9fd-6c86d31f1c5b/r --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-04-19 00:20:32 +00:00
hulitaitai	7d0a008744	community[minor]: Add audio-parser "faster-whisper" in audio.py (#20012 ) faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is up to 4 times faster than enai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU. It can automatically detect the following 14 languages and transcribe the text into their respective languages: en, zh, fr, de, ja, ko, ru, es, th, it, pt, vi, ar, tr. The gitbub repository for faster-whisper is : https://github.com/SYSTRAN/faster-whisper --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-04-18 20:50:59 +00:00
Guangdong Liu	e3c2431c5b	comminuty[patch]:Fix Error in apache doris insert (#19989 ) - Issue: #19886	2024-04-18 16:34:32 -04:00
Tomaz Bratanic	27370b679e	community[patch]: Ignore null and invalid embedding values for neo4j metadata filtering (#20558 )	2024-04-18 16:15:45 -04:00
Massimiliano Pronesti	2542a09abc	community[patch]: AzureSearch incorrectly converted to retriever (#20601 ) Closes #20600. Please see the issue for more details.	2024-04-18 16:06:47 -04:00
Christophe Bornet	8f0b5687a3	community[minor]: Add hybrid search to Cassandra VectorStore (#20286 ) Only supported by Astra DB at the moment. Twitter handle: cbornet_	2024-04-18 15:58:43 -04:00
Christophe Bornet	d2d01370bc	community[minor]: Add async methods to CassandraLoader (#20609 ) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-04-18 19:45:20 +00:00
balloonio	e786da7774	community[patch]: Invoke callback prior to yielding token fix [HuggingFaceTextGenInference] (#20426 ) …gFaceTextGenInference) - [x] PR title: community[patch]: Invoke callback prior to yielding token fix for [HuggingFaceTextGenInference] - [x] PR message: - Description: Invoke callback prior to yielding token in stream method in [HuggingFaceTextGenInference] - Issue: https://github.com/langchain-ai/langchain/issues/16913 - Dependencies: None - Twitter handle: @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-04-18 14:25:20 +00:00
Ethan Yang	2d6d796040	community: Add save_model function for openvino reranker and embedding (#19896 )	2024-04-18 10:20:33 -04:00
zR	9c1d7f2405	update zhipuai notebook (#20595 ) fix timeout issue fix zhipuai usecase notebookbook Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	2024-04-18 10:12:12 -04:00
ccurme	c897264b9b	community: (milvus) check for num_shards (#20603 ) @rgupta2508 I believe this change is necessary following https://github.com/langchain-ai/langchain/pull/20318 because of how Milvus handles defaults: `59bf5e811a/pymilvus/client/prepare.py (L82-L85)` ```python num_shards = kwargs[next(iter(same_key))] if not isinstance(num_shards, int): msg = f"invalid num_shards type, got {type(num_shards)}, expected int" raise ParamError(message=msg) req.shards_num = num_shards ``` this way lets Milvus control the default value (instead of maintaining a separate default in Langchain). Let me know if I've got this wrong or you feel it's unnecessary. Thanks.	2024-04-18 09:44:56 -04:00
Rohit Gupta	25c4c24e89	Support to create shards_num in milvus vectorstores (#20318 ) To support number of the shards for the collection to create in milvus vvectorstores. Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	2024-04-18 08:58:00 -04:00
Erick Friis	e395115807	docs: aws docs updates (#20571 )	2024-04-17 23:32:00 +00:00
Erick Friis	f09bd0b75b	upstage: init package (#20574 ) Co-authored-by: Sean Cho <sean@upstage.ai> Co-authored-by: JuHyung-Son <sonju0427@gmail.com>	2024-04-17 23:25:36 +00:00
Marco Perini	11c9ed3362	community[patch]: exposing headless flag parameter to AsyncChromiumLoader class (#20424 ) - Description: added the headless parameter as optional argument to the langchain_community.document_loaders AsyncChromiumLoader class - Dependencies: None - Twitter handle: @perinim_98 If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-17 16:00:28 -07:00
Christophe Bornet	a22da4315b	community[patch]: Replace function in CassandraVectorStore with simpler lambda (#20323 )	2024-04-17 17:13:13 -04:00
Christophe Bornet	75733c5cc1	community[minor]: Improve CassandraVectorStore from_texts (#20284 )	2024-04-17 17:12:28 -04:00
Tomer Cagan	463160c3f6	community: fix `DirectoryLoader` progress bar (#19821 ) Description: currently, the `DirectoryLoader` progress-bar maximum value is based on an incorrect number of files to process In langchain_community/document_loaders/directory.py:127: ```python paths = p.rglob(self.glob) if self.recursive else p.glob(self.glob) items = [ path for path in paths if not (self.exclude and any(path.match(glob) for glob in self.exclude)) ] ``` `paths` returns both files and directories. `items` is later used to determine the maximum value of the progress-bar which gives an incorrect progress indication.	2024-04-17 21:12:16 +00:00
Pengcheng Liu	ecd19a9e58	community[patch]: Add function call support in Tongyi chat model. (#20119 ) - [ ] PR message: - Description: This pr adds function calling support in Tongyi chat model. - Issue: None - Dependencies: None - Twitter handle: None Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-17 20:42:23 +00:00
kaijietti	80679ab906	zep[patch]: implement add_messages and aadd_messages (#20099 ) This PR implement `add_messages` and `aadd_messages` to avoid unnecessary round-trips.	2024-04-17 13:40:24 -07:00
Sevin F. Varoglu	3f156e0ece	community[minor]: add ChatOctoAI (#20059 ) This PR adds ChatOctoAI, a chat model integration for OctoAI.	2024-04-17 03:20:56 -07:00
Eun Hye Kim	b34f1086fe	community[patch]: Add streaming logic in ChatHuggingFace (#18784 ) - Add functions (_stream, _astream) - Connect to _generate and _agenerate Thank you for contributing to LangChain! - [x] PR title: "community: Add streaming logic in ChatHuggingFace" - [x] PR message: *Delete this entire checklist* and replace with - Description: Addition functions (_stream, _astream) and connection to _generate and _agenerate - Issue: #18782 - Dependencies: none - Twitter handle: @lunara_x	2024-04-16 19:17:03 -07:00
pjb157	479be3cc91	community[minor]: Unify Titan Takeoff Integrations and Adding Embedding Support (#18775 ) Community: Unify Titan Takeoff Integrations and Adding Embedding Support Description: Titan Takeoff no longer reflects this either of the integrations in the community folder. The two integrations (TitanTakeoffPro and TitanTakeoff) where causing confusion with clients, so have moved code into one place and created an alias for backwards compatibility. Added Takeoff Client python package to do the bulk of the work with the requests, this is because this package is actively updated with new versions of Takeoff. So this integration will be far more robust and will not degrade as badly over time. Issue: Fixes bugs in the old Titan integrations and unified the code with added unit test converge to avoid future problems. Dependencies: Added optional dependency takeoff-client, all imports still work without dependency including the Titan Takeoff classes but just will fail on initialisation if not pip installed takeoff-client Twitter @MeryemArik9 Thanks all :) --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-17 01:43:35 +00:00
Rahul Triptahi	2cbfc94bcb	community[patch]: Add support for authorized identities in PebbloSafeLoader. (#20055 ) Description: Add support for authorized identities in PebbloSafeLoader. Now with this change, PebbloSafeLoader will extract authorized_identities from metadata and send it to pebblo server Dependencies: None Documentation: None Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	2024-04-16 18:34:06 -07:00
Guangdong Liu	b78ede2f96	community[patch]: standardize init args (#20166 ) Related to https://github.com/langchain-ai/langchain/issues/20085 @baskaryan	2024-04-16 18:30:26 -07:00
Guangdong Liu	3729bec1a2	community[patch]: standardize init args (#20210 ) Related to https://github.com/langchain-ai/langchain/issues/20085 @baskaryan	2024-04-16 18:29:57 -07:00
sdan	a7c5e41443	community[minor]: Added VLite as VectorStore (#20245 ) Support [VLite](https://github.com/sdan/vlite) as a new VectorStore type. Description: vlite is a simple and blazing fast vector database(vdb) made with numpy. It abstracts a lot of the functionality around using a vdb in the retrieval augmented generation(RAG) pipeline such as embeddings generation, chunking, and file processing while still giving developers the functionality to change how they're made/stored. Before submitting: Added tests [here](`c09c2ebd5c/libs/community/tests/integration_tests/vectorstores/test_vlite.py`) Added ipython notebook [here](`c09c2ebd5c/docs/docs/integrations/vectorstores/vlite.ipynb`) Added simple docs on how to use [here](`c09c2ebd5c/docs/docs/integrations/providers/vlite.mdx`) Profiles Maintainers: @sdan Twitter handles: [@sdand](https://x.com/sdand) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-17 01:24:38 +00:00
Hyeongchan Kim	7824291252	community[patch]: Fix not to cast to str type when `file_path` is None (#20057 ) From `langchain_community 0.0.30`, there's a bug that cannot send a file-like object via `file` parameter instead of `file path` due to casting the `file_path` to str type even if `file_path` is None. which means that when I call the `partition_via_api()`, exactly one of `filename` and `file` must be specified by the following error message. however, from `langchain_community 0.0.30`, `file_path` is casted into `str` type even `file_path` is None in `get_elements_from_api()` and got an error at `exactly_one(filename=filename, file=file)`. here's an error message ``` ---> 51 exactly_one(filename=filename, file=file) 53 if metadata_filename and file_filename: 54 raise ValueError( 55 "Only one of metadata_filename and file_filename is specified. " 56 "metadata_filename is preferred. file_filename is marked for deprecation.", 57 ) File /opt/homebrew/lib/python3.11/site-packages/unstructured/partition/common.py:441, in exactly_one(**kwargs) 439 else: 440 message = f"{names[0]} must be specified." --> 441 raise ValueError(message) ValueError: Exactly one of filename and file must be specified. ``` So, I simply made a change that casting to str type when `file_path` is not None. I use `UnstructuredAPIFileLoader` like below. ``` from langchain_community.document_loaders.unstructured import UnstructuredAPIFileLoader documents: list = UnstructuredAPIFileLoader( file_path=None, file=file, # file-like object, io.BytesIO type mode='elements', url='http://127.0.0.1:8000/general/v0/general', content_type='application/pdf', metadata_filename='asdf.pdf', ).load_and_split() ```	2024-04-16 18:06:21 -07:00
MacanPN	bce69ae43d	community[patch]: Changes to base_o365 and sharepoint document loaders (#20373 ) ## Description: The PR introduces 3 changes: 1. added `recursive` property to `O365BaseLoader`. (To keep the behavior unchanged, by default is set to `False`). When `recursive=True`, `_load_from_folder()` also recursively loads all nested folders. 2. added `folder_id` to SharePointLoader.(similar to (this PR)[https://github.com/langchain-ai/langchain/pull/10780] ) This provides an alternative to `folder_path` that doesn't seem to reliably work. 3. when none of `document_ids`, `folder_id`, `folder_path` is provided, the loader fetches documets from root folder. Combined with `recursive=True` this provides an easy way of loading all compatible documents from SharePoint. The PR contains the same logic as [this stale PR](https://github.com/langchain-ai/langchain/pull/10780) by @WaleedAlfaris. I'd like to ask his blessing for moving forward with this one. ## Issue: - As described in https://github.com/langchain-ai/langchain/issues/19938 and https://github.com/langchain-ai/langchain/pull/10780 the sharepoint loader often does not seem to work with folder_path. - Recursive loading of subfolders is a missing functionality ## Dependecies: None Twitter handle: @martintriska1 @WRhetoric This is my first PR here, please be gentle :-) Please review @baskaryan --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-17 00:36:15 +00:00
Sevin F. Varoglu	54d388d898	community[patch]: update OctoAI endpoint to subclass BaseOpenAI (#19757 ) This PR updates OctoAIEndpoint LLM to subclass BaseOpenAI as OctoAI is an OpenAI-compatible service. The documentation and tests have also been updated.	2024-04-16 17:32:20 -07:00
Benito Geordie	57b226532d	community[minor]: Added integrations for ThirdAI's NeuralDB as a Retriever (#17334 ) Description: Adds ThirdAI NeuralDB retriever integration. NeuralDB is a CPU-friendly and fine-tunable text retrieval engine. We previously added a vector store integration but we think that it will be easier for our customers if they can also find us under under langchain-community/retrievers. --------- Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com> Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>	2024-04-16 16:36:55 -07:00
WeichenXu	e9fc87aab1	community[patch]: Make ChatDatabricks model supports streaming response (#19912 ) Description: Make ChatDatabricks model supports stream Issue: N/A Dependencies: MLflow nightly build version (we will release next MLflow version soon) Twitter handle: N/A Manually test: (Before testing, please install `pip install git+https://github.com/mlflow/mlflow.git`) ```python # Test Databricks Foundation LLM model from langchain.chat_models import ChatDatabricks chat_model = ChatDatabricks( endpoint="databricks-llama-2-70b-chat", max_tokens=500 ) from langchain_core.messages import AIMessageChunk for chunk in chat_model.stream("What is mlflow?"): print(chunk.content, end="\|") ``` - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Signed-off-by: Weichen Xu <weichen.xu@databricks.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-16 23:34:49 +00:00
Dhruv Chawla	d6d559d50d	community[minor]: add UpTrainCallbackHandler (#19956 ) - Description: This PR adds a callback handler for UpTrain. It performs evaluations in the RAG pipeline to check the quality of retrieved documents, generated queries and responses. - Dependencies: - The UpTrainCallbackHandler requires the uptrain package --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>	2024-04-16 19:32:03 +00:00
Ravindu Somawansa	5acc7ba622	community[minor]: Add glue catalog loader (#20220 ) Add Glue Catalog loader	2024-04-16 11:39:23 -04:00
Martín Gotelli Ferenaz	b48add4353	community[patch]: Fix pgvector deprecated filter clause usage with OR and AND conditions (#20446 ) Description: Support filter by OR and AND for deprecated PGVector version Issue: #20445 Dependencies: N/A Twitter handle: @martinferenaz	2024-04-16 14:08:07 +00:00
Eugene Yurtsev	c50099161b	community[patch]: Use uuid4 not uuid1 (#20487 ) Using UUID1 is incorrect since it's time dependent, which makes it easy to generate the exact same uuid	2024-04-16 09:40:44 -04:00
Leonid Kuligin	676c68d318	community[patch]: deprecating remaining google_community integrations (#20471 ) Deprecating remaining google community integrations	2024-04-15 09:57:12 -04:00
balloonio	b66a4f48fa	community[patch]: Invoke callback prior to yielding token fix [DeepInfra] (#20427 ) - [x] PR title: community[patch]: Invoke callback prior to yielding token fix for [DeepInfra] - [x] PR message: - Description: Invoke callback prior to yielding token in stream method in [DeepInfra] - Issue: https://github.com/langchain-ai/langchain/issues/16913 - Dependencies: None - Twitter handle: @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	2024-04-14 14:32:52 -04:00
Juan Carlos José Camacho	450c458f8f	community[minor]: Add Datahareld tool (#19680 ) Description: Integrate [dataherald](https://www.dataherald.com) tool, It is a natural language-to-SQL tool. Dependencies: Install dataherald sdk to use it, ``` pip install dataherald ``` --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Christophe Bornet <cbornet@hotmail.com>	2024-04-13 23:27:16 +00:00
Egor Krasheninnikov	c8391d4ff1	community[patch]: Fix YandexGPT embeddings (#19720 ) Fix of YandexGPT embeddings. The current version uses a single `model_name` for queries and documents, essentially making the `embed_documents` and `embed_query` methods the same. Yandex has a different endpoint (`model_uri`) for encoding documents, see [this](https://yandex.cloud/en/docs/yandexgpt/concepts/embeddings). The bug may impact retrievers built with `YandexGPTEmbeddings` (for instance FAISS database as retriever) since they use both `embed_documents` and `embed_query`. A simple snippet to test the behaviour: ```python from langchain_community.embeddings.yandex import YandexGPTEmbeddings embeddings = YandexGPTEmbeddings() q_emb = embeddings.embed_query('hello world') doc_emb = embeddings.embed_documents(['hello world', 'hello world']) q_emb == doc_emb[0] ``` The response is `True` with the current version and `False` with the changes I made. Twitter: @egor_krash --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-13 16:23:01 -07:00
Guangdong Liu	4be7ca7b4c	community[patch]:sparkllm standardize init args (#20194 ) Related to https://github.com/langchain-ai/langchain/issues/20085 @baskaryan	2024-04-13 16:03:19 -07:00
Yuki Oshima	0758da8940	community[patch]: Set default value for _ListSQLDatabaseToolInput tool_input (#20409 ) Description: `_ListSQLDatabaseToolInput` raise error if model returns `{}`. For example, gpt-4-turbo returns `{}` with SQL Agent initialized by `create_sql_agent`. So, I set default value `""` for `_ListSQLDatabaseToolInput` tool_input. This is actually a gpt-4-turbo issue, not a LangChain issue, but I thought it would be helpful to set a default value `""`. This problem is discussed in detail in the following Issue. Issue: https://github.com/langchain-ai/langchain/issues/20405 Dependencies: none Sorry, I did not add or change the test code, as tests for this components was not exist . However, I have tested the following code based on the [SQL Agent Document](https://python.langchain.com/docs/use_cases/sql/agents/), to make sure it works. ``` from langchain_community.agent_toolkits.sql.base import create_sql_agent from langchain_community.utilities.sql_database import SQLDatabase from langchain_openai import ChatOpenAI db = SQLDatabase.from_uri("sqlite:///Chinook.db") llm = ChatOpenAI(model="gpt-4-turbo", temperature=0) agent_executor = create_sql_agent(llm, db=db, agent_type="openai-tools", verbose=True) result = agent_executor.invoke("List the total sales per country. Which country's customers spent the most?") print(result["output"]) ```	2024-04-13 15:58:47 -07:00
ccurme	38faa74c23	community[patch]: update use of deprecated llm methods (#20393 ) .predict and .predict_messages for BaseLanguageModel and BaseChatModel	2024-04-12 17:28:23 -04:00
Corey Zumar	3a068b26f3	community[patch]: Databricks - fix scope of dangerous deserialization error in Databricks LLM connector (#20368 ) fix scope of dangerous deserialization error in Databricks LLM connector --------- Signed-off-by: dbczumar <corey.zumar@databricks.com>	2024-04-12 17:27:26 -04:00
balloonio	e7b1a44c5b	community[patch]: Invoke callback prior to yielding token fix for Llamafile (#20365 ) - [x] PR title: community[patch]: Invoke callback prior to yielding token fix for Llamafile - [x] PR message: - Description: Invoke callback prior to yielding token in stream method in community llamafile.py - Issue: https://github.com/langchain-ai/langchain/issues/16913 - Dependencies: None - Twitter handle: @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	2024-04-12 19:26:12 +00:00
balloonio	93caa568f9	community[patch]: Invoke callback prior to yielding token fix for HuggingFaceEndpoint (#20366 ) - [x] PR title: community[patch]: Invoke callback prior to yielding token fix for HuggingFaceEndpoint - [x] PR message: - Description: Invoke callback prior to yielding token in stream method in community HuggingFaceEndpoint - Issue: https://github.com/langchain-ai/langchain/issues/16913 - Dependencies: None - Twitter handle: @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-04-12 19:16:34 +00:00
Nicolas	ad04585e30	community[minor]: Firecrawl.dev integration (#20364 ) Added the [FireCrawl](https://firecrawl.dev) document loader. Firecrawl crawls and convert any website into LLM-ready data. It crawls all accessible subpages and give you clean markdown for each. - Description: Adds FireCrawl data loader - Dependencies: firecrawl-py - Twitter handle: @mendableai ccing contributors: (@ericciarla @nickscamara) --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-12 19:13:48 +00:00
P. Taylor Goetz	9317df7f16	community[patch]: Add "model" attribute to the payload sent to Ollama in `ChatOllama` (#20354 ) Example Ollama API calls: Request without "model": ``` curl --location 'http://localhost:11434/api/chat' \ --header 'Content-Type: application/json' \ --data '{ "messages": [ { "role": "user", "content": "What is the capitol of PA?" } ], "stream": false }' ``` Response: ``` {"error":"model is required"} ``` Request with "model": ``` curl --location 'http://localhost:11434/api/chat' \ --header 'Content-Type: application/json' \ --data '{ "model": "openchat", "messages": [ { "role": "user", "content": "What is the capitol of PA?" } ], "stream": false }' ``` Response: ``` { "eval_duration" : 733248000, "created_at" : "2024-04-11T23:04:08.735766843Z", "model" : "openchat", "message" : { "content" : " The capital city of Pennsylvania is Harrisburg.", "role" : "assistant" }, "total_duration" : 3138731168, "prompt_eval_count" : 25, "load_duration" : 466562959, "done" : true, "prompt_eval_duration" : 1938495000, "eval_count" : 10 } ```	2024-04-12 13:32:53 -04:00
Alex Sherstinsky	fad0962643	community: for Predibase -- enable both Predibase-hosted and HuggingFace-hosted fine-tuned adapter repositories (#20370 )	2024-04-12 08:32:00 -07:00
Isak Nyberg	bac9fb9a7c	community: add gpt-4 pricing in callback (#20292 ) Added the pricing for `gpt-4-turbo` and `gpt-4-turbo-2024-04-09` in the callback method. related to issue #17173 https://openai.com/pricing#language-models	2024-04-11 18:02:39 -04:00
Leonid Ganeline	7cf2d2759d	community[patch]: docstrings update (#20301 ) Added missed docstrings. Format docstings to the consistent form.	2024-04-11 16:23:27 -04:00
Eugene Yurtsev	22fd844e8a	community[patch]: Add deprecation warnings to postgres implementation (#20222 ) Add deprecation warnings to postgres implementation that are in langchain-postgres.	2024-04-11 10:33:22 -04:00
Leonid Ganeline	4cb5f4c353	community[patch]: import flattening fix (#20110 ) This PR should make it easier for linters to do type checking and for IDEs to jump to definition of code. See #20050 as a template for this PR. - As a byproduct: Added 3 missed `test_imports`. - Added missed `SolarChat` in to __init___.py Added it into test_import ut. - Added `# type: ignore` to fix linting. It is not clear, why linting errors appear after ^ changes. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-04-10 13:01:19 -04:00
Chip Davis	806d4ae48f	community[patch]: fixed multithreading returning List[List[Documents]] instead of List[Documents] (#20230 ) Description: When multithreading is set to True and using the DirectoryLoader, there was a bug that caused the return type to be a double nested list. This resulted in other places upstream not being able to utilize the from_documents method as it was no longer a `List[Documents]` it was a `List[List[Documents]]`. The change made was to just loop through the `future.result()` and yield every item. Issue: #20093 Dependencies: N/A Twitter handle: N/A	2024-04-09 17:06:37 -04:00
seray	add31f46d0	community[patch]: OpenLLM Async Client Fixes and Timeout Parameter (#20007 ) Same changes as this merged [PR](https://github.com/langchain-ai/langchain/pull/17478) (https://github.com/langchain-ai/langchain/pull/17478), but for the async client, as the same issues persist. - Replaced 'responses' attribute of OpenLLM's GenerationOutput schema to 'outputs'. reference: `66de54eae7/openllm-core/src/openllm_core/_schemas.py (L135)` - Added timeout parameter for the async client. --------- Co-authored-by: Seray Arslan <seray.arslan@knime.com>	2024-04-09 16:34:56 -04:00
Erick Friis	37a9e23c05	community: switch to falkordb python client (#20229 )	2024-04-09 20:19:44 +00:00
David Lee	0394c6e126	community[minor]: add allow_dangerous_requests for OpenAPI toolkits (#19493 ) OpenAPI allow_dangerous_requests: community: add allow_dangerous_requests for OpenAPI toolkits Description: a description of the change Due to BaseRequestsTool changes, we need to pass allow_dangerous_requests manually. `b617085af0/libs/community/langchain_community/tools/requests/tool.py (L26-L46)` While OpenAPI toolkits didn't pass it in the arguments. `b617085af0/libs/community/langchain_community/agent_toolkits/openapi/planner.py (L262-L269)` Issue: the issue # it fixes, if applicable https://github.com/langchain-ai/langchain/issues/19440 If not passing allow_dangerous_requests, it won't be able to do requests. Dependencies: any dependencies required for this change Not much --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-04-09 17:14:02 +00:00
Timothy	0c848a25ad	community[patch]: GCSDirectoryLoader bugfix (#20005 ) - Description: Bug fix. Removed extra line in `GCSDirectoryLoader` to allow catching Exceptions. Now also logs the file path if Exception is raised for easier debugging. - Issue: #20198 Bug since langchain-community==0.0.31 - Dependencies: No change - Twitter handle: timothywong731 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-09 16:57:00 +00:00
jeff kit	ac42e96e4c	community[patch], langchain[minor]: Enhance Tencent Cloud VectorDB, langchain: make Tencent Cloud VectorDB self query retrieve compatible (#19651 ) - make Tencent Cloud VectorDB support metadata filtering. - implement delete function for Tencent Cloud VectorDB. - support both Langchain Embedding model and Tencent Cloud VDB embedding model. - Tencent Cloud VectorDB support filter search keyword, compatible with langchain filtering syntax. - add Tencent Cloud VectorDB TranslationVisitor, now work with self query retriever. - more documentations. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-09 16:50:48 +00:00
Bagatur	1a34c65e01	community[patch]: pass through sql agent kwargs (#19962 ) Fix #19961	2024-04-09 16:47:32 +00:00
Guangdong Liu	97d91ec17c	community[patch]: standardize baichuan init args (#20209 ) Related to https://github.com/langchain-ai/langchain/issues/20085 @baskaryan	2024-04-09 11:00:40 -05:00
Piyush Jain	cd7abc495a	community[minor]: add neptune analytics graph (#20047 ) Replacement for PR [#19772](https://github.com/langchain-ai/langchain/pull/19772). --------- Co-authored-by: Dave Bechberger <dbechbe@amazon.com> Co-authored-by: bechbd <bechbd@users.noreply.github.com>	2024-04-09 09:20:59 -05:00
Shuqian	ad9750403b	community[minor]: add bedrock anthropic callback for token usage counting (#19864 ) Description: add bedrock anthropic callback for token usage counting, consulted openai callback. --------- Co-authored-by: Massimiliano Pronesti <massimiliano.pronesti@gmail.com>	2024-04-09 09:18:48 -05:00
Prince Canuma	1f9f4d8742	community[minor]: Add support for MLX models (chat & llm) (#18152 ) Description: This PR adds support for MLX models both chat (i.e., instruct) and llm (i.e., pretrained) types/ Dependencies: mlx, mlx_lm, transformers Twitter handle: @Prince_Canuma --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-09 14:17:07 +00:00
Leonid Ganeline	2f8dd1a161	community[patch]: `cross_encoders` flatten namespaces (#20183 ) Issue `langchain_community.cross_encoders` didn't have flattening namespace code in the __init__.py file. Changes: - added code to flattening namespaces (used #20050 as a template) - added ut for a change - added missed `test_imports` for `chat_loaders` and `chat_message_histories` modules	2024-04-08 20:50:23 -04:00
Alex Sherstinsky	5f563e040a	community: extend Predibase integration to support fine-tuned LLM adapters (#19979 ) - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: Langchain-Predibase integration was failing, because it was not current with the Predibase SDK; in addition, Predibase integration tests were instantiating the Langchain Community `Predibase` class with one required argument (`model`) missing. This change updates the Predibase SDK usage and fixes the integration tests. - Twitter handle: `@alexsherstinsky` - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-04-08 18:54:29 +00:00
Bagatur	5ae0e687b3	docs: use standard openai params (#20160 ) Part of #20085	2024-04-08 10:56:53 -05:00
david02871	e1a24d09c5	community: Add PHP language parser to document_loaders (#19850 ) Description: Added a PHP language parser to document_loaders Issue: N/A Dependencies: N/A Twitter handle: N/A --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-04-08 11:30:28 -04:00
Marlene	2f03bc397e	Community: Updating Azure Retriever and Docs to be Azure AI Search instead of Azure Cognitive Search (#19925 ) Last year Microsoft [changed the name](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) of Azure Cognitive Search to Azure AI Search. This PR updates the Langchain Azure Retriever API and it's associated docs to reflect this change. It may be confusing for users to see the name Cognitive here and AI in the Microsoft documentation which is why this is needed. I've also added a more detailed example to the Azure retriever doc page. There are more places that need a similar update but I'm breaking it up so the PRs are not too big 😄 Fixing my errors from the previous PR. Twitter: @marlene_zw Two new tests added to test backward compatibility in `libs/community/tests/integration_tests/retrievers/test_azure_cognitive_search.py` --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-04-08 11:12:41 -04:00
Rahul Triptahi	820b713086	community[minor]: Add support for Pebblo cloud_api_key in PebbloSafeLoader (#19855 ) Description: _PebbloSafeLoader_: Add support for pebblo's cloud api-key in PebbloSafeLoader - This Pull request enables PebbloSafeLoader to accept pebblo's cloud api-key and send the semantic classification data to pebblo cloud. Documentation: Updated Unit test: Added Issue: NA Dependencies: - None Twitter handle: @rahul_tripathi2 Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	2024-04-08 11:10:04 -04:00
Eugene Yurtsev	520ff50adc	community[patch]: Improve import callbacks to make it IDE friendly (#20050 ) * declares __all__ as a list of strings (instead of dynamically computing it) * import type definitions when TYPE_CHECKING is true	2024-04-05 15:17:51 -04:00
Leonid Ganeline	3aacd11846	community[minor]: added missed class to __all__ (#19888 ) Added missed `UnstructuredCHMLoader` class to the document_loader.\_\_init\_\_.py \_\_all\_\_	2024-04-04 16:16:51 -04:00
Tomaz Bratanic	df25829f33	community[minor]: Add metadata filtering support for neo4j vector (#20001 )	2024-04-04 11:37:06 -04:00
Ben Mitchell	b52b78478f	community[minor]: Implement Async OpenSearch `afrom_texts` & `afrom_embeddings` (#20009 ) - Description: Adds async variants of afrom_texts and afrom_embeddings into `OpenSearchVectorSearch`, which allows for `afrom_documents` to be called. - Issue: I implemented this because my use case involves an async scraper generating documents as and when they're ready to be ingested by Embedding/OpenSearch - Dependencies: None that I'm aware Co-authored-by: Ben Mitchell <b.mitchell@reply.com>	2024-04-04 15:36:14 +00:00
happy-go-lucky	c6432abdbe	community[patch]: Implement delete method and all async methods in opensearch_vector_search (#17321 ) - Description: In order to use index and aindex in libs/langchain/langchain/indexes/_api.py, I implemented delete method and all async methods in opensearch_vector_search - Dependencies: No changes	2024-04-03 09:40:49 -07:00
Cheng, Penghui	cc407e8a1b	community[minor]: weight only quantization with intel-extension-for-transformers. (#14504 ) Support weight only quantization with intel-extension-for-transformers. [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers) is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor [Sapphire Rapids](https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html) (codenamed Sapphire Rapids). The toolkit provides the below key features: * Seamless user experience of model compressions on Transformer-based models by extending [Hugging Face transformers](https://github.com/huggingface/transformers) APIs and leveraging [Intel® Neural Compressor](https://github.com/intel/neural-compressor) * Advanced software optimizations and unique compression-aware runtime. * Optimized Transformer-based model packages. * [NeuralChat](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat), a customizable chatbot framework to create your own chatbot within minutes by leveraging a rich set of plugins and SOTA optimizations. * [Inference](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/llm/runtime/graph) of Large Language Model (LLM) in pure C/C++ with weight-only quantization kernels. This PR is an integration of weight only quantization feature with intel-extension-for-transformers. Unit test is in lib/langchain/tests/integration_tests/llm/test_weight_only_quantization.py The notebook is in docs/docs/integrations/llms/weight_only_quantization.ipynb. The document is in docs/docs/integrations/providers/weight_only_quantization.mdx. --------- Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-03 16:21:34 +00:00
Eugene Yurtsev	d293431e10	core[minor]: Add aload to document loader (#19936 ) Add aload to document loader	2024-04-03 10:46:47 -04:00
Leonid Kuligin	eb0521064e	deprecating integrations moved to langchain_google_community (#19841 ) Thank you for contributing to LangChain! - [ ] PR title: "community: deprecating integrations moved to langchain_google_community" - [ ] PR message: deprecating integrations moved to langchain_google_community --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-04-02 17:06:07 -04:00
Peter Vandenabeele	e830a4e731	community[patch]: Add remove_comments option (default True): do not extract html comments (#13259 ) - Description: add `remove_comments` option (default: True): do not extract html _comments_, - Issue: None, - Dependencies: None, - Tag maintainer: @nfcampos , - Twitter handle: peter_v I ran `make format`, `make lint` and `make test`. Discussion: I my use case, I prefer to not have the comments in the extracted text: * e.g. from a Google tag that is added in the html as comment * e.g. content that the authors have temporarily hidden to make it non visible to the regular reader Removing the comments makes the extracted text more alike the intended text to be seen by the reader. Choice to make: do we prefer to make the default for this `remove_comments` option to be True or False? I have changed it to True in a second commit, since that is how I would prefer to use it by default. Have the cleaned text (without technical Google tags etc.) and also closer to the actually visible and intended content. I am not sure what is best aligned with the conventions of langchain in general ... INITIAL VERSION (new version above): ~Choice to make: do we prefer to make the default for this `ignore_comments` option to be True or False? I have set it to False now to be backwards compatible. On the other hand, I would use it mostly with True. I am not sure what is best aligned with the conventions of langchain in general ...~ --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-02 00:19:12 +00:00
Jamsheed Mistri	4f70bc119d	community[minor]: add Layerup Security integration (#19787 ) Description: adds integration with [Layerup Security](https://uselayerup.com). Docs can be found [here](https://docs.uselayerup.com). Integrates directly with our Python SDK. Dependencies: [LayerupSecurity](https://pypi.org/project/LayerupSecurity/) Note: all methods for our product require a paid API key, so I only included 1 test which checks for an invalid API key response. I have tested extensively locally. Twitter handle: [@layerup_](https://twitter.com/layerup_) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-01 23:49:00 +00:00
Anıl Berk Altuner	4384fa8e49	community[minor]: Add Dria retriever (#17098 ) [Dria](https://dria.co/) is a hub of public RAG models for developers to both contribute and utilize a shared embedding lake. This PR adds a retriever that can retrieve documents from Dria.	2024-04-01 12:04:19 -07:00
Ethan Yang	48f84e253e	community[minor]: Add OpenVINO rerank model support (#19791 ) @eaidova @AlexKoff88 Could you help to review, thanks --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-01 18:27:23 +00:00
Chenhui Zhang	a1f3e9f537	community[minor]: Update ChatZhipuAI to support GLM-4 model (#16695 ) Description: Update `ChatZhipuAI` to support the latest `glm-4` model. Issue: N/A Dependencies: httpx, httpx-sse, PyJWT The previous `ChatZhipuAI` implementation requires the `zhipuai` package, and cannot call the latest GLM model. This is because - The old version `zhipuai==1.` doesn't support the latest model. - `zhipuai==2.` requires `pydantic V2`, which is incompatible with 'langchain-community'. This re-implementation invokes the GLM model by sending HTTP requests to [open.bigmodel.cn](https://open.bigmodel.cn/dev/api) via the `httpx` package, and uses the `httpx-sse` package to handle stream events. --------- Co-authored-by: zR <2448370773@qq.com>	2024-04-01 18:11:21 +00:00
Bagatur	d62e84c4f5	community[patch]: Revert " Fix the bug that Chroma does not specify `e… (#19866 ) …mbedding_function` (#19277)" This reverts commit `7042934b5f`. Fixes #19848	2024-04-01 10:10:44 -07:00
hsuyuming	5ab6b39098	community[patch]: add attribution_token within GoogleVertexAISearchRetriever (#18520 ) - Description: Add attribution_token within GoogleVertexAISearchRetriever so user can provide this information to Google support team or product team during debug session. Reference: https://cloud.google.com/generative-ai-app-builder/docs/view-analytics#user-events Attribution tokens. Attribution tokens are unique IDs generated by Vertex AI Search and returned with each search request. Make sure to include that attribution token as UserEvent.attributionToken with any user events resulting from a search. This is needed to identify if a search is served by the API. Only user events with a Google-generated attribution token are used to compute metrics. - Issue: No - Dependencies: No - Twitter handle: abehsu1992626 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-31 13:54:56 -07:00
Kenneth Choe	f98d7f7494	langchain[minor], community[minor]: add CrossEncoderReranker with HuggingFaceCrossEncoder and SagemakerEndpointCrossEncoder (#13687 ) - Description: Support reranking based on cross encoder models available from HuggingFace. - Added `CrossEncoder` schema - Implemented `HuggingFaceCrossEncoder` and `SagemakerEndpointCrossEncoder` - Implemented `CrossEncoderReranker` that performs similar functionality to `CohereRerank` - Added `cross-encoder-reranker.ipynb` to demonstrate how to use it. Please let me know if anything else needs to be done to make it visible on the table-of-contents navigation bar on the left, or on the card list on [retrievers documentation page](https://python.langchain.com/docs/integrations/retrievers). - Issue: N/A - Dependencies: None other than the existing ones. --------- Co-authored-by: Kenny Choe <kchoe@amazon.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-31 20:51:31 +00:00
Kamal Zhang	368e35c3b1	community[patch]: introduce convert_to_secret() to bananadev llm (#14283 ) - Description: Per #12165, this PR add to BananaLLM the function convert_to_secret_str() during environment variable validation. - Issue: #12165 - Tag maintainer: @eyurtsev - Twitter handle: @treewatcha75751 --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-30 00:52:25 +00:00
anshaneel	0884e5de7f	community[minor]: Add Alpha Vantage API Tool (#14332 ) ### Description This implementation adds functionality from the AlphaVantage API, renowned for its comprehensive financial data. The class encapsulates various methods, each dedicated to fetching specific types of financial information from the API. ### Implemented Functions - `search_symbols`: - Searches the AlphaVantage API for financial symbols using the provided keywords. - `_get_market_news_sentiment`: - Retrieves market news sentiment for a specified stock symbol from the AlphaVantage API. - `_get_time_series_daily`: - Fetches daily time series data for a specific symbol from the AlphaVantage API. - `_get_quote_endpoint`: - Obtains the latest price and volume information for a given symbol from the AlphaVantage API. - `_get_time_series_weekly`: - Gathers weekly time series data for a particular symbol from the AlphaVantage API. - `_get_top_gainers_losers`: - Provides details on top gainers, losers, and most actively traded tickers in the US market from the AlphaVantage API. ### Issue: - #11994 ### Dependencies: - 'requests' library for HTTP requests. (import requests) - 'pytest' library for testing. (import pytest) --------- Co-authored-by: Adam Badar <94140103+adam-badar@users.noreply.github.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-30 00:44:01 +00:00
Alex Sherstinsky	a9bc212bf2	community[minor]: fix failing Predibase integration (#19776 ) - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: Langchain-Predibase integration was failing, because it was not current with the Predibase SDK; in addition, Predibase integration tests were instantiating the Langchain Community `Predibase` class with one required argument (`model`) missing. This change updates the Predibase SDK usage and fixes the integration tests. - Twitter handle: `@alexsherstinsky` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-30 00:38:13 +00:00
ethynic	e9caa22d47	community[patch]: Update minimax.py (#14384 ) MiniMaxChat class _generate method shoud return a ChatResult object not str Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 23:57:06 +00:00
M.Abdulrahman Alnaseer	ba54f1577f	community[minor]: add support for llmsherpa (#19741 ) Thank you for contributing to LangChain! - [x] PR title: "community: added support for llmsherpa library" - [x] Add tests and docs: 1. Integration test: 'docs/docs/integrations/document_loaders/test_llmsherpa.py'. 2. an example notebook: `docs/docs/integrations/document_loaders/llmsherpa.ipynb`. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 16:04:57 -07:00
Hrvoje Milković	b7344e3347	community[minor]: Infobip tool integration (#16805 ) Description: Adding Tool that wraps Infobip API for sending sms or emails and email validation. Dependencies: None, Twitter handle: @hmilkovic Implementation: ``` libs/community/langchain_community/utilities/infobip.py ``` Integration tests: ``` libs/community/tests/integration_tests/utilities/test_infobip.py ``` Example notebook: ``` docs/docs/integrations/tools/infobip.ipynb ``` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 19:01:27 +00:00
Luka Krapic	727a2ea9f1	community[patch]: history size support for DynamoDBChatMessageHistory (#16794 ) Description: PR adds support for limiting number of messages preserved in a session history for DynamoDBChatMessageHistory --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 18:56:21 +00:00
Dt22	6dbf1a2de0	community[patch]: fix redis input type for index_schema field (#16874 ) ### Subject: Fix Type Misdeclaration for index_schema in redis/base.py I noticed a type misdeclaration for the index_schema column in the redis/base.py file. When following the instructions outlined in [Redis Custom Metadata Indexing](https://python.langchain.com/docs/integrations/vectorstores/redis) to create our own index_schema, it leads to a Pylance type error. <br/> The error message indicates that Dict[str, list[Dict[str, str]]] is incompatible with the type Optional[Union[Dict[str, str], str, os.PathLike]]. ``` index_schema = { "tag": [{"name": "credit_score"}], "text": [{"name": "user"}, {"name": "job"}], "numeric": [{"name": "age"}], } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users_modified", index_schema=index_schema, ) ``` Therefore, I have created this pull request to rectify the type declaration problem. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 18:55:54 +00:00
morgana	074ad5095f	community[patch]: mmr search for Rockset vectorstore integration (#16908 ) - Description: Adding support for mmr search in the Rockset vectorstore integration. - Issue: N/A - Dependencies: N/A - Twitter handle: `@_morgan_adams_` --------- Co-authored-by: Rockset API Bot <admin@rockset.io> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 18:45:22 +00:00
shahrin014	f51e6a35ba	community[patch]: OllamaEmbeddings - Pass headers to post request (#16880 ) ## Feature - Set additional headers in constructor - Headers will be sent in post request This feature is useful if deploying Ollama on a cloud service such as hugging face, which requires authentication tokens to be passed in the request header. ## Tests - Test if header is passed - Test if header is not passed Similar to https://github.com/langchain-ai/langchain/pull/15881 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 18:44:52 +00:00
Jan Chorowski	b8b42ccbc5	community[minor]: Pathway vectorstore(#14859 ) - Description: Integration with pathway.com data processing pipeline acting as an always updated vectorstore - Issue: not applicable - Dependencies: optional dependency on [`pathway`](https://pypi.org/project/pathway/) - Twitter handle: pathway_com The PR provides and integration with `pathway` to provide an easy to use always updated vector store: ```python import pathway as pw from langchain.embeddings.openai import OpenAIEmbeddings from langchain.text_splitter import CharacterTextSplitter from langchain.vectorstores import PathwayVectorClient, PathwayVectorServer data_sources = [] data_sources.append( pw.io.gdrive.read(object_id="17H4YpBOAKQzEJ93xmC2z170l0bP2npMy", service_user_credentials_file="credentials.json", with_metadata=True)) text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) embeddings_model = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"]) vector_server = PathwayVectorServer( *data_sources, embedder=embeddings_model, splitter=text_splitter, ) vector_server.run_server(host="127.0.0.1", port="8765", threaded=True, with_cache=False) client = PathwayVectorClient( host="127.0.0.1", port="8765", ) query = "What is Pathway?" docs = client.similarity_search(query) ``` The `PathwayVectorServer` builds a data processing pipeline which continusly scans documents in a given source connector (google drive, s3, ...) and builds a vector store. The `PathwayVectorClient` implements LangChain's `VectorStore` interface and connects to the server to retrieve documents. --------- Co-authored-by: Mateusz Lewandowski <lewymati@users.noreply.github.com> Co-authored-by: mlewandowski <mlewandowski@MacBook-Pro-mlewandowski.local> Co-authored-by: Berke <berkecanrizai1@gmail.com> Co-authored-by: Adrian Kosowski <adrian@pathway.com> Co-authored-by: mlewandowski <mlewandowski@macbook-pro-mlewandowski.home> Co-authored-by: berkecanrizai <63911408+berkecanrizai@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: mlewandowski <mlewandowski@MBPmlewandowski.ht.home> Co-authored-by: Szymon Dudycz <szymond@pathway.com> Co-authored-by: Szymon Dudycz <szymon.dudycz@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 10:50:39 -07:00
Arturs Konfino	2319212d54	community[patch]: avoid executing `toolkit.get_context()` when not necessary (#19762 ) If `prompt` is passed into `create_sql_agent()`, then `toolkit.get_context()` shouldn't be executed against the database unless relevant prompt variables (`table_info` or `table_names`) are present .	2024-03-29 16:42:21 +00:00
高璟琦	ec7a59c96c	community[minor]: Add solar embedding (#19761 ) Solar is a large language model developed by [Upstage](https://upstage.ai/). It's a powerful and purpose-trained LLM. You can visit the embedding service provided by Solar within this pr. You may get SOLAR_API_KEY from https://console.upstage.ai/services/embedding You can refer to more details about accepted llm integration at https://python.langchain.com/docs/integrations/llms/solar.	2024-03-29 09:36:05 -07:00
Tomaz Bratanic	dec00d3050	community[patch]: Add the ability to pass maps to neo4j retrieval query (#19758 ) Makes it easier to flatten complex values to text, so you don't have to use a lot of Cypher to do it.	2024-03-29 08:33:48 -07:00
Robby	f7e8a382cc	community[minor]: add hugging face text-to-speech inference API (#18880 ) Description: I implemented a tool to use Hugging Face text-to-speech inference API. Issue: n/a Dependencies: n/a Twitter handle: No Twitter, but do have [LinkedIn](https://www.linkedin.com/in/robby-horvath/) lol. --------- Co-authored-by: Robby <h0rv@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-03-29 15:02:29 +00:00
DasDingoCodes	73eb3f8fd9	community[minor]: Implement DirectoryLoader lazy_load function (#19537 ) Thank you for contributing to LangChain! - [x] PR title: "community: Implement DirectoryLoader lazy_load function" - [x] Description: The `lazy_load` function of the `DirectoryLoader` yields each document separately. If the given `loader_cls` of the `DirectoryLoader` also implemented `lazy_load`, it will be used to yield subdocuments of the file. - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access: `libs/community/tests/unit_tests/document_loaders/test_directory_loader.py` 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory: `docs/docs/integrations/document_loaders/directory.ipynb` - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-03-29 14:46:52 +00:00
Jialei	f7c903e24a	community[minor]: add support for Moonshot llm and chat model (#17100 )	2024-03-29 08:54:23 +00:00
Ethan Yang	7164015135	community[minor]: Add Openvino embedding support (#19632 ) This PR is used to support both HF and BGE embeddings with openvino --------- Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>	2024-03-29 01:34:51 -07:00
T Cramer	540ebf35a9	community[patch]: Add explicit error message to Bedrock error output. (#17328 ) - Description: Propagate Bedrock errors into Langchain explicitly. Use-case: unset region error is hidden behind 'Could not load credentials...' message - Issue: [17654](https://github.com/langchain-ai/langchain/issues/17654) - Dependencies: None --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 03:07:33 +00:00
Marcus Virginia	69bb96c80f	community[patch]: surrealdb handle for empty metadata and allow collection names with complex characters (#17374 ) - Description: Handle for empty metadata and allow collection names with complex characters - Issue: #17057 - Dependencies: `surrealdb` --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 01:04:27 +00:00
kYLe	124ab79c23	community[minor]: Add Anyscale embedding support (#17605 ) Description: Add embedding model support for Anyscale Endpoint Dependencies: openai --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:53:53 +00:00
Lance Martin	12843f292f	community[patch]: llama cpp embeddings reset default n_batch (#17594 ) When testing Nomic embeddings -- ``` from langchain_community.embeddings import LlamaCppEmbeddings embd_model_path = "/Users/rlm/Desktop/Code/llama.cpp/models/nomic-embd/nomic-embed-text-v1.Q4_K_S.gguf" embd_lc = LlamaCppEmbeddings(model_path=embd_model_path) embedding_lc = embd_lc.embed_query(query) ``` We were seeing this error for strings > a certain size -- ``` File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/llama.py:827, in Llama.embed(self, input, normalize, truncate, return_count) 824 s_sizes = [] 826 # add to batch --> 827 self._batch.add_sequence(tokens, len(s_sizes), False) 828 t_batch += n_tokens 829 s_sizes.append(n_tokens) File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/_internals.py:542, in _LlamaBatch.add_sequence(self, batch, seq_id, logits_all) 540 self.batch.token[j] = batch[i] 541 self.batch.pos[j] = i --> 542 self.batch.seq_id[j][0] = seq_id 543 self.batch.n_seq_id[j] = 1 544 self.batch.logits[j] = logits_all ValueError: NULL pointer access ``` The default `n_batch` of llama-cpp-python's Llama is `512` but we were explicitly setting it to `8`. These need to be set to equal for embedding models. * The embedding.cpp example has an assertion to make sure these are always equal. * Apparently this is not being done properly in llama-cpp-python. With `n_batch` set to 8, if more than 8 tokens are passed the batch runs out of space and it crashes. This also explains why the CPU compute buffer size was small: raw client with default `n_batch=512` ``` llama_new_context_with_model: CPU input buffer size = 3.51 MiB llama_new_context_with_model: CPU compute buffer size = 21.00 MiB ``` langchain with `n_batch=8` ``` llama_new_context_with_model: CPU input buffer size = 0.04 MiB llama_new_context_with_model: CPU compute buffer size = 0.33 MiB ``` We can work around this by passing `n_batch=512`, but this will not be obvious to some users: ``` embedding = LlamaCppEmbeddings(model_path=embd_model_path, n_batch=512) ``` From discussion w/ @cebtenzzre. Related: https://github.com/abetlen/llama-cpp-python/issues/1189 Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:47:22 +00:00
Zijian Han	8e976545f3	community[patch]: support OpenAI whisper base url (#17695 ) Description: The base URL for OpenAI is retrieved from the environment variable "OPENAI_BASE_URL", whereas for langchain it is obtained from "OPENAI_API_BASE". By adding `base_url = os.environ.get("OPENAI_API_BASE")`, the OpenAI proxy can execute correctly. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:35:27 +00:00
Paulo Nascimento	44a3484503	community[patch]: add NotebookLoader unit test (#17721 ) Thank you for contributing to LangChain! - Description: added unit tests for NotebookLoader. Linked PR: https://github.com/langchain-ai/langchain/pull/17614 - Issue: [#17614](https://github.com/langchain-ai/langchain/pull/17614) - Twitter handle: @paulodoestech - [x] Pass lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified to check that you're passing lint and testing. See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/ - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: lachiewalker <lachiewalker1@hotmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:27:46 +00:00
Paulo Nascimento	4c3a67122f	community[patch]: add Integration for OpenAI image gen with v1 sdk (#17771 ) Description: Created a Langchain Tool for OpenAI DALLE Image Generation. Issue: [#15901](https://github.com/langchain-ai/langchain/issues/15901) Dependencies: n/a Twitter handle: @paulodoestech - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:23:14 +00:00
Jiaming	3d3cc71287	community[patch]: fix bugs for bilibili Loader (#18036 ) - Description: 1. Fix the BiliBiliLoader that can receive cookie parameters, it requires 3 other parameters to run. The change is backward compatible. 2. Add test; 3. Add example in docs - Issue: [#14213] Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-28 16:39:38 -07:00
Sachin Paryani	25c9f3d1d1	community[patch]: Support Streaming in Azure Machine Learning (#18246 ) - [x] PR title: "community: Support streaming in Azure ML and few naming changes" - [x] PR message: - Description: Added support for streaming for azureml_endpoint. Also, renamed and AzureMLEndpointApiType.realtime to AzureMLEndpointApiType.dedicated. Also, added new classes CustomOpenAIChatContentFormatter and CustomOpenAIContentFormatter and updated the classes LlamaChatContentFormatter and LlamaContentFormatter to now show a deprecated warning message when instantiated. --------- Co-authored-by: Sachin Paryani <saparan@microsoft.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 23:38:20 +00:00
Victor Adan	afa2d85405	community[patch]: Added missing from_documents method to KNNRetriever. (#18411 ) - Description: Added missing `from_documents` method to `KNNRetriever`, providing the ability to supply metadata to LangChain `Document`s, and to give it parity to the other retrievers, which do have `from_documents`. - Issue: None - Dependencies: None - Twitter handle: None Co-authored-by: Victor Adan <vadan@netroadshow.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-28 23:18:50 +00:00
Smit Parmar	dfc4177b50	community[patch]: mypy ignore fix (#18483 ) Relates to #17048 Description : Applied fix to dynamodb and elasticsearch file. Error was : `Cannot override writeable attribute with read-only property` Suggestion: instead of adding ``` @messages.setter def messages(self, messages: List[BaseMessage]) -> None: raise NotImplementedError("Use add_messages instead") ``` we can change base class property `messages: List[BaseMessage]` to ``` @property def messages(self) -> List[BaseMessage]:... ``` then we don't need to add `@messages.setter` in all child classes.	2024-03-28 15:36:53 -07:00
Luca Dorigo	f19229c564	core[patch]: fix beta, deprecated typing (#18877 ) Description: While not technically incorrect, the TypeVar used for the `@beta` decorator prevented pyright (and thus most vscode users) from correctly seeing the types of functions/classes decorated with `@beta`. This is in part due to a small bug in pyright (https://github.com/microsoft/pyright/issues/7448 ) - however, the `Type` bound in the typevar `C = TypeVar("C", Type, Callable)` is not doing anything - classes are `Callables` by default, so by my understanding binding to `Type` does not actually provide any more safety - the modified annotation still works correctly for both functions, properties, and classes. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 22:33:43 +00:00
wulixuan	b7c8bc8268	community[patch]: fix yuan2 errors in LLMs (#19004 ) 1. fix yuan2 errors while invoke Yuan2. 2. update tests.	2024-03-28 14:37:44 -07:00
高远	688ca48019	community[patch]: Adding validation when vector does not exist (#19698 ) Adding validation when vector does not exist Co-authored-by: gaoyuan <gaoyuan.20001218@bytedance.com>	2024-03-28 13:58:23 -07:00
Chaunte W. Lacewell	4a49fc5a95	community[patch]: Fix bug in vdms (#19728 ) Description: Fix embedding check in vdms Contribution maintainer: [@cwlacewe](https://github.com/cwlacewe)	2024-03-28 12:54:24 -07:00
高璟琦	75173d31db	community[minor]: Add solar model chat model (#18556 ) Add our solar chat models, available model choices: * solar-1-mini-chat * solar-1-mini-translate-enko * solar-1-mini-translate-koen More documents and pricing can be found at https://console.upstage.ai/services/solar. The references to our solar model can be found at * https://arxiv.org/abs/2402.17032 --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 12:31:11 -07:00
Davide Menini	f7042321f1	community[patch]: gather token usage info in BedrockChat during generation (#19127 ) This PR allows to calculate token usage for prompts and completion directly in the generation method of BedrockChat. The token usage details are then returned together with the generations, so that other downstream tasks can access them easily. This allows to define a callback for tokens tracking and cost calculation, similarly to what happens with OpenAI (see [OpenAICallbackHandler](https://api.python.langchain.com/en/latest/_modules/langchain_community/callbacks/openai_info.html#OpenAICallbackHandler). I plan on adding a BedrockCallbackHandler later. Right now keeping track of tokens in the callback is already possible, but it requires passing the llm, as done here: https://how.wtf/how-to-count-amazon-bedrock-anthropic-tokens-with-langchain.html. However, I find the approach of this PR cleaner. Thanks for your reviews. FYI @baskaryan, @hwchase17 --------- Co-authored-by: taamedag <Davide.Menini@swisscom.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 18:58:46 +00:00
ligang-super	a662468dde	community[patch]: Fix the error of Baidu Qianfan not passing the stop parameter (#18666 ) - [x] PR title: "community: fix baidu qianfan missing stop parameter" - [x] PR message: - **Description: Baidu Qianfan lost the stop parameter when requesting service due to extracting it from kwargs. This bug can cause the agent to receive incorrect results --------- Co-authored-by: ligang33 <ligang33@baidu.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 18:21:49 +00:00
kaijietti	9c4b6dc979	community[patch]: fix bug in cohere that `async for` a coroutine in ChatCohere (#19381 ) Without `await`, the `stream` returned from the `async_client` is actually a coroutine, which could not be used in `async for`.	2024-03-27 21:34:46 -07:00
Christian Galo	1adaa3c662	community[minor]: Update Azure Cognitive Services to Azure AI Services (#19488 ) This is a follow up to #18371. These are the changes: - New Azure AI Services toolkit and tools to replace those of Azure Cognitive Services. - Updated documentation for Microsoft platform. - The image analysis tool has been rewritten to use the new package `azure-ai-vision-imageanalysis`, doing a proper replacement of `azure-ai-vision`. These changes: - Update outdated naming from "Azure Cognitive Services" to "Azure AI Services". - Update documentation to use non-deprecated methods to create and use agents. - Removes need to depend on yanked python package (`azure-ai-vision`) There is one new dependency that is needed as a replacement to `azure-ai-vision`: - `azure-ai-vision-imageanalysis`. This is optional and declared within a function. There is a new `azure_ai_services.ipynb` notebook showing usage; Changes have been linted and formatted. I am leaving the actions of adding deprecation notices and future removal of Azure Cognitive Services up to the LangChain team, as I am not sure what the current practice around this is. --- If this PR makes it, my handle is @galo@mastodon.social --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-03-28 03:19:02 +00:00
Shengsheng Huang	ac1dd8ad94	community[minor]: migrate `bigdl-llm` to `ipex-llm` (#19518 ) - Description: `bigdl-llm` library has been renamed to [`ipex-llm`](https://github.com/intel-analytics/ipex-llm). This PR migrates the `bigdl-llm` integration to `ipex-llm` . - Issue: N/A. The original PR of `bigdl-llm` is https://github.com/langchain-ai/langchain/pull/17953 - Dependencies: `ipex-llm` library - Contribution maintainer: @shane-huang Updated doc: docs/docs/integrations/llms/ipex_llm.ipynb Updated test: libs/community/tests/integration_tests/llms/test_ipex_llm.py	2024-03-27 20:12:59 -07:00
Chaunte W. Lacewell	a31f692f4e	community[minor]: Add VDMS vectorstore (#19551 ) - Description: Add support for Intel Lab's [Visual Data Management System (VDMS)](https://github.com/IntelLabs/vdms) as a vector store - Dependencies: `vdms` library which requires protobuf = "4.24.2". There is a conflict with dashvector in `langchain` package but conflict is resolved in `community`. - Contribution maintainer: [@cwlacewe](https://github.com/cwlacewe) - Added tests: libs/community/tests/integration_tests/vectorstores/test_vdms.py - Added docs: docs/docs/integrations/vectorstores/vdms.ipynb - Added cookbook: cookbook/multi_modal_RAG_vdms.ipynb --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 03:12:11 +00:00
William FH	b7b62e29fb	community[patch], mongodb[patch]: Stop spamming SIMD import warnings (#19531 ) If you use an embedding dist function in an eval loop, you get warned every time. Would prefer to just check once and forget about it. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 03:11:02 +00:00
yongheng.liu	7e29b6061f	community[minor]: integrate China Mobile Ecloud vector search (#15298 ) - Description: integrate China Mobile Ecloud vector search, - Dependencies: elasticsearch==7.10.1 Co-authored-by: liuyongheng <liuyongheng@cmss.chinamobile.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-27 23:02:40 +00:00
Hyeongchan Kim	9b70131aed	community[patch]: refactor the type hint of `file_path` in `UnstructuredAPIFileLoader` class (#18839 ) * Description: add `None` type for `file_path` along with `str` and `List[str]` types. * `file_path`/`filename` arguments in `get_elements_from_api()` and `partition()` can be `None`, however, there's no `None` type hint for `file_path` in `UnstructuredAPIFileLoader` and `UnstructuredFileLoader` currently. * calling the function with `file_path=None` is no problem, but my IDE annoys me lol. * Issue: N/A * Dependencies: N/A Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-27 22:31:54 +00:00
CaroFG	cf96060ab7	community[patch]: update for compatibility with latest Meilisearch version (#18970 ) - Description: Updates Meilisearch vectorstore for compatibility with v1.6 and above. Adds embedders settings and embedder_name which are now required. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-27 22:08:27 +00:00
chyroc	be2adb1083	community[patch]: support unstructured_kwargs for s3 loader (#15473 ) fix https://github.com/langchain-ai/langchain/issues/15472 Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-27 22:03:48 +00:00
Tomaz Bratanic	87d2a6b777	community[minor]: Add the option to omit schema refresh in Neo4jGraph (#19654 )	2024-03-27 14:20:12 -04:00
Rajendra Kadam	0019d8a948	community[minor]: Add support for non-file-based Document Loaders in PebbloSafeLoader (#19574 ) Description: PebbloSafeLoader: Add support for non-file-based Document Loaders This pull request enhances PebbloSafeLoader by introducing support for several non-file-based Document Loaders. With this update, PebbloSafeLoader now seamlessly integrates with the following loaders: - GoogleDriveLoader - SlackDirectoryLoader - Unstructured EmailLoader Issue: NA Dependencies: - None Twitter handle: @Raj__725 --------- Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	2024-03-27 17:39:52 +00:00
hulitaitai	dc2c9dd4d7	Update text2vec.py (#19657 ) Add that URL of the embedding tool "text2vec". Fix minor mistakes in the doc-string.	2024-03-27 13:13:30 -04:00
Guangdong Liu	7042934b5f	community[patch]: Fix the bug that Chroma does not specify `embedding_function` (#19277 ) - Issue: close #18291 - @baskaryan, @eyurtsev PTAL	2024-03-27 11:43:38 -04:00
yuwenzho	3a7d2cf443	community[minor]: Add ITREX optimized Embeddings (#18474 ) Introduction [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers) is an innovative toolkit designed to accelerate GenAI/LLM everywhere with the optimal performance of Transformer-based models on various Intel platforms Description adding ITREX runtime embeddings using intel-extension-for-transformers. added mdx documentation and example notebooks added embedding import testing. --------- Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-27 07:22:06 +00:00
Fabrizio Ruocco	f12cb0bea4	community[patch]: Microsoft Azure Document Intelligence updates (#16932 ) - Description: Update Azure Document Intelligence implementation by Microsoft team and RAG cookbook with Azure AI Search --------- Co-authored-by: Lu Zhang (AI) <luzhan@microsoft.com> Co-authored-by: Yateng Hong <yatengh@microsoft.com> Co-authored-by: teethache <hongyateng2006@126.com> Co-authored-by: Lu Zhang <44625949+luzhang06@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 23:36:59 -07:00
Timothy	ad77fa15ee	community[patch]: Adding try-except block for GCSDirectoryLoader (#19591 ) - Description: Implemented try-except block for `GCSDirectoryLoader`. Reason: Users processing large number of unstructured files in a folder may experience many different errors. A try-exception block is added to capture these errors. A new argument `use_try_except=True` is added to enable silent failure so that error caused by processing one file does not break the whole function. - Issue: N/A - Dependencies: no new dependencies - Twitter handle: timothywong731 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-27 00:12:24 +00:00
xsai9101	160a8eb178	community[minor]: add oracle autonomous database doc loader integration (#19536 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: Adding oracle autonomous database document loader integration. This will allow users to connect to oracle autonomous database through connection string or TNS configuration. https://www.oracle.com/autonomous-database/ - Issue: None - Dependencies: oracledb python package https://pypi.org/project/oracledb/ - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. Unit test and doc are added. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-26 17:02:18 -07:00
Adam Law	aeb7b6b11d	community[patch]: use semantic_configurations in AzureSearch (#19347 ) - Description: Currently the semantic_configurations are not used when creating an AzureSearch instance, instead creating a new one with default values. This PR changes the behavior to use the passed semantic_configurations if it is present, and the existing default configuration if not. --------- Co-authored-by: Adam Law <adamlaw@microsoft.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-26 13:57:39 -07:00
Adrian Valente	2763d8cbe5	community: add len() implementation to Chroma (#19419 ) Thank you for contributing to LangChain! - [x] Add len() implementation to Chroma: "package: community" - [x] PR message: - Description: add an implementation of the __len__() method for the Chroma vectostore, for convenience. - Issue: no exposed method to know the size of a Chroma vectorstore - Dependencies: None - Twitter handle: lowrank_adrian - [x] Add tests and docs - [x] Lint and test --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 12:53:10 -04:00
Tom Aarsen	e0a1278d2b	docs: HFEmbeddings: Add more information to model_kwargs/encode_kwargs (#19594 ) - Description: Be more explicit with the `model_kwargs` and `encode_kwargs` for `HuggingFaceEmbeddings`. - Issue: - - Dependencies: - I received some reports by my users that they didn't realise that you could change the default `batch_size` with `HuggingFaceEmbeddings`, which may be attributed to how the `model_kwargs` and `encode_kwargs` don't give much information about what you can specify. I've added some parameter names & links to the Sentence Transformers documentation to help clear it up. Let me know if you'd rather have Markdown/Sphinx-style hyperlinks rather than a "bare URL". - Tom Aarsen	2024-03-26 12:46:04 -04:00
Dobiichi-Origami	18e6f9376d	community[Qianfan]: add function_call in additional_kwargs (#19550 ) - Description: add lacked `function_call` field in `additional_kwargs` in previous version - Dependencies: None of new dependency	2024-03-26 12:20:19 -04:00
mwmajewsk	f7a1fd91b8	community: better support of pathlib paths in document loaders (#18396 ) So this arose from the https://github.com/langchain-ai/langchain/pull/18397 problem of document loaders not supporting `pathlib.Path`. This pull request provides more uniform support for Path as an argument. The core ideas for this upgrade: - if there is a local file path used as an argument, it should be supported as `pathlib.Path` - if there are some external calls that may or may not support Pathlib, the argument is immidiately converted to `str` - if there `self.file_path` is used in a way that it allows for it to stay pathlib without conversion, is is only converted for the metadata. Twitter handle: https://twitter.com/mwmajewsk	2024-03-26 11:51:52 -04:00
Yuki Watanabe	cfecbda48b	community[minor]: Allow passing `allow_dangerous_deserialization` when loading LLM chain (#18894 ) ### Issue Recently, the new `allow_dangerous_deserialization` flag was introduced for preventing unsafe model deserialization that relies on pickle without user's notice (#18696). Since then some LLMs like Databricks requires passing in this flag with true to instantiate the model. However, this breaks existing functionality to loading such LLMs within a chain using `load_chain` method, because the underlying loader function [load_llm_from_config](`f96dd57501/libs/langchain/langchain/chains/loading.py (L40)`) (and load_llm) ignores keyword arguments passed in. ### Solution This PR fixes this issue by propagating the `allow_dangerous_deserialization` argument to the class loader iff the LLM class has that field. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 11:07:55 -04:00
hulitaitai	d7c14cb6f9	community[minor]: Add embeddings integration for text2vec (#19267 ) Create a Class which allows to use the "text2vec" open source embedding model. It should install the model by running 'pip install -U text2vec'. Example to call the model through LangChain: from langchain_community.embeddings.text2vec import Text2vecEmbeddings embedding = Text2vecEmbeddings() bookend.embed_documents([ "This is a CoSENT(Cosine Sentence) model.", "It maps sentences to a 768 dimensional dense vector space.", ]) bookend.embed_query( "It can be used for text matching or semantic search." ) --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-03-26 11:06:58 -04:00
Kalyan Mudumby	d27600c6f7	community[patch]: GPTCache pydantic validation error on lookup (#19427 ) Description: this change fixes the pydantic validation error when looking up from GPTCache, the `ChatOpenAI` class returns `ChatGeneration` as response which is not handled. use the existing `_loads_generations` and `_dumps_generations` functions to handle it Trace ``` File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/development/scripts/chatbot-postgres-test.py", line 90, in <module> print(llm.invoke("tell me a joke")) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 166, in invoke self.generate_prompt( File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 544, in generate_prompt return self.generate(prompt_messages, stop=stop, callbacks=callbacks, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 408, in generate raise e File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 398, in generate self._generate_with_cache( File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 585, in _generate_with_cache cache_val = llm_cache.lookup(prompt, llm_string) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_community/cache.py", line 807, in lookup return [ ^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_community/cache.py", line 808, in <listcomp> Generation(generation_dict) for generation_dict in json.loads(res) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/load/serializable.py", line 120, in __init__ super().__init__(**kwargs) File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/pydantic/v1/main.py", line 341, in __init__ raise validation_error pydantic.v1.error_wrappers.ValidationError: 1 validation error for Generation type unexpected value; permitted: 'Generation' (type=value_error.const; given=ChatGeneration; permitted=('Generation',)) ``` Although I don't seem to find any issues here, here's an [issue](https://github.com/zilliztech/GPTCache/issues/585) raised in GPTCache. Please let me know if I need to do anything else Thank you --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 10:52:30 -04:00
Piyush Jain	72ba738bf5	community[minor]: Improvements for NeptuneRdfGraph, Improve discovery of graph schema using database statistics (#19546 ) Fixes linting for PR [19244](https://github.com/langchain-ai/langchain/pull/19244) --------- Co-authored-by: mhavey <mchavey@gmail.com>	2024-03-26 10:36:51 -04:00
Christophe Bornet	8595c3ab59	community[minor]: Add InMemoryVectorStore to module level imports (#19576 )	2024-03-26 14:07:44 +00:00
Aayush Kataria	03c38005cb	community[patch]: Fixing some caching issues for AzureCosmosDBSemanticCache (#18884 ) Fixing some issues for AzureCosmosDBSemanticCache - Added the entry for "AzureCosmosDBSemanticCache" which was missing in langchain/cache.py - Added application name when creating the MongoClient for the AzureCosmosDBVectorSearch, for tracking purposes. @baskaryan, can you please review this PR, we need this to go in asap. These are just small fixes which we found today in our testing.	2024-03-25 19:06:17 -07:00
Clément Tamines	a6cbb755a7	community[patch]: fix semantic answer bug in AzureSearch vector store (#18938 ) - Description: The `semantic_hybrid_search_with_score_and_rerank` method of `AzureSearch` contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets from `AzureSearch` test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use the `FIELDS_METADATA` variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search in `semantic_hybrid_search_with_score_and_rerank`. - Issue: https://github.com/langchain-ai/langchain/issues/18731 - Prior fix to this bug: This bug was fixed in this PR https://github.com/langchain-ai/langchain/pull/15642 by adding a check for the existence of the metadata field named `FIELDS_METADATA` and retrieving a value for the key called "key" in that metadata if it exists. If the field named `FIELDS_METADATA` was not present, an empty string was returned. This fix was removed in this PR https://github.com/langchain-ai/langchain/pull/15659 (see `ed1ffca911`#). @lz-chen: could you confirm this wasn't intentional? - New fix to this bug: I believe there was an oversight in the logic of the fix from [#1564](https://github.com/langchain-ai/langchain/pull/15642) which I explain below. The `semantic_hybrid_search_with_score_and_rerank` method creates a dictionary `semantic_answers_dict` with semantic answers returned by the search as follows. `5c2f7e6b2b/libs/community/langchain_community/vectorstores/azuresearch.py (L574-L581)` The keys in this dictionary are the unique document ids in the index, if I understand the [documentation of semantic answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers) in Azure AI Search correctly. When the method transforms a search result into a `Document` object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between a `Document` object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for the `semantic_answers_dict` dictionary. This id is defined in the search result's field named `FIELDS_ID`. I added a check to avoid any error in case no field named `FIELDS_ID` exists in a search result (which shouldn't happen in theory). A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field. @levalencia could you confirm my analysis and test the fix? @raunakshrivastava7 do you agree with the fix? Thanks for the help!	2024-03-25 18:51:54 -07:00
Anindyadeep	b2a11ce686	community[minor]: Prem AI langchain integration (#19113 ) ### Prem SDK integration in LangChain This PR adds the integration with [PremAI's](https://www.premai.io/) prem-sdk with langchain. User can now access to deployed models (llms/embeddings) and use it with langchain's ecosystem. This PR adds the following: ### This PR adds the following: - [x] Add chat support - [X] Adding embedding support - [X] writing integration tests - [X] writing tests for chat - [X] writing tests for embedding - [X] writing unit tests - [X] writing tests for chat - [X] writing tests for embedding - [X] Adding documentation - [X] writing documentation for chat - [X] writing documentation for embedding - [X] run `make test` - [X] run `make lint`, `make lint_diff` - [X] Final checks (spell check, lint, format and overall testing) --------- Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 01:37:19 +00:00
Souhail Hanfi	cbec43afa9	community[patch]: avoid creating extension PGvector while using readOnly Databases (#19268 ) - Description: PgVector class always runs "create extension" on init and this statement crashes on ReadOnly databases (read only replicas). but wierdly the next create collection etc work even in readOnly databases - Dependencies: no new dependencies - Twitter handle: @VenOmaX666 Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 01:25:01 +00:00
Barun Amalkumar Halder	9246ec6b36	community[patch] : [Fiddler] ensure dataset is not added if model is present (#19293 ) Description: - minor PR to speed up onboarding by not trying to add a dataset, if a model is already present. - replace batch publish API with streaming when single events are published. Dependencies: any dependencies required for this change Twitter handle: behalder Co-authored-by: Barun Halder <barun@fiddler.ai>	2024-03-25 17:28:05 -07:00
JSDu	6e090280fd	community[patch]: milvus will autoflush, manual flush is slowly (#19300 ) reference: https://milvus.io/docs/configure_quota_limits.md#quotaAndLimitsflushRateenabled https://github.com/milvus-io/milvus/issues/31407 Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 00:26:58 +00:00
mackong	e65dc4b95b	community[patch]: clean warning when delete by ids (#19301 ) * Description: rearrange to avoid variable overwrite, which cause warning always. * Issue: N/A * Dependencies: N/A	2024-03-25 17:23:22 -07:00
Stefano Mosconi	01fc69c191	community[patch]: expanding version in confluence loader (#19324 ) Description: Expanding version in all the Confluence API calls so to get when the page was last modified/created in all cases. Issue: #12812 Twitter handle: zzste	2024-03-25 17:08:01 -07:00
Dmitry Tyumentsev	08b769d539	community[patch]: YandexGPT Use recent yandexcloud sdk version (#19341 ) Fixed inability to work with [yandexcloud SDK](https://pypi.org/project/yandexcloud/) version higher 0.265.0	2024-03-25 17:05:57 -07:00
Marlene	f1313339ac	community[patch]: Fixing incorrect base URLs for Azure Cognitive Search Retriever (#19352 ) This PR adds code to make sure that the correct base URL is being created for the Azure Cognitive Search retriever. At the moment an incorrect base URL is being generated. I think this is happening because the original code was based on a depreciated API version. No dependencies need to be added. I've also added more context to the test doc strings. I should also note that ACS is now Azure AI Search. I will open a separate PR to make these changes as that would be a breaking change and should potentially be discussed. Twitter: @marlene_zw - No new tests added, however the current ACS retriever tests are now passing when I run them. - Code was linted. Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 00:04:59 +00:00
FinTech秋田	03ba1d4731	community[patch]: Add Support for GPU Index Types in Milvus 2.4 (#19468 ) - Description: This commit introduces support for the newly available GPU index types introduced in Milvus 2.4 within the LangChain project's `milvus.py`. With the release of Milvus 2.4, a range of GPU-accelerated index types have been added, offering enhanced search capabilities and performance optimizations for vector search operations. This update ensures LangChain users can fully utilize the new performance benefits for vector search operations. - Reference: https://milvus.io/docs/gpu_index.md Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-25 23:39:54 +00:00
Ash Vardanian	d01bad5169	core[patch]: Convert SimSIMD back to NumPy (#19473 ) This patch fixes the #18022 issue, converting the SimSIMD internal zero-copy outputs to NumPy. I've also noticed, that oftentimes `dtype=np.float32` conversion is used before passing to SimSIMD. Which numeric types do LangChain users generally care about? We support `float64`, `float32`, `float16`, and `int8` for cosine distances and `float16` seems reasonable for practically any kind of embeddings and any modern piece of hardware, so we can change that part as well 🤗	2024-03-25 16:36:26 -07:00
Mikelarg	dac2e0165a	community[minor]: Added GigaChat Embeddings support + updated previous GigaChat integration (#19516 ) - Description: Added integration with [GigaChat](https://developers.sber.ru/portal/products/gigachat) embeddings. Also added support for extra fields in GigaChat LLM and fixed docs.	2024-03-25 16:08:37 -07:00
Martin Kolb	e5bdb26f76	community[patch]: More flexible handling for entity names in vector store "HANA Cloud" (#19523 ) - Description: Added support for lower-case and mixed-case names The names for tables and columns previouly had to be UPPER_CASE. With this enhancement, also lower_case and MixedCase are supported, - Issue: N/A - Dependencies: no new dependecies added - Twitter handle: @sapopensource	2024-03-25 15:52:45 -07:00
billytrend-cohere	63343b4987	cohere[patch]: add cohere as a partner package (#19049 ) Description: adds support for langchain_cohere --------- Co-authored-by: Harry M <127103098+harry-cohere@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-03-25 20:23:47 +00:00
ccurme	82de8fd6c9	add kwargs (#19519 ) `HanaDB.add_texts` is missing **kwargs.	2024-03-25 11:56:01 -04:00
Nikhil Kumar	3d3b46a782	docs: Update docs for `HuggingFacePipeline` (#19306 ) Updated `HuggingFacePipeline` docs to be in sync with list of supported tasks, including translation. - [x] PR title: "community: Update docs for `HuggingFacePipeline`" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: - Description: Update docs for `HuggingFacePipeline`, was earlier missing `translation` as a valid task - Issue: N/A - Dependencies: N/A - Twitter handle: None - [x] Add tests and docs: - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/	2024-03-25 00:29:21 -07:00
Igor Muniz Soares	743f888580	community[minor]: Dappier chat model integration (#19370 ) Description: This PR adds [Dappier](https://dappier.com/) for the chat model. It supports generate, async generate, and batch functionalities. We added unit and integration tests as well as a notebook with more details about our chat model. Dependencies: No extra dependencies are needed.	2024-03-25 07:29:05 +00:00
Hugoberry	96dc180883	community[minor]: Add `DuckDB` as a vectorstore (#18916 ) DuckDB has a cosine similarity function along list and array data types, which can be used as a vector store. - Description: The latest version of DuckDB features a cosine similarity function, which can be used with its support for list or array column types. This PR surfaces this functionality to langchain. - Dependencies: duckdb 0.10.0 - Twitter handle: @igocrite --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-25 07:02:35 +00:00
preak95	6ea3e57a63	community[minor]: S3FileLoader to use expose mode and post_processors arguments of unstructured loader (#19270 ) Description: Update s3_file.py to use arguments mode and post_processors from the base class UnstructuredBaseLoader to include more metadata about the files from the S3 bucket such as 'page_number', 'languages' etc. Issue: NA Dependencies: None Twitter handle: preak95 --------- Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-25 06:56:55 +00:00
fengjial	3b52ee05d1	community[patch]: fix bugs in baiduvectordb as vectorstore (#19380 ) fix small bugs in vectorstore/baiduvectordb	2024-03-22 17:03:59 -07:00
aditya thomas	515aab3312	community[patch]: invoke callback prior to yielding token (openai) (#19389 ) Description: Invoke callback prior to yielding token for BaseOpenAI & OpenAIChat Issue: [Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913](https://github.com/langchain-ai/langchain/issues/16913) Dependencies: None	2024-03-22 16:45:55 -07:00
aditya thomas	49e932cd24	community[patch]: invoke callback prior to yielding token (fireworks) (#19388 ) Description: Invoke callback prior to yielding token for Fireworks Issue: [Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913](https://github.com/langchain-ai/langchain/issues/16913) Dependencies: None	2024-03-22 16:44:06 -07:00
Tarun Jain	ef6d3d66d6	community[patch]: docarray requires hnsw installation (#19416 ) I have a small dataset, and I tried to use docarray: ``DocArrayHnswSearch ``. But when I execute, it returns: ```bash raise ImportError( ImportError: Could not import docarray python package. Please install it with `pip install "langchain[docarray]"`. ``` Instead of docarray it needs to be ```bash docarray[hnswlib] ``` Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-22 22:39:07 +00:00
German Swan	d4dc98a9f9	community[patch]: RecursiveUrlLoader: add base_url option (#19421 ) RecursiveUrlLoader does not currently provide an option to set `base_url` other than the `url`, though it uses a function with such an option. For example, this causes it unable to parse the `https://python.langchain.com/docs`, as it returns the 404 page, and `https://python.langchain.com/docs/get_started/introduction` has no child routes to parse. `base_url` allows setting the `https://python.langchain.com/docs` to filter by, while the starting URL is anything inside, that contains relevant links to continue crawling. I understand that for this case, the docusaurus loader could be used, but it's a common issue with many websites. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-22 15:34:31 -07:00
aditya thomas	4856a87261	community[patch]: invoke callback prior to yielding token (llama.cpp) (#19392 ) Description: Invoke callback prior to yielding token for llama.cpp Issue: [Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913](https://github.com/langchain-ai/langchain/issues/16913) Dependencies: None	2024-03-22 16:17:56 -04:00
billytrend-cohere	f6bcd42421	community[patch]: Replace positional argument with text=text for cohere>=5 compatibility (#19407 ) - Description: Replace positional argument with text=text for cohere>=5 compatibility	2024-03-21 10:42:51 -07:00
Yudhajit Sinha	7d216ad1e1	community[patch]: Invoke callback prior to yielding token (titan_takeoff_pro) (#18624 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/titan_takeoff_pro. - Issue: #16913 - Dependencies: None	2024-03-20 07:58:18 -07:00
Yudhajit Sinha	455a74486b	community[patch]: Invoke callback prior to yielding token (sparkllm) (#18625 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/sparkllm. - Issue: #16913 - Dependencies: None	2024-03-20 07:57:53 -07:00
Yudhajit Sinha	5ac1860484	community[patch]: Invoke callback prior to yielding token (replicate) (#18626 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/replicate. - Issue: #16913 - Dependencies: None	2024-03-20 07:57:27 -07:00
Yudhajit Sinha	9525e392de	community[patch]: Invoke callback prior to yielding token (pai_eas_endpoint) (#18627 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/pai_eas_endpoint. - Issue: #16913 - Dependencies: None	2024-03-20 07:56:58 -07:00
Yudhajit Sinha	140f06e59a	community[patch]: Invoke callback prior to yielding token (openai) (#18628 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/openai. - Issue: #16913 - Dependencies: None	2024-03-20 07:56:30 -07:00
Yudhajit Sinha	280a914920	community[patch]: Invoke callback prior to yielding token (ollama) (#18629 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ & _astream_ methods in llms/ollama. - Issue: #16913 - Dependencies: None	2024-03-20 07:56:09 -07:00
Christophe Bornet	00614f332a	community[minor]: Add InMemoryVectorStore (#19326 ) This is a basic VectorStore implementation using an in-memory dict to store the documents. It doesn't need any extra/optional dependency as it uses numpy which is already a dependency of langchain. This is useful for quick testing, demos, examples. Also it allows to write vendor-neutral tutorials, guides, etc...	2024-03-20 10:21:07 -04:00
Nithish Raghunandanan	7ad0a3f2a7	community: add Couchbase Vector Store (#18994 ) - Description: Added support for Couchbase Vector Search to LangChain. - Dependencies: couchbase>=4.1.12 - Twitter handle: @nithishr --------- Co-authored-by: Nithish Raghunandanan <nithishr@users.noreply.github.com>	2024-03-19 12:39:51 -07:00
Christophe Bornet	30e4a35d7a	community: Use langchain-astradb for AstraDB caches (#18419 ) - [x] Needs https://github.com/langchain-ai/langchain-datastax/pull/4 - [x] Needs a new release of langchain-astradb	2024-03-19 14:04:36 -04:00
Vittorio Rigamonti	9b2f9ee952	community: VectorStore Infinispan, adding autoconfiguration (#18967 ) Description: this PR enable VectorStore autoconfiguration for Infinispan: if metadatas are only of basic types, protobuf config will be automatically generated for the user.	2024-03-18 21:33:45 -07:00
gonvee	b82644078e	community: Add `keep_alive` parameter to control how long the model w… (#19005 ) Add `keep_alive` parameter to control how long the model will stay loaded into memory with Ollama。 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-19 04:29:01 +00:00
Harrison Chase	efcdf54edd	Josha91 fix docstring (#19249 ) Co-authored-by: Josha van Houdt <josha.van.houdt@sap.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-18 21:19:56 -07:00
Taqi Jaffri	044bc22acc	Community: Add mistral oss model support to azureml endpoints, plus configurable timeout (#19123 ) - Description: There was no formatter for mistral models for Azure ML endpoints. Adding that, plus a configurable timeout (it was hard coded before) - Dependencies: none - Twitter handle: @tjaffri @docugami	2024-03-18 21:10:42 -07:00
Hamza Muhammad Farooqi	24a0a4472a	Add docstrings for Clickhouse class methods (#19195 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	2024-03-19 04:03:12 +00:00
Rohit Gupta	785f8ab174	[langchain_community] milvus vectorstores upsert: add kwargs to make it use for other argument also (#19193 ) add kwargs in add_documents for upsert, to make it use for other argument also. Lets use this, it was unused as of now. - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. Co-authored-by: Rohit Gupta <rohit.gupta2@walmart.com>	2024-03-18 21:01:12 -07:00
Guangdong Liu	c3310c5e7f	community: Fix Milvus got multiple values for keyword argument 'timeout' (#19232 ) - Description: Fix Milvus got multiple values for keyword argument 'timeout' - Issue: fix #18580 - @baskaryan @eyurtsev PTAL	2024-03-18 20:44:25 -07:00
Leonid Ganeline	7de1d9acfd	community: `llms` imports fixes (#18943 ) Classes are missed in __all__ and in different places of __init__.py - BaichuanLLM - ChatDatabricks - ChatMlflow - Llamafile - Mlflow - Together Added classes to __all__. I also sorted __all__ list. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-03-18 20:24:40 +00:00
Kenzie Mihardja	21f75991d4	deprecate community docugami loader (#19230 ) Thank you for contributing to LangChain! - [x] PR title: "community: deprecate DocugamiLoader" - [x] PR message: Deprecate the langchain_community and use the docugami_langchain DocugamiLoader --------- Co-authored-by: Kenzie Mihardja <kenzie28@cs.washington.edu>	2024-03-18 12:56:47 -07:00
Pengfei Jiang	514fe80778	community[patch]: add stop parameter support to volcengine maas (#19052 ) - Description: add stop parameter to volcengine maas model - Dependencies: no --------- Co-authored-by: 江鹏飞 <jiangpengfei.jiangpf@bytedance.com>	2024-03-17 01:58:50 +00:00
htaoruan	bcc771e37c	docs: ChatTongyi example error (#19013 )	2024-03-17 01:55:56 +00:00
primate88	5aa68936e0	community: Fix import path for StreamingStdOutCallbackHandler example (#19170 ) - Description: - Updated the import path for `StreamingStdOutCallbackHandler` in the streaming response example within `huggingface_endpoint.py`. This change corrects the import statement to reflect the actual location of `StreamingStdOutCallbackHandler` in `langchain_core.callbacks.streaming_stdout`. - Issue: - None - Dependencies: - No additional dependencies are required for this change. - Twitter handle: - None ## Note: I have tested this change locally and confirmed that the `StreamingStdOutCallbackHandler` works as expected with the updated import path. This PR does not require the addition of new tests since it is a correction to documentation/examples rather than functional code.	2024-03-17 00:50:37 +00:00
Nikhil Kumar	635b3372bd	community[minor]: Add support for translation in HuggingFacePipeline (#19190 ) - [x] Support for translation: "community: Add support for translation in `HuggingFacePipeline`" - [x] Add support for translation in `HuggingFacePipeline`: - Description: Add support for translation in `HuggingFacePipeline`, which earlier used to support only text summarization and generation. - Issue: N/A - Dependencies: N/A - Twitter handle: None	2024-03-17 00:48:13 +00:00
k.muto	8d2c34e655	community: Fix all page numbers were the same for _BaseGoogleVertexAISearchRetriever (#19175 ) - Description: - This pull request is to fix a bug where page numbers were not set correctly. In the current code, all chunks share the same metadata object doc_metadata, so the page number is set with the same value for all documents. To fix this, I changed to using separate metadata objects for each chunk. - Issue: - None - Dependencies: - No additional dependencies are required for this change. - Twitter handle: - @eycjur - Test - Even if it's not a bug, there are cases where everything ends up with the same number of pages, so it's very difficult for me to write integration tests.	2024-03-16 22:28:56 +00:00
Cailin Wang	7cd87d2f6a	community: Add `partition` parameter to DashVector (#19023 ) Description: DashVector Add partition parameter Twitter handle: @CailinWang_ --------- Co-authored-by: root <root@Bluedot-AI>	2024-03-16 15:20:30 -07:00
Rodrigo Nogueira	e64cf1aba4	community: Add model argument for maritalk models and better error handling (#19187 )	2024-03-16 15:18:56 -07:00
Sergey Kozlov	1a55e950aa	community[patch]: support fastembed v1 and v2 (#19125 ) Description: #18040 forces `fastembed>2.0`, and this causes dependency conflicts with the new `unstructured` package (different `onnxruntime`). There may be other dependency conflicts.. The only way to use `langchain-community>=0.0.28` is rollback to `unstructured 0.10.X`. But new `unstructured` contains many fixes. This PR allows to use both `fastembed` `v1` and `v2`. How to reproduce: `pyproject.toml`: ```toml [tool.poetry] name = "depstest" version = "0.0.0" description = "test" authors = ["<dev@example.org>"] [tool.poetry.dependencies] python = ">=3.10,<3.12" langchain-community = "^0.0.28" fastembed = "^0.2.0" unstructured = {extras = ["pdf"], version = "^0.12"} ``` ```bash $ poetry lock ``` Co-authored-by: Sergey Kozlov <sergey.kozlov@ludditelabs.io>	2024-03-15 18:33:51 -07:00
高远	ef9813dae6	docs: add vikingdb docstrings(#19016 ) Co-authored-by: gaoyuan <gaoyuan.20001218@bytedance.com>	2024-03-15 16:29:29 -07:00
wulixuan	0e0030f494	community[patch]: fix yuan2 chat model errors while invoke. (#19015 ) 1. fix yuan2 chat model errors while invoke. 2. update related tests. 3. fix some deprecationWarning.	2024-03-15 16:28:36 -07:00
Shuai Liu	c244e1a50b	community[patch]: Fixed bug in merging `generation_info` during chunk concatenation in Tongyi and ChatTongyi (#19014 ) - Description: In #16218 , during the `GenerationChunk` and `ChatGenerationChunk` concatenation, the `generation_info` merging changed from simple keys & values replacement to using the util method [`merge_dicts`](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/utils/_merge.py): ![image](https://github.com/langchain-ai/langchain/assets/2098020/10f315bf-7fe0-43a7-a0ce-6a3834b99a15) The `merge_dicts` method could not handle merging values of `int` or some other types, and would raise a [`TypeError`](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/utils/_merge.py#L55). This PR fixes this issue in the Tongyi and ChatTongyi Model by adopting the `generation_info` of the last chunk and discarding the `generation_info` of the intermediate chunks, ensuring that `stream` and `astream` function correctly. - Issue: - Related issues or PRs about Tongyi & ChatTongyi: #16605, #17105 - Other models or cases: #18441, #17376 - Dependencies: No new dependencies	2024-03-15 16:27:53 -07:00
Christophe Bornet	f2a7dda4bd	community[patch]: Use langchain-astradb for AstraDB doc loader (#19071 ) Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-15 22:57:25 +00:00
Holt Skinner	cee03630d9	community[patch]: Add Blended Search Support to `GoogleVertexAISearchRetriever` (#19082 ) https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es#multi-data-stores --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-15 22:39:31 +00:00
case-k	ebc4a64f9e	docs: fix databricks document url (#19096 ) Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-15 22:25:11 +00:00
Guangdong Liu	cced3eb9bc	community[patch]: Fix sparkllm embeddings api bug. (#19122 ) - Description: Fix sparkllm embeddings api bug. @baskaryan PTAL	2024-03-15 15:08:49 -07:00
kaijietti	c20aeef79a	community[patch]: implement qdrant _aembed_query and use it in other async funcs (#19155 ) `amax_marginal_relevance_search ` and `asimilarity_search_with_score ` should use an async version of `_embed_query `.	2024-03-15 21:20:12 +00:00
Barun Amalkumar Halder	34d6f0557d	community[patch] : publishes duration as milliseconds to Fiddler (#19166 ) Description: Many LLM steps complete in sub-second duration, which can lead to non-collection of duration field for Fiddler. This PR updates duration from seconds to milliseconds. Issue: [INTERNAL] FDL-17568 Dependencies: NA Twitter handle: behalder Co-authored-by: Barun Halder <barun@fiddler.ai>	2024-03-15 14:04:56 -07:00
Barun Amalkumar Halder	b551d49cf5	community[patch] : adds feedback and status for Fiddler callback handler events (#19157 ) Description: This PR adds updates the fiddler events schema to also pass user feedback, and llm status to fiddler Tickets: [INTERNAL] FDL-17559 Dependencies: NA Twitter handle: behalder Co-authored-by: Barun Halder <barun@fiddler.ai>	2024-03-15 12:03:49 -07:00
Juan Felipe Arias	f5b9aedc48	community[patch]: add args_schema to sql_database tools for langGraph integration (#18595 ) - Description: This modification adds pydantic input definition for sql_database tools. This helps for function calling capability in LangGraph. Since actions nodes will usually check for the args_schema attribute on tools, This update should make these tools compatible with it (only implemented on the InfoSQLDatabaseTool) - Issue: N/A - Dependencies: N/A - Twitter handle: juanfe8881	2024-03-15 19:03:36 +00:00
fengjial	c922ea36cb	community[minor]: Add Baidu VectorDB as vector store (#17997 ) Co-authored-by: fengjialin <fengjialin@MacBook-Pro.local>	2024-03-15 19:01:58 +00:00
Erick Friis	781aee0068	community, langchain, infra: revert store extended test deps outside of poetry (#19153 ) Reverts langchain-ai/langchain#18995 Because it makes installing dependencies in python 3.11 extended testing take 80 minutes	2024-03-15 17:10:47 +00:00
Erick Friis	9e569d85a4	community, langchain, infra: store extended test deps outside of poetry (#18995 ) poetry can't reliably handle resolving the number of optional "extended test" dependencies we have. If we instead just rely on pip to install extended test deps in CI, this isn't an issue.	2024-03-15 05:55:30 +00:00
Erick Friis	7ce81eb6f4	voyageai[patch]: init package (#19098 ) Co-authored-by: fodizoltan <zoltan@conway.expert> Co-authored-by: Yujie Qian <thomasq0809@gmail.com> Co-authored-by: fzowl <160063452+fzowl@users.noreply.github.com>	2024-03-15 00:56:10 +00:00
billytrend-cohere	7253b816cc	community: Add support for cohere SDK v5 (keeps v4 backwards compatibility) (#19084 ) - Description: Add support for cohere SDK v5 (keeps v4 backwards compatibility) --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-03-14 15:53:24 -07:00
Eugene Yurtsev	6cdca4355d	community[minor]: Revamp PGVector Filtering (#18992 ) This PR makes the following updates in the pgvector database: 1. Use JSONB field for metadata instead of JSON 2. Update operator syntax to include required `$` prefix before the operators (otherwise there will be name collisions with fields) 3. The change is non-breaking, old functionality is still the default, but it will emit a deprecation warning 4. Previous functionality has bugs associated with comparisons due to casting to text (so lexical ordering is used incorrectly for numeric fields) 5. Adds an a GIN index on the JSONB field for more efficient querying	2024-03-14 16:56:00 -04:00
Anton Parkhomenko	ae73b9d839	community[patch]: Fix NotionDBLoader 400 Error by conditionally adding filter parameter (#19075 ) - Description: This change fixes a bug where attempts to load data from Notion using the NotionDBLoader resulted in a 400 Bad Request error. The issue was traced to the unconditional addition of an empty 'filter' object in the request payload, which Notion's API does not accept. The modification ensures that the 'filter' object is only included in the payload when it is explicitly provided and not empty, thus preventing the 400 error from occurring. - Issue: Fixes [#18009](https://github.com/langchain-ai/langchain/issues/18009) - Dependencies: None - Twitter handle: @gunnzolder Co-authored-by: Anton Parkhomenko <anton@merge.rocks>	2024-03-14 13:56:57 +00:00
Leonid Ganeline	9c8523b529	community[patch]: flattening imports 3 (#18939 ) @eyurtsev	2024-03-12 15:18:54 -07:00
Dobiichi-Origami	471f2ed40a	community[patch]: re-arrange the addtional_kwargs of returned qianfan structure to avoid _merge_dict issue (#18889 ) fix issue: https://github.com/langchain-ai/langchain/issues/18441 PTAL, thanks @baskaryan, @efriis, @eyurtsev, @hwchase17. --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-12 05:43:56 +00:00
Tymofii	0bec1f6877	commnity[patch]: refactor code for faiss vectorstore, update faiss vectorstore documentation (#18092 ) Description: Refactor code of FAISS vectorcstore and update the related documentation. Details: - replace `.format()` with f-strings for strings formatting; - refactor definition of a filtering function to make code more readable and more flexible; - slightly improve efficiency of `max_marginal_relevance_search_with_score_by_vector` method by removing unnecessary looping over the same elements; - slightly improve efficiency of `delete` method by using set data structure for checking if the element was already deleted; Issue: fix small inconsistency in the documentation (the old example was incorrect and unappliable to faiss vectorstore) Dependencies: basic langchain-community dependencies and `faiss` (for CPU or for GPU) Twitter handle: antonenkodev	2024-03-11 22:33:03 -07:00
Leonid Ganeline	11195cfa42	community[patch]: speed up import times in the community package (#18928 ) This PR speeds up import times in the community package	2024-03-11 16:37:36 -04:00
Virat Singh	cafffe8a21	community: Add PolygonAggregates tool (#18882 ) Description: In this PR, I am adding a `PolygonAggregates` tool, which can be used to get historical stock price data (called aggregates by Polygon) for a given ticker. Polygon [docs](https://polygon.io/docs/stocks/get_v2_aggs_ticker__stocksticker__range__multiplier___timespan___from___to) for this endpoint. Twitter: [@virattt](https://twitter.com/virattt)	2024-03-11 11:58:10 -07:00
Mohammad Mohtashim	43db4cd20e	core[major]: On Tool End Observation Casting Fix (#18798 ) This PR updates the on_tool_end handlers to return the raw output from the tool instead of casting it to a string. This is technically a breaking change, though it's impact is expected to be somewhat minimal. It will fix behavior in `astream_events` as well. Fixes the following issue #18760 raised by @eyurtsev --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-03-11 10:59:04 -04:00
Massimiliano Pronesti	8113d612bb	community[patch]: support modin document loader (#18866 ) Langchain community document loaders support `pyspark`, `polars`, and `pandas` dataframes but not `modin`'s. This PR addresses this point.	2024-03-10 18:40:04 -07:00
Pol Ruiz Farre	a7f63d8cb4	community[patch]: Fix BasePDFLoader suffix for s3 presigned urls (#18844 ) BasePDFLoader doesn't parse the suffix of the file correctly when parsing S3 presigned urls. This fix enables the proper detection and parsing of S3 presigned URLs to prevent errors such as `OSError: [Errno 36] File name too long`. No additional dependencies required.	2024-03-11 00:58:51 +00:00
Joshua Carroll	ddaf9de169	community: Fix bug with StreamlitChatMessageHistory (#18834 ) - Description: Fix Streamlit bug which was introduced by https://github.com/langchain-ai/langchain/pull/18250, update integration test - Issue: https://github.com/langchain-ai/langchain/issues/18684 - Dependencies: None	2024-03-09 13:42:22 -08:00
Tomaz Bratanic	a28be31a96	Switch to md5 for deduplication in neo4j integrations (#18846 ) Deduplicate documents using MD5 of the page_content. Also allows for custom deduplication with graph ingestion method by providing metadata id attribute --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2024-03-09 13:28:55 -08:00
Leonid Ganeline	476d6dc596	community[patch]: Use getattr for `toolkits` imports (#18825 ) This will preserve the namespace, without actually loading the underlying packages on init.	2024-03-08 20:54:28 -05:00
Luis Antonio Vieira Junior	67c880af74	community[patch]: adding linearization config to AmazonTextractPDFLoader (#17489 ) - Description: Adding an optional parameter `linearization_config` to the `AmazonTextractPDFLoader` so the caller can define how the output will be linearized, instead of forcing a predefined set of linearization configs. It will still have a default configuration as this will be an optional parameter. - Issue: #17457 - Dependencies: The same ones that already exist for `AmazonTextractPDFLoader` - Twitter handle: [@lvieirajr19](https://twitter.com/lvieirajr19) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-08 17:25:22 -08:00
Anis ZAKARI	37e89ba5b1	community[patch]: Bedrock add support for mistral models (#18756 ) Description*: My previous [PR](https://github.com/langchain-ai/langchain/pull/18521) was mistakenly closed, so I am reopening this one. Context: AWS released two Mistral models on Bedrock last Friday (March 1, 2024). This PR includes some code adjustments to ensure their compatibility with the Bedrock class. --------- Co-authored-by: Anis ZAKARI <anis.zakari@hymaia.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-03-09 01:20:38 +00:00
Keith Chan	914af69b44	community[patch]: Update azuresearch vectorstore from_texts() method to include fields argument (#17661 ) - Description: Update azuresearch vectorstore from_texts() method to include fields argument, necessary for creating an Azure AI Search index with custom fields. - Issue: Currently index fields are fixed to default fields if Azure Search index is created using from_texts() method - Dependencies: None - Twitter handle: None --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-08 17:05:35 -08:00
al1p	46f0cea2b9	community[patch][: improved the suffix prompt to avoid loop (#17791 ) Small improvement to the openapi prompt. The agent was not finding the server base URL (looping through all nodes). This small change narrows the search and enables finding the url faster. No dependency Twitter : @al1pra	2024-03-08 16:53:09 -08:00
Théo LEBRUN	cf94091cd0	community[patch]: Skip nested directories when using S3DirectoryLoader (#17829 ) - Description: `S3DirectoryLoader` is failing if prefix is a folder (ex: `my_folder/`) because `S3FileLoader` will try to load that folder and will fail. This PR skip nested directories so prefix can be set to folder instead of `my_folder/files_prefix`. - Issue: - #11917 - #6535 - #4326 - Dependencies: none - Twitter handle: @Falydoor - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/	2024-03-08 16:50:58 -08:00
Venkatesan	7a18b63dbf	community[patch]: Mongo index creation (#17748 ) - [ ] Title: Mongodb: MongoDB connection performance improvement. - [ ] Message: - Description: I made collection index_creation as optional. Index Creation is one time process. - Issue: MongoDBChatMessageHistory class object is attempting to create an index during connection, causing each request to take longer than usual. This should be optional with a parameter. - Dependencies: N/A - Branch to be checked: origin/mongo_index_creation --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-08 16:43:17 -08:00
wt3639	5b5b37a999	community[patch]: Add embedding instruction to HuggingFaceBgeEmbeddings (#18017 ) - Description: Add embedding instruction to HuggingFaceBgeEmbeddings, so that it can be compatible with nomic and other models that need embedding instruction. --------- Co-authored-by: Tao Wu <tao.wu@rwth-aachen.de> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-08 16:39:29 -08:00
Ishani Vyas	2b0cbd65ba	community[patch]: Add Passio Nutrition AI Food Search Tool to Community Package (#18278 ) ## Add Passio Nutrition AI Food Search Tool to Community Package ### Description We propose adding a new tool to the `community` package, enabling integration with Passio Nutrition AI for food search functionality. This tool will provide a simple interface for retrieving nutrition facts through the Passio Nutrition AI API, simplifying user access to nutrition data based on food search queries. ### Implementation Details - Class Structure: Implement `NutritionAI`, extending `BaseTool`. It includes an `_run` method that accepts a query string and, optionally, a `CallbackManagerForToolRun`. - API Integration: Use `NutritionAIAPI` for the API wrapper, encapsulating all interactions with the Passio Nutrition AI and providing a clean API interface. - Error Handling: Implement comprehensive error handling for API request failures. ### Expected Outcome - User Benefits: Enable easy querying of nutrition facts from Passio Nutrition AI, enhancing the utility of the `langchain_community` package for nutrition-related projects. - Functionality: Provide a straightforward method for integrating nutrition information retrieval into users' applications. ### Dependencies - `langchain_core` for base tooling support - `pydantic` for data validation and settings management - Consider `requests` or another HTTP client library if not covered by `NutritionAIAPI`. ### Tests and Documentation - Unit Tests: Include tests that mock network interactions to ensure tool reliability without external API dependency. - Documentation: Create an example notebook in `docs/docs/integrations/tools/passio_nutrition_ai.ipynb` showing usage, setup, and example queries. ### Contribution Guidelines Compliance - Adhere to the project's linting and formatting standards (`make format`, `make lint`, `make test`). - Ensure compliance with LangChain's contribution guidelines, particularly around dependency management and package modifications. ### Additional Notes - Aim for the tool to be a lightweight, focused addition, not introducing significant new dependencies or complexity. - Potential future enhancements could include caching for common queries to improve performance. ### Twitter Handle - Here is our Passio AI [twitter handle](https://twitter.com/@passio_ai) where we announce our products. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	2024-03-08 20:33:22 +00:00
Kushagra	b1f22bf76c	community[minor]: added a feature to filter documents in Mongoloader (#18253 ) "community: added a feature to filter documents in Mongoloader" - Description: added a feature to filter documents in Mongoloader - Feature: the feature #18251 - Dependencies: No - Twitter handle: https://twitter.com/im_Kushagra	2024-03-08 12:06:35 -08:00
Christophe Bornet	e54a49b697	community[minor]: Add lazy_table_reflection param to SqlDatabase (#18742 ) For some DBs with lots of tables, reflection of all the tables can take very long. So this change will make the tables be reflected lazily when get_table_info() is called and `lazy_table_reflection` is True.	2024-03-08 14:10:23 -05:00

... 4 5 6 7 8 ...

1088 Commits