langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-02 09:40:22 +00:00

Author	SHA1	Message	Date
Andres Algaba	05ae8ca7d4	community[patch]: deprecate persist method in Chroma (#20855 ) Thank you for contributing to LangChain! - [x] PR title - [x] PR message: - Description: Deprecate persist method in Chroma no longer exists in Chroma 0.4.x - Issue: #20851 - Dependencies: None - Twitter handle: AndresAlgaba1 - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-25 19:42:03 +00:00
ccurme	fdabd3cdf5	mistral, openai: support custom tokenizers in chat models (#20901 )	2024-04-25 15:23:29 -04:00
ccurme	b8db73233c	core, community: deprecate tool.__call__ (#20900 ) Does not update docs.	2024-04-25 14:50:39 -04:00
Tomaz Bratanic	520972fd0f	community[patch]: Support passing graph object to Neo4j integrations (#20876 ) For driver connection reusage, we introduce passing the graph object to neo4j integrations	2024-04-25 11:30:22 -07:00
Lei Zhang	748a6ae609	community[patch]: add HTTP response headers Content-Type to metadata of RecursiveUrlLoader document (#20875 ) Description: The RecursiveUrlLoader loader offers a link_regex parameter that can filter out URLs. However, this filtering capability is limited, and if the internal links of the website change, unexpected resources may be loaded. These resources, such as font files, can cause problems in subsequent embedding processing. > https://blog.langchain.dev/assets/fonts/source-sans-pro-v21-latin-ext_latin-regular.woff2?v=0312715cbf We can add the Content-Type in the HTTP response headers to the document metadata so developers can choose which resources to use. This allows developers to make their own choices. For example, the following may be a good choice for text knowledge. - text/plain - simple text file - text/html - HTML web page - text/xml - XML format file - text/json - JSON format data - application/pdf - PDF file - application/msword - Word document and ignore the following - text/css - CSS stylesheet - text/javascript - JavaScript script - application/octet-stream - binary data - image/jpeg - JPEG image - image/png - PNG image - image/gif - GIF image - image/svg+xml - SVG image - audio/mpeg - MPEG audio files - video/mp4 - MP4 video file - application/font-woff - WOFF font file - application/font-ttf - TTF font file - application/zip - ZIP compressed file - application/octet-stream - binary data Twitter handle: @coolbeevip --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-25 11:29:41 -07:00
Erick Friis	eca3640af7	upstage: release 0.1.2 (#20898 )	2024-04-25 10:41:19 -07:00
Joan Fontanals	baefbfb14e	community[mionr]: add Jina Reranker in retrievers module (#19406 ) - Description: Adapt JinaEmbeddings to run with the new Jina AI Rerank API - Twitter handle: https://twitter.com/JinaAI_ - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-25 10:27:10 -07:00
Erick Friis	92969d49cb	multiple: remove external repo mds (#20896 ) api docs build doesn't tolerate them	2024-04-25 17:18:29 +00:00
Jason_Chen	53bb7dbd29	community[patch]: add BeautifulSoupTransformer remove_unwanted_classnames method (#20467 ) Add the remove_unwanted_classnames method to the BeautifulSoupTransformer class, which can filter more effectively. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-25 17:04:04 +00:00
YISH	ed26149a29	openai[patch]: Allow disablling safe_len_embeddings(OpenAIEmbeddings) (#19743 ) OpenAI API compatible server may not support `safe_len_embedding`， use `disable_safe_len_embeddings=True` to disable it. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-25 09:45:52 -07:00
Bagatur	5b83130855	core[minor], langchain[patch], community[patch]: mv StructuredQuery (#20849 ) mv StructuredQuery to core	2024-04-25 09:40:26 -07:00
Sean	540f384197	partner: Upstage quick documentation update (#20869 ) * Updating the provider docs page. The RAG example was meant to be moved to cookbook, but was merged by mistake. * Fix bug in Groundedness Check --------- Co-authored-by: JuHyung-Son <sonju0427@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-04-25 16:36:54 +00:00
Bagatur	ffad3985a1	core[patch]: Release 0.1.46 (#20891 )	2024-04-25 15:40:17 +00:00
Mish Ushakov	6ccecf2363	community[minor]: added Browserbase loader (#20478 )	2024-04-25 01:11:03 +00:00
Erick Friis	5da9dd1195	mistral: comment batching param (#20868 ) Addresses #20523	2024-04-25 00:38:21 +00:00
Ivaylo Bratoev	7c5063ef60	infra: fix how Poetry is installed in the dev container (#20521 ) Currently, when a new dev container is created, poetry does not work in it with the error "No module named 'rapidfuzz'". Install Poetry outside the project venv so that poetry and project dependencies do not get mixed. Use pipx to install poetry securely in its own isolated environment. Issue: #12237 Twitter handle: https://twitter.com/ibratoev Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-24 17:33:25 -07:00
GustavoSept	c2d09a5186	experimental[patch]: Makes regex customizable in text_splitter.py (SemanticChunker class) (#20485 ) - Description: Currently, the regex is static (`r"(?<=[.?!])\s+"`), which is only useful for certain use cases. The current change only moves this to be a parameter of split_text(). Which adds flexibility without making it more complex (as the default regex is still the same). - Issue: Not applicable (I searched, no one seems to have created this issue yet). - Dependencies: None. _If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17._ --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-25 00:32:40 +00:00
William FH	a936f696a6	[Core] Feat: update config CVar in tool.invoke (#20808 )	2024-04-24 17:17:21 -07:00
Lei Zhang	2cd907ad7e	text-splitters[patch]: fix MarkdownHeaderTextSplitter fails to parse headers with non-printable characters (#20645 ) Description: MarkdownHeaderTextSplitter Fails to Parse Headers with non-printable characters. more #20643 The following is the official test case. Just replacing `# Foo\n\n` with `\ufeff# Foo\n\n` will cause the test case to fail. chunk metadata is empty ```python def test_md_header_text_splitter_1() -> None: """Test markdown splitter by header: Case 1.""" markdown_document = ( "\ufeff# Foo\n\n" " ## Bar\n\n" "Hi this is Jim\n\n" "Hi this is Joe\n\n" " ## Baz\n\n" " Hi this is Molly" ) headers_to_split_on = [ ("#", "Header 1"), ("##", "Header 2"), ] markdown_splitter = MarkdownHeaderTextSplitter( headers_to_split_on=headers_to_split_on, ) output = markdown_splitter.split_text(markdown_document) expected_output = [ Document( page_content="Hi this is Jim \nHi this is Joe", metadata={"Header 1": "Foo", "Header 2": "Bar"}, ), Document( page_content="Hi this is Molly", metadata={"Header 1": "Foo", "Header 2": "Baz"}, ), ] assert output == expected_output ``` twitter: @coolbeevip Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-25 00:07:42 +00:00
ccurme	481d3855dc	patch: remove usage of llm, chat model __call__ (#20788 ) - `llm(prompt)` -> `llm.invoke(prompt)` - `llm(prompt=prompt` -> `llm.invoke(prompt)` (same with `messages=`) - `llm(prompt, callbacks=callbacks)` -> `llm.invoke(prompt, config={"callbacks": callbacks})` - `llm(prompt, kwargs)` -> `llm.invoke(prompt, kwargs)`	2024-04-24 19:39:23 -04:00
Raghav Dixit	9b7fb381a4	community[patch]: LanceDB integration patch update (#20686 ) Description : - added functionalities - delete, index creation, using existing connection object etc. - updated usage - Added LaceDB cloud OSS support make lint_diff , make test checks done	2024-04-24 16:27:43 -07:00
Nikita Pokidyshev	9e983c9500	langchain[patch]: fix agent_token_buffer_memory not working with openai tools (#20708 ) - Description: fix a bug in the agent_token_buffer_memory - Issue: agent_token_buffer_memory was not working with openai tools - Dependencies: None - Twitter handle: @pokidyshef	2024-04-24 15:51:58 -07:00
Erick Friis	1aef8116de	upstage: release 0.1.1 (#20864 )	2024-04-24 15:18:30 -07:00
junkeon	c8fd51e8c8	upstage: Add Upstage partner package LA and GC (#20651 ) --------- Co-authored-by: Sean <chosh0615@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Sean Cho <sean@upstage.ai>	2024-04-24 15:17:20 -07:00
Alex Lee	243ba71b28	langchain[patch]: add `aprep_output` method to `langchain/chains/base.py` (#20748 ) ## Description Add `aprep_output` method to `langchain/chains/base.py`. Some downstream `ChatMessageHistory` objects that use async connections require an async way to append to the context. It turned out that `ainvoke()` was calling `prep_output` which is synchronous. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-24 22:16:25 +00:00
Harrison Chase	43c041cda5	support messages in messages out (#20862 )	2024-04-24 14:58:58 -07:00
back2nix	a1614b88ac	groq[patch]: groq proxy support (#20758 ) # Proxy Fix for Groq Class 🐛 🚀 ## Description This PR fixes a bug related to proxy settings in the `Groq` class, allowing users to connect to LangChain services via a proxy. ## Changes Made - ✅ FIX support for specifying proxy settings in the `Groq` class. - ✅ Resolved the bug causing issues with proxy settings. - ❌ Did not include unit tests and documentation updates. - ❌ Did not run make format, make lint, and make test to ensure code quality and functionality because I couldn't get it to run, so I don't program in Python and couldn't run `ruff`. - ❔ Ensured that the changes are backwards compatible. - ✅ No additional dependencies were added to `pyproject.toml`. ### Error Before Fix ```python Traceback (most recent call last): File "/home/bg/Documents/code/github.com/back2nix/test/groq/main.py", line 9, in <module> chat = ChatGroq( ^^^^^^^^^ File "/home/bg/Documents/code/github.com/back2nix/test/groq/venv310/lib/python3.11/site-packages/langchain_core/load/serializable.py", line 120, in __init__ super().__init__(**kwargs) File "/home/bg/Documents/code/github.com/back2nix/test/groq/venv310/lib/python3.11/site-packages/pydantic/v1/main.py", line 341, in __init__ raise validation_error pydantic.v1.error_wrappers.ValidationError: 1 validation error for ChatGroq __root__ Invalid `http_client` argument; Expected an instance of `httpx.AsyncClient` but got <class 'httpx.Client'> (type=type_error) ``` ### Example usage after fix ```python3 import os import httpx from langchain_core.prompts import ChatPromptTemplate from langchain_groq import ChatGroq chat = ChatGroq( temperature=0, groq_api_key=os.environ.get("GROQ_API_KEY"), model_name="mixtral-8x7b-32768", http_client=httpx.Client( proxies="socks5://127.0.0.1:1080", transport=httpx.HTTPTransport(local_address="0.0.0.0"), ), http_async_client=httpx.AsyncClient( proxies="socks5://127.0.0.1:1080", transport=httpx.HTTPTransport(local_address="0.0.0.0"), ), ) system = "You are a helpful assistant." human = "{text}" prompt = ChatPromptTemplate.from_messages([("system", system), ("human", human)]) chain = prompt \| chat out = chain.invoke({"text": "Explain the importance of low latency LLMs"}) print(out) ``` --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-24 21:58:03 +00:00
volodymyr-memsql	493afe4d8d	community[patch]: add hybrid search to singlestoredb vectorstore (#20793 ) Implemented the ability to enable full-text search within the SingleStore vector store, offering users a versatile range of search strategies. This enhancement allows users to seamlessly combine full-text search with vector search, enabling the following search strategies: * Search solely by vector similarity. * Conduct searches exclusively based on text similarity, utilizing Lucene internally. * Filter search results by text similarity score, with the option to specify a threshold, followed by a search based on vector similarity. * Filter results by vector similarity score before conducting a search based on text similarity. * Perform searches using a weighted sum of vector and text similarity scores. Additionally, integration tests have been added to comprehensively cover all scenarios. Updated notebook with examples. CC: @baskaryan, @hwchase17 --------- Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-24 21:34:50 +00:00
Tomaz Bratanic	9efab3ed66	community[patch]: Add driver config param for neo4j graph (#20772 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-24 21:14:41 +00:00
Leonid Ganeline	13751c3297	community: `tigergraph` fixes (#20034 ) - added guard on the `pyTigerGraph` import - added a missed example page in the `docs/integrations/graphs/` - formatted the `docs/integrations/providers/` page to the consistent format. Added links.	2024-04-24 16:49:21 -04:00
Martin Kolb	0186e4e633	community[patch]: Advanced filtering for HANA Cloud Vector Engine (#20821 ) - Description: This PR adds support for advanced filtering to the integration of HANA Vector Engine. The newly supported filtering operators are: $eq, $ne, $gt, $gte, $lt, $lte, $between, $in, $nin, $like, $and, $or - Issue: N/A - Dependencies: no new dependencies added Added integration tests to: `libs/community/tests/integration_tests/vectorstores/test_hanavector.py` Description of the new capabilities in notebook: `docs/docs/integrations/vectorstores/hanavector.ipynb`	2024-04-24 13:47:27 -07:00
Alex Sherstinsky	12e5ec6de3	community: Support both Predibase SDK-v1 and SDK-v2 in Predibase-LangChain integration (#20859 )	2024-04-24 13:31:01 -07:00
Erick Friis	8c95ac3145	docs, multiple: de-beta with_structured_output (#20850 )	2024-04-24 19:34:57 +00:00
Nuno Campos	477eb1745c	Better support for subgraphs in graph viz (#20840 )	2024-04-24 12:32:52 -07:00
JeffKatzy	5ab3f9a995	community[patch]: standardize chat init args (#20844 ) Thank you for contributing to LangChain! community:perplexity[patch]: standardize init args updated pplx_api_key and request_timeout so that aliased to api_key, and timeout respectively. Added test that both continue to set the same underlying attributes. Related to [20085](https://github.com/langchain-ai/langchain/issues/20085) --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-24 12:26:05 -07:00
Massimiliano Pronesti	8d1167b32f	community[patch]: add support for similarity_score_threshold search in… (#20852 ) See https://github.com/langchain-ai/langchain/issues/20600#issuecomment-2075569338 for details. @chrislrobert	2024-04-24 19:14:33 +00:00
Eugene Yurtsev	d8aa72f51d	core[minor],langchain[patch]: Move base indexing interface and logic to core (#20667 ) This PR moves the interface and the logic to core. The following changes to namespaces: `indexes` -> `indexing` `indexes._api` -> `indexing.api` Testing code is intentionally duplicated for now since it's testing different implementations of the record manager (in-memory vs. SQL). Common logic will need to be pulled out into the test client. A follow up PR will move the SQL based implementation outside of LangChain.	2024-04-24 13:18:42 -04:00
ccurme	3bcfbcc871	groq: handle null queue_time (#20839 )	2024-04-24 09:50:09 -07:00
Eugene Yurtsev	30e48c9878	core[patch],community[patch]: Move file chat history back to community (#20834 ) Marking as patch since we haven't had releases in between. This just reverting part of a PR from yesterday.	2024-04-24 12:47:25 -04:00
ccurme	6debadaa70	groq: bump core (#20838 )	2024-04-24 11:51:46 -04:00
Erick Friis	7984206c95	groq: release 0.1.3 (#20836 ) Fixes #20811	2024-04-24 08:06:06 -07:00
Nestor Qin	9111d3a636	community[patch]: Fix message formatting for Anthropic models on Amazon Bedrock (#20801 ) Description: This PR fixes an issue in message formatting function for Anthropic models on Amazon Bedrock. Currently, LangChain BedrockChat model will crash if it uses Anthropic models and the model return a message in the following type: - `AIMessageChunk` Moreover, when use BedrockChat with for building Agent, the following message types will trigger the same issue too: - `HumanMessageChunk` - `FunctionMessage` Issue: https://github.com/langchain-ai/langchain/issues/18831 Dependencies: No. Testing: Manually tested. The following code was failing before the patch and works after. ``` @tool def square_root(x: str): "Useful when you need to calculate the square root of a number" return math.sqrt(int(x)) llm = ChatBedrock( model_id="anthropic.claude-3-sonnet-20240229-v1:0", model_kwargs={ "temperature": 0.0 }, ) prompt = ChatPromptTemplate.from_messages( [ ("system", FUNCTION_CALL_PROMPT), ("human", "Question: {user_input}"), MessagesPlaceholder(variable_name="agent_scratchpad"), ] ) tools = [square_root] tools_string = format_tool_to_anthropic_function(square_root) agent = ( RunnablePassthrough.assign( user_input=lambda x: x['user_input'], agent_scratchpad=lambda x: format_to_openai_function_messages( x["intermediate_steps"] ) ) \| prompt \| llm \| AnthropicFunctionsAgentOutputParser() ) agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, return_intermediate_steps=True) output = agent_executor.invoke({ "user_input": "What is the square root of 2?", "tools_string": tools_string, }) ``` List of messages returned from Bedrock: ``` <SystemMessage> content='You are a helpful assistant.' <HumanMessage> content='Question: What is the square root of 2?' <AIMessageChunk> content="Okay, let's calculate the square root of 2.<scratchpad>\nTo calculate the square root of a number, I can use the square_root tool:\n\n<function_calls>\n <invoke>\n <tool_name>square_root</tool_name>\n <parameters>\n <__arg1>2</__arg1>\n </parameters>\n </invoke>\n</function_calls>\n</scratchpad>\n\n<function_results>\n<search_result>\nThe square root of 2 is approximately 1.414213562373095\n</search_result>\n</function_results>\n\n<answer>\nThe square root of 2 is approximately 1.414213562373095\n</answer>" id='run-92363df7-eff6-4849-bbba-fa16a1b2988c'" <FunctionMessage> content='1.4142135623730951' name='square_root' ```	2024-04-23 22:40:39 +00:00
ccurme	06b04b80b8	groq: fix warning filter for integration test (#20806 )	2024-04-23 18:11:41 -04:00
ccurme	5a3c65a756	standard tests: add xfails (#20659 )	2024-04-23 17:14:16 -04:00
Erick Friis	ddc2274aea	standard-tests: split tool calling test (#20803 ) just making it a bit easier to grok	2024-04-23 20:59:45 +00:00
ccurme	6622829c67	mistral: catch GatedRepoError, release 0.1.3 (#20802 ) https://github.com/langchain-ai/langchain/issues/20618 --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-04-23 20:56:42 +00:00
Eugene Yurtsev	a7c347ab35	langchain[patch]: Update evaluation logic that instantiates a default LLM (#20760 ) Favor langchain_openai over langchain_community for evaluation logic. --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-04-23 16:09:32 -04:00
Eugene Yurtsev	72f720fa38	langchain[major]: Remove default instantations of LLMs from VectorstoreToolkit (#20794 ) Remove default instantiation from vectorstore toolkit.	2024-04-23 16:09:14 -04:00
ccurme	42de5168b1	langchain: deprecate LLMChain, RetrievalQA, and ConversationalRetrievalChain (#20751 )	2024-04-23 15:55:34 -04:00
Erick Friis	30c7951505	core: use qualname in beta message (#20361 )	2024-04-23 11:20:13 -07:00

1 2 3 4 5 ...

3921 Commits