langchain

Author	SHA1	Message	Date
Jan Philipp Harries	fc3c2c4406	Async Support for LLMChainExtractor (new) (#3780 ) @vowelparrot @hwchase17 Here a new implementation of `acompress_documents` for `LLMChainExtractor ` without changes to the sync-version, as you suggested in #3587 / [Async Support for LLMChainExtractor](https://github.com/hwchase17/langchain/pull/3587) . I created a new PR to avoid cluttering history with reverted commits, hope that is the right way. Happy for any improvements/suggestions. (PS: I also tried an alternative implementation with a nested helper function like ``` python async def acompress_documents_old( self, documents: Sequence[Document], query: str ) -> Sequence[Document]: """Compress page content of raw documents.""" async def _compress_concurrently(doc): _input = self.get_input(query, doc) output = await self.llm_chain.apredict_and_parse(*_input) return Document(page_content=output, metadata=doc.metadata) outputs=await asyncio.gather([_compress_concurrently(doc) for doc in documents]) compressed_docs=list(filter(lambda x: len(x.page_content)>0,outputs)) return compressed_docs ``` But in the end I found the commited version to be better readable and more "canonical" - hope you agree.	2023-05-01 21:23:13 -07:00
Harrison Chase	2cecc572f9	Harrison/chroma get (#3938 ) Co-authored-by: sdan <git@sdan.io>	2023-05-01 21:19:28 -07:00
liviuasnash1	6396a4ad8d	Fix documentation typos (#3870 ) Co-authored-by: Liviu Asnash <liviua@maximallearning.com>	2023-05-01 20:58:38 -07:00
Hristo Stoychev	109927cdb2	Make project compatible with SQLAlchemy 1.3.* (#3862 ) Related to [this issue.](https://github.com/hwchase17/langchain/issues/3655#issuecomment-1529415363) The `Mapped` SQLAlchemy class is introduced in SQLAlchemy 1.4 but the migration from 1.3 to 1.4 is quite challenging so, IMO, it's better to keep backwards compatibility and not change the SQLAlchemy requirements just because of type annotations.	2023-05-01 20:58:22 -07:00
sqr	8bbdde8f9e	make ARG POETRY_HOME available in multistage (#3882 )	2023-05-01 20:57:41 -07:00
玄猫	188a7bd653	fix: pgvector hang risk if table not exist #3883 (#3884 )	2023-05-01 20:57:31 -07:00
tomer555	9acf80fd69	fix: invalid escape sequence error in regex pattern (#3902 ) This PR fixes the "SyntaxError: invalid escape sequence" error in the pydantic.py file. The issue was caused by the backslashes in the regular expression pattern being treated as escape characters. By using a raw string literal for the regex pattern (e.g., r"\{.*\}"), this fix ensures that backslashes are treated as literal characters, thus preventing the error. Co-authored-by: Tomer Levy <tomer.levy@tipalti.com>	2023-05-01 20:57:19 -07:00
Samuel Dion-Girardeau	c5c33786a7	Fix bad spellings for 'convenience' (#3936 ) Found in the docs for chat prompt templates: https://python.langchain.com/en/latest/getting_started/getting_started.html#chat-prompt-templates and fixed similar issues in neighboring notebooks.	2023-05-01 20:57:06 -07:00
Harrison Chase	f04faf8496	Harrison/spreedly (#3937 ) Co-authored-by: Esmit Pérez <esmitperez@users.noreply.github.com>	2023-05-01 20:56:56 -07:00
Harrison Chase	cd3f8582cb	Harrison/combined memory (#3935 ) Co-authored-by: engkheng <60956360+outday29@users.noreply.github.com>	2023-05-01 20:55:56 -07:00
Zander Chase	c4cb55a0c5	[Breaking] Migrate GPT4All to use PyGPT4All (#3934 ) Seems the pyllamacpp package is no longer the supported bindings from gpt4all. Tested that this works locally. Given that the older models weren't very performant, I think it's better to migrate now without trying to include a lot of try / except blocks --------- Co-authored-by: Nissan Pow <npow@users.noreply.github.com> Co-authored-by: Nissan Pow <pownissa@amazon.com>	2023-05-01 20:42:45 -07:00
leo-gan	f0a4bbb8e2	updated `YouTube` links (#3916 ) Added several links to fresh videos Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-05-01 20:39:59 -07:00
Mike Wang	68a18cc621	[simple] add ddg-search to __init__ for easier loading (#3933 ) the same as other tools	2023-05-01 20:39:17 -07:00
Matt Robinson	c51dec5101	feat: add Unstructured API loaders (#3906 ) ### Summary Adds `UnstructuredAPIFileLoaders` and `UnstructuredAPIFIleIOLoaders` that partition documents through the Unstructured API. Defaults to the URL for hosted Unstructured API, but can switch to a self hosted or locally running API using the `url` kwarg. Currently, the Unstructured API is open and does not require an API, but it will soon. A note was added about that to the Unstructured ecosystem page. ### Testing ```python from langchain.document_loaders import UnstructuredAPIFileIOLoader filename = "fake-email.eml" with open(filename, "rb") as f: loader = UnstructuredAPIFileIOLoader(file=f, file_filename=filename) docs = loader.load() docs[0] ``` ```python from langchain.document_loaders import UnstructuredAPIFileLoader filename = "fake-email.eml" loader = UnstructuredAPIFileLoader(file_path=filename, mode="elements") docs = loader.load() docs[0] ```	2023-05-01 20:37:35 -07:00
Harrison Chase	13269fb583	Harrison/relevancy score (#3907 ) Co-authored-by: Ryan Grippeling <R.Grippeling@hotmail.com> Co-authored-by: Ryan <ryan@webgrip.nl> Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com>	2023-05-01 20:37:24 -07:00
Zander Chase	c582f2e9e3	Add Structure Chat Agent (#3912 ) Create a new chat agent that is compatible with the Multi-input tools	2023-05-01 20:34:50 -07:00
Mike Wang	ec21b7126c	[agent][property type] Change allowed_tools to Set as Duplicate doesn’t make sense (#3840 ) - ActionAgent has a property called, `allowed_tools`, which is declared as `List`. It stores all provided tools which is available to use during agent action. - This collection shouldn’t allow duplicates. The original datatype List doesn’t make sense. Each tool should be unique. Even when there are variants (assuming in the future), it would be named differently in load_tools. Test: - confirm the functionality in an example by initializing an agent with a list of 2 tools and confirm everything works. ```python3 def test_agent_chain_chat_bot(): from langchain.agents import load_tools from langchain.agents import initialize_agent from langchain.agents import AgentType from langchain.chat_models import ChatOpenAI from langchain.llms import OpenAI from langchain.utilities.duckduckgo_search import DuckDuckGoSearchAPIWrapper chat = ChatOpenAI(temperature=0) llm = OpenAI(temperature=0) tools = load_tools(["ddg-search", "llm-math"], llm=llm) agent = initialize_agent(tools, chat, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True) agent.run("Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?") test_agent_chain_chat_bot() ``` Result: <img width="863" alt="Screenshot 2023-05-01 at 7 58 11 PM" src="https://user-images.githubusercontent.com/62768671/235572157-0937594c-ddfb-4760-acb2-aea4cacacd89.png">	2023-05-01 20:30:10 -07:00
Harrison Chase	c5cc09d4e3	Harrison/agent exec kwargs (#3917 ) Co-authored-by: Zach Schillaci <40636930+zachschillaci27@users.noreply.github.com>	2023-05-01 20:28:43 -07:00
Harrison Chase	05170b6764	Harrison/from documents (#3919 ) Co-authored-by: Gabriel Altay <gabriel.altay@gmail.com>	2023-05-01 20:28:14 -07:00
Davis Chase	e7e29f9937	Dev2049/add modern treasury (#3924 ) Modified Modern Treasury and Strip slightly so credentials don't have to be passed in explicitly. Thanks @mattgmarcus for adding Modern Treasury! --------- Co-authored-by: Matt Marcus <matt.g.marcus@gmail.com>	2023-05-01 20:28:02 -07:00
Davis Chase	5db6b796cf	Dev2049/hf emb encode kwargs (#3925 ) Thanks @amogkam for the addition! Refactored slightly --------- Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2023-05-01 20:27:41 -07:00
mbchang	ffc87233a1	refactor GymnasiumAgent (#3927 ) refactor GymnasiumAgent (for single-agent environments) to be extensible to PettingZooAgent (multi-agent environments)	2023-05-01 20:25:03 -07:00
mbchang	81601d886c	new example: multi-agent simulations with environment (#3928 )	2023-05-01 20:24:15 -07:00
Harrison Chase	f7a828685d	Harrison/constitutional chain (#3931 ) Co-authored-by: Sam Ching <samuel@duolingo.com>	2023-05-01 20:23:16 -07:00
Eduard van Valkenburg	43a0cb4b92	small change to allow powerbi tools to all have single inputs (#3864 ) Small change in the tool input so that the single_input_tool function works against all powerbi tools	2023-05-01 20:22:16 -07:00
Eduard van Valkenburg	c38cafd6c2	Add connection string auth to cosmos (#3867 ) Adds a connection string option for the cosmos memory, in case AAD auth is not enabled on the cosmos instance.	2023-05-01 20:21:46 -07:00
Venelin Valkov	bc7e4d5cd4	Add links to YouTube videos by Venelin Valkov (#3820 ) Hi, I've added links to my YouTube videos on LangChain. Thank you for making/maintaining LangChain! Venelin	2023-05-01 20:20:30 -07:00
Rafal Wojdyla	a5a4999fb7	New line should be remove only for the 1st gen embedding models (#3853 ) Only 1st generation OpenAI embeddings models are negatively impacted by new lines. Context: https://github.com/openai/openai-python/issues/418#issuecomment-1525939500	2023-05-01 20:09:20 -07:00
Johan Stenberg (MSFT)	6bd367916c	Update adding_memory_chain_multiple_inputs.ipynb (#3895 ) Fix misleading docs in memory chain example (used the term "outputs" instead of "inputs")	2023-05-01 19:57:27 -07:00
Zander Chase	9b9b231e10	Update some Tools Docs (#3913 ) Haven't gotten to all of them, but this: - Updates some of the tools notebooks to actually instantiate a tool (many just show a 'utility' rather than a tool. More changes to come in separate PR) - Move the `Tool` and decorator definitions to `langchain/tools/base.py` (but still export from `langchain.agents`) - Add scene explain to the load_tools() function - Add unit tests for public apis for the langchain.tools and langchain.agents modules	2023-05-01 19:07:26 -07:00
Zander Chase	84ea17b786	Move Tool Validation (#3923 ) Move tool validation to each implementation of the Agent. Another alternative would be to adjust the `_validate_tools()` signature to accept the output parser (and format instructions) and add logic there. Something like `parser.outputs_structured_actions(format_instructions)` But don't think that's needed right now.	2023-05-01 18:44:24 -07:00
Eugene Yurtsev	7cce68a051	Add minimal file system blob loader (#3669 ) This adds a minimal file system blob loader. If looks good, this PR will be merged and a few additional enhancements will be made.	2023-05-01 21:37:26 -04:00
Bank Natchapol	487d4aeebd	Motorhead Memory messages come in reversed order. (#3835 ) History from Motorhead memory return in reversed order It should be Human: 1, AI:..., Human: 2, Ai... ``` You are a chatbot having a conversation with a human. AI: I'm sorry, I'm still not sure what you're trying to communicate. Can you please provide more context or information? Human: 3 AI: I'm sorry, I'm not sure what you mean by "1" and "2". Could you please clarify your request or question? Human: 2 AI: Hello, how can I assist you today? Human: 1 Human: 4 AI: ``` So, i `reversed` the messages before putting in chat_memory.	2023-05-01 17:02:34 -07:00
Davis Chase	900ad106d3	Update google palm model signatures (#3920 ) Signatures out of date after callback refactors	2023-05-01 16:19:31 -07:00
sherylZhaoCode	145ff23fb1	correct the llm type of AzureOpenAI (#3721 ) The llm type of AzureOpenAI was previously set to default, which is openai. But since AzureOpenAI has different API from openai, it creates problems when doing chain saving and loading. This PR corrected the llm type of AzureOpenAI to "azure"	2023-05-01 15:51:34 -07:00
engkheng	21335d43b2	Minor `LLMChain` docs correction (#3791 ) `LLMChain` run method can take multiple input variables.	2023-05-01 15:50:57 -07:00
Rafal Wojdyla	039b672f46	Fixup OpenAI Embeddings - fix the weighted mean (#3778 ) Re: https://github.com/hwchase17/langchain/issues/3777 Copy pasting from the issue: While working on https://github.com/hwchase17/langchain/issues/3722 I have noticed that there might be a bug in the current implementation of the OpenAI length safe embeddings in `_get_len_safe_embeddings`, which before https://github.com/hwchase17/langchain/issues/3722 was actually the default implementation regardless of the length of the context (via https://github.com/hwchase17/langchain/pull/2330). It appears the weights used are constant and the length of the embedding vector (1536) and NOT the number of tokens in the batch, as in the reference implementation at https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb <hr> Here's some debug info: <img width="1094" alt="image" src="https://user-images.githubusercontent.com/1419010/235286595-a8b55298-7830-45df-b9f7-d2a2ad0356e0.png"> <hr> We can also validate this against the reference implementation: <details> <summary>Reference implementation (click to unroll)</summary> This implementation is copy pasted from https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb ```py import openai from itertools import islice import numpy as np from tenacity import retry, wait_random_exponential, stop_after_attempt, retry_if_not_exception_type EMBEDDING_MODEL = 'text-embedding-ada-002' EMBEDDING_CTX_LENGTH = 8191 EMBEDDING_ENCODING = 'cl100k_base' # let's make sure to not retry on an invalid request, because that is what we want to demonstrate @retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6), retry=retry_if_not_exception_type(openai.InvalidRequestError)) def get_embedding(text_or_tokens, model=EMBEDDING_MODEL): return openai.Embedding.create(input=text_or_tokens, model=model)["data"][0]["embedding"] def batched(iterable, n): """Batch data into tuples of length n. The last batch may be shorter.""" # batched('ABCDEFG', 3) --> ABC DEF G if n < 1: raise ValueError('n must be at least one') it = iter(iterable) while (batch := tuple(islice(it, n))): yield batch def chunked_tokens(text, encoding_name, chunk_length): encoding = tiktoken.get_encoding(encoding_name) tokens = encoding.encode(text) chunks_iterator = batched(tokens, chunk_length) yield from chunks_iterator def reference_safe_get_embedding(text, model=EMBEDDING_MODEL, max_tokens=EMBEDDING_CTX_LENGTH, encoding_name=EMBEDDING_ENCODING, average=True): chunk_embeddings = [] chunk_lens = [] for chunk in chunked_tokens(text, encoding_name=encoding_name, chunk_length=max_tokens): chunk_embeddings.append(get_embedding(chunk, model=model)) chunk_lens.append(len(chunk)) if average: chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens) chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings) # normalizes length to 1 chunk_embeddings = chunk_embeddings.tolist() return chunk_embeddings ``` </details> ```py long_text = 'foo bar' * 5000 reference_safe_get_embedding(long_text, average=True)[:10] # Here's the first 10 floats from the reference embeddings: [0.004407593824276758, 0.0017611146161865465, -0.019824815970984996, -0.02177626039794025, -0.012060967454897886, 0.0017955296329155309, -0.015609168983609643, -0.012059823076681351, -0.016990468527792825, -0.004970484452089445] # and now langchain implementation from langchain.embeddings.openai import OpenAIEmbeddings OpenAIEmbeddings().embed_query(long_text)[:10] [0.003791506184693747, 0.0025310066579390025, -0.019282322699514628, -0.021492679249899803, -0.012598522213242891, 0.0022181168611315662, -0.015858940621301307, -0.011754004130791204, -0.016402944319627515, -0.004125287485127554] # clearly they are different ^ ```	2023-05-01 15:47:38 -07:00
Younis Shah	22a1896c30	[docs]: updates connecting_to_a_feature_store.ipynb (#3776 ) * fixes `FeastPromptTemplate.format` example to use `driver_id`	2023-05-01 15:45:59 -07:00
Harrison Chase	e28c6403aa	Harrison/cohere reranker (#3904 )	2023-05-01 15:40:16 -07:00
Zura Isakadze	647bbf61c1	Add SQLiteChatMessageHistory (#3534 ) It's based on already existing `PostgresChatMessageHistory` Use case somewhere in between multiple files and Postgres storage.	2023-05-01 15:40:00 -07:00
James Brotchie	921894960b	Add ChatModel, LLM, and Embeddings for Google's PaLM APIs (#3575 ) - Add langchain.llms.GooglePalm for text completion, - Add langchain.chat_models.ChatGooglePalm for chat completion, - Add langchain.embeddings.GooglePalmEmbeddings for sentence embeddings, - Add example field to HumanMessage and AIMessage so that users can feed in examples into the PaLM Chat API, - Add system and unit tests. Note async completion for the Text API is not yet supported and will be included in a future PR. Happy for feedback on any aspect of this PR, especially our choice of adding an example field to Human and AI Message objects to enable passing example messages to the API.	2023-05-01 15:23:16 -07:00
Roma	d15f481352	Add unit test to output parsers (#3911 ) This pull request adds unit tests for various output parsers (BooleanOutputParser, CommaSeparatedListOutputParser, and StructuredOutputParser) to ensure their correct functionality and to increase code reliability and maintainability. The tests cover both valid and invalid input cases. Changes: Added unit tests for BooleanOutputParser. Added unit tests for CommaSeparatedListOutputParser. Added unit tests for StructuredOutputParser. Testing: All new unit tests have been executed, and they pass successfully. The overall test suite has been run, and all tests pass. Notes: These tests cover both successful parsing scenarios and error handling for invalid inputs. If any new output parsers are added in the future, corresponding unit tests should also be created to maintain coverage.	2023-05-01 14:53:08 -07:00
Tim Asp	9c89ff8bd9	Increase `request_timeout` on ChatOpenAI (#3910 ) With longer context and completions, gpt-3.5-turbo and, especially, gpt-4, will more times than not take > 60seconds to respond. Based on some other discussions, it seems like this is an increasingly common problem, especially with summarization tasks. - https://github.com/hwchase17/langchain/issues/3512 - https://github.com/hwchase17/langchain/issues/3005 OpenAI's max 600s timeout seems excessive, so I settled on 120, but I do run into generations that take >240 seconds when using large prompts and completions with GPT-4, so maybe 240 would be a better compromise?	2023-05-01 14:51:05 -07:00
Davis Chase	2451310975	Chroma fix mmr (#3897 ) Fixes #3628, thanks @derekmoeller for the issue!	2023-05-01 10:47:15 -07:00
mbchang	3e1cb31f63	fix: add import for gymnasium (#3899 )	2023-05-01 10:37:25 -07:00
Zander Chase	484707ad29	Add incremental messages token count (#3890 )	2023-05-01 10:36:54 -07:00
Davis Chase	52e4fba897	Fix self query pinecone translation (#3892 ) Enum to string conversion handled differently between python 3.9 and 3.11, currently breaking in 3.11 (see #3788). Thanks @peter-brady for catching this!	2023-05-01 10:35:48 -07:00
Jef Packer	47a685adcf	count tokens instead of chars in autogpt prompt (#3841 ) This looks like a bug. Overall by using len instead of token_counter the prompt thinks it has less context window than it actually does. Because of this it adds fewer messages. The reduced previous message context makes the agent repetitive when selecting tasks.	2023-05-01 09:21:42 -07:00
Nikolas Garske	c4d3d74148	Fix typos in arxiv.ipynb (#3887 ) Several minor typos in the doc for the arxiv document loaders were fixed.	2023-05-01 09:17:37 -07:00
Zander Chase	f7cb2af5f4	Export StructuredTool at `/tools` (#3858 )	2023-04-30 19:22:21 -07:00

1 2 3 4 5 ...

1800 Commits