langchain

Author	SHA1	Message	Date
Harrison Chase	be7a8e0824	Harrison/redis cache (#3766 ) Co-authored-by: Tyler Hutcherson <tyler.hutcherson@redis.com>	2023-04-28 20:47:18 -07:00
Mike Wang	b588446bf9	[simple][test] Added test case for schema.py (#3692 ) - added unittest for schema.py covering utility functions and token counting. - fixed a nit. based on huggingface doc, the tokenizer model is gpt-2. [link](https://huggingface.co/transformers/v4.8.2/_modules/transformers/models/gpt2/tokenization_gpt2_fast.html) - make lint && make format, passed on local - screenshot of new test running result <img width="1283" alt="Screenshot 2023-04-27 at 9 51 55 PM" src="https://user-images.githubusercontent.com/62768671/235057441-c0ac3406-9541-453f-ba14-3ebb08656114.png">	2023-04-28 20:42:24 -07:00
Harrison Chase	15b92d361d	Harrison/confluence stuff (#3765 ) Co-authored-by: Jelmer Borst <japborst@gmail.com>	2023-04-28 20:19:44 -07:00
SimFG	5998b53596	Use the GPTCache api interface (#3693 ) Use the GPTCache api interface to reduce the possibility of compatibility issues	2023-04-28 20:18:51 -07:00
engkheng	f37a932b24	Improve chat prompt template docs (#3719 ) Add a few more explanations and examples.	2023-04-28 20:16:22 -07:00
Robert Perrotta	22770f5202	Make StuffDocumentsChain doc separator configurable (#3718 ) This PR makes the `"\n\n"` string with which `StuffDocumentsChain` joins formatted documents a property so it can be configured. The new `document_separator` property defaults to `"\n\n"` so the change is backwards compatible.	2023-04-28 20:14:07 -07:00
Akhil Vempali	64ba24292d	fix: 🐛 SQLAlchemy import error (#3716 ) During the import of langchain, SQLAlchemy was throeing an errror `ImportError: cannot import name 'Mapped' from 'sqlalchemy.orm'`. This is becaue the Mapped name was introduced in v1.4	2023-04-28 20:13:32 -07:00
Jon Saginaw	f8d69e4e52	Enhancement: Blockchain Document Loader with better Metadata support (#3710 ) This PR includes some minor alignment updates, including: - metadata object extended to support contractAddress, blockchainType, and tokenId - notebook doc better aligned to standard langchain format - startToken changed from int to str to support multiple hex value types on the Alchemy API The updated metadata will look like the below. It's possible for a single contractAddress to exist across multiple blockchains (e.g. Ethereum, Polygon, etc.) so it's important to include the blockchainType. ``` metadata = {"source": self.contract_address, "blockchain": self.blockchainType, "tokenId": tokenId} ```	2023-04-28 20:13:05 -07:00
Davis Chase	220a7076ac	Add Mathpix pdf loader (#3727 ) Inspo https://twitter.com/danielgross/status/1651695062307274754?s=46&t=1zHLap5WG4I_kQPPjfW9fA Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-28 20:11:22 -07:00
Rafal Wojdyla	37ed6f2177	Handle length safe embedding only if needed (#3723 ) Re: https://github.com/hwchase17/langchain/issues/3722 Copy pasting context from the issue: `1bf1c37c0c/langchain/embeddings/openai.py (L210-L211)` Means that the length safe embedding method is "always" used, initial implementation https://github.com/hwchase17/langchain/pull/991 has the `embedding_ctx_length` set to -1 (meaning you had to opt-in for the length safe method), https://github.com/hwchase17/langchain/pull/2330 changed that to max length of OpenAI embeddings v2, meaning the length safe method is used at all times. How about changing that if branch to use length safe method only when needed, meaning when the text is longer than the max context length?	2023-04-28 20:10:04 -07:00
Harrison Chase	40f6e60e68	Harrison/stripe (#3762 ) Co-authored-by: Ismail Pelaseyed <homanp@gmail.com>	2023-04-28 20:03:21 -07:00
Jelmer Borst	8cf2ff0be0	Confluence: Add page status filter for spaces (#3732 ) At the moment all content in Confluence is retrieved by default, including archived content. Often, this is undesired as the content is not relevant anymore. Notes Fetching pages by label does not support excluding archived content. This may lead to unexpected results.	2023-04-28 19:56:53 -07:00
Harrison Chase	7a129ac043	Harrison/pypdf loader (#3764 ) Co-authored-by: Felipe Meres <felipe@felipemeres.com>	2023-04-28 19:56:21 -07:00
mbchang	4eefea0fe8	new example: single agent, simulated environment (openai gym) (#3758 ) For many applications of LLM agents, the environment is real (internet, database, REPL, etc). However, we can also define agents to interact in simulated environments like text-based games. This is an example of how to create a simple agent-environment interaction loop with [Gymnasium](https://github.com/Farama-Foundation/Gymnasium) (formerly [OpenAI Gym](https://github.com/openai/gym)).	2023-04-28 19:52:05 -07:00
0xDTE	6ce34bb4fe	Fixing broken document links (#3756 ) simple document url fixes. nothing fancy.	2023-04-28 19:51:23 -07:00
Rafal Wojdyla	160bfae93f	Add `DocstoreFn` - lookup doc via arbitrary function (#3760 ) This partially addresses https://github.com/hwchase17/langchain/issues/1524, but it's also useful for some of our use cases. This `DocstoreFn` allows to lookup a document given a function that accepts the `search` string without the need to implement a custom `Docstore`. This could be useful when: * you don't want to implement a `Docstore` just to provide a custom `search` * it's expensive to construct an `InMemoryDocstore`/dict * you retrieve documents from remote sources * you just want to reuse existing objects	2023-04-28 19:50:32 -07:00
Harrison Chase	c55ba43093	Harrison/vespa (#3761 ) Co-authored-by: Lester Solbakken <lesters@users.noreply.github.com>	2023-04-28 19:48:43 -07:00
mbchang	ee20b3e0d0	bug fix: initialize the arxivAPIWrapper object (#3733 )	2023-04-28 19:35:01 -07:00
leo-gan	e510732ad2	docs: improved `vectorstore` notebooks (#3724 ) - Added links to the vectorstore providers - Added installation code (it is not clear that we have to go to the `LangChan Ecosystem` page to get installation instructions.)	2023-04-28 19:26:50 -07:00
BioErrorLog	ad4eae7ef0	Fix linting on the Quickstart Guide sample codes (#3701 ) When copying and pasting the sample code from the Quickstart Guide, lint errors ("missing whitespace around operator") occur."	2023-04-28 17:29:05 -07:00
Zander Chase	a46f1d830e	Synchronous Browser (#3745 ) Split out sync methods in playwright	2023-04-28 17:09:00 -07:00
Zander Chase	6c2b16e465	Add SceneXplain Tool (#3752 )	2023-04-28 17:01:54 -07:00
erwanlc	72c5c15f7f	Fix: Updated links for in depth explanation of chain types in the Question Answering notebooks (#3714 ) In the notebook question_answering.ipynb ([link](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/question_answering.ipynb)), and the notebook qa_with_sources.ipynb ([link](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/qa_with_sources.ipynb)), the first paragraph contains a dead link: > This notebook walks through how to use LangChain for question answering over a list of documents. It covers four different types of chains: stuff, map_reduce, refine, map_rerank. For a more in depth explanation of what these chain types are, see [here](`32793f94fd/docs/modules/chains/combine_docs.md`). The file combine_docs.md doesn't exist anymore and thus provide 404 - Page not found. I updated the links so it redirect to https://docs.langchain.com/docs/components/chains/index_related_chains as in the summarize notebook ([link](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/summarize.ipynb)) present in the same folder.	2023-04-28 15:06:46 -07:00
Alan Cha	e3b7a20454	Fix typo (#3728 )	2023-04-28 13:01:09 -07:00
Zander Chase	5042bd40d3	Add Shell Tool (#3335 ) Create an official bash shell tool to replace the dynamically generated one	2023-04-28 11:10:43 -07:00
Zander Chase	334c162f16	Add Other File Utilities (#3209 ) Add other File Utilities, include - List Directory - Search for file - Move - Copy - Remove file Bundle as toolkit Add a notebook that connects to the Chat Agent, which somewhat supports multi-arg input tools Update original read/write files to return the original dir paths and better handle unsupported file paths. Add unit tests	2023-04-28 10:53:37 -07:00
Zander Chase	491c27f861	PlayWright Web Browser Toolkit (#3262 ) Adds a PlayWright web browser toolkit with the following tools: - NavigateTool (navigate_browser) - navigate to a URL - NavigateBackTool (previous_page) - wait for an element to appear - ClickTool (click_element) - click on an element (specified by selector) - ExtractTextTool (extract_text) - use beautiful soup to extract text from the current web page - ExtractHyperlinksTool (extract_hyperlinks) - use beautiful soup to extract hyperlinks from the current web page - GetElementsTool (get_elements) - select elements by CSS selector - CurrentPageTool (current_page) - get the current page URL	2023-04-28 10:42:44 -07:00
Zander Chase	da7b51455c	Dynamic tool -> single purpose (#3697 ) I think the logic of https://github.com/hwchase17/langchain/pull/3684#pullrequestreview-1405358565 is too confusing. I prefer this alternative because: - All `Tool()` implementations by default will be treated the same as before. No breaking changes. - Less reliance on pydantic magic - The decorator (which only is typed as returning a callable) can infer schema and generate a structured tool - Either way, the recommended way to create a custom tool is through inheriting from the base tool	2023-04-28 09:38:41 -07:00
Zach Schillaci	1bf1c37c0c	Update VectorDBQA to RetrievalQA in tools (#3698 ) Because `VectorDBQA` and `VectorDBQAWithSourcesChain` are deprecated	2023-04-28 07:39:59 -07:00
Harrison Chase	32793f94fd	bump version to 152 (#3695 )	2023-04-28 00:21:53 -07:00
mbchang	1da3ee1386	Multiagent authoritarian (#3686 ) This notebook showcases how to implement a multi-agent simulation where a privileged agent decides who to speak. This follows the polar opposite selection scheme as [multi-agent decentralized speaker selection](https://python.langchain.com/en/latest/use_cases/agent_simulations/multiagent_bidding.html). We show an example of this approach in the context of a fictitious simulation of a news network. This example will showcase how we can implement agents that - think before speaking - terminate the conversation	2023-04-27 23:33:29 -07:00
Zander Chase	4654c58f72	Add validation on agent instantiation for multi-input tools (#3681 ) Tradeoffs here: - No lint-time checking for compatibility - Differs from JS package - The signature inference, etc. in the base tool isn't simple - The `args_schema` is optional Pros: - Forwards compatibility retained - Doesn't break backwards compatibility - User doesn't have to think about which class to subclass (single base tool or dynamic `Tool` interface regardless of input) - No need to change the load_tools, etc. interfaces Co-authored-by: Hasan Patel <mangafield@gmail.com>	2023-04-27 15:36:11 -07:00
Davis Chase	212aadd4af	Nit: list to sequence (#3678 )	2023-04-27 14:41:59 -07:00
Davis Chase	b807a114e4	Add query parsing unit tests (#3672 )	2023-04-27 13:42:12 -07:00
Hasan Patel	03c05b15f6	Fixed some typos on deployment.md (#3652 ) Fixed typos and added better formatting for easier readability	2023-04-27 13:01:24 -07:00
Zander Chase	1b5721c999	Remove Pexpect Dependency (#3667 ) Resolves #3664 Next PR will be to clean up CI to catch this earlier. Triaging this, it looks like it wasn't caught because pexpect is a `poetry` dependency. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-04-27 11:39:01 -07:00
Eugene Yurtsev	708787dddb	Blob: Add validator and use future annotations (#3650 ) Minor changes to the Blob schema. --------- Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com>	2023-04-27 14:33:59 -04:00
Eugene Yurtsev	c5a4b4fea1	Suppress duckdb warning in unit tests explicitly (#3653 ) This catches the warning raised when using duckdb, asserts that it's as expected. The goal is to resolve all existing warnings to make unit-testing much stricter.	2023-04-27 14:29:41 -04:00
Eugene Yurtsev	2052e70664	Add lazy iteration interface to document loaders (#3659 ) Adding a lazy iteration for document loaders. Following the plan here: https://github.com/hwchase17/langchain/pull/2833 Keeping the `load` method as is for backwards compatibility. The `load` returns a materialized list of documents and downstream users may rely on that fact. A new method that returns an iterable is introduced for handling lazy loading. --------- Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com>	2023-04-27 14:29:01 -04:00
Piotr Mardziel	8a54217e7b	update example of ConstitutionalChain.from_llm (#3630 ) Example code was missing an argument and import. Fixed.	2023-04-27 11:17:31 -07:00
Eugene Yurtsev	e6c8cce050	Add unit-test to catch changes to required deps (#3662 ) This adds a unit test that can catch changes to required dependencies	2023-04-27 13:04:17 -04:00
Eugene Yurtsev	055f58960a	Fix pytest collection warning (#3651 ) Fixes a pytest collection warning because the test class starts with the prefix "Test"	2023-04-27 09:51:43 -07:00
Harrison Chase	0cf890eed4	bump version to 151 (#3658 )	2023-04-27 09:02:39 -07:00
Davis Chase	3b609642ae	Self-query with generic query constructor (#3607 ) Alternate implementation of #3452 that relies on a generic query constructor chain and language and then has vector store-specific translation layer. Still refactoring and updating examples but general structure is there and seems to work s well as #3452 on exampels --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-27 08:36:00 -07:00
plutopulp	6d6fd1b9e1	Add PipelineAI LLM integration (#3644 ) Add PipelineAI LLM integration	2023-04-27 08:22:26 -07:00
Harrison Chase	a35bbbfa9e	Harrison/lancedb (#3634 ) Co-authored-by: Minh Le <minhle@canva.com>	2023-04-27 08:14:36 -07:00
Nuno Campos	52b5290810	Update README.md (#3643 )	2023-04-27 08:14:09 -07:00
Eugene Yurtsev	5d02010763	Introduce Blob and Blob Loader interface (#3603 ) This PR introduces a Blob data type and a Blob loader interface. This is the first of a sequence of PRs that follows this proposal: https://github.com/hwchase17/langchain/pull/2833 The primary goals of these abstraction are: * Decouple content loading from content parsing code. * Help duplicated content loading code from document loaders. * Make lazy loading a default for langchain.	2023-04-27 09:45:25 -04:00
Matt Robinson	8e10ac422e	enhancement: add elements mode to `UnstructuredURLLoader` (#3456 ) ### Summary Updates the `UnstructuredURLLoader` to include a "elements" mode that retains additional metadata from `unstructured`. This makes `UnstructuredURLLoader` consistent with other unstructured loaders, which also support "elements" mode. Patched mode into the existing `UnstructuredURLLoader` class instead of inheriting from `UnstructuredBaseLoader` because it significantly simplified the implementation. ### Testing This should still work and show the url in the source for the metadata ```python from langchain.document_loaders import UnstructuredURLLoader urls = ["https://www.understandingwar.org/sites/default/files/Russian%20Offensive%20Campaign%20Assessment%2C%20April%2011%2C%202023.pdf"] loader = UnstructuredURLLoader(urls=urls, headers={"Accept": "application/json"}, strategy="fast") docs = loader.load() print(docs[0].page_content[:1000]) docs[0].metadata ``` This should now work and show additional metadata from `unstructured`. This should still work and show the url in the source for the metadata ```python from langchain.document_loaders import UnstructuredURLLoader urls = ["https://www.understandingwar.org/sites/default/files/Russian%20Offensive%20Campaign%20Assessment%2C%20April%2011%2C%202023.pdf"] loader = UnstructuredURLLoader(urls=urls, headers={"Accept": "application/json"}, strategy="fast", mode="elements") docs = loader.load() print(docs[0].page_content[:1000]) docs[0].metadata ```	2023-04-26 22:09:45 -07:00
Eduard van Valkenburg	a3e3f26090	Some more PowerBI pydantic and import fixes (#3461 )	2023-04-26 22:09:12 -07:00

... 4 5 6 7 8 ...

1877 Commits