langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
Jon Saginaw	f8d69e4e52	Enhancement: Blockchain Document Loader with better Metadata support (#3710 ) This PR includes some minor alignment updates, including: - metadata object extended to support contractAddress, blockchainType, and tokenId - notebook doc better aligned to standard langchain format - startToken changed from int to str to support multiple hex value types on the Alchemy API The updated metadata will look like the below. It's possible for a single contractAddress to exist across multiple blockchains (e.g. Ethereum, Polygon, etc.) so it's important to include the blockchainType. ``` metadata = {"source": self.contract_address, "blockchain": self.blockchainType, "tokenId": tokenId} ```	2023-04-28 20:13:05 -07:00
Davis Chase	220a7076ac	Add Mathpix pdf loader (#3727 ) Inspo https://twitter.com/danielgross/status/1651695062307274754?s=46&t=1zHLap5WG4I_kQPPjfW9fA Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-28 20:11:22 -07:00
Rafal Wojdyla	37ed6f2177	Handle length safe embedding only if needed (#3723 ) Re: https://github.com/hwchase17/langchain/issues/3722 Copy pasting context from the issue: `1bf1c37c0c/langchain/embeddings/openai.py (L210-L211)` Means that the length safe embedding method is "always" used, initial implementation https://github.com/hwchase17/langchain/pull/991 has the `embedding_ctx_length` set to -1 (meaning you had to opt-in for the length safe method), https://github.com/hwchase17/langchain/pull/2330 changed that to max length of OpenAI embeddings v2, meaning the length safe method is used at all times. How about changing that if branch to use length safe method only when needed, meaning when the text is longer than the max context length?	2023-04-28 20:10:04 -07:00
Harrison Chase	40f6e60e68	Harrison/stripe (#3762 ) Co-authored-by: Ismail Pelaseyed <homanp@gmail.com>	2023-04-28 20:03:21 -07:00
Jelmer Borst	8cf2ff0be0	Confluence: Add page status filter for spaces (#3732 ) At the moment all content in Confluence is retrieved by default, including archived content. Often, this is undesired as the content is not relevant anymore. Notes Fetching pages by label does not support excluding archived content. This may lead to unexpected results.	2023-04-28 19:56:53 -07:00
Harrison Chase	7a129ac043	Harrison/pypdf loader (#3764 ) Co-authored-by: Felipe Meres <felipe@felipemeres.com>	2023-04-28 19:56:21 -07:00
mbchang	4eefea0fe8	new example: single agent, simulated environment (openai gym) (#3758 ) For many applications of LLM agents, the environment is real (internet, database, REPL, etc). However, we can also define agents to interact in simulated environments like text-based games. This is an example of how to create a simple agent-environment interaction loop with [Gymnasium](https://github.com/Farama-Foundation/Gymnasium) (formerly [OpenAI Gym](https://github.com/openai/gym)).	2023-04-28 19:52:05 -07:00
0xDTE	6ce34bb4fe	Fixing broken document links (#3756 ) simple document url fixes. nothing fancy.	2023-04-28 19:51:23 -07:00
Rafal Wojdyla	160bfae93f	Add `DocstoreFn` - lookup doc via arbitrary function (#3760 ) This partially addresses https://github.com/hwchase17/langchain/issues/1524, but it's also useful for some of our use cases. This `DocstoreFn` allows to lookup a document given a function that accepts the `search` string without the need to implement a custom `Docstore`. This could be useful when: * you don't want to implement a `Docstore` just to provide a custom `search` * it's expensive to construct an `InMemoryDocstore`/dict * you retrieve documents from remote sources * you just want to reuse existing objects	2023-04-28 19:50:32 -07:00
Harrison Chase	c55ba43093	Harrison/vespa (#3761 ) Co-authored-by: Lester Solbakken <lesters@users.noreply.github.com>	2023-04-28 19:48:43 -07:00
mbchang	ee20b3e0d0	bug fix: initialize the arxivAPIWrapper object (#3733 )	2023-04-28 19:35:01 -07:00
leo-gan	e510732ad2	docs: improved `vectorstore` notebooks (#3724 ) - Added links to the vectorstore providers - Added installation code (it is not clear that we have to go to the `LangChan Ecosystem` page to get installation instructions.)	2023-04-28 19:26:50 -07:00
BioErrorLog	ad4eae7ef0	Fix linting on the Quickstart Guide sample codes (#3701 ) When copying and pasting the sample code from the Quickstart Guide, lint errors ("missing whitespace around operator") occur."	2023-04-28 17:29:05 -07:00
Zander Chase	a46f1d830e	Synchronous Browser (#3745 ) Split out sync methods in playwright	2023-04-28 17:09:00 -07:00
Zander Chase	6c2b16e465	Add SceneXplain Tool (#3752 )	2023-04-28 17:01:54 -07:00
erwanlc	72c5c15f7f	Fix: Updated links for in depth explanation of chain types in the Question Answering notebooks (#3714 ) In the notebook question_answering.ipynb ([link](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/question_answering.ipynb)), and the notebook qa_with_sources.ipynb ([link](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/qa_with_sources.ipynb)), the first paragraph contains a dead link: > This notebook walks through how to use LangChain for question answering over a list of documents. It covers four different types of chains: stuff, map_reduce, refine, map_rerank. For a more in depth explanation of what these chain types are, see [here](`32793f94fd/docs/modules/chains/combine_docs.md`). The file combine_docs.md doesn't exist anymore and thus provide 404 - Page not found. I updated the links so it redirect to https://docs.langchain.com/docs/components/chains/index_related_chains as in the summarize notebook ([link](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/summarize.ipynb)) present in the same folder.	2023-04-28 15:06:46 -07:00
Alan Cha	e3b7a20454	Fix typo (#3728 )	2023-04-28 13:01:09 -07:00
Zander Chase	5042bd40d3	Add Shell Tool (#3335 ) Create an official bash shell tool to replace the dynamically generated one	2023-04-28 11:10:43 -07:00
Zander Chase	334c162f16	Add Other File Utilities (#3209 ) Add other File Utilities, include - List Directory - Search for file - Move - Copy - Remove file Bundle as toolkit Add a notebook that connects to the Chat Agent, which somewhat supports multi-arg input tools Update original read/write files to return the original dir paths and better handle unsupported file paths. Add unit tests	2023-04-28 10:53:37 -07:00
Zander Chase	491c27f861	PlayWright Web Browser Toolkit (#3262 ) Adds a PlayWright web browser toolkit with the following tools: - NavigateTool (navigate_browser) - navigate to a URL - NavigateBackTool (previous_page) - wait for an element to appear - ClickTool (click_element) - click on an element (specified by selector) - ExtractTextTool (extract_text) - use beautiful soup to extract text from the current web page - ExtractHyperlinksTool (extract_hyperlinks) - use beautiful soup to extract hyperlinks from the current web page - GetElementsTool (get_elements) - select elements by CSS selector - CurrentPageTool (current_page) - get the current page URL	2023-04-28 10:42:44 -07:00
Zander Chase	da7b51455c	Dynamic tool -> single purpose (#3697 ) I think the logic of https://github.com/hwchase17/langchain/pull/3684#pullrequestreview-1405358565 is too confusing. I prefer this alternative because: - All `Tool()` implementations by default will be treated the same as before. No breaking changes. - Less reliance on pydantic magic - The decorator (which only is typed as returning a callable) can infer schema and generate a structured tool - Either way, the recommended way to create a custom tool is through inheriting from the base tool	2023-04-28 09:38:41 -07:00
Zach Schillaci	1bf1c37c0c	Update VectorDBQA to RetrievalQA in tools (#3698 ) Because `VectorDBQA` and `VectorDBQAWithSourcesChain` are deprecated	2023-04-28 07:39:59 -07:00
Harrison Chase	32793f94fd	bump version to 152 (#3695 )	2023-04-28 00:21:53 -07:00
mbchang	1da3ee1386	Multiagent authoritarian (#3686 ) This notebook showcases how to implement a multi-agent simulation where a privileged agent decides who to speak. This follows the polar opposite selection scheme as [multi-agent decentralized speaker selection](https://python.langchain.com/en/latest/use_cases/agent_simulations/multiagent_bidding.html). We show an example of this approach in the context of a fictitious simulation of a news network. This example will showcase how we can implement agents that - think before speaking - terminate the conversation	2023-04-27 23:33:29 -07:00
Zander Chase	4654c58f72	Add validation on agent instantiation for multi-input tools (#3681 ) Tradeoffs here: - No lint-time checking for compatibility - Differs from JS package - The signature inference, etc. in the base tool isn't simple - The `args_schema` is optional Pros: - Forwards compatibility retained - Doesn't break backwards compatibility - User doesn't have to think about which class to subclass (single base tool or dynamic `Tool` interface regardless of input) - No need to change the load_tools, etc. interfaces Co-authored-by: Hasan Patel <mangafield@gmail.com>	2023-04-27 15:36:11 -07:00
Davis Chase	212aadd4af	Nit: list to sequence (#3678 )	2023-04-27 14:41:59 -07:00
Davis Chase	b807a114e4	Add query parsing unit tests (#3672 )	2023-04-27 13:42:12 -07:00
Hasan Patel	03c05b15f6	Fixed some typos on deployment.md (#3652 ) Fixed typos and added better formatting for easier readability	2023-04-27 13:01:24 -07:00
Zander Chase	1b5721c999	Remove Pexpect Dependency (#3667 ) Resolves #3664 Next PR will be to clean up CI to catch this earlier. Triaging this, it looks like it wasn't caught because pexpect is a `poetry` dependency. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-04-27 11:39:01 -07:00
Eugene Yurtsev	708787dddb	Blob: Add validator and use future annotations (#3650 ) Minor changes to the Blob schema. --------- Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com>	2023-04-27 14:33:59 -04:00
Eugene Yurtsev	c5a4b4fea1	Suppress duckdb warning in unit tests explicitly (#3653 ) This catches the warning raised when using duckdb, asserts that it's as expected. The goal is to resolve all existing warnings to make unit-testing much stricter.	2023-04-27 14:29:41 -04:00
Eugene Yurtsev	2052e70664	Add lazy iteration interface to document loaders (#3659 ) Adding a lazy iteration for document loaders. Following the plan here: https://github.com/hwchase17/langchain/pull/2833 Keeping the `load` method as is for backwards compatibility. The `load` returns a materialized list of documents and downstream users may rely on that fact. A new method that returns an iterable is introduced for handling lazy loading. --------- Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com>	2023-04-27 14:29:01 -04:00
Piotr Mardziel	8a54217e7b	update example of ConstitutionalChain.from_llm (#3630 ) Example code was missing an argument and import. Fixed.	2023-04-27 11:17:31 -07:00
Eugene Yurtsev	e6c8cce050	Add unit-test to catch changes to required deps (#3662 ) This adds a unit test that can catch changes to required dependencies	2023-04-27 13:04:17 -04:00
Eugene Yurtsev	055f58960a	Fix pytest collection warning (#3651 ) Fixes a pytest collection warning because the test class starts with the prefix "Test"	2023-04-27 09:51:43 -07:00
Harrison Chase	0cf890eed4	bump version to 151 (#3658 )	2023-04-27 09:02:39 -07:00
Davis Chase	3b609642ae	Self-query with generic query constructor (#3607 ) Alternate implementation of #3452 that relies on a generic query constructor chain and language and then has vector store-specific translation layer. Still refactoring and updating examples but general structure is there and seems to work s well as #3452 on exampels --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-27 08:36:00 -07:00
plutopulp	6d6fd1b9e1	Add PipelineAI LLM integration (#3644 ) Add PipelineAI LLM integration	2023-04-27 08:22:26 -07:00
Harrison Chase	a35bbbfa9e	Harrison/lancedb (#3634 ) Co-authored-by: Minh Le <minhle@canva.com>	2023-04-27 08:14:36 -07:00
Nuno Campos	52b5290810	Update README.md (#3643 )	2023-04-27 08:14:09 -07:00
Eugene Yurtsev	5d02010763	Introduce Blob and Blob Loader interface (#3603 ) This PR introduces a Blob data type and a Blob loader interface. This is the first of a sequence of PRs that follows this proposal: https://github.com/hwchase17/langchain/pull/2833 The primary goals of these abstraction are: * Decouple content loading from content parsing code. * Help duplicated content loading code from document loaders. * Make lazy loading a default for langchain.	2023-04-27 09:45:25 -04:00
Matt Robinson	8e10ac422e	enhancement: add elements mode to `UnstructuredURLLoader` (#3456 ) ### Summary Updates the `UnstructuredURLLoader` to include a "elements" mode that retains additional metadata from `unstructured`. This makes `UnstructuredURLLoader` consistent with other unstructured loaders, which also support "elements" mode. Patched mode into the existing `UnstructuredURLLoader` class instead of inheriting from `UnstructuredBaseLoader` because it significantly simplified the implementation. ### Testing This should still work and show the url in the source for the metadata ```python from langchain.document_loaders import UnstructuredURLLoader urls = ["https://www.understandingwar.org/sites/default/files/Russian%20Offensive%20Campaign%20Assessment%2C%20April%2011%2C%202023.pdf"] loader = UnstructuredURLLoader(urls=urls, headers={"Accept": "application/json"}, strategy="fast") docs = loader.load() print(docs[0].page_content[:1000]) docs[0].metadata ``` This should now work and show additional metadata from `unstructured`. This should still work and show the url in the source for the metadata ```python from langchain.document_loaders import UnstructuredURLLoader urls = ["https://www.understandingwar.org/sites/default/files/Russian%20Offensive%20Campaign%20Assessment%2C%20April%2011%2C%202023.pdf"] loader = UnstructuredURLLoader(urls=urls, headers={"Accept": "application/json"}, strategy="fast", mode="elements") docs = loader.load() print(docs[0].page_content[:1000]) docs[0].metadata ```	2023-04-26 22:09:45 -07:00
Eduard van Valkenburg	a3e3f26090	Some more PowerBI pydantic and import fixes (#3461 )	2023-04-26 22:09:12 -07:00
Harrison Chase	ab749fa1bb	Harrison/opensearch logic (#3631 ) Co-authored-by: engineer-matsuo <95115586+engineer-matsuo@users.noreply.github.com>	2023-04-26 22:08:03 -07:00
ccw630	cf384dcb7f	Supports async in SequentialChain/SimpleSequentialChain (#3503 )	2023-04-26 22:07:20 -07:00
Ehsan M. Kermani	4a246e2fd6	Allow clearing cache and fix gptcache (#3493 ) This PR * Adds `clear` method for `BaseCache` and implements it for various caches * Adds the default `init_func=None` and fixes gptcache integtest * Since right now integtest is not running in CI, I've verified the changes by running `docs/modules/models/llms/examples/llm_caching.ipynb` (until proper e2e integtest is done in CI)	2023-04-26 22:03:50 -07:00
Howard Su	83e871f1ff	Fix Invalid Request using AzureOpenAI (#3522 ) This fixes the error when calling AzureOpenAI of gpt-35-turbo model. The error is: InvalidRequestError: logprobs, best_of and echo parameters are not available on gpt-35-turbo model. Please remove the parameter and try again. For more details, see https://go.microsoft.com/fwlink/?linkid=2227346.	2023-04-26 22:00:09 -07:00
Luoyger	f5aa767ef1	add --no-sandbox for chrome in url_selenium (#3589 ) without --no-sandbox param, load documents from url by selenium in chrome occured error below: ```Traceback (most recent call last): File "/data//playgroud/try_langchain.py", line 343, in <module> langchain_doc_loader() File "/data//playgroud/try_langchain.py", line 67, in langchain_doc_loader documents = loader.load() File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/langchain/document_loaders/url_selenium.py", line 102, in load driver = self._get_driver() File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/langchain/document_loaders/url_selenium.py", line 76, in _get_driver return Chrome(options=chrome_options) File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/chrome/webdriver.py", line 80, in __init__ super().__init__( File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/chromium/webdriver.py", line 104, in __init__ super().__init__( File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 286, in __init__ self.start_session(capabilities, browser_profile) File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 378, in start_session response = self.execute(Command.NEW_SESSION, parameters) File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 440, in execute self.error_handler.check_response(response) File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 245, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally. (unknown error: DevToolsActivePort file doesn't exist) (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.) Stacktrace: #0 0x55cf8da1bfe3 <unknown> #1 0x55cf8d75ad36 <unknown> #2 0x55cf8d783b20 <unknown> #3 0x55cf8d77fa9b <unknown> #4 0x55cf8d7c1af7 <unknown> #5 0x55cf8d7c111f <unknown> #6 0x55cf8d7b8693 <unknown> #7 0x55cf8d78b03a <unknown> #8 0x55cf8d78c17e <unknown> #9 0x55cf8d9dddbd <unknown> #10 0x55cf8d9e1c6c <unknown> #11 0x55cf8d9eb4b0 <unknown> #12 0x55cf8d9e2d63 <unknown> #13 0x55cf8d9b5c35 <unknown> #14 0x55cf8da06138 <unknown> #15 0x55cf8da062c7 <unknown> #16 0x55cf8da14093 <unknown> #17 0x7f3da31a72de start_thread ``` add option `chrome_options.add_argument("--no-sandbox")` for chrome.	2023-04-26 21:48:43 -07:00
Shukri	fac4f36a87	Update models used for embeddings in the weaviate example (#3594 ) Use text-embedding-ada-002 because it [outperforms all other models](https://openai.com/blog/new-and-improved-embedding-model).	2023-04-26 21:48:08 -07:00
cs0lar	440c98e24b	Fix/issue 2695 (#3608 ) ## Background fixes #2695 ## Changes The `add_text` method uses the internal embedding function if one was passes to the `Weaviate` constructor. NOTE: the latest merge on the `Weaviate` class made the specification of a `weaviate_api_key` mandatory which might not be desirable for all users and connection methods (for example weaviate also support Embedded Weaviate which I am happy to add support to here if people think it's desirable). I wrapped the fetching of the api key into a try catch in order to allow the `weaviate_api_key` to be unspecified. Do let me know if this is unsatisfactory. ## Test Plan added test for `add_texts` method.	2023-04-26 21:45:03 -07:00

1 2 3 4 5 ...

1720 Commits