langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
rogerserper	b1446bea5f	google-serper: async + full json results + support for Google Images, Places and News (#4078 ) * implemented arun, results, and aresults. Reuses aiosession if available. * helper tools GoogleSerperRun and GoogleSerperResults * support for Google Images, Places and News (examples given) and filtering based on time (e.g. past hour) * updated docs	2023-05-03 22:35:48 -07:00
Zander Chase	65c3b146c9	Accept str or list[str] for shell (#4060 ) Relax the requirements	2023-05-03 21:11:06 -07:00
hp0404	374725a715	Refactor TelegramChatLoader and FacebookChatLoader classes and add tests (#3863 ) This PR includes two main changes: - Refactor the `TelegramChatLoader` and `FacebookChatLoader` classes by removing the dependency on pandas and simplifying the message filtering process. - Add test cases for the `TelegramChatLoader` and `FacebookChatLoader` classes. This test ensures that the class correctly loads and processes the example chat data, providing better test coverage for this functionality.	2023-05-03 15:59:19 -07:00
Jon Saginaw	ea64b1716d	Enhancement: option to Get All Tokens with a single Blockchain Document Loader call (#3797 ) The Blockchain Document Loader's default behavior is to return 100 tokens at a time which is the Alchemy API limit. The Document Loader exposes a startToken that can be used for pagination against the API. This enhancement includes an optional get_all_tokens param (default: False) which will: - Iterate over the Alchemy API until it receives all the tokens, and return the tokens in a single call to the loader. - Manage all/most tokenId formats (this can be int, hex16 with zero or all the leading zeros). There aren't constraints as to how smart contracts can represent this value, but these three are most common. Note that a contract with 10,000 tokens will issue 100 calls to the Alchemy API, and could take about a minute, which is why this param will default to False. But I've been using the doc loader with these utilities on the side, so figured it might make sense to build them in for others to use.	2023-05-03 15:46:44 -07:00
Zander Chase	afa9d1292b	Re-Permit Partials in `Tool` (#4058 ) Resolved issue #4053 Now that StructuredTool is a separate class, this constraint is no longer needed. Added/updated a unit test	2023-05-03 13:16:41 -07:00
Harrison Chase	a5dd73c1a6	Revert "[agent][property type] Change allowed_tools to Set as Duplicate doesn’t make sense" (#4014 ) Reverts hwchase17/langchain#3840	2023-05-02 18:58:05 -07:00
Davis Chase	f08a76250f	Better custom model handling OpenAICallbackHandler (#4009 ) Thanks @maykcaldas for flagging! think this should resolve #3988. Let me know if you still see issues after next release.	2023-05-02 16:19:57 -07:00
Harrison Chase	48ea27ba60	Harrison/blockwise sitemap (#3940 ) Co-authored-by: Martin Holzhauer <martin@holzhauer.eu>	2023-05-01 21:34:07 -07:00
Harrison Chase	f04faf8496	Harrison/spreedly (#3937 ) Co-authored-by: Esmit Pérez <esmitperez@users.noreply.github.com>	2023-05-01 20:56:56 -07:00
Harrison Chase	cd3f8582cb	Harrison/combined memory (#3935 ) Co-authored-by: engkheng <60956360+outday29@users.noreply.github.com>	2023-05-01 20:55:56 -07:00
Zander Chase	c4cb55a0c5	[Breaking] Migrate GPT4All to use PyGPT4All (#3934 ) Seems the pyllamacpp package is no longer the supported bindings from gpt4all. Tested that this works locally. Given that the older models weren't very performant, I think it's better to migrate now without trying to include a lot of try / except blocks --------- Co-authored-by: Nissan Pow <npow@users.noreply.github.com> Co-authored-by: Nissan Pow <pownissa@amazon.com>	2023-05-01 20:42:45 -07:00
Zander Chase	c582f2e9e3	Add Structure Chat Agent (#3912 ) Create a new chat agent that is compatible with the Multi-input tools	2023-05-01 20:34:50 -07:00
Mike Wang	ec21b7126c	[agent][property type] Change allowed_tools to Set as Duplicate doesn’t make sense (#3840 ) - ActionAgent has a property called, `allowed_tools`, which is declared as `List`. It stores all provided tools which is available to use during agent action. - This collection shouldn’t allow duplicates. The original datatype List doesn’t make sense. Each tool should be unique. Even when there are variants (assuming in the future), it would be named differently in load_tools. Test: - confirm the functionality in an example by initializing an agent with a list of 2 tools and confirm everything works. ```python3 def test_agent_chain_chat_bot(): from langchain.agents import load_tools from langchain.agents import initialize_agent from langchain.agents import AgentType from langchain.chat_models import ChatOpenAI from langchain.llms import OpenAI from langchain.utilities.duckduckgo_search import DuckDuckGoSearchAPIWrapper chat = ChatOpenAI(temperature=0) llm = OpenAI(temperature=0) tools = load_tools(["ddg-search", "llm-math"], llm=llm) agent = initialize_agent(tools, chat, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True) agent.run("Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?") test_agent_chain_chat_bot() ``` Result: <img width="863" alt="Screenshot 2023-05-01 at 7 58 11 PM" src="https://user-images.githubusercontent.com/62768671/235572157-0937594c-ddfb-4760-acb2-aea4cacacd89.png">	2023-05-01 20:30:10 -07:00
Davis Chase	e7e29f9937	Dev2049/add modern treasury (#3924 ) Modified Modern Treasury and Strip slightly so credentials don't have to be passed in explicitly. Thanks @mattgmarcus for adding Modern Treasury! --------- Co-authored-by: Matt Marcus <matt.g.marcus@gmail.com>	2023-05-01 20:28:02 -07:00
Davis Chase	5db6b796cf	Dev2049/hf emb encode kwargs (#3925 ) Thanks @amogkam for the addition! Refactored slightly --------- Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2023-05-01 20:27:41 -07:00
Zander Chase	9b9b231e10	Update some Tools Docs (#3913 ) Haven't gotten to all of them, but this: - Updates some of the tools notebooks to actually instantiate a tool (many just show a 'utility' rather than a tool. More changes to come in separate PR) - Move the `Tool` and decorator definitions to `langchain/tools/base.py` (but still export from `langchain.agents`) - Add scene explain to the load_tools() function - Add unit tests for public apis for the langchain.tools and langchain.agents modules	2023-05-01 19:07:26 -07:00
Zander Chase	84ea17b786	Move Tool Validation (#3923 ) Move tool validation to each implementation of the Agent. Another alternative would be to adjust the `_validate_tools()` signature to accept the output parser (and format instructions) and add logic there. Something like `parser.outputs_structured_actions(format_instructions)` But don't think that's needed right now.	2023-05-01 18:44:24 -07:00
Eugene Yurtsev	7cce68a051	Add minimal file system blob loader (#3669 ) This adds a minimal file system blob loader. If looks good, this PR will be merged and a few additional enhancements will be made.	2023-05-01 21:37:26 -04:00
Zura Isakadze	647bbf61c1	Add SQLiteChatMessageHistory (#3534 ) It's based on already existing `PostgresChatMessageHistory` Use case somewhere in between multiple files and Postgres storage.	2023-05-01 15:40:00 -07:00
James Brotchie	921894960b	Add ChatModel, LLM, and Embeddings for Google's PaLM APIs (#3575 ) - Add langchain.llms.GooglePalm for text completion, - Add langchain.chat_models.ChatGooglePalm for chat completion, - Add langchain.embeddings.GooglePalmEmbeddings for sentence embeddings, - Add example field to HumanMessage and AIMessage so that users can feed in examples into the PaLM Chat API, - Add system and unit tests. Note async completion for the Text API is not yet supported and will be included in a future PR. Happy for feedback on any aspect of this PR, especially our choice of adding an example field to Human and AI Message objects to enable passing example messages to the API.	2023-05-01 15:23:16 -07:00
Roma	d15f481352	Add unit test to output parsers (#3911 ) This pull request adds unit tests for various output parsers (BooleanOutputParser, CommaSeparatedListOutputParser, and StructuredOutputParser) to ensure their correct functionality and to increase code reliability and maintainability. The tests cover both valid and invalid input cases. Changes: Added unit tests for BooleanOutputParser. Added unit tests for CommaSeparatedListOutputParser. Added unit tests for StructuredOutputParser. Testing: All new unit tests have been executed, and they pass successfully. The overall test suite has been run, and all tests pass. Notes: These tests cover both successful parsing scenarios and error handling for invalid inputs. If any new output parsers are added in the future, corresponding unit tests should also be created to maintain coverage.	2023-05-01 14:53:08 -07:00
Davis Chase	2451310975	Chroma fix mmr (#3897 ) Fixes #3628, thanks @derekmoeller for the issue!	2023-05-01 10:47:15 -07:00
Davis Chase	52e4fba897	Fix self query pinecone translation (#3892 ) Enum to string conversion handled differently between python 3.9 and 3.11, currently breaking in 3.11 (see #3788). Thanks @peter-brady for catching this!	2023-05-01 10:35:48 -07:00
Zander Chase	19912d755e	Vwp/arxiv (#3855 ) Co-authored-by: Mike Wang <62768671+skcoirz@users.noreply.github.com>	2023-04-30 18:59:22 -07:00
Zander Chase	d7e17fc8fe	Deprecate StdInquireTool (#3850 ) - Deprecate StdInInquire tool (dup of HumanInputRun) - Expose missing tools from `langchain.tools`	2023-04-30 16:55:50 -07:00
Ankush Gola	d3ec00b566	Callbacks Refactor [base] (#3256 ) Co-authored-by: Nuno Campos <nuno@boringbits.io> Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com> Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-30 11:14:09 -07:00
Harrison Chase	bd7e0a534c	Harrison/csv loader (#3771 ) Co-authored-by: mrT23 <tal.r@codium.ai>	2023-04-28 21:54:24 -07:00
Mike Wang	ce4fea983b	[simple] added test case and improve self class return type annotation (#3773 ) a simple follow up of https://github.com/hwchase17/langchain/pull/3748 - added test case - improve annotation when function return type is class itself.	2023-04-28 21:54:07 -07:00
Harrison Chase	0c0f14407c	Harrison/tair (#3770 ) Co-authored-by: Seth Huang <848849+seth-hg@users.noreply.github.com>	2023-04-28 21:25:33 -07:00
Mike Wang	512c24fc9c	[annotation improvement] Make AgentType->Class Conversion More Scalable (#3749 ) In the current solution, AgentType and AGENT_TO_CLASS are placed in two separate files and both manually maintained. This might cause inconsistency when we update either of them. — latest — based on the discussion with hwchase17, we don’t know how to further use the newly introduced AgentTypeConfig type, so it doesn’t make sense yet to add it. Instead, it’s better to move the dictionary to another file to keep the loading.py file clear. The consistency is a good point. Instead of asserting the consistency during linting, we added a unittest for consistency check. I think it works as auto unittest is triggered every time with clear failure notice. (well, force push is possible, but we all know what we are doing, so let’s show trust. :>) ~~This PR includes~~ - ~~Introduced AgentTypeConfig as the source of truth of all AgentType related meta data.~~ - ~~Each AgentTypeConfig is a annotated class type which can be used for annotation in other places.~~ - ~~Each AgentTypeConfig can be easily extended when we have more meta data needs.~~ - ~~Strong assertion to ensure AgentType and AGENT_TO_CLASS are always consistent.~~ - ~~Made AGENT_TO_CLASS automatically generated.~~ ~~Test Plan:~~ - ~~since this change is focusing on annotation, lint is the major test focus.~~ - ~~lint, format and test passed on local.~~	2023-04-28 21:17:28 -07:00
Harrison Chase	be7a8e0824	Harrison/redis cache (#3766 ) Co-authored-by: Tyler Hutcherson <tyler.hutcherson@redis.com>	2023-04-28 20:47:18 -07:00
Mike Wang	b588446bf9	[simple][test] Added test case for schema.py (#3692 ) - added unittest for schema.py covering utility functions and token counting. - fixed a nit. based on huggingface doc, the tokenizer model is gpt-2. [link](https://huggingface.co/transformers/v4.8.2/_modules/transformers/models/gpt2/tokenization_gpt2_fast.html) - make lint && make format, passed on local - screenshot of new test running result <img width="1283" alt="Screenshot 2023-04-27 at 9 51 55 PM" src="https://user-images.githubusercontent.com/62768671/235057441-c0ac3406-9541-453f-ba14-3ebb08656114.png">	2023-04-28 20:42:24 -07:00
Jon Saginaw	f8d69e4e52	Enhancement: Blockchain Document Loader with better Metadata support (#3710 ) This PR includes some minor alignment updates, including: - metadata object extended to support contractAddress, blockchainType, and tokenId - notebook doc better aligned to standard langchain format - startToken changed from int to str to support multiple hex value types on the Alchemy API The updated metadata will look like the below. It's possible for a single contractAddress to exist across multiple blockchains (e.g. Ethereum, Polygon, etc.) so it's important to include the blockchainType. ``` metadata = {"source": self.contract_address, "blockchain": self.blockchainType, "tokenId": tokenId} ```	2023-04-28 20:13:05 -07:00
Davis Chase	220a7076ac	Add Mathpix pdf loader (#3727 ) Inspo https://twitter.com/danielgross/status/1651695062307274754?s=46&t=1zHLap5WG4I_kQPPjfW9fA Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-28 20:11:22 -07:00
Harrison Chase	40f6e60e68	Harrison/stripe (#3762 ) Co-authored-by: Ismail Pelaseyed <homanp@gmail.com>	2023-04-28 20:03:21 -07:00
Rafal Wojdyla	160bfae93f	Add `DocstoreFn` - lookup doc via arbitrary function (#3760 ) This partially addresses https://github.com/hwchase17/langchain/issues/1524, but it's also useful for some of our use cases. This `DocstoreFn` allows to lookup a document given a function that accepts the `search` string without the need to implement a custom `Docstore`. This could be useful when: * you don't want to implement a `Docstore` just to provide a custom `search` * it's expensive to construct an `InMemoryDocstore`/dict * you retrieve documents from remote sources * you just want to reuse existing objects	2023-04-28 19:50:32 -07:00
Zander Chase	5042bd40d3	Add Shell Tool (#3335 ) Create an official bash shell tool to replace the dynamically generated one	2023-04-28 11:10:43 -07:00
Zander Chase	334c162f16	Add Other File Utilities (#3209 ) Add other File Utilities, include - List Directory - Search for file - Move - Copy - Remove file Bundle as toolkit Add a notebook that connects to the Chat Agent, which somewhat supports multi-arg input tools Update original read/write files to return the original dir paths and better handle unsupported file paths. Add unit tests	2023-04-28 10:53:37 -07:00
Zander Chase	da7b51455c	Dynamic tool -> single purpose (#3697 ) I think the logic of https://github.com/hwchase17/langchain/pull/3684#pullrequestreview-1405358565 is too confusing. I prefer this alternative because: - All `Tool()` implementations by default will be treated the same as before. No breaking changes. - Less reliance on pydantic magic - The decorator (which only is typed as returning a callable) can infer schema and generate a structured tool - Either way, the recommended way to create a custom tool is through inheriting from the base tool	2023-04-28 09:38:41 -07:00
Zander Chase	4654c58f72	Add validation on agent instantiation for multi-input tools (#3681 ) Tradeoffs here: - No lint-time checking for compatibility - Differs from JS package - The signature inference, etc. in the base tool isn't simple - The `args_schema` is optional Pros: - Forwards compatibility retained - Doesn't break backwards compatibility - User doesn't have to think about which class to subclass (single base tool or dynamic `Tool` interface regardless of input) - No need to change the load_tools, etc. interfaces Co-authored-by: Hasan Patel <mangafield@gmail.com>	2023-04-27 15:36:11 -07:00
Davis Chase	b807a114e4	Add query parsing unit tests (#3672 )	2023-04-27 13:42:12 -07:00
Eugene Yurtsev	708787dddb	Blob: Add validator and use future annotations (#3650 ) Minor changes to the Blob schema. --------- Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com>	2023-04-27 14:33:59 -04:00
Eugene Yurtsev	c5a4b4fea1	Suppress duckdb warning in unit tests explicitly (#3653 ) This catches the warning raised when using duckdb, asserts that it's as expected. The goal is to resolve all existing warnings to make unit-testing much stricter.	2023-04-27 14:29:41 -04:00
Eugene Yurtsev	e6c8cce050	Add unit-test to catch changes to required deps (#3662 ) This adds a unit test that can catch changes to required dependencies	2023-04-27 13:04:17 -04:00
Eugene Yurtsev	055f58960a	Fix pytest collection warning (#3651 ) Fixes a pytest collection warning because the test class starts with the prefix "Test"	2023-04-27 09:51:43 -07:00
plutopulp	6d6fd1b9e1	Add PipelineAI LLM integration (#3644 ) Add PipelineAI LLM integration	2023-04-27 08:22:26 -07:00
Harrison Chase	a35bbbfa9e	Harrison/lancedb (#3634 ) Co-authored-by: Minh Le <minhle@canva.com>	2023-04-27 08:14:36 -07:00
Eugene Yurtsev	5d02010763	Introduce Blob and Blob Loader interface (#3603 ) This PR introduces a Blob data type and a Blob loader interface. This is the first of a sequence of PRs that follows this proposal: https://github.com/hwchase17/langchain/pull/2833 The primary goals of these abstraction are: * Decouple content loading from content parsing code. * Help duplicated content loading code from document loaders. * Make lazy loading a default for langchain.	2023-04-27 09:45:25 -04:00
Harrison Chase	ab749fa1bb	Harrison/opensearch logic (#3631 ) Co-authored-by: engineer-matsuo <95115586+engineer-matsuo@users.noreply.github.com>	2023-04-26 22:08:03 -07:00
Ehsan M. Kermani	4a246e2fd6	Allow clearing cache and fix gptcache (#3493 ) This PR * Adds `clear` method for `BaseCache` and implements it for various caches * Adds the default `init_func=None` and fixes gptcache integtest * Since right now integtest is not running in CI, I've verified the changes by running `docs/modules/models/llms/examples/llm_caching.ipynb` (until proper e2e integtest is done in CI)	2023-04-26 22:03:50 -07:00
cs0lar	440c98e24b	Fix/issue 2695 (#3608 ) ## Background fixes #2695 ## Changes The `add_text` method uses the internal embedding function if one was passes to the `Weaviate` constructor. NOTE: the latest merge on the `Weaviate` class made the specification of a `weaviate_api_key` mandatory which might not be desirable for all users and connection methods (for example weaviate also support Embedded Weaviate which I am happy to add support to here if people think it's desirable). I wrapped the fetching of the api key into a try catch in order to allow the `weaviate_api_key` to be unspecified. Do let me know if this is unsatisfactory. ## Test Plan added test for `add_texts` method.	2023-04-26 21:45:03 -07:00
leo-gan	36c59e0c25	`Arxiv` document loader (#3627 ) It makes sense to use `arxiv` as another source of the documents for downloading. - Added the `arxiv` document_loader, based on the `utilities/arxiv.py:ArxivAPIWrapper` - added tests - added an example notebook - sorted `__all__` in `__init__.py` (otherwise it is hard to find a class in the very long list)	2023-04-26 21:04:56 -07:00
Zander Chase	443a893ffd	Align names of search tools (#3620 ) Tools for Bing, DDG and Google weren't consistent even though the underlying implementations were. All three services now have the same tools and implementations to easily switch and experiment when building chains.	2023-04-26 16:21:34 -07:00
Maciej Bryński	aa345a4bb7	Add get_text_separator parameter to BSHTMLLoader (#3551 ) By default get_text doesn't separate content of different HTML tag. Adding option for specifying separator helps with document splitting.	2023-04-26 16:10:16 -07:00
Zander Chase	ee670c448e	Persistent Bash Shell (#3580 ) Clean up linting and make more idiomatic by using an output parser --------- Co-authored-by: FergusFettes <fergusfettes@gmail.com>	2023-04-26 15:20:28 -07:00
Davis Chase	d18b0caf0e	Add Anthropic default request timeout (#3540 ) thanks @hitflame! --------- Co-authored-by: Wenqiang Zhao <hitzhaowenqiang@sina.com> Co-authored-by: delta@com <delta@com>	2023-04-25 11:40:41 -07:00
Roma	2b4e9a3efa	Add unit test for _merge_splits function (#3513 ) This commit adds a new unit test for the _merge_splits function in the text splitter. The new test verifies that the function merges text into chunks of the correct size and overlap, using a specified separator. The test passes on the current implementation of the function.	2023-04-25 10:02:59 -07:00
yakigac	f338d6251c	Add a test for cosmos db memory (#3525 ) Test for #3434 @eavanvalkenburg Initially, I was unaware and had submitted a pull request #3450 for the same purpose, but I have now repurposed the one I used for that. And it worked.	2023-04-25 08:10:02 -07:00
Harrison Chase	0fc0aa62f2	Harrison/blockchain docloader (#3491 ) Co-authored-by: Jon Saginaw <saginawj@users.noreply.github.com>	2023-04-25 08:07:06 -07:00
Harrison Chase	707741de58	Harrison/prediction guard (#3490 ) Co-authored-by: Daniel Whitenack <whitenack.daniel@gmail.com>	2023-04-24 22:27:22 -07:00
Harrison Chase	7257f9e015	Harrison/tfidf parameters (#3481 ) Co-authored-by: pao <go5kuramubon@gmail.com> Co-authored-by: KyoHattori <kyo.hattori@abejainc.com>	2023-04-24 22:19:58 -07:00
Harrison Chase	eda69b13f3	openai embeddings (#3488 )	2023-04-24 22:19:47 -07:00
Harrison Chase	408a0183cd	Harrison/weaviate (#3494 ) Co-authored-by: Nick Rubell <nick@rubell.com>	2023-04-24 22:15:32 -07:00
Mindaugas Sharskus	a4d85f7fd5	[Fix #3365 ]: Changed regex to cover new line before action serious (#3367 ) Fix for: [Changed regex to cover new line before action serious.](https://github.com/hwchase17/langchain/issues/3365) --- This PR fixes the issue where `ValueError: Could not parse LLM output:` was thrown on seems to be valid input. Changed regex to cover new lines before action serious (after the keywords "Action:" and "Action Input:"). regex101: https://regex101.com/r/CXl1kB/1 --------- Co-authored-by: msarskus <msarskus@cisco.com>	2023-04-24 22:05:31 -07:00
Davis Chase	b2564a6391	fix #3884 (#3475 ) fixes mar bug #3384	2023-04-24 19:54:15 -07:00
Zander Chase	416f3bdf11	Vwp/alpaca streaming (#3468 ) Co-authored-by: Luke Stanley <306671+lukestanley@users.noreply.github.com>	2023-04-24 16:27:51 -07:00
cs0lar	3033c6b964	fixes #1214 (#3003 ) ### Background Continuing to implement all the interface methods defined by the `VectorStore` class. This PR pertains to implementation of the `max_marginal_relevance_search_by_vector` method. ### Changes - a `max_marginal_relevance_search_by_vector` method implementation has been added in `weaviate.py` - tests have been added to the the new method - vcr cassettes have been added for the weaviate tests ### Test Plan Added tests for the `max_marginal_relevance_search_by_vector` implementation ### Change Safety - [x] I have added tests to cover my changes	2023-04-24 11:50:55 -07:00
Zander Chase	49122a96e7	Structured Tool Bugfixes (#3324 ) - Proactively raise error if a tool subclasses BaseTool, defines its own schema, but fails to add the type-hints - fix the auto-inferred schema of the decorator to strip the unneeded virtual kwargs from the schema dict Helps avoid silent instances of #3297	2023-04-24 09:58:29 -07:00
Davit Buniatyan	2c0023393b	Deep Lake mini upgrades (#3375 ) Improvements * set default num_workers for ingestion to 0 * upgraded notebooks for avoiding dataset creation ambiguity * added `force_delete_dataset_by_path` * bumped deeplake to 3.3.0 * creds arg passing to deeplake object that would allow custom S3 Notes * please double check if poetry is not messed up (thanks!) Asks * Would be great to create a shared slack channel for quick questions --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>	2023-04-23 21:23:54 -07:00
Zander Chase	20f530e9c5	Add Sentence Transformers Embeddings (#3409 ) Add embeddings based on the sentence transformers library. Add a notebook and integration tests. Co-authored-by: khimaros <me@khimaros.com>	2023-04-23 18:25:20 -07:00
Luke Harris	b4de839ed8	Several confluence loader improvements (#3300 ) This PR addresses several improvements: - Previously it was not possible to load spaces of more than 100 pages. The `limit` was being used both as an overall page limit and as a per request pagination limit. This, in combination with the fact that atlassian seem to use a server-side hard limit of 100 when page content is expanded, meant it wasn't possible to download >100 pages. Now `limit` is used only as a per-request pagination limit and `max_pages` is introduced as the way to limit the total number of pages returned by the paginator. - Document metadata now includes `source` (the source url), making it compatible with `RetrievalQAWithSourcesChain`. - It is now possible to include inline and footer comments. - It is now possible to pass `verify_ssl=False` and other parameters to the confluence object for use cases that require it.	2023-04-23 15:06:10 -07:00
Harrison Chase	a6664be79c	Harrison/myscale (#3352 ) Co-authored-by: Fangrui Liu <fangruil@moqi.ai> Co-authored-by: 刘方瑞 <fangrui.liu@outlook.com> Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>	2023-04-22 09:17:38 -07:00
Filip Haltmayer	215dcc2d26	Refactor Milvus/Zilliz (#3047 ) Refactoring milvus/zilliz to clean up and have a more consistent experience. Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>	2023-04-22 08:26:19 -07:00
Richy Wang	88a8f59aa7	Add a full PostgresSQL syntax database 'AnalyticDB' as vector store. (#3135 ) Hi there！ I'm excited to open this PR to add support for using a fully Postgres syntax compatible database 'AnalyticDB' as a vector. As AnalyticDB has been proved can be used with AutoGPT, ChatGPT-Retrieve-Plugin, and LLama-Index, I think it is also good for you. AnalyticDB is a distributed Alibaba Cloud-Native vector database. It works better when data comes to large scale. The PR includes: - [x] A new memory: AnalyticDBVector - [x] A suite of integration tests verifies the AnalyticDB integration I have read your [contributing guidelines](`72b7d76d79/.github/CONTRIBUTING.md`). And I have passed the tests below - [x] make format - [x] make lint - [x] make coverage - [x] make test	2023-04-22 08:25:41 -07:00
Zander Chase	05a8aa5447	Fix linting on master (#3327 )	2023-04-21 15:49:46 -07:00
Varun Srinivas	d2f922f525	Change in method name for creating an issue on JIRA (#3307 ) The awesome JIRA tool created by @zywilliamli calls the `create_issue()` method to create issues, however, the actual method is `issue_create()`. Details in the Documentation here: https://atlassian-python-api.readthedocs.io/jira.html#manage-issues	2023-04-21 13:01:33 -07:00
Paul Garner	aa9d5707e0	Add PythonLoader which auto-detects encoding of Python files (#3311 ) This PR contributes a `PythonLoader`, which inherits from `TextLoader` but detects and sets the encoding automatically.	2023-04-21 10:47:57 -07:00
Davis Chase	2fd24d31a4	Cleanup integration test dir (#3308 )	2023-04-21 09:44:09 -07:00
Naveen Tatikonda	bb6c459f7a	OpenSearch: Add Support for Lucene Filter (#3201 ) ### Description Add Support for Lucene Filter. When you specify a Lucene filter for a k-NN search, the Lucene algorithm decides whether to perform an exact k-NN search with pre-filtering or an approximate search with modified post-filtering. This filter is supported only for approximate search with the indexes that are created using `lucene` engine. OpenSearch Documentation - https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#lucene-k-nn-filter-implementation Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-04-20 20:42:53 -07:00
Davis Chase	46542dc774	Contextual compression retriever (#2915 ) Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-20 17:01:14 -07:00
Gabriel Altay	34fb56b633	fix copy/pasta typos wikipedia->arxiv (#3222 ) just updates a few module level docstrings from Wikipedia -> Arxiv	2023-04-20 07:15:41 -07:00
Harrison Chase	d2520a5f1e	Harrison/ddg (#3206 ) Co-authored-by: itai <itai.marks@gmail.com> Co-authored-by: Itai Marks <itaim@users.noreply.github.com> Co-authored-by: Tianyi Pan <60060750+tipani86@users.noreply.github.com> Co-authored-by: Tianyi Pan <tianyi.pan@clobotics.com> Co-authored-by: Adilzhan Ismailov <13088690+aismlv@users.noreply.github.com> Co-authored-by: Justin Flick <Justinjayflick@gmail.com> Co-authored-by: Justin Flick <jflick@homesite.com>	2023-04-19 21:32:26 -07:00
Harrison Chase	9a0356d276	Harrison/file chat history (#3198 ) Co-authored-by: Young Lee <joybro201@gmail.com>	2023-04-19 21:05:20 -07:00
Harrison Chase	9181cd9b22	Harrison/playwright selector (#3185 ) Co-authored-by: zhyuri <4649294+zhyuri@users.noreply.github.com>	2023-04-19 16:54:15 -07:00
Harrison Chase	68cd37175e	Harrison/arxiv tool (#3186 ) Co-authored-by: leo-gan <leo.gan.57@gmail.com>	2023-04-19 16:53:34 -07:00
Zander Chase	4adfd790f0	Update File Management Tools to Include Root Directory (#3112 ) - Permit the specification of a `root_dir` to the read/write file tools to specify a working directory - Add validation for attempts to read/write outside the directory (e.g., through `../../` or symlinks or `/abs/path`'s that don't lie in the correct path) - Add some tests for all One question is whether we should make a default root directory for these? tradeoffs either way	2023-04-19 16:46:10 -07:00
engkheng	dbbc340f25	Validate `input_variables` when using `jinja2` templates (#3140 ) `langchain.prompts.PromptTemplate` and `langchain.prompts.FewShotPromptTemplate` do not validate `input_variables` when initialized as `jinja2` template. ```python # Using langchain v0.0.144 template = """"\ Your variable: {{ foo }} {% if bar %} You just set bar boolean variable to true {% endif %} """ # Missing variable, should raise ValueError prompt_template = PromptTemplate(template=template, input_variables=["bar"], template_format="jinja2", validate_template=True) # Extra variable, should raise ValueError prompt_template = PromptTemplate(template=template, input_variables=["bar", "foo", "extra", "thing"], template_format="jinja2", validate_template=True) ```	2023-04-19 16:18:32 -07:00
Zander Chase	90ef705ced	Update Tool Input (#3103 ) - Remove dynamic model creation in the `args()` property. _Only infer for the decorator (and add an argument to NOT infer if someone wishes to only pass as a string)_ - Update the validation example to make it less likely to be misinterpreted as a "safe" way to run a repl There is one example of "Multi-argument tools" in the custom_tools.ipynb from yesterday, but we could add more. The output parsing for the base MRKL agent hasn't been adapted to handle structured args at this point in time --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-18 18:18:33 -07:00
Harrison Chase	aad0a498ac	Harrison/output error (#3094 ) Co-authored-by: yummydum <sumita@nowcast.co.jp>	2023-04-18 08:59:56 -07:00
Harrison Chase	db968284f8	tools refactor (#2961 ) Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>	2023-04-17 21:35:29 -07:00
engkheng	19febc77d6	Support inference of `input_variables` from `jinja2` template (#3013 ) `langchain.prompts.PromptTemplate` is unable to infer `input_variables` from jinja2 template. ```python # Using langchain v0.0.141 template_string = """\ Hello world Your variable: {{ var }} {# This will not get rendered #} {% if verbose %} Congrats! You just turned on verbose mode and got extra messages! {% endif %} """ template = PromptTemplate.from_template(template_string, template_format="jinja2") print(template.input_variables) # Output ['# This will not get rendered #', '% endif %', '% if verbose %'] ``` --------- Co-authored-by: engkheng <ongengkheng929@example.com>	2023-04-17 20:31:03 -07:00
Nuno Campos	dac32c59e5	Nc/combining output parser (#3014 ) Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>	2023-04-17 20:29:53 -07:00
Davis Chase	19c85aa990	Factor out doc formatting and add validation (#3026 ) @cnhhoang850 slightly more generic fix for #2944, works for whatever the expected metadata keys are not just `source`	2023-04-17 20:28:01 -07:00
Naveen Tatikonda	3453b7457c	OpenSearch: Add Support for Boolean Filter with ANN search (#3038 ) ### Description Add Support for Boolean Filter with ANN search Documentation - https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#boolean-filter-with-ann-search ### Issues Resolved https://github.com/hwchase17/langchain/issues/2924 Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-04-17 20:26:26 -07:00
Harrison Chase	afd3e70ae5	Harrison/confluent loader (#2994 ) Co-authored-by: Justin Flick <Justinjayflick@gmail.com>	2023-04-17 20:23:45 -07:00
vowelparrot	99c0382209	Generative Characters (#2859 ) Add a time-weighted memory retriever and a notebook that approximates a Generative Agent from https://arxiv.org/pdf/2304.03442.pdf The "daily plan" components are removed for now since they are less useful without a virtual world, but the memory is an interesting component to build off. --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-16 21:41:00 -07:00
Jan Backes	a9310a3e8b	Add Annoy as VectorStore (#2939 ) Adds Annoy (https://github.com/spotify/annoy) as vector Store. RESOLVES hwchase17/langchain#2842 discord ref: https://discord.com/channels/1038097195422978059/1051632794427723827/1096089994168377354 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>	2023-04-16 13:44:04 -07:00
Harrison Chase	e12e00df12	use output parsers in agents (#2987 )	2023-04-16 13:15:21 -07:00
cs0lar	8b9e02da9d	Fix/issue 1213 (#2932 ) ### Background Continuing to implement all the interface methods defined by the `VectorStore` class. This PR pertains to implementation of the `max_marginal_relevance_search` method. ### Changes - a `max_marginal_relevance_search` method implementation has been added in `weaviate.py` - tests have been added to the the new method - vcr cassettes have been added for the weaviate tests ### Test Plan Added tests for the `max_marginal_relevance_search` implementation ### Change Safety - [x] I have added tests to cover my changes	2023-04-16 13:11:30 -07:00
vowelparrot	5ca7ce77cd	Remove pythonrepl from LLM-MathChain (#2943 ) Use numexpr evaluate instead of the python REPL to avoid malicious code injection. Tested against the (limited) math dataset and got the same score as before. For more permissive tools (like the REPL tool itself), other approaches ought to be provided (some combination of Sanitizer + Restricted python + unprivileged-docker + ...), but for a calculator tool, only mathematical expressions should be permitted. See https://github.com/hwchase17/langchain/issues/814	2023-04-16 08:50:32 -07:00

1 2 3 4 5 ...

413 Commits