langchain

Commit Graph

Author	SHA1	Message	Date
Harrison Chase	bd7e0a534c	Harrison/csv loader (#3771 ) Co-authored-by: mrT23 <tal.r@codium.ai>	1 year ago
Mike Wang	ce4fea983b	[simple] added test case and improve self class return type annotation (#3773 ) a simple follow up of https://github.com/hwchase17/langchain/pull/3748 - added test case - improve annotation when function return type is class itself.	1 year ago
Harrison Chase	0c0f14407c	Harrison/tair (#3770 ) Co-authored-by: Seth Huang <848849+seth-hg@users.noreply.github.com>	1 year ago
Mike Wang	512c24fc9c	[annotation improvement] Make AgentType->Class Conversion More Scalable (#3749 ) In the current solution, AgentType and AGENT_TO_CLASS are placed in two separate files and both manually maintained. This might cause inconsistency when we update either of them. — latest — based on the discussion with hwchase17, we don’t know how to further use the newly introduced AgentTypeConfig type, so it doesn’t make sense yet to add it. Instead, it’s better to move the dictionary to another file to keep the loading.py file clear. The consistency is a good point. Instead of asserting the consistency during linting, we added a unittest for consistency check. I think it works as auto unittest is triggered every time with clear failure notice. (well, force push is possible, but we all know what we are doing, so let’s show trust. :>) ~~This PR includes~~ - ~~Introduced AgentTypeConfig as the source of truth of all AgentType related meta data.~~ - ~~Each AgentTypeConfig is a annotated class type which can be used for annotation in other places.~~ - ~~Each AgentTypeConfig can be easily extended when we have more meta data needs.~~ - ~~Strong assertion to ensure AgentType and AGENT_TO_CLASS are always consistent.~~ - ~~Made AGENT_TO_CLASS automatically generated.~~ ~~Test Plan:~~ - ~~since this change is focusing on annotation, lint is the major test focus.~~ - ~~lint, format and test passed on local.~~	1 year ago
Harrison Chase	be7a8e0824	Harrison/redis cache (#3766 ) Co-authored-by: Tyler Hutcherson <tyler.hutcherson@redis.com>	1 year ago
Mike Wang	b588446bf9	[simple][test] Added test case for schema.py (#3692 ) - added unittest for schema.py covering utility functions and token counting. - fixed a nit. based on huggingface doc, the tokenizer model is gpt-2. [link](https://huggingface.co/transformers/v4.8.2/_modules/transformers/models/gpt2/tokenization_gpt2_fast.html) - make lint && make format, passed on local - screenshot of new test running result <img width="1283" alt="Screenshot 2023-04-27 at 9 51 55 PM" src="https://user-images.githubusercontent.com/62768671/235057441-c0ac3406-9541-453f-ba14-3ebb08656114.png">	1 year ago
Jon Saginaw	f8d69e4e52	Enhancement: Blockchain Document Loader with better Metadata support (#3710 ) This PR includes some minor alignment updates, including: - metadata object extended to support contractAddress, blockchainType, and tokenId - notebook doc better aligned to standard langchain format - startToken changed from int to str to support multiple hex value types on the Alchemy API The updated metadata will look like the below. It's possible for a single contractAddress to exist across multiple blockchains (e.g. Ethereum, Polygon, etc.) so it's important to include the blockchainType. ``` metadata = {"source": self.contract_address, "blockchain": self.blockchainType, "tokenId": tokenId} ```	1 year ago
Davis Chase	220a7076ac	Add Mathpix pdf loader (#3727 ) Inspo https://twitter.com/danielgross/status/1651695062307274754?s=46&t=1zHLap5WG4I_kQPPjfW9fA Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	1 year ago
Harrison Chase	40f6e60e68	Harrison/stripe (#3762 ) Co-authored-by: Ismail Pelaseyed <homanp@gmail.com>	1 year ago
Rafal Wojdyla	160bfae93f	Add `DocstoreFn` - lookup doc via arbitrary function (#3760 ) This partially addresses https://github.com/hwchase17/langchain/issues/1524, but it's also useful for some of our use cases. This `DocstoreFn` allows to lookup a document given a function that accepts the `search` string without the need to implement a custom `Docstore`. This could be useful when: * you don't want to implement a `Docstore` just to provide a custom `search` * it's expensive to construct an `InMemoryDocstore`/dict * you retrieve documents from remote sources * you just want to reuse existing objects	1 year ago
Zander Chase	5042bd40d3	Add Shell Tool (#3335 ) Create an official bash shell tool to replace the dynamically generated one	1 year ago
Zander Chase	334c162f16	Add Other File Utilities (#3209 ) Add other File Utilities, include - List Directory - Search for file - Move - Copy - Remove file Bundle as toolkit Add a notebook that connects to the Chat Agent, which somewhat supports multi-arg input tools Update original read/write files to return the original dir paths and better handle unsupported file paths. Add unit tests	1 year ago
Zander Chase	da7b51455c	Dynamic tool -> single purpose (#3697 ) I think the logic of https://github.com/hwchase17/langchain/pull/3684#pullrequestreview-1405358565 is too confusing. I prefer this alternative because: - All `Tool()` implementations by default will be treated the same as before. No breaking changes. - Less reliance on pydantic magic - The decorator (which only is typed as returning a callable) can infer schema and generate a structured tool - Either way, the recommended way to create a custom tool is through inheriting from the base tool	1 year ago
Zander Chase	4654c58f72	Add validation on agent instantiation for multi-input tools (#3681 ) Tradeoffs here: - No lint-time checking for compatibility - Differs from JS package - The signature inference, etc. in the base tool isn't simple - The `args_schema` is optional Pros: - Forwards compatibility retained - Doesn't break backwards compatibility - User doesn't have to think about which class to subclass (single base tool or dynamic `Tool` interface regardless of input) - No need to change the load_tools, etc. interfaces Co-authored-by: Hasan Patel <mangafield@gmail.com>	1 year ago
Davis Chase	b807a114e4	Add query parsing unit tests (#3672 )	1 year ago
Eugene Yurtsev	708787dddb	Blob: Add validator and use future annotations (#3650 ) Minor changes to the Blob schema. --------- Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com>	1 year ago
Eugene Yurtsev	c5a4b4fea1	Suppress duckdb warning in unit tests explicitly (#3653 ) This catches the warning raised when using duckdb, asserts that it's as expected. The goal is to resolve all existing warnings to make unit-testing much stricter.	1 year ago
Eugene Yurtsev	e6c8cce050	Add unit-test to catch changes to required deps (#3662 ) This adds a unit test that can catch changes to required dependencies	1 year ago
Eugene Yurtsev	055f58960a	Fix pytest collection warning (#3651 ) Fixes a pytest collection warning because the test class starts with the prefix "Test"	1 year ago
plutopulp	6d6fd1b9e1	Add PipelineAI LLM integration (#3644 ) Add PipelineAI LLM integration	1 year ago
Harrison Chase	a35bbbfa9e	Harrison/lancedb (#3634 ) Co-authored-by: Minh Le <minhle@canva.com>	1 year ago
Eugene Yurtsev	5d02010763	Introduce Blob and Blob Loader interface (#3603 ) This PR introduces a Blob data type and a Blob loader interface. This is the first of a sequence of PRs that follows this proposal: https://github.com/hwchase17/langchain/pull/2833 The primary goals of these abstraction are: * Decouple content loading from content parsing code. * Help duplicated content loading code from document loaders. * Make lazy loading a default for langchain.	1 year ago
Harrison Chase	ab749fa1bb	Harrison/opensearch logic (#3631 ) Co-authored-by: engineer-matsuo <95115586+engineer-matsuo@users.noreply.github.com>	1 year ago
Ehsan M. Kermani	4a246e2fd6	Allow clearing cache and fix gptcache (#3493 ) This PR * Adds `clear` method for `BaseCache` and implements it for various caches * Adds the default `init_func=None` and fixes gptcache integtest * Since right now integtest is not running in CI, I've verified the changes by running `docs/modules/models/llms/examples/llm_caching.ipynb` (until proper e2e integtest is done in CI)	1 year ago
cs0lar	440c98e24b	Fix/issue 2695 (#3608 ) ## Background fixes #2695 ## Changes The `add_text` method uses the internal embedding function if one was passes to the `Weaviate` constructor. NOTE: the latest merge on the `Weaviate` class made the specification of a `weaviate_api_key` mandatory which might not be desirable for all users and connection methods (for example weaviate also support Embedded Weaviate which I am happy to add support to here if people think it's desirable). I wrapped the fetching of the api key into a try catch in order to allow the `weaviate_api_key` to be unspecified. Do let me know if this is unsatisfactory. ## Test Plan added test for `add_texts` method.	1 year ago
leo-gan	36c59e0c25	`Arxiv` document loader (#3627 ) It makes sense to use `arxiv` as another source of the documents for downloading. - Added the `arxiv` document_loader, based on the `utilities/arxiv.py:ArxivAPIWrapper` - added tests - added an example notebook - sorted `__all__` in `__init__.py` (otherwise it is hard to find a class in the very long list)	1 year ago
Zander Chase	443a893ffd	Align names of search tools (#3620 ) Tools for Bing, DDG and Google weren't consistent even though the underlying implementations were. All three services now have the same tools and implementations to easily switch and experiment when building chains.	1 year ago
Maciej Bryński	aa345a4bb7	Add get_text_separator parameter to BSHTMLLoader (#3551 ) By default get_text doesn't separate content of different HTML tag. Adding option for specifying separator helps with document splitting.	1 year ago
Zander Chase	ee670c448e	Persistent Bash Shell (#3580 ) Clean up linting and make more idiomatic by using an output parser --------- Co-authored-by: FergusFettes <fergusfettes@gmail.com>	1 year ago
Davis Chase	d18b0caf0e	Add Anthropic default request timeout (#3540 ) thanks @hitflame! --------- Co-authored-by: Wenqiang Zhao <hitzhaowenqiang@sina.com> Co-authored-by: delta@com <delta@com>	1 year ago
Roma	2b4e9a3efa	Add unit test for _merge_splits function (#3513 ) This commit adds a new unit test for the _merge_splits function in the text splitter. The new test verifies that the function merges text into chunks of the correct size and overlap, using a specified separator. The test passes on the current implementation of the function.	1 year ago
yakigac	f338d6251c	Add a test for cosmos db memory (#3525 ) Test for #3434 @eavanvalkenburg Initially, I was unaware and had submitted a pull request #3450 for the same purpose, but I have now repurposed the one I used for that. And it worked.	1 year ago
Harrison Chase	0fc0aa62f2	Harrison/blockchain docloader (#3491 ) Co-authored-by: Jon Saginaw <saginawj@users.noreply.github.com>	1 year ago
Harrison Chase	707741de58	Harrison/prediction guard (#3490 ) Co-authored-by: Daniel Whitenack <whitenack.daniel@gmail.com>	1 year ago
Harrison Chase	7257f9e015	Harrison/tfidf parameters (#3481 ) Co-authored-by: pao <go5kuramubon@gmail.com> Co-authored-by: KyoHattori <kyo.hattori@abejainc.com>	1 year ago
Harrison Chase	eda69b13f3	openai embeddings (#3488 )	1 year ago
Harrison Chase	408a0183cd	Harrison/weaviate (#3494 ) Co-authored-by: Nick Rubell <nick@rubell.com>	1 year ago
Mindaugas Sharskus	a4d85f7fd5	[Fix #3365 ]: Changed regex to cover new line before action serious (#3367 ) Fix for: [Changed regex to cover new line before action serious.](https://github.com/hwchase17/langchain/issues/3365) --- This PR fixes the issue where `ValueError: Could not parse LLM output:` was thrown on seems to be valid input. Changed regex to cover new lines before action serious (after the keywords "Action:" and "Action Input:"). regex101: https://regex101.com/r/CXl1kB/1 --------- Co-authored-by: msarskus <msarskus@cisco.com>	1 year ago
Davis Chase	b2564a6391	fix #3884 (#3475 ) fixes mar bug #3384	1 year ago
Zander Chase	416f3bdf11	Vwp/alpaca streaming (#3468 ) Co-authored-by: Luke Stanley <306671+lukestanley@users.noreply.github.com>	1 year ago
cs0lar	3033c6b964	fixes #1214 (#3003 ) ### Background Continuing to implement all the interface methods defined by the `VectorStore` class. This PR pertains to implementation of the `max_marginal_relevance_search_by_vector` method. ### Changes - a `max_marginal_relevance_search_by_vector` method implementation has been added in `weaviate.py` - tests have been added to the the new method - vcr cassettes have been added for the weaviate tests ### Test Plan Added tests for the `max_marginal_relevance_search_by_vector` implementation ### Change Safety - [x] I have added tests to cover my changes	1 year ago
Zander Chase	49122a96e7	Structured Tool Bugfixes (#3324 ) - Proactively raise error if a tool subclasses BaseTool, defines its own schema, but fails to add the type-hints - fix the auto-inferred schema of the decorator to strip the unneeded virtual kwargs from the schema dict Helps avoid silent instances of #3297	1 year ago
Davit Buniatyan	2c0023393b	Deep Lake mini upgrades (#3375 ) Improvements * set default num_workers for ingestion to 0 * upgraded notebooks for avoiding dataset creation ambiguity * added `force_delete_dataset_by_path` * bumped deeplake to 3.3.0 * creds arg passing to deeplake object that would allow custom S3 Notes * please double check if poetry is not messed up (thanks!) Asks * Would be great to create a shared slack channel for quick questions --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>	1 year ago
Zander Chase	20f530e9c5	Add Sentence Transformers Embeddings (#3409 ) Add embeddings based on the sentence transformers library. Add a notebook and integration tests. Co-authored-by: khimaros <me@khimaros.com>	1 year ago
Luke Harris	b4de839ed8	Several confluence loader improvements (#3300 ) This PR addresses several improvements: - Previously it was not possible to load spaces of more than 100 pages. The `limit` was being used both as an overall page limit and as a per request pagination limit. This, in combination with the fact that atlassian seem to use a server-side hard limit of 100 when page content is expanded, meant it wasn't possible to download >100 pages. Now `limit` is used only as a per-request pagination limit and `max_pages` is introduced as the way to limit the total number of pages returned by the paginator. - Document metadata now includes `source` (the source url), making it compatible with `RetrievalQAWithSourcesChain`. - It is now possible to include inline and footer comments. - It is now possible to pass `verify_ssl=False` and other parameters to the confluence object for use cases that require it.	1 year ago
Harrison Chase	a6664be79c	Harrison/myscale (#3352 ) Co-authored-by: Fangrui Liu <fangruil@moqi.ai> Co-authored-by: 刘方瑞 <fangrui.liu@outlook.com> Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>	1 year ago
Filip Haltmayer	215dcc2d26	Refactor Milvus/Zilliz (#3047 ) Refactoring milvus/zilliz to clean up and have a more consistent experience. Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>	1 year ago
Richy Wang	88a8f59aa7	Add a full PostgresSQL syntax database 'AnalyticDB' as vector store. (#3135 ) Hi there！ I'm excited to open this PR to add support for using a fully Postgres syntax compatible database 'AnalyticDB' as a vector. As AnalyticDB has been proved can be used with AutoGPT, ChatGPT-Retrieve-Plugin, and LLama-Index, I think it is also good for you. AnalyticDB is a distributed Alibaba Cloud-Native vector database. It works better when data comes to large scale. The PR includes: - [x] A new memory: AnalyticDBVector - [x] A suite of integration tests verifies the AnalyticDB integration I have read your [contributing guidelines](`72b7d76d79/.github/CONTRIBUTING.md`). And I have passed the tests below - [x] make format - [x] make lint - [x] make coverage - [x] make test	1 year ago
Zander Chase	05a8aa5447	Fix linting on master (#3327 )	1 year ago
Varun Srinivas	d2f922f525	Change in method name for creating an issue on JIRA (#3307 ) The awesome JIRA tool created by @zywilliamli calls the `create_issue()` method to create issues, however, the actual method is `issue_create()`. Details in the Documentation here: https://atlassian-python-api.readthedocs.io/jira.html#manage-issues	1 year ago

1 2 3 4 5 ...

337 Commits (20aad0bed193f4bd0368442383d98f177d41f4b6)