langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-08 07:10:35 +00:00

Author	SHA1	Message	Date
cs0lar	440c98e24b	Fix/issue 2695 (#3608 ) ## Background fixes #2695 ## Changes The `add_text` method uses the internal embedding function if one was passes to the `Weaviate` constructor. NOTE: the latest merge on the `Weaviate` class made the specification of a `weaviate_api_key` mandatory which might not be desirable for all users and connection methods (for example weaviate also support Embedded Weaviate which I am happy to add support to here if people think it's desirable). I wrapped the fetching of the api key into a try catch in order to allow the `weaviate_api_key` to be unspecified. Do let me know if this is unsatisfactory. ## Test Plan added test for `add_texts` method.	2023-04-26 21:45:03 -07:00
leo-gan	36c59e0c25	`Arxiv` document loader (#3627 ) It makes sense to use `arxiv` as another source of the documents for downloading. - Added the `arxiv` document_loader, based on the `utilities/arxiv.py:ArxivAPIWrapper` - added tests - added an example notebook - sorted `__all__` in `__init__.py` (otherwise it is hard to find a class in the very long list)	2023-04-26 21:04:56 -07:00
Zander Chase	443a893ffd	Align names of search tools (#3620 ) Tools for Bing, DDG and Google weren't consistent even though the underlying implementations were. All three services now have the same tools and implementations to easily switch and experiment when building chains.	2023-04-26 16:21:34 -07:00
Maciej Bryński	aa345a4bb7	Add get_text_separator parameter to BSHTMLLoader (#3551 ) By default get_text doesn't separate content of different HTML tag. Adding option for specifying separator helps with document splitting.	2023-04-26 16:10:16 -07:00
Zander Chase	ee670c448e	Persistent Bash Shell (#3580 ) Clean up linting and make more idiomatic by using an output parser --------- Co-authored-by: FergusFettes <fergusfettes@gmail.com>	2023-04-26 15:20:28 -07:00
Davis Chase	d18b0caf0e	Add Anthropic default request timeout (#3540 ) thanks @hitflame! --------- Co-authored-by: Wenqiang Zhao <hitzhaowenqiang@sina.com> Co-authored-by: delta@com <delta@com>	2023-04-25 11:40:41 -07:00
Roma	2b4e9a3efa	Add unit test for _merge_splits function (#3513 ) This commit adds a new unit test for the _merge_splits function in the text splitter. The new test verifies that the function merges text into chunks of the correct size and overlap, using a specified separator. The test passes on the current implementation of the function.	2023-04-25 10:02:59 -07:00
yakigac	f338d6251c	Add a test for cosmos db memory (#3525 ) Test for #3434 @eavanvalkenburg Initially, I was unaware and had submitted a pull request #3450 for the same purpose, but I have now repurposed the one I used for that. And it worked.	2023-04-25 08:10:02 -07:00
Harrison Chase	0fc0aa62f2	Harrison/blockchain docloader (#3491 ) Co-authored-by: Jon Saginaw <saginawj@users.noreply.github.com>	2023-04-25 08:07:06 -07:00
Harrison Chase	707741de58	Harrison/prediction guard (#3490 ) Co-authored-by: Daniel Whitenack <whitenack.daniel@gmail.com>	2023-04-24 22:27:22 -07:00
Harrison Chase	7257f9e015	Harrison/tfidf parameters (#3481 ) Co-authored-by: pao <go5kuramubon@gmail.com> Co-authored-by: KyoHattori <kyo.hattori@abejainc.com>	2023-04-24 22:19:58 -07:00
Harrison Chase	eda69b13f3	openai embeddings (#3488 )	2023-04-24 22:19:47 -07:00
Harrison Chase	408a0183cd	Harrison/weaviate (#3494 ) Co-authored-by: Nick Rubell <nick@rubell.com>	2023-04-24 22:15:32 -07:00
Mindaugas Sharskus	a4d85f7fd5	[Fix #3365 ]: Changed regex to cover new line before action serious (#3367 ) Fix for: [Changed regex to cover new line before action serious.](https://github.com/hwchase17/langchain/issues/3365) --- This PR fixes the issue where `ValueError: Could not parse LLM output:` was thrown on seems to be valid input. Changed regex to cover new lines before action serious (after the keywords "Action:" and "Action Input:"). regex101: https://regex101.com/r/CXl1kB/1 --------- Co-authored-by: msarskus <msarskus@cisco.com>	2023-04-24 22:05:31 -07:00
Davis Chase	b2564a6391	fix #3884 (#3475 ) fixes mar bug #3384	2023-04-24 19:54:15 -07:00
Zander Chase	416f3bdf11	Vwp/alpaca streaming (#3468 ) Co-authored-by: Luke Stanley <306671+lukestanley@users.noreply.github.com>	2023-04-24 16:27:51 -07:00
cs0lar	3033c6b964	fixes #1214 (#3003 ) ### Background Continuing to implement all the interface methods defined by the `VectorStore` class. This PR pertains to implementation of the `max_marginal_relevance_search_by_vector` method. ### Changes - a `max_marginal_relevance_search_by_vector` method implementation has been added in `weaviate.py` - tests have been added to the the new method - vcr cassettes have been added for the weaviate tests ### Test Plan Added tests for the `max_marginal_relevance_search_by_vector` implementation ### Change Safety - [x] I have added tests to cover my changes	2023-04-24 11:50:55 -07:00
Zander Chase	49122a96e7	Structured Tool Bugfixes (#3324 ) - Proactively raise error if a tool subclasses BaseTool, defines its own schema, but fails to add the type-hints - fix the auto-inferred schema of the decorator to strip the unneeded virtual kwargs from the schema dict Helps avoid silent instances of #3297	2023-04-24 09:58:29 -07:00
Davit Buniatyan	2c0023393b	Deep Lake mini upgrades (#3375 ) Improvements * set default num_workers for ingestion to 0 * upgraded notebooks for avoiding dataset creation ambiguity * added `force_delete_dataset_by_path` * bumped deeplake to 3.3.0 * creds arg passing to deeplake object that would allow custom S3 Notes * please double check if poetry is not messed up (thanks!) Asks * Would be great to create a shared slack channel for quick questions --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>	2023-04-23 21:23:54 -07:00
Zander Chase	20f530e9c5	Add Sentence Transformers Embeddings (#3409 ) Add embeddings based on the sentence transformers library. Add a notebook and integration tests. Co-authored-by: khimaros <me@khimaros.com>	2023-04-23 18:25:20 -07:00
Luke Harris	b4de839ed8	Several confluence loader improvements (#3300 ) This PR addresses several improvements: - Previously it was not possible to load spaces of more than 100 pages. The `limit` was being used both as an overall page limit and as a per request pagination limit. This, in combination with the fact that atlassian seem to use a server-side hard limit of 100 when page content is expanded, meant it wasn't possible to download >100 pages. Now `limit` is used only as a per-request pagination limit and `max_pages` is introduced as the way to limit the total number of pages returned by the paginator. - Document metadata now includes `source` (the source url), making it compatible with `RetrievalQAWithSourcesChain`. - It is now possible to include inline and footer comments. - It is now possible to pass `verify_ssl=False` and other parameters to the confluence object for use cases that require it.	2023-04-23 15:06:10 -07:00
Harrison Chase	a6664be79c	Harrison/myscale (#3352 ) Co-authored-by: Fangrui Liu <fangruil@moqi.ai> Co-authored-by: 刘方瑞 <fangrui.liu@outlook.com> Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>	2023-04-22 09:17:38 -07:00
Filip Haltmayer	215dcc2d26	Refactor Milvus/Zilliz (#3047 ) Refactoring milvus/zilliz to clean up and have a more consistent experience. Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>	2023-04-22 08:26:19 -07:00
Richy Wang	88a8f59aa7	Add a full PostgresSQL syntax database 'AnalyticDB' as vector store. (#3135 ) Hi there！ I'm excited to open this PR to add support for using a fully Postgres syntax compatible database 'AnalyticDB' as a vector. As AnalyticDB has been proved can be used with AutoGPT, ChatGPT-Retrieve-Plugin, and LLama-Index, I think it is also good for you. AnalyticDB is a distributed Alibaba Cloud-Native vector database. It works better when data comes to large scale. The PR includes: - [x] A new memory: AnalyticDBVector - [x] A suite of integration tests verifies the AnalyticDB integration I have read your [contributing guidelines](`72b7d76d79/.github/CONTRIBUTING.md`). And I have passed the tests below - [x] make format - [x] make lint - [x] make coverage - [x] make test	2023-04-22 08:25:41 -07:00
Zander Chase	05a8aa5447	Fix linting on master (#3327 )	2023-04-21 15:49:46 -07:00
Varun Srinivas	d2f922f525	Change in method name for creating an issue on JIRA (#3307 ) The awesome JIRA tool created by @zywilliamli calls the `create_issue()` method to create issues, however, the actual method is `issue_create()`. Details in the Documentation here: https://atlassian-python-api.readthedocs.io/jira.html#manage-issues	2023-04-21 13:01:33 -07:00
Paul Garner	aa9d5707e0	Add PythonLoader which auto-detects encoding of Python files (#3311 ) This PR contributes a `PythonLoader`, which inherits from `TextLoader` but detects and sets the encoding automatically.	2023-04-21 10:47:57 -07:00
Davis Chase	2fd24d31a4	Cleanup integration test dir (#3308 )	2023-04-21 09:44:09 -07:00
Naveen Tatikonda	bb6c459f7a	OpenSearch: Add Support for Lucene Filter (#3201 ) ### Description Add Support for Lucene Filter. When you specify a Lucene filter for a k-NN search, the Lucene algorithm decides whether to perform an exact k-NN search with pre-filtering or an approximate search with modified post-filtering. This filter is supported only for approximate search with the indexes that are created using `lucene` engine. OpenSearch Documentation - https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#lucene-k-nn-filter-implementation Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-04-20 20:42:53 -07:00
Davis Chase	46542dc774	Contextual compression retriever (#2915 ) Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-20 17:01:14 -07:00
Gabriel Altay	34fb56b633	fix copy/pasta typos wikipedia->arxiv (#3222 ) just updates a few module level docstrings from Wikipedia -> Arxiv	2023-04-20 07:15:41 -07:00
Harrison Chase	d2520a5f1e	Harrison/ddg (#3206 ) Co-authored-by: itai <itai.marks@gmail.com> Co-authored-by: Itai Marks <itaim@users.noreply.github.com> Co-authored-by: Tianyi Pan <60060750+tipani86@users.noreply.github.com> Co-authored-by: Tianyi Pan <tianyi.pan@clobotics.com> Co-authored-by: Adilzhan Ismailov <13088690+aismlv@users.noreply.github.com> Co-authored-by: Justin Flick <Justinjayflick@gmail.com> Co-authored-by: Justin Flick <jflick@homesite.com>	2023-04-19 21:32:26 -07:00
Harrison Chase	9a0356d276	Harrison/file chat history (#3198 ) Co-authored-by: Young Lee <joybro201@gmail.com>	2023-04-19 21:05:20 -07:00
Harrison Chase	9181cd9b22	Harrison/playwright selector (#3185 ) Co-authored-by: zhyuri <4649294+zhyuri@users.noreply.github.com>	2023-04-19 16:54:15 -07:00
Harrison Chase	68cd37175e	Harrison/arxiv tool (#3186 ) Co-authored-by: leo-gan <leo.gan.57@gmail.com>	2023-04-19 16:53:34 -07:00
Zander Chase	4adfd790f0	Update File Management Tools to Include Root Directory (#3112 ) - Permit the specification of a `root_dir` to the read/write file tools to specify a working directory - Add validation for attempts to read/write outside the directory (e.g., through `../../` or symlinks or `/abs/path`'s that don't lie in the correct path) - Add some tests for all One question is whether we should make a default root directory for these? tradeoffs either way	2023-04-19 16:46:10 -07:00
engkheng	dbbc340f25	Validate `input_variables` when using `jinja2` templates (#3140 ) `langchain.prompts.PromptTemplate` and `langchain.prompts.FewShotPromptTemplate` do not validate `input_variables` when initialized as `jinja2` template. ```python # Using langchain v0.0.144 template = """"\ Your variable: {{ foo }} {% if bar %} You just set bar boolean variable to true {% endif %} """ # Missing variable, should raise ValueError prompt_template = PromptTemplate(template=template, input_variables=["bar"], template_format="jinja2", validate_template=True) # Extra variable, should raise ValueError prompt_template = PromptTemplate(template=template, input_variables=["bar", "foo", "extra", "thing"], template_format="jinja2", validate_template=True) ```	2023-04-19 16:18:32 -07:00
Zander Chase	90ef705ced	Update Tool Input (#3103 ) - Remove dynamic model creation in the `args()` property. _Only infer for the decorator (and add an argument to NOT infer if someone wishes to only pass as a string)_ - Update the validation example to make it less likely to be misinterpreted as a "safe" way to run a repl There is one example of "Multi-argument tools" in the custom_tools.ipynb from yesterday, but we could add more. The output parsing for the base MRKL agent hasn't been adapted to handle structured args at this point in time --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-18 18:18:33 -07:00
Harrison Chase	aad0a498ac	Harrison/output error (#3094 ) Co-authored-by: yummydum <sumita@nowcast.co.jp>	2023-04-18 08:59:56 -07:00
Harrison Chase	db968284f8	tools refactor (#2961 ) Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>	2023-04-17 21:35:29 -07:00
engkheng	19febc77d6	Support inference of `input_variables` from `jinja2` template (#3013 ) `langchain.prompts.PromptTemplate` is unable to infer `input_variables` from jinja2 template. ```python # Using langchain v0.0.141 template_string = """\ Hello world Your variable: {{ var }} {# This will not get rendered #} {% if verbose %} Congrats! You just turned on verbose mode and got extra messages! {% endif %} """ template = PromptTemplate.from_template(template_string, template_format="jinja2") print(template.input_variables) # Output ['# This will not get rendered #', '% endif %', '% if verbose %'] ``` --------- Co-authored-by: engkheng <ongengkheng929@example.com>	2023-04-17 20:31:03 -07:00
Nuno Campos	dac32c59e5	Nc/combining output parser (#3014 ) Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>	2023-04-17 20:29:53 -07:00
Davis Chase	19c85aa990	Factor out doc formatting and add validation (#3026 ) @cnhhoang850 slightly more generic fix for #2944, works for whatever the expected metadata keys are not just `source`	2023-04-17 20:28:01 -07:00
Naveen Tatikonda	3453b7457c	OpenSearch: Add Support for Boolean Filter with ANN search (#3038 ) ### Description Add Support for Boolean Filter with ANN search Documentation - https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#boolean-filter-with-ann-search ### Issues Resolved https://github.com/hwchase17/langchain/issues/2924 Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-04-17 20:26:26 -07:00
Harrison Chase	afd3e70ae5	Harrison/confluent loader (#2994 ) Co-authored-by: Justin Flick <Justinjayflick@gmail.com>	2023-04-17 20:23:45 -07:00
vowelparrot	99c0382209	Generative Characters (#2859 ) Add a time-weighted memory retriever and a notebook that approximates a Generative Agent from https://arxiv.org/pdf/2304.03442.pdf The "daily plan" components are removed for now since they are less useful without a virtual world, but the memory is an interesting component to build off. --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-04-16 21:41:00 -07:00
Jan Backes	a9310a3e8b	Add Annoy as VectorStore (#2939 ) Adds Annoy (https://github.com/spotify/annoy) as vector Store. RESOLVES hwchase17/langchain#2842 discord ref: https://discord.com/channels/1038097195422978059/1051632794427723827/1096089994168377354 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>	2023-04-16 13:44:04 -07:00
Harrison Chase	e12e00df12	use output parsers in agents (#2987 )	2023-04-16 13:15:21 -07:00
cs0lar	8b9e02da9d	Fix/issue 1213 (#2932 ) ### Background Continuing to implement all the interface methods defined by the `VectorStore` class. This PR pertains to implementation of the `max_marginal_relevance_search` method. ### Changes - a `max_marginal_relevance_search` method implementation has been added in `weaviate.py` - tests have been added to the the new method - vcr cassettes have been added for the weaviate tests ### Test Plan Added tests for the `max_marginal_relevance_search` implementation ### Change Safety - [x] I have added tests to cover my changes	2023-04-16 13:11:30 -07:00
vowelparrot	5ca7ce77cd	Remove pythonrepl from LLM-MathChain (#2943 ) Use numexpr evaluate instead of the python REPL to avoid malicious code injection. Tested against the (limited) math dataset and got the same score as before. For more permissive tools (like the REPL tool itself), other approaches ought to be provided (some combination of Sanitizer + Restricted python + unprivileged-docker + ...), but for a calculator tool, only mathematical expressions should be permitted. See https://github.com/hwchase17/langchain/issues/814	2023-04-16 08:50:32 -07:00

1 2 3 4 5 ...

313 Commits