langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

Author	SHA1	Message	Date
Bagatur	6b5a970949	refactor(document_loaders): abstract page evaluation logic in PlaywrightURLLoader (#9995 ) This PR brings structural updates to `PlaywrightURLLoader`, aiming at making the code more readable and extensible through the abstraction of page evaluation logic. These changes also align this implementation with a similar structure used in LangChain.js. The key enhancements include: 1. Introduction of 'PlaywrightEvaluator', an abstract base class for all evaluators. 2. Creation of 'UnstructuredHtmlEvaluator', a concrete class implementing 'PlaywrightEvaluator', which uses `unstructured` library for processing page's HTML content. 3. Extension of 'PlaywrightURLLoader' constructor to optionally accept an evaluator of the type 'PlaywrightEvaluator'. It defaults to 'UnstructuredHtmlEvaluator' if no evaluator is provided. 4. Refactoring of 'load' and 'aload' methods to use the 'evaluate' and 'evaluate_async' methods of the provided 'PageEvaluator' for page content handling. This update brings flexibility to 'PlaywrightURLLoader' as it can now utilize different evaluators for page processing depending on the requirement. The abstraction also improves code maintainability and readability. Twitter: @ywkim	2023-08-31 00:45:33 -07:00
Hunsmore	13fef1e5d3	add bloomz_7b, llama-2-7b, llama-2-13b, llama-2-70b to ErnieBotChat (#10024 ) - Description: Add bloomz_7b, llama-2-7b, llama-2-13b, llama-2-70b to ErnieBotChat, which only supported ERNIE-Bot-turbo and ERNIE-Bot. - Issue: #10022, - Dependencies: no extra dependencies --------- Co-authored-by: hetianfeng <hetianfeng@meituan.com>	2023-08-31 00:38:55 -07:00
skspark	52a3e8a261	Add integration TCs on bing search (#8068 ) (#10021 ) ## Description Added integration TCs on bing search utility ## Issue #8068 ## Dependencies None	2023-08-31 00:34:06 -07:00
William FH	5341b04d68	Update error message (#9970 ) in evals	2023-08-30 17:42:55 -07:00
William FH	b82ad19ed2	Check memory address (#9971 ) Don't want to dup the collector but can have multiple	2023-08-30 15:30:22 -07:00
maks-operlejn-ds	a8f804a618	Add data anonymizer (#9863 ) ### Description The feature for anonymizing data has been implemented. In order to protect private data, such as when querying external APIs (OpenAI), it is worth pseudonymizing sensitive data to maintain full privacy. Anonynization consists of two steps: 1. Identification: Identify all data fields that contain personally identifiable information (PII). 2. Replacement: Replace all PIIs with pseudo values or codes that do not reveal any personal information about the individual but can be used for reference. We're not using regular encryption, because the language model won't be able to understand the meaning or context of the encrypted data. We use Microsoft Presidio together with Faker framework for anonymization purposes because of the wide range of functionalities they provide. The full implementation is available in `PresidioAnonymizer`. ### Future works - deanonymization - add the ability to reverse anonymization. For example, the workflow could look like this: `anonymize -> LLMChain -> deanonymize`. By doing this, we will retain anonymity in requests to, for example, OpenAI, and then be able restore the original data. - instance anonymization - at this point, each occurrence of PII is treated as a separate entity and separately anonymized. Therefore, two occurrences of the name John Doe in the text will be changed to two different names. It is therefore worth introducing support for full instance detection, so that repeated occurrences are treated as a single object. ### Twitter handle @deepsense_ai / @MaksOpp --------- Co-authored-by: MaksOpp <maks.operlejn@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-30 10:39:44 -07:00
Bagatur	b3e3a31240	bump 277 (#9997 )	2023-08-30 08:29:51 -07:00
Bagatur	9828701de1	mv base cache to schema (#9953 ) if you remove all other imports from langchain.init it exposes a circular dep	2023-08-30 08:10:51 -07:00
Christophe Bornet	9870bfb9cd	Add bucket and object key to metadata in S3 loader (#9317 ) - Description: this PR adds `s3_object_key` and `s3_bucket` to the doc metadata when loading an S3 file. This is particularly useful when using `S3DirectoryLoader` to remove the files from the dir once they have been processed (getting the object keys from the metadata `source` field seems brittle) - Dependencies: N/A - Tag maintainer: ? - Twitter handle: _cbornet --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-08-30 11:03:24 -04:00
Eugene Yurtsev	6da158388b	Merge branch 'master' into ywkim/master	2023-08-30 10:46:26 -04:00
Guy Korland	24c0b01c38	Extend the FalkorDB QA demo (#9992 ) - Description: Extend the FalkorDB QA demo - Tag maintainer: @baskaryan	2023-08-30 10:13:18 -04:00
Eugene Yurtsev	588237ef30	Make document serializable, create utility to create a docstore (#9674 ) This PR makes the following changes: 1. Documents become serializable using langhchain serialization 2. Make a utility to create a docstore kw store Will help to address issue here: https://github.com/langchain-ai/langchain/issues/9345	2023-08-30 09:45:04 -04:00
Buckler89	a28e888b36	fix call _get_keys for custom_evaluator (#9763 ) In the function _load_run_evaluators the function _get_keys was not called if only custom_evaluators parameter is used - Description: In the function _load_run_evaluators the function _get_keys was not called if only custom_evaluators parameter is used, - Issue: no issue created for this yet, - Dependencies: None, - Tag maintainer: @vowelparrot, - Twitter handle: Buckler89 --------- Co-authored-by: ddroghini <d.droghini@mflgroup.com>	2023-08-30 06:35:23 -07:00
Bagatur	2d2b097fab	mv chat history (#9725 )	2023-08-29 21:41:32 -07:00
Bagatur	d762a6b51f	rm mutable defaults (#9974 )	2023-08-29 20:36:27 -07:00
Arjun Aravindan	6a51672164	Update SeleniumURLLoader to use webdriver Service in favor of deprecated executable_path parameter (#9814 ) Description: This commit uses the new Service object in Selenium webdriver as executable_path has been [deprecated and removed in selenium version 4.11.2](`9f5801c82f`) Issue: https://github.com/langchain-ai/langchain/issues/9808 Tag Maintainer: @eyurtsev	2023-08-29 19:45:18 -07:00
William FH	c844aaa7a6	Weakref to tracer (#9954 ) Prevent memory/thread leakage	2023-08-29 19:27:22 -07:00
Jurik-001	a05fed9369	Fix add callbacks to spark_sql due to depreciation of callback_manager (#9831 ) Description: Due to depreciation (regarding to line 109 in [langchain/libs/langchain/langchain/chains/base.py](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/base.py) of callback_manager i replaced several parts Issue: None Dependencies: Maintainer: @baskaryan --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-29 19:23:44 -07:00
axiangcoding	ffa5625134	feat(llms): improve ERNIE-Bot chat model (#9833 ) - Description: improve ERNIE-Bot chat model, add request timeout and more testcases. - Issue: None - Dependencies: None - Tag maintainer: @baskaryan --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-29 18:20:06 -07:00
Bagatur	d966ba63e2	fixed GoogleCloudEnterpriseSearchRetriever returning an empty array (#9858 ) `GoogleCloudEnterpriseSearchRetriever` returned an empty array of documents earlier, fixed	2023-08-29 17:49:48 -07:00
Bagatur	ec362ecbe2	Fixed regex bug in RetrievalQAWithSources in previous update (#9898 ) - Description: In my previous PR, I had modified the code to catch all kinds of [SOURCES, sources, Source, Sources]. However, this change included checking for a colon or a white space which should actually have been only checking for a colon. - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change,	2023-08-29 17:32:24 -07:00
Nikhil Suresh	56a0165a4e	cleaned up unit test example	2023-08-29 23:37:54 +00:00
William FH	cedfad541d	don't emit none from eval config (#9963 )	2023-08-29 16:14:32 -07:00
Nikhil Suresh	b31475c622	minor updates to regex	2023-08-29 23:13:31 +00:00
Bagatur	8fb0a9594c	Add LLMonitor Callback Handler Integration - open-source observability & analytics (#9870 ) Adds support for [llmonitor](https://llmonitor.com) callbacks. It enables: - Requests tracking / logging / analytics - Error debugging - Cost analytics - User tracking Let me know if anythings neds to be changed for merge. Thank you!	2023-08-29 15:49:01 -07:00
William FH	d799963870	Wfh/async tool (#9878 ) Co-authored-by: Daniel Brenot <dbrenot@pelmorex.com> Co-authored-by: Daniel <daniel.alexander.brenot@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-29 15:37:41 -07:00
Bagatur	16eb935469	Fix for similarity_search_with_score (#9903 ) - Description: the implementation for similarity_search_with_score did not actually include a score or logic to filter. Now fixed. - Tag maintainer: @rlancemartin - Twitter handle: @ofermend	2023-08-29 15:04:48 -07:00
Bagatur	c70bb0ec28	Activeloopai runtime arg (#9961 )	2023-08-29 15:01:46 -07:00
Bagatur	0f85671630	fmt	2023-08-29 14:55:25 -07:00
Bagatur	78c014399f	fmt	2023-08-29 14:53:15 -07:00
Eugene Yurtsev	5cce6529a4	Speed up openai tests (#9943 ) Saves ~8-10 seconds from total unit tests times --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-29 14:30:41 -07:00
Guy Korland	7cbe872af8	Add support for Falkordb (ex-RedisGraph) (#9821 ) Replace this entire comment with: - Description: Add support for Falkordb (ex-RedisGraph) - Tag maintainer: @hwchase17 - Twitter handle: @g_korland	2023-08-29 14:22:33 -07:00
William FH	fbd792ac7c	Fix import (#9945 )	2023-08-29 12:38:42 -07:00
Zizhong Zhang	8bd7a9d18e	feat: PromptGuard takes a list of str (#9948 ) Recently we made the decision that PromptGuard takes a list of strings instead of a string. @ggroode implemented the integration change. --------- Co-authored-by: ggroode <ggroode@berkeley.edu> Co-authored-by: ggroode <46691276+ggroode@users.noreply.github.com>	2023-08-29 12:22:30 -07:00
Predrag Gruevski	8dbf4cbe80	Add notice about security-sensitive experimental code to experimental README. (#9936 ) It renders like this: https://github.com/langchain-ai/langchain/tree/pg/experimental-readme/libs/experimental ![image](https://github.com/langchain-ai/langchain/assets/2348618/a5f9569d-96f6-44c6-8559-921adb3e337d)	2023-08-29 14:21:30 -04:00
Predrag Gruevski	b5cd1e0fed	Add security notices on PAL and CPAL experimental chains. (#9938 ) Clearly document that the PAL and CPAL techniques involve generating code, and that such code must be properly sandboxed and given appropriate narrowly-scoped credentials in order to ensure security. While our implementations include some mitigations, Python and SQL sandboxing is well-known to be a very hard problem and our mitigations are no replacement for proper sandboxing and permissions management. The implementation of such techniques must be performed outside the scope of the Python process where this package's code runs, so its correct setup and administration must therefore be the responsibility of the user of this code.	2023-08-29 13:51:56 -04:00
Jan-Luca Barthel	f5faac8859	addition of cosine distance function for faiss (#9939 ) - Description: added the _cosine_relevance_score_fn to _select_relevance_score_fn of faiss.py to enable the use of cosine distance for similarity for this vector store and to comply with the Error Message, that implies, that cosine should be a valid distance strategy - Issue: no relevant Issue found, but needed this function myself and tested it in a private repo - Dependencies: none	2023-08-29 10:29:51 -07:00
Bagatur	d6957921f0	bump 276 (#9931 )	2023-08-29 08:00:38 -07:00
Tomaz Bratanic	db13fba7ea	Add neo4j vector support (#9770 ) Neo4j has added vector index integration just recently. To allow both ingestion and integrating it as vector RAG applications, I wrapped it as a vector store as the implementation is completely different from `GraphCypherQAChain`. Here, we are not generating any Cypher statements at query time, we are simply doing the vector similarity search using the new vector index as if we were dealing with a vector database. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-29 07:54:20 -07:00
Bagatur	49ebbe4bcd	fix pydantic import (#9930 )	2023-08-29 07:53:01 -07:00
Mike Nitsenko	c80e406e95	Cube semantic loader: allow cubes processing (#9927 ) We've started to receive feedback (after launch) that using only views is confusing. We're considering this as a good practice, as a view serves as a "facade" for your data - however, we decided to let users decide this on their own. Solves the questions from: - https://github.com/cube-js/cube/issues/7028 - https://github.com/langchain-ai/langchain/pull/9690	2023-08-29 07:21:01 -07:00
Nikhil Suresh	dd10cf945c	fixed minor linting issues	2023-08-29 14:15:59 +00:00
adilkhan	bbae8cb88f	Added runtime argument	2023-08-29 12:12:49 +06:00
Ofer Mendelevitch	4454204455	reformat black	2023-08-28 23:04:57 -07:00
Ofer Mendelevitch	318a21e267	fixed typo in spelling	2023-08-28 23:01:11 -07:00
hughcrt	e71f4760db	Change multiline comment width	2023-08-29 07:55:10 +02:00
Ofer Mendelevitch	a5450be32e	fixed lint	2023-08-28 22:31:39 -07:00
Ofer Mendelevitch	8b8d2a6535	fixed similarity_search_with_score to really use a score updated unit test with a test for score threshold Updated demo notebook	2023-08-28 22:26:55 -07:00
hughcrt	7979cef06a	Replace `\|` by `Union`	2023-08-29 06:22:50 +02:00
Nikhil Suresh	23ef836b48	matches colon and any number of white spaces after colon	2023-08-29 04:18:33 +00:00

1 2 3 4 5 ...

652 Commits