langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-31 15:20:26 +00:00

Author	SHA1	Message	Date
olgavrou	67dc1a9dd2	cleanup	2023-09-04 07:36:47 -04:00
olgavrou	ca163f0ee6	fixes and tests	2023-09-04 07:10:44 -04:00
olgavrou	b162f1c8e1	dot product of encodings as default auto_embed	2023-09-04 05:50:15 -04:00
olgavrou	a9ba6a8cd1	Merge pull request #9 from VowpalWabbit/fix_embedding_w_indexes proper embeddings and rolling window average	2023-09-01 10:07:53 -04:00
olgavrou	2b90a8afa2	Merge branch 'langchain-ai:master' into master	2023-09-01 04:10:49 -04:00
jmhayes3	324c86acd5	fix typo in web_research.py (#10076 ) fix spelling	2023-08-31 22:19:03 -07:00
olgavrou	2c877a4a34	proper embeddings and rolling window average	2023-08-31 20:14:41 -04:00
Davide Menini	3f8f3de28e	fix (parsers/json): do not escape double quotes if already escaped (#9916 ) This PR fixes an issues I found when upgrading to a more recent version of Langchain. I was using 0.0.142 before, and this issue popped up already when the `_custom_parser` was added to `output_parsers/json`. Anyway, the issue is that the parser tries to escape quotes when they are double-escaped (e.g. `\\"`), leading to OutputParserException. This is particularly undesired in my app, because I have an Agent that uses a single input Tool, which expects as input a JSON string with the structure: ```python { "foo": string, "bar": string } ``` The LLM (GPT3.5) response is (almost) always something like `"action_input": "{\\"foo\\": \\"bar\\", \\"bar\\": \\"foo\\"}"` and since the upgrade this is not correctly parsed. --------- Co-authored-by: taamedag <Davide.Menini@swisscom.com>	2023-08-31 17:11:52 -07:00
Harrison Chase	ad9e242a7a	add snippet for max concurrency (#9892 )	2023-08-31 16:52:28 -07:00
Harrison Chase	566ce06f4a	add async support for tools (#10058 )	2023-08-31 16:52:05 -07:00
Stefano Lottini	c710c7303f	fix wrong import line in cassandra doc page for vector store (#10041 ) This fixes the exampe import line in the general "cassandra" doc page mdx file. (it was erroneously a copy of the chat message history import statement found below).	2023-08-31 16:05:46 -07:00
Jon Bennion	cc6a20d3e6	updated prompt name in documentation for sequential chain (#10048 ) Description: updated the prompt name in a sequential chain example so that it is not overwritten by the same prompt name in the next chain (this is a sequential chain example) Issue: n/a Dependencies: none Tag maintainer: not known Twitter handle: not on twitter, feel free to use my git username for anything	2023-08-31 16:05:18 -07:00
Jiří Moravčík	86646ec555	feat: Add `ApifyWrapper` class (#10067 ) If you look at documentation https://python.langchain.com/docs/integrations/tools/apify (or the actual file https://github.com/langchain-ai/langchain/blob/master/docs/extras/integrations/tools/apify.ipynb ), there's a class `ApifyWrapper` mentioned. It seems it got lost in some refactoring, i.e. it does not exist in the codebase ATM. I just propose to add it back. It would fix issues e.g. https://github.com/langchain-ai/langchain/issues/8307 or https://github.com/langchain-ai/langchain/issues/8201 To add, Apify is a wanted integration, e.g. see https://twitter.com/hwchase17/status/1695490295914545626 or https://twitter.com/hwchase17/status/1695470765343461756 Lastly, I offer taking ownership of the Apify-related parts of the codebase, so you can tag me if anything is needed.	2023-08-31 15:47:44 -07:00
Robert Perrotta	02e51f4217	update_forward_refs for Run (#9969 ) Adds a call to Pydantic's `update_forward_refs` for the `Run` class (in addition to the `ChainRun` and `ToolRun` classes, for which that method is already called). Without it, the self-reference of child classes (type `List[Run]`) is problematic. For example: ```python from langchain.callbacks import StdOutCallbackHandler from langchain.chains import LLMChain from langchain.llms import OpenAI from langchain.prompts import PromptTemplate from wandb.integration.langchain import WandbTracer llm = OpenAI() prompt = PromptTemplate.from_template("1 + {number} = ") chain = LLMChain(llm=llm, prompt=prompt, callbacks=[StdOutCallbackHandler(), WandbTracer()]) print(chain.run(number=2)) ``` results in the following output before the change ``` WARNING:root:Error in on_chain_start callback: field "child_runs" not yet prepared so type is still a ForwardRef, you might need to call Run.update_forward_refs(). > Entering new LLMChain chain... Prompt after formatting: 1 + 2 = WARNING:root:Error in on_chain_end callback: No chain Run found to be traced > Finished chain. 3 ``` but afterwards the callback error messages are gone.	2023-08-31 15:25:59 -07:00
Eugene Yurtsev	74fcfed4e2	lint for pydantic imports (#9937 ) Catch pydantic imports	2023-08-31 15:55:29 -04:00
Zizhong Zhang	641b71e2cd	refactor: rename to OpaquePrompts (#10013 ) Renamed to OpaquePrompts cc @baskaryan Thanks in advance!	2023-08-31 12:21:24 -07:00
Bagatur	8d66b00c73	Data anonymizer notebook nit (#10062 )	2023-08-31 10:58:13 -07:00
Bagatur	19400ba253	bump 278 (#10052 )	2023-08-31 07:35:42 -07:00
Bagatur	29270e0378	fix #3117 (#9957 ) fix #3117	2023-08-31 07:29:49 -07:00
Bagatur	5b913003e0	bump	2023-08-31 07:27:56 -07:00
Bagatur	4b15328767	Add indexing support for postgresql (#9933 ) Add support to postgresql for the SQL Manager Record This code was tested locally. I'm looking at how to add testing with postgres in a separate PR.	2023-08-31 07:27:09 -07:00
olgavrou	b7d0e4835e	Merge branch 'langchain-ai:master' into master	2023-08-31 08:02:14 -04:00
Bagatur	e60e1cdf23	fixed openai_functions api_response format args err (#9968 ) root cause: args may not have a key (params) resulting in an error	2023-08-31 00:49:19 -07:00
Bagatur	3efab8d3df	implement vectorstores by tencent vectordb (#9989 ) Hi there！ I'm excited to open this PR to add support for using 'Tencent Cloud VectorDB' as a vector store. Tencent Cloud VectorDB is a fully-managed, self-developed, enterprise-level distributed database service designed for storing, retrieving, and analyzing multi-dimensional vector data. The database supports multiple index types and similarity calculation methods, with a single index supporting vector scales up to 1 billion and capable of handling millions of QPS with millisecond-level query latency. Tencent Cloud VectorDB not only provides external knowledge bases for large models to improve their accuracy, but also has wide applications in AI fields such as recommendation systems, NLP services, computer vision, and intelligent customer service. The PR includes: Implementation of Vectorstore. I have read your [contributing guidelines](`72b7d76d79/.github/CONTRIBUTING.md`). And I have passed the tests below make format make lint make coverage make test	2023-08-31 00:48:25 -07:00
Bagatur	d43a36c32a	Bagatur/dereference tool schema (#10007 ) fix for #9375	2023-08-31 00:48:12 -07:00
Bagatur	6b5a970949	refactor(document_loaders): abstract page evaluation logic in PlaywrightURLLoader (#9995 ) This PR brings structural updates to `PlaywrightURLLoader`, aiming at making the code more readable and extensible through the abstraction of page evaluation logic. These changes also align this implementation with a similar structure used in LangChain.js. The key enhancements include: 1. Introduction of 'PlaywrightEvaluator', an abstract base class for all evaluators. 2. Creation of 'UnstructuredHtmlEvaluator', a concrete class implementing 'PlaywrightEvaluator', which uses `unstructured` library for processing page's HTML content. 3. Extension of 'PlaywrightURLLoader' constructor to optionally accept an evaluator of the type 'PlaywrightEvaluator'. It defaults to 'UnstructuredHtmlEvaluator' if no evaluator is provided. 4. Refactoring of 'load' and 'aload' methods to use the 'evaluate' and 'evaluate_async' methods of the provided 'PageEvaluator' for page content handling. This update brings flexibility to 'PlaywrightURLLoader' as it can now utilize different evaluators for page processing depending on the requirement. The abstraction also improves code maintainability and readability. Twitter: @ywkim	2023-08-31 00:45:33 -07:00
Bagatur	b1644bc9ad	cr	2023-08-31 00:43:34 -07:00
Hunsmore	13fef1e5d3	add bloomz_7b, llama-2-7b, llama-2-13b, llama-2-70b to ErnieBotChat (#10024 ) - Description: Add bloomz_7b, llama-2-7b, llama-2-13b, llama-2-70b to ErnieBotChat, which only supported ERNIE-Bot-turbo and ERNIE-Bot. - Issue: #10022, - Dependencies: no extra dependencies --------- Co-authored-by: hetianfeng <hetianfeng@meituan.com>	2023-08-31 00:38:55 -07:00
Cameron Vetter	e37d51cab6	fix scoring profile example (#10016 ) - Description: A change in the documentation example for Azure Cognitive Vector Search with Scoring Profile so the example works as written - Issue: #10015 - Dependencies: None - Tag maintainer: @baskaryan @ruoccofabrizio - Twitter handle: @poshporcupine	2023-08-31 00:35:06 -07:00
skspark	52a3e8a261	Add integration TCs on bing search (#8068 ) (#10021 ) ## Description Added integration TCs on bing search utility ## Issue #8068 ## Dependencies None	2023-08-31 00:34:06 -07:00
Hyeokjun seo	e2e05ad89e	Fix Typo : `openai_api_key` -> `serpapi_api_key` (#10020 ) Fixed typo in the comments Notebook. (which says `openai_api_key` for SerpAPI)	2023-08-31 00:33:13 -07:00
Tomaz Bratanic	f2e8399cc8	Fix link in Neo4j provider page (#10023 )	2023-08-31 00:32:42 -07:00
William FH	5341b04d68	Update error message (#9970 ) in evals	2023-08-30 17:42:55 -07:00
William FH	b82ad19ed2	Check memory address (#9971 ) Don't want to dup the collector but can have multiple	2023-08-30 15:30:22 -07:00
Bagatur	e805f8e263	add tests	2023-08-30 15:23:02 -07:00
Bagatur	1f5c579ef4	add	2023-08-30 13:37:50 -07:00
Bagatur	240cc289e6	wip	2023-08-30 13:37:39 -07:00
Bagatur	7fa82900cb	guides docs nits (#10005 )	2023-08-30 11:07:42 -07:00
Bagatur	2f03e71e67	rename local llm guide (#10004 )	2023-08-30 10:52:46 -07:00
Bagatur	781f274d19	make privacy guide section (#10003 )	2023-08-30 10:49:20 -07:00
maks-operlejn-ds	a8f804a618	Add data anonymizer (#9863 ) ### Description The feature for anonymizing data has been implemented. In order to protect private data, such as when querying external APIs (OpenAI), it is worth pseudonymizing sensitive data to maintain full privacy. Anonynization consists of two steps: 1. Identification: Identify all data fields that contain personally identifiable information (PII). 2. Replacement: Replace all PIIs with pseudo values or codes that do not reveal any personal information about the individual but can be used for reference. We're not using regular encryption, because the language model won't be able to understand the meaning or context of the encrypted data. We use Microsoft Presidio together with Faker framework for anonymization purposes because of the wide range of functionalities they provide. The full implementation is available in `PresidioAnonymizer`. ### Future works - deanonymization - add the ability to reverse anonymization. For example, the workflow could look like this: `anonymize -> LLMChain -> deanonymize`. By doing this, we will retain anonymity in requests to, for example, OpenAI, and then be able restore the original data. - instance anonymization - at this point, each occurrence of PII is treated as a separate entity and separately anonymized. Therefore, two occurrences of the name John Doe in the text will be changed to two different names. It is therefore worth introducing support for full instance detection, so that repeated occurrences are treated as a single object. ### Twitter handle @deepsense_ai / @MaksOpp --------- Co-authored-by: MaksOpp <maks.operlejn@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-30 10:39:44 -07:00
Bagatur	98cce7dcd3	update moderation docs (#10002 )	2023-08-30 10:34:25 -07:00
Bagatur	b3e3a31240	bump 277 (#9997 )	2023-08-30 08:29:51 -07:00
Bagatur	9828701de1	mv base cache to schema (#9953 ) if you remove all other imports from langchain.init it exposes a circular dep	2023-08-30 08:10:51 -07:00
Christophe Bornet	9870bfb9cd	Add bucket and object key to metadata in S3 loader (#9317 ) - Description: this PR adds `s3_object_key` and `s3_bucket` to the doc metadata when loading an S3 file. This is particularly useful when using `S3DirectoryLoader` to remove the files from the dir once they have been processed (getting the object keys from the metadata `source` field seems brittle) - Dependencies: N/A - Tag maintainer: ? - Twitter handle: _cbornet --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-08-30 11:03:24 -04:00
Eugene Yurtsev	6da158388b	Merge branch 'master' into ywkim/master	2023-08-30 10:46:26 -04:00
Guy Korland	24c0b01c38	Extend the FalkorDB QA demo (#9992 ) - Description: Extend the FalkorDB QA demo - Tag maintainer: @baskaryan	2023-08-30 10:13:18 -04:00
Eugene Yurtsev	588237ef30	Make document serializable, create utility to create a docstore (#9674 ) This PR makes the following changes: 1. Documents become serializable using langhchain serialization 2. Make a utility to create a docstore kw store Will help to address issue here: https://github.com/langchain-ai/langchain/issues/9345	2023-08-30 09:45:04 -04:00
Eugene Yurtsev	e8f29be350	x	2023-08-30 09:36:27 -04:00
Buckler89	a28e888b36	fix call _get_keys for custom_evaluator (#9763 ) In the function _load_run_evaluators the function _get_keys was not called if only custom_evaluators parameter is used - Description: In the function _load_run_evaluators the function _get_keys was not called if only custom_evaluators parameter is used, - Issue: no issue created for this yet, - Dependencies: None, - Tag maintainer: @vowelparrot, - Twitter handle: Buckler89 --------- Co-authored-by: ddroghini <d.droghini@mflgroup.com>	2023-08-30 06:35:23 -07:00

1 2 3 4 5 ...

4288 Commits