langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-08 07:10:35 +00:00

Author	SHA1	Message	Date
shibuiwilliam	2759e2d857	add save and load tfidf vectorizer and docs for TFIDFRetriever (#8112 ) This is to add save_local and load_local to tfidf_vectorizer and docs in tfidf_retriever to make the vectorizer reusable. <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: add save_local and load_local to tfidf_vectorizer and docs in tfidf_retriever - Issue: None - Dependencies: None - Tag maintainer: @rlancemartin, @eyurtsev - Twitter handle: @MlopsJ Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-08-03 23:06:27 -07:00
aerickson-clt	0f68054401	Issue #8089 Improve painless script scoring with params.query_value. (#8086 ) This is a minor improvement that replaces the full query_vector with the reference string `params.query_value` used in the painless scripting docs. I have tested it manually and it works on an example. This makes the query about half the size and much easier to read. https://opensearch.org/docs/latest/search-plugins/knn/painless-functions/#get-started-with-k-nns-painless-scripting-functions @babbldev #8089 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-08-03 23:06:17 -07:00
linpan	0ead8ea708	typo: ignored to ignore (#8740 ) <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md -->	2023-08-03 23:05:59 -07:00
aerickson-clt	c7ea6e9ff8	Issue 8081 Fix query results size bug. Other bug: pass vector_field param. (#8085 ) @baskaryan #8081 Likely the reason why the issue occurred is that OpenSearch's default k is 10, so it needs to be specified. Here's a similar question about its cousin ElasticSearch https://discuss.elastic.co/t/elasticsearch-returns-only-10-records-but-the-hit-is-507/136605 I tested this manually and also fixed the same issue in `_default_painless_scripting_query`. In addition, `_default_painless_scripting_query` was not passing the `vector_field` name to a sub call, so I fixed that too. ![image](https://github.com/hwchase17/langchain/assets/32244272/cfb7aad1-f701-49d9-9beb-a723aa276817) I also tested this in the aws opensearch developer tools. ![image](https://github.com/hwchase17/langchain/assets/32244272/24544682-1578-4bbb-9eb5-980463c5b41b) --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-08-03 22:41:11 -07:00
Sidchat95	812419d946	Removing score threshold parameter of faiss _similarity_search_with_r… (#8093 ) Removing score threshold parameter of faiss _similarity_search_with_relevance_scores as the thresholding part is implemented in similarity_search_with_relevance_scores method which calls this method. As this method is supposed to be a private method of faiss.py this will never receive the score threshold parameter as it is popped in the super method similarity_search_with_relevance_scores. @baskaryan @hwchase17	2023-08-03 21:31:43 -07:00
Mathias Panzenböck	873a80e496	Reduce generation of temporary objects (#7950 ) Just a tiny change to use `list.append(...)` and `list.extend(...)` instead of `list += [...]` so that no unnecessary temporary lists are created. Since its a tiny miscellaneous thing I guess @baskaryan is the maintainer to tag? --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-08-03 21:24:08 -07:00
Lance Martin	d1b95db874	Retriever that can re-phase user inputs (#8026 ) Simple retriever that applies an LLM between the user input and the query pass the to retriever. It can be used to pre-process the user input in any way. The default prompt: ``` DEFAULT_QUERY_PROMPT = PromptTemplate( input_variables=["question"], template="""You are an assistant tasked with taking a natural languge query from a user and converting it into a query for a vectorstore. In this process, you strip out information that is not relevant for the retrieval task. Here is the user query: {question} """ ) ``` --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-08-03 21:23:59 -07:00
Harrison Chase	6c3573e7f6	Harrison/aleph alpha (#8735 ) Co-authored-by: PiotrMazurek <piotr.mazurek@aleph-alpha.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-03 21:21:15 -07:00
Wilson Leao Neto	179a39954d	Provides access to a Document page_content formatter in the AmazonKendraRetriever (#8034 ) - Description: - Provides a new attribute in the AmazonKendraRetriever which processes a ResultItem and returns a string that will be used as page_content; - The excerpt metadata should not be changed, it will be kept as was retrieved. But it is cleaned when composing the page_content; - Refactors the AmazonKendraRetriever to improve code reusability; - Issue: #7787 - Tag maintainer: @3coins @baskaryan - Twitter handle: wilsonleao Why? Some use cases need to adjust the page_content by dynamically combining the ResultItem attributes depending on the context of the item.	2023-08-03 20:54:49 -07:00
Ilya	6f0bccfeb5	Add regex control over separators in character text splitter (#7933 ) <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #7854 Added the ability to use the `separator` ase a regex or a simple character. Fixed a bug where `start_index` was incorrectly counting from -1. Who can review? @eyurtsev @hwchase17 @mmz-001	2023-08-03 20:25:23 -07:00
Vasileios Mansolas	e68a1d73d0	Fix Issue #6650 : Enable Azure Active Directory token-based auth access for AzureChatOpenAI (#8622 ) When using AzureChatOpenAI the openai_api_type defaults to "azure". The utils' get_from_dict_or_env() function triggered by the root validator does not look for user provided values from environment variables OPENAI_API_TYPE, so other values like "azure_ad" are replaced with "azure". This does not allow the use of token-based auth. By removing the "default" value, this allows environment variables to be pulled at runtime for the openai_api_type and thus enables the other api_types which are expected to work. This fixes #6650 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-03 20:21:41 -07:00
Ofer Mendelevitch	29f51055e8	Updates to Vectara documentation (#8699 ) - Description: updates to Vectara documentation with more details on how to get started. - Issue: NA - Dependencies: NA - Tag maintainer: @rlancemartin, @eyurtsev - Twitter handle: @vectara, @ofermend --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-03 20:21:17 -07:00
Alec Flett	5d765408ce	propagate callbacks through load_summarize_chain (#7565 ) This lets you pass callbacks when you create the summarize chain: ``` summarize = load_summarize_chain(llm, chain_type="map_reduce", callbacks=[my_callbacks]) summary = summarize(documents) ``` See #5572 for a similar surgical fix. tagging @hwchase17 for callbacks work <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md -->	2023-08-03 20:12:34 -07:00
Alec Flett	404d103c41	propagate RetrievalQA chain callbacks through its own LLMChain and StuffDocumentsChain (#7853 ) This is another case, similar to #5572 and #7565 where the callbacks are getting dropped during construction of the chains. tagging @hwchase17 and @agola11 for callbacks propagation <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md -->	2023-08-03 20:11:58 -07:00
Bal Narendra Sapa	47eea32f6a	add serializer methods (#7914 ) Description: I have added two methods serializer and deserializer methods. There was method called save local but it saves the to the local disk. I wanted the vectorstore in the format using which i can push it to the sql database's blob field. I have used this while i was working on something @rlancemartin, @eyurtsev --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-08-03 20:10:35 -07:00
Ryan Sloan	b786335dd1	fix RecursiveUrlLoader (#8582 ) Description: the recursive url loader does not fully crawl for all urls under base url Maintainer: @baskaryan	2023-08-03 16:51:57 -07:00
William FH	f81e613086	Fix Async Retry Event Handling (#8659 ) It fails currently because the event loop is already running. The `retry` decorator alraedy infers an `AsyncRetrying` handler for coroutines (see [tenacity line](`aa6f8f0a24/tenacity/__init__.py (L535)`)) However before_sleep always gets called synchronously (see [tenacity line](`aa6f8f0a24/tenacity/__init__.py (L338)`)). Instead, check for a running loop and use that it exists. Of course, it's running an async method synchronously which is not _nice_. Given how important LLMs are, it may make sense to have a task list or something but I'd want to chat with @nfcampos on where that would live. This PR also fixes the unit tests to check the handler is called and to make sure the async test is run (it looks like it's just been being skipped). It would have failed prior to the proposed fixes but passes now.	2023-08-03 15:02:16 -07:00
ruze	8ef7e14a85	RSS Feed / OPML loader (#8694 ) Replace this comment with: - Description: added a document loader for a list of RSS feeds or OPML. It iterates through the list and uses NewsURLLoader to load each article. - Issue: N/A - Dependencies: feedparser, listparser - Tag maintainer: @rlancemartin, @eyurtsev - Twitter handle: @ruze --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-03 14:58:06 -07:00
sumandeng	53e4148a1b	add model_revison parameter to ModelScopeEmbeddings (#8669 ) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-03 14:17:48 -07:00
Yoshi	4e8f11b36a	Deterministic Fake Embedding Model (#8706 ) Solves #8644 This embedding models output identical random embedding vectors, given the input texts are identical. Useful when used in unittest. @baskaryan	2023-08-03 13:36:45 -07:00
Leonid Kuligin	2928a1a3c9	added minimum expected version of SDK to the error description (#8712 ) #7932 Co-authored-by: Leonid Kuligin <kuligin@google.com>	2023-08-03 13:28:42 -07:00
Harrison Chase	814faa9de5	relax deps for yaml (#8713 ) context: https://github.com/yaml/pyyaml/issues/724 I think this is fine? I don't think we use yaml too heavily	2023-08-03 13:22:17 -07:00
Holt Skinner	8a8917e0d9	feat: Add Spell Correction Spec to Google Cloud Enterprise Search connector (#8705 )	2023-08-03 13:38:45 -04:00
Bagatur	b2b71b0d35	Bagatur/eden llm (#8670 ) Co-authored-by: RedhaWassim <rwasssim@gmail.com> Co-authored-by: KyrianC <ckyrian@protonmail.com> Co-authored-by: sam <melaine.samy@gmail.com>	2023-08-03 10:24:51 -07:00
William FH	8022293124	lint (#8702 )	2023-08-03 09:33:28 -07:00
axa99	1f54ec899b	updated interface jupyter notebook explanations (#8689 ) Updated the documentation in the interface.ipynb to clearly show the _input_ and _output_ types for various components @baskaryan	2023-08-03 11:53:31 -04:00
William FH	a137492b53	Permit none key in chain mapper (#8696 )	2023-08-03 08:50:36 -07:00
Bagatur	e283dc8d50	bump 251 (#8690 )	2023-08-03 06:28:36 -07:00
Eugene Yurtsev	81e0cbf2d5	Minor typo fix (#8657 ) Fix typo in doc-string.	2023-08-02 23:20:25 -07:00
Lance Martin	37aade19da	Minor formatting and additional figure for summarization use case (#8663 )	2023-08-02 21:52:29 -07:00
Harrison Chase	43dffe39fb	Harrison/conversational retrieval agent (#8639 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-02 18:05:15 -07:00
ruze	71f98db2fe	Newspaper (#8647 ) - Description: Added newspaper3k based news article loader. Provide a list of urls. - Issue: N/A - Dependencies: newspaper3k, - Tag maintainer: @rlancemartin , @eyurtsev - Twitter handle: @ruze --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-02 17:56:08 -07:00
shibuiwilliam	f68f3b23d7	add missing RemoteLangChainRetriever _get_relevant_documents test (#8628 ) # What - Add missing RemoteLangChainRetriever _get_relevant_documents test --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-02 17:20:40 -07:00
William FH	206901fa01	Use salt instead of datetime (#8653 ) If you want to kick off two runs at the same time it'll cause errors. Use a uuid instead	2023-08-02 17:15:50 -07:00
William FH	7ea2b08d1f	Use call directly for chain (#8655 ) for run_on_dataset since the `run()` method requires a single output	2023-08-02 17:11:39 -07:00
William FH	368aa4ede7	fix enum error message (#8652 ) could be a string so don't directly call value	2023-08-02 17:11:27 -07:00
millerick	5018af8839	docs: fix some grammar (#8654 ) ### Description Fixes a grammar issue I noticed when reading through the documentation. ### Maintainers @baskaryan Co-authored-by: mmillerick <mmillerick@blend.com>	2023-08-02 16:48:01 -07:00
Erick Friis	96b0ff182e	Enterprise support form wording (#8641 )	2023-08-02 15:18:20 -07:00
Lance Martin	59194c2214	Add summarization use-case (#8376 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-02 14:25:11 -07:00
Will Thompson	ee1d13678e	🐛 Docs Fixes [2 one-liners, examples broken] (#8519 ) ## Description: 1)Map reduce example in docs is missing an important import statement. Figured other people would benefit from being able to copy 🍝 the code. 2)RefineDocumentsChain example also broken. ## Issue: None ## Dependencies: None. One liner. ## Tag maintainer: @baskaryan ## Twitter handle: I mean, it's a one line fix lol. But @will_thompson_k is my twitter handle.	2023-08-02 13:39:41 -07:00
Leonid Ganeline	1335f2b9f8	`MLflow` examples (#8642 ) Updated `MLflow` examples with links to the examples from MLflow @baskaryan	2023-08-02 13:30:28 -07:00
Kacper Łukawski	16551536e3	Refactor Qdrant integration (#8634 ) This small PR introduces new parameters into Qdrant (`on_disk`), fixes some tests and changes the error message to be more clear. Tagging: @baskaryan, @rlancemartin, @eyurtsev	2023-08-02 10:30:18 -07:00
Erick Friis	c5fb3b6069	Enterprise support form in airtable (#8607 )	2023-08-02 09:49:59 -07:00
Eugene Yurtsev	1ec0b18379	Re-add __add__ functionality for messages (revert #8245 ) (#8489 ) This PR reverts #8245, so `__add__` is defined on base messages. Resolves issue: https://github.com/langchain-ai/langchain/issues/8472	2023-08-02 10:51:44 -04:00
Bagatur	f31047a394	bump 250 (#8632 )	2023-08-02 07:47:36 -07:00
Comendeiro	5c516945d0	Add local support for audio models (PR #7329 ) (#7591 ) - Description: run the poetry dependencies - Issue: #7329 - Dependencies: any dependencies required for this change, - Tag maintainer: @rlancemartin --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-02 01:24:53 -07:00
Naveen Tatikonda	d2adec3818	[Opensearch] : Fix the service validation in http_auth (#8609 ) ### Description OpenSearch supports validation using both Master Credentials (Username and password) and IAM. For Master Credentials users will not pass the argument `service` in `http_auth` and the existing code will break. To fix this, I have updated the condition to check if service attribute is present in http_auth before accessing it. ### Maintainers @baskaryan @navneet1v Signed-off-by: Naveen Tatikonda <navtat@amazon.com>	2023-08-02 01:16:38 -07:00
Harrison Chase	7c5c0557cb	cast to string when measuring token length (#8617 )	2023-08-02 00:12:59 -07:00
rjanardhan3	68113348cc	Fireworks integration (#8322 ) Description - Integrates Fireworks within Langchain LLMs to allow users to use Fireworks models with Langchain, mainly for summarization. Issue - Not applicable Dependencies - None Tag maintainer - @rlancemartin --------- Co-authored-by: Raj Janardhan <rajjanardhan@Rajs-Laptop.attlocal.net>	2023-08-01 21:17:26 -07:00
Bagatur	b574507c51	normalized openai embeddings embed_query (#8604 ) we weren't normalizing when embedding queries	2023-08-01 17:12:10 -07:00

1 2 3 4 5 ...

3547 Commits