langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-08 07:10:35 +00:00

Author	SHA1	Message	Date
Harrison Chase	0adc282d70	Harrison/as retriever docstring (#8840 ) Co-authored-by: Bytestorm <31070777+Bytestorm5@users.noreply.github.com>	2023-08-06 17:00:57 -07:00
Zend	bd4865b6fe	Async Recursive URL loader (#8502 ) Description: This PR improves the function of recursive_url_loader, such as limiting the depth of the access, and customizable extractors(from the raw webpage to the text of the Document object), so that users can use other tools to extract the webpage. This PR also includes the document and test for the new loader. Old PR closed due to project structure change. #7756 Because socket requests are not allowed, the old unit test was removed. Issue: N/A Dependencies: asyncio, aiohttp Tag maintainer: @rlancemartin Twitter handle: @ Zend_Nihility --------- Co-authored-by: Lance Martin <lance@langchain.dev>	2023-08-06 16:22:31 -07:00
fqassemi	485d716c21	Feature faiss delete (#8135 ) <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: docstore had two main method: add and search, however, dealing with docstore sometimes requires deleting an entry from docstore. So I have added a simple delete method that deletes items from docstore. Additionally, I have added the delete method to faiss vectorstore for the very same reason. - Issue: NA - Dependencies: NA - Tag maintainer: @rlancemartin, @eyurtsev - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-08-06 15:46:30 -07:00
Nicolas	b57fa1a39c	docs: Improvements on Mendable Search (#8808 ) - Balancing prioritization between keyword / AI search - Show snippets of highlighted keywords when searching - Improved keyword search - Fixed bugs and issues Shoutout to @calebpeffer for implementing and gathering feedback on it cc: @dev2049 @rlancemartin @hwchase17	2023-08-06 15:32:06 -07:00
Ikko Eltociear Ashimine	6b93670410	Fix typo in long_context_reorder.ipynb (#8811 ) begining -> beginning <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md -->	2023-08-06 15:31:38 -07:00
Harrison Chase	2bb1d256f3	add example of memory and returning retrieved docs (#8830 )	2023-08-06 15:25:12 -07:00
Kshitij Wadhwa	5f1aab5487	Fix docs for Rockset (#8807 ) * remove error output for notebook * add comment about vector length for ingest transformation * change OPENAI_KEY -> OPENAI_API_KEY cc @baskaryan	2023-08-06 15:04:01 -07:00
Bagatur	d7b613a293	Bagatur/revert revert nuclia (#8833 )	2023-08-06 11:24:36 -07:00
Bagatur	2f309a4ce6	Revert "Bagatur/nuclia (#8404 )" (#8832 )	2023-08-06 11:14:01 -07:00
Snehil Kumar	1bd4890506	Update links on QA Use Case docs (#8784 ) - Description: 2 links were not working on Question Answering Use Cases documentation page. Hence, changed them to nearest useful links, - Issue: NA, - Dependencies: NA, - Tag maintainer: @baskaryan, - Twitter handle: NA <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md -->	2023-08-05 17:30:56 -07:00
Bal Narendra Sapa	a22d502248	added the embeddings part (#8805 ) Description: forgot to add the embeddings part in the documentation. sorry 😅 @baskaryan	2023-08-05 17:16:33 -07:00
Bagatur	9fc9018951	Bagatur/nuclia (#8404 ) Co-authored-by: Eric BREHAULT <ebrehault@gmail.com>	2023-08-05 10:44:43 -07:00
Francisco Ingham	ef5bc1fef1	Refactor for extraction docs (#8465 ) Refactor for the extraction use case documentation --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Lance Martin <lance@langchain.dev>	2023-08-05 10:09:14 -07:00
Bagatur	21771a6f1c	rm sklearn links (#8773 )	2023-08-04 14:28:00 -07:00
Joshua Carroll	e5fed7d535	Extend the StreamlitChatMessageHistory docs with a fuller example and… (#8774 ) Add more details to the [notebook for StreamlitChatMessageHistory](https://python.langchain.com/docs/integrations/memory/streamlit_chat_message_history), including a link to a [running example app](https://langchain-st-memory.streamlit.app/). Original PR: https://github.com/langchain-ai/langchain/pull/8497	2023-08-04 14:27:46 -07:00
Eugene Yurtsev	19dfe166c9	Update documentation for prompts (#8381 ) * Documentation to favor creation without declaring input_variables * Cut out obvious examples, but add more description in a few places --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2023-08-04 14:25:03 -07:00
Dayou Liu	91a0817e39	docs: llamacpp minor fixes (#8738 ) - Description: minor updates on llama cpp doc	2023-08-04 14:19:43 -07:00
Eugene Yurtsev	003e1ca9a0	Update api references (#8646 ) Update API reference documentation. This PR will pick up a number of missing classes, it also applies selective formatting based on the class / object type.	2023-08-04 16:10:58 -04:00
Snehil Kumar	a6ee646ef3	Update get_started.mdx (#8744 ) - Description: Added a missing word and rearranged a sentence in the documentation of Self Query Retrievers., - Issue: NA, - Dependencies: NA, - Tag maintainer: @baskaryan, - Twitter handle: NA Thanks for your time.	2023-08-04 15:32:19 -04:00
Bal Narendra Sapa	bd61757423	add documentation for serializer function (#8769 ) Description: Added necessary documentation for serializer functions @baskaryan	2023-08-04 14:39:40 -04:00
rjanardhan3	affaaea87b	Updates fireworks (#8765 ) <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: Updates to Fireworks Documentation, - Issue: N/A, - Dependencies: N/A, - Tag maintainer: @rlancemartin, --------- Co-authored-by: Raj Janardhan <rajjanardhan@Rajs-Laptop.attlocal.net>	2023-08-04 10:32:22 -07:00
Bagatur	8c35fcb571	update rss doc (#8761 )	2023-08-04 08:25:20 -07:00
Bagatur	0d5a90f30a	Revert "add filter to sklearn vector store functions (#8113 )" (#8760 )	2023-08-04 08:13:32 -07:00
Lance Martin	be638ad77d	Chatbots use case (#8554 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-04 07:02:14 -07:00
Ruiqi Guo	6aee589eec	Add ScaNN support in vectorstore. (#8251 ) Description: Add ScaNN vectorstore to langchain. ScaNN is a Open Source, high performance vector similarity library optimized for AVX2-enabled CPUs. https://github.com/google-research/google-research/tree/master/scann - Dependencies: scann Python notebook to illustrate the usage: docs/extras/integrations/vectorstores/scann.ipynb Integration test: libs/langchain/tests/integration_tests/vectorstores/test_scann.py @rlancemartin, @eyurtsev for review. Thanks!	2023-08-03 23:41:30 -07:00
shibuiwilliam	0f0ccfe7f6	add filter to sklearn vector store functions (#8113 ) # What - This is to add filter option to sklearn vectore store functions <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: Add filter to sklearn vectore store functions. - Issue: None - Dependencies: None - Tag maintainer: @rlancemartin, @eyurtsev - Twitter handle: @MlopsJ If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-08-03 23:06:41 -07:00
shibuiwilliam	2759e2d857	add save and load tfidf vectorizer and docs for TFIDFRetriever (#8112 ) This is to add save_local and load_local to tfidf_vectorizer and docs in tfidf_retriever to make the vectorizer reusable. <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: add save_local and load_local to tfidf_vectorizer and docs in tfidf_retriever - Issue: None - Dependencies: None - Tag maintainer: @rlancemartin, @eyurtsev - Twitter handle: @MlopsJ Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-08-03 23:06:27 -07:00
Lance Martin	d1b95db874	Retriever that can re-phase user inputs (#8026 ) Simple retriever that applies an LLM between the user input and the query pass the to retriever. It can be used to pre-process the user input in any way. The default prompt: ``` DEFAULT_QUERY_PROMPT = PromptTemplate( input_variables=["question"], template="""You are an assistant tasked with taking a natural languge query from a user and converting it into a query for a vectorstore. In this process, you strip out information that is not relevant for the retrieval task. Here is the user query: {question} """ ) ``` --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-08-03 21:23:59 -07:00
Harrison Chase	6c3573e7f6	Harrison/aleph alpha (#8735 ) Co-authored-by: PiotrMazurek <piotr.mazurek@aleph-alpha.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-03 21:21:15 -07:00
Ilya	6f0bccfeb5	Add regex control over separators in character text splitter (#7933 ) <!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #7854 Added the ability to use the `separator` ase a regex or a simple character. Fixed a bug where `start_index` was incorrectly counting from -1. Who can review? @eyurtsev @hwchase17 @mmz-001	2023-08-03 20:25:23 -07:00
Ofer Mendelevitch	29f51055e8	Updates to Vectara documentation (#8699 ) - Description: updates to Vectara documentation with more details on how to get started. - Issue: NA - Dependencies: NA - Tag maintainer: @rlancemartin, @eyurtsev - Twitter handle: @vectara, @ofermend --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-03 20:21:17 -07:00
ruze	8ef7e14a85	RSS Feed / OPML loader (#8694 ) Replace this comment with: - Description: added a document loader for a list of RSS feeds or OPML. It iterates through the list and uses NewsURLLoader to load each article. - Issue: N/A - Dependencies: feedparser, listparser - Tag maintainer: @rlancemartin, @eyurtsev - Twitter handle: @ruze --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-03 14:58:06 -07:00
Bagatur	b2b71b0d35	Bagatur/eden llm (#8670 ) Co-authored-by: RedhaWassim <rwasssim@gmail.com> Co-authored-by: KyrianC <ckyrian@protonmail.com> Co-authored-by: sam <melaine.samy@gmail.com>	2023-08-03 10:24:51 -07:00
axa99	1f54ec899b	updated interface jupyter notebook explanations (#8689 ) Updated the documentation in the interface.ipynb to clearly show the _input_ and _output_ types for various components @baskaryan	2023-08-03 11:53:31 -04:00
Lance Martin	37aade19da	Minor formatting and additional figure for summarization use case (#8663 )	2023-08-02 21:52:29 -07:00
Harrison Chase	43dffe39fb	Harrison/conversational retrieval agent (#8639 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-02 18:05:15 -07:00
ruze	71f98db2fe	Newspaper (#8647 ) - Description: Added newspaper3k based news article loader. Provide a list of urls. - Issue: N/A - Dependencies: newspaper3k, - Tag maintainer: @rlancemartin , @eyurtsev - Twitter handle: @ruze --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-02 17:56:08 -07:00
millerick	5018af8839	docs: fix some grammar (#8654 ) ### Description Fixes a grammar issue I noticed when reading through the documentation. ### Maintainers @baskaryan Co-authored-by: mmillerick <mmillerick@blend.com>	2023-08-02 16:48:01 -07:00
Lance Martin	59194c2214	Add summarization use-case (#8376 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-02 14:25:11 -07:00
Will Thompson	ee1d13678e	🐛 Docs Fixes [2 one-liners, examples broken] (#8519 ) ## Description: 1)Map reduce example in docs is missing an important import statement. Figured other people would benefit from being able to copy 🍝 the code. 2)RefineDocumentsChain example also broken. ## Issue: None ## Dependencies: None. One liner. ## Tag maintainer: @baskaryan ## Twitter handle: I mean, it's a one line fix lol. But @will_thompson_k is my twitter handle.	2023-08-02 13:39:41 -07:00
Leonid Ganeline	1335f2b9f8	`MLflow` examples (#8642 ) Updated `MLflow` examples with links to the examples from MLflow @baskaryan	2023-08-02 13:30:28 -07:00
Comendeiro	5c516945d0	Add local support for audio models (PR #7329 ) (#7591 ) - Description: run the poetry dependencies - Issue: #7329 - Dependencies: any dependencies required for this change, - Tag maintainer: @rlancemartin --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-02 01:24:53 -07:00
rjanardhan3	68113348cc	Fireworks integration (#8322 ) Description - Integrates Fireworks within Langchain LLMs to allow users to use Fireworks models with Langchain, mainly for summarization. Issue - Not applicable Dependencies - None Tag maintainer - @rlancemartin --------- Co-authored-by: Raj Janardhan <rajjanardhan@Rajs-Laptop.attlocal.net>	2023-08-01 21:17:26 -07:00
Joshua Carroll	6705928b9d	Add StreamlitChatMessageHistory (#8497 ) Add a StreamlitChatMessageHistory class that stores chat messages in [Streamlit's Session State](https://docs.streamlit.io/library/api-reference/session-state). Note: The integration test uses a currently-experimental Streamlit testing framework to simulate the execution of a Streamlit app. Marking this PR as draft until I confirm with the Streamlit team that we're comfortable supporting it. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-01 14:28:15 -07:00
Matt Robinson	8961c720b8	docs: update `unstructured` install instructions (#8596 ) ### Summary Updates the `unstructured` install instructions. For `unstructured>=0.9.0`, dependencies are broken out by document type and the base `unstructured` package includes fewer dependencies. `pip install "unstructured[local-inference]"` has been replace by `pip install "unstructured[all-docs]"`, though the `local-inference` extra is still supported for the time being. ### Reviewers - @rlancemartin - @eyurtsev - @hwchase17	2023-08-01 14:17:49 -07:00
Bagatur	73072d3db8	mv (#8595 )	2023-08-01 14:17:04 -07:00
Tesfagabir Meharizghi	a7000ee89e	Callback handler for Amazon SageMaker Experiments (#8587 ) ## Description This PR implements a callback handler for SageMaker Experiments which is similar to that of mlflow. * When creating the callback handler, it takes the experiment's run object as an argument. All the callback outputs are then logged to the run object. * The output of each callback action (e.g., `on_llm_start`) is saved to S3 bucket as json file. * Optionally, you can also log additional information such as the LLM hyper-parameters to the same run object. * Once the callback object is no more needed, you will need to call the `flush_tracker()` method. This makes sure that any intermediate files are deleted. * A separate notebook example is provided to show how the callback is used. @3coins @agola11 --------- Co-authored-by: Tesfagabir Meharizghi <mehariz@amazon.com>	2023-08-01 13:47:08 -07:00
mpb159753	7df2dfc4c2	Add Support for Loading Documents from Huawei OBS (#8573 ) Description: This PR adds support for loading documents from Huawei OBS (Object Storage Service) in Langchain. OBS is a cloud-based object storage service provided by Huawei Cloud. With this enhancement, Langchain users can now easily access and load documents stored in Huawei OBS directly into the system. Key Changes: - Added a new document loader module specifically for Huawei OBS integration. - Implemented the necessary logic to authenticate and connect to Huawei OBS using access credentials. - Enabled the loading of individual documents from a specified bucket and object key in Huawei OBS. - Provided the option to specify custom authentication information or obtain security tokens from Huawei Cloud ECS for easy access. How to Test: 1. Ensure the required package "esdk-obs-python" is installed. 2. Configure the endpoint, access key, secret key, and bucket details for Huawei OBS in the Langchain settings. 3. Load documents from Huawei OBS using the updated document loader module. 4. Verify that documents are successfully retrieved and loaded into Langchain for further processing. Please review this PR and let us know if any further improvements are needed. Your feedback is highly appreciated! @rlancemartin, @eyurtsev --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-01 09:30:30 -07:00
Harrison Chase	66226d1d4d	add example for memory (#8552 )	2023-08-01 01:10:19 -07:00
Shantanu Nair	53f3793504	Fast load conversationsummarymemory from existing summary (#7533 ) - Description: Adds an optional buffer arg to the memory's from_messages() method. If provided the existing memory will be loaded instead of regenerating a summary from the loaded messages. Why? If we have past messages to load from, it is likely we also have an existing summary. This is particularly helpful in cases where the chat is ephemeral and/or is backed by serverless where the chat history is not stored but where the updated chat history is passed back and forth between a backend/frontend. Eg: Take a stateless qa backend implementation that loads messages on every request and generates a response — without this addition, each time the messages are loaded via from_messages, the summaries are recomputed even though they may have just been computed during the previous response. With this, the previously computed summary can be passed in and avoid: 1) spending extra $$$ on tokens, and 2) increased response time by avoiding regenerating previously generated summary. Tag maintainer: @hwchase17 Twitter handle: https://twitter.com/ShantanuNair --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-07-31 18:14:11 -07:00

1 2 3 4 5 ...

1624 Commits