Hello langchain maintainers,
this PR aims at integrating
[vllm](https://vllm.readthedocs.io/en/latest/#) into langchain. This PR
closes#8729.
This feature clearly depends on `vllm`, but I've seen other models
supported here depend on packages that are not included in the
pyproject.toml (e.g. `gpt4all`, `text-generation`) so I thought it was
the case for this as well.
@hwchase17, @baskaryan
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- Updated to use newer better function interaction
- Previous version had only one callback
- @hinthornw @hwchase17 Can you look into this
- Shout out to @MultiON_AI @DivGarg9 on twitter
---------
Co-authored-by: Naman Garg <ngarg3@binghamton.edu>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Description: This PR improves the function of recursive_url_loader, such
as limiting the depth of the access, and customizable extractors(from
the raw webpage to the text of the Document object), so that users can
use other tools to extract the webpage. This PR also includes the
document and test for the new loader.
Old PR closed due to project structure change. #7756
Because socket requests are not allowed, the old unit test was removed.
Issue: N/A
Dependencies: asyncio, aiohttp
Tag maintainer: @rlancemartin
Twitter handle: @ Zend_Nihility
---------
Co-authored-by: Lance Martin <lance@langchain.dev>
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: docstore had two main method: add and search, however,
dealing with docstore sometimes requires deleting an entry from
docstore. So I have added a simple delete method that deletes items from
docstore. Additionally, I have added the delete method to faiss
vectorstore for the very same reason.
- Issue: NA
- Dependencies: NA
- Tag maintainer: @rlancemartin, @eyurtsev
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
begining -> beginning
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
- Description: 2 links were not working on Question Answering Use Cases
documentation page. Hence, changed them to nearest useful links,
- Issue: NA,
- Dependencies: NA,
- Tag maintainer: @baskaryan,
- Twitter handle: NA
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
Refactor for the extraction use case documentation
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Lance Martin <lance@langchain.dev>
Description: Add ScaNN vectorstore to langchain.
ScaNN is a Open Source, high performance vector similarity library
optimized for AVX2-enabled CPUs.
https://github.com/google-research/google-research/tree/master/scann
- Dependencies: scann
Python notebook to illustrate the usage:
docs/extras/integrations/vectorstores/scann.ipynb
Integration test:
libs/langchain/tests/integration_tests/vectorstores/test_scann.py
@rlancemartin, @eyurtsev for review.
Thanks!
# What
- This is to add filter option to sklearn vectore store functions
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: Add filter to sklearn vectore store functions.
- Issue: None
- Dependencies: None
- Tag maintainer: @rlancemartin, @eyurtsev
- Twitter handle: @MlopsJ
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
This is to add save_local and load_local to tfidf_vectorizer and docs in
tfidf_retriever to make the vectorizer reusable.
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: add save_local and load_local to tfidf_vectorizer and
docs in tfidf_retriever
- Issue: None
- Dependencies: None
- Tag maintainer: @rlancemartin, @eyurtsev
- Twitter handle: @MlopsJ
Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Simple retriever that applies an LLM between the user input and the
query pass the to retriever.
It can be used to pre-process the user input in any way.
The default prompt:
```
DEFAULT_QUERY_PROMPT = PromptTemplate(
input_variables=["question"],
template="""You are an assistant tasked with taking a natural languge query from a user
and converting it into a query for a vectorstore. In this process, you strip out
information that is not relevant for the retrieval task. Here is the user query: {question} """
)
```
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- Description: updates to Vectara documentation with more details on how
to get started.
- Issue: NA
- Dependencies: NA
- Tag maintainer: @rlancemartin, @eyurtsev
- Twitter handle: @vectara, @ofermend
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Replace this comment with:
- Description: added a document loader for a list of RSS feeds or OPML.
It iterates through the list and uses NewsURLLoader to load each
article.
- Issue: N/A
- Dependencies: feedparser, listparser
- Tag maintainer: @rlancemartin, @eyurtsev
- Twitter handle: @ruze
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- Description: run the poetry dependencies
- Issue: #7329
- Dependencies: any dependencies required for this change,
- Tag maintainer: @rlancemartin
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Description - Integrates Fireworks within Langchain LLMs to allow users
to use Fireworks models with Langchain, mainly for summarization.
Issue - Not applicable
Dependencies - None
Tag maintainer - @rlancemartin
---------
Co-authored-by: Raj Janardhan <rajjanardhan@Rajs-Laptop.attlocal.net>
Add a StreamlitChatMessageHistory class that stores chat messages in
[Streamlit's Session
State](https://docs.streamlit.io/library/api-reference/session-state).
Note: The integration test uses a currently-experimental Streamlit
testing framework to simulate the execution of a Streamlit app. Marking
this PR as draft until I confirm with the Streamlit team that we're
comfortable supporting it.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
### Summary
Updates the `unstructured` install instructions. For
`unstructured>=0.9.0`, dependencies are broken out by document type and
the base `unstructured` package includes fewer dependencies. `pip
install "unstructured[local-inference]"` has been replace by `pip
install "unstructured[all-docs]"`, though the `local-inference` extra is
still supported for the time being.
### Reviewers
- @rlancemartin
- @eyurtsev
- @hwchase17
## Description
This PR implements a callback handler for SageMaker Experiments which is
similar to that of mlflow.
* When creating the callback handler, it takes the experiment's run
object as an argument. All the callback outputs are then logged to the
run object.
* The output of each callback action (e.g., `on_llm_start`) is saved to
S3 bucket as json file.
* Optionally, you can also log additional information such as the LLM
hyper-parameters to the same run object.
* Once the callback object is no more needed, you will need to call the
`flush_tracker()` method. This makes sure that any intermediate files
are deleted.
* A separate notebook example is provided to show how the callback is
used.
@3coins @agola11
---------
Co-authored-by: Tesfagabir Meharizghi <mehariz@amazon.com>
Description:
This PR adds support for loading documents from Huawei OBS (Object
Storage Service) in Langchain. OBS is a cloud-based object storage
service provided by Huawei Cloud. With this enhancement, Langchain users
can now easily access and load documents stored in Huawei OBS directly
into the system.
Key Changes:
- Added a new document loader module specifically for Huawei OBS
integration.
- Implemented the necessary logic to authenticate and connect to Huawei
OBS using access credentials.
- Enabled the loading of individual documents from a specified bucket
and object key in Huawei OBS.
- Provided the option to specify custom authentication information or
obtain security tokens from Huawei Cloud ECS for easy access.
How to Test:
1. Ensure the required package "esdk-obs-python" is installed.
2. Configure the endpoint, access key, secret key, and bucket details
for Huawei OBS in the Langchain settings.
3. Load documents from Huawei OBS using the updated document loader
module.
4. Verify that documents are successfully retrieved and loaded into
Langchain for further processing.
Please review this PR and let us know if any further improvements are
needed. Your feedback is highly appreciated!
@rlancemartin, @eyurtsev
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- Description: updated BabyAGI examples to append the iteration to the
result id to fix error storing data to vectorstore.
- Issue: 7445
- Dependencies: no
- Tag maintainer: @eyurtsev
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
This fix worked for me locally. Happy to take some feedback and iterate
on a better solution. I was considering appending a uuid instead but
didnt want to over complicate the example.
Works just like the GenericLoader but concurrently for those who choose
to optimize their workflow.
@rlancemartin @eyurtsev
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>