We may want to process load all URLs under a root directory.
For example, let's look at the [LangChain JS
documentation](https://js.langchain.com/docs/).
This has many interesting child pages that we may want to read in bulk.
Of course, the `WebBaseLoader` can load a list of pages.
But, the challenge is traversing the tree of child pages and actually
assembling that list!
We do this using the `RecusiveUrlLoader`.
This also gives us the flexibility to exclude some children (e.g., the
`api` directory with > 800 child pages).
Many cities have open data portals for events like crime, traffic, etc.
Socrata provides an API for many, including SF (e.g., see
[here](https://dev.socrata.com/foundry/data.sfgov.org/tmnf-yvry)).
This is a new data loader for city data that uses Socrata API.
# Changes
This PR adds [Clarifai](https://www.clarifai.com/) integration to
Langchain. Clarifai is an end-to-end AI Platform. Clarifai offers user
the ability to use many types of LLM (OpenAI, cohere, ect and other open
source models). As well, a clarifai app can be treated as a vector
database to upload and retrieve data. The integrations includes:
- Clarifai LLM integration: Clarifai supports many types of language
model that users can utilize for their application
- Clarifai VectorDB: A Clarifai application can hold data and
embeddings. You can run semantic search with the embeddings
#### Before submitting
- [x] Added integration test for LLM
- [x] Added integration test for VectorDB
- [x] Added notebook for LLM
- [x] Added notebook for VectorDB
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
### Description
We have added a new LLM integration `azureml_endpoint` that allows users
to leverage models from the AzureML platform. Microsoft recently
announced the release of [Azure Foundation
Models](https://learn.microsoft.com/en-us/azure/machine-learning/concept-foundation-models?view=azureml-api-2)
which users can find in the AzureML Model Catalog. The Model Catalog
contains a variety of open source and Hugging Face models that users can
deploy on AzureML. The `azureml_endpoint` allows LangChain users to use
the deployed Azure Foundation Models.
### Dependencies
No added dependencies were required for the change.
### Tests
Integration tests were added in
`tests/integration_tests/llms/test_azureml_endpoint.py`.
### Notebook
A Jupyter notebook demonstrating how to use `azureml_endpoint` was added
to `docs/modules/llms/integrations/azureml_endpoint_example.ipynb`.
### Twitters
[Prakhar Gupta](https://twitter.com/prakhar_in)
[Matthew DeGuzman](https://twitter.com/matthew_d13)
---------
Co-authored-by: Matthew DeGuzman <91019033+matthewdeguzman@users.noreply.github.com>
Co-authored-by: prakharg-msft <75808410+prakharg-msft@users.noreply.github.com>
Everything needed to support sending messages over WhatsApp Business
Platform (GA), Facebook Messenger (Public Beta) and Google Business
Messages (Private Beta) was present. Just added some details on
leveraging it.
Just some grammar fixes: I found "retriver" instead of "retriever" in
several comments across the documentation and in the comments. I fixed
it.
Co-authored-by: andrey.vedishchev <andrey.vedishchev@rgigroup.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.
Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.
After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->
<!-- Remove if not applicable -->
Fixes # (issue)
#### Before submitting
<!-- If you're adding a new integration, please include:
1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use
See contribution guidelines for more information on how to write tests,
lint
etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
Here are some examples to use StarRocks as vectordb
```
from langchain.vectorstores import StarRocks
from langchain.vectorstores.starrocks import StarRocksSettings
embeddings = OpenAIEmbeddings()
# conifgure starrocks settings
settings = StarRocksSettings()
settings.port = 41003
settings.host = '127.0.0.1'
settings.username = 'root'
settings.password = ''
settings.database = 'zya'
# to fill new embeddings
docsearch = StarRocks.from_documents(split_docs, embeddings, config = settings)
# or to use already-built embeddings in database.
docsearch = StarRocks(embeddings, settings)
```
#### Who can review?
Tag maintainers/contributors who might be interested:
@dev2049
<!-- For a quicker response, figure out the right person to tag with @
@hwchase17 - project lead
Tracing / Callbacks
- @agola11
Async
- @agola11
DataLoaders
- @eyurtsev
Models
- @hwchase17
- @agola11
Agents / Tools / Toolkits
- @hwchase17
VectorStores / Retrievers / Memory
- @dev2049
-->
---------
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
### Integration of Infino with LangChain for Enhanced Observability
This PR aims to integrate [Infino](https://github.com/infinohq/infino),
an open source observability platform written in rust for storing
metrics and logs at scale, with LangChain, providing users with a
streamlined and efficient method of tracking and recording LangChain
experiments. By incorporating Infino into LangChain, users will be able
to gain valuable insights and easily analyze the behavior of their
language models.
#### Please refer to the following files related to integration:
- `InfinoCallbackHandler`: A [callback
handler](https://github.com/naman-modi/langchain/blob/feature/infino-integration/langchain/callbacks/infino_callback.py)
specifically designed for storing chain responses within Infino.
- Example `infino.ipynb` file: A comprehensive notebook named
[infino.ipynb](https://github.com/naman-modi/langchain/blob/feature/infino-integration/docs/extras/modules/callbacks/integrations/infino.ipynb)
has been included to guide users on effectively leveraging Infino for
tracking LangChain requests.
- [Integration
Doc](https://github.com/naman-modi/langchain/blob/feature/infino-integration/docs/extras/ecosystem/integrations/infino.mdx)
for Infino integration.
By integrating Infino, LangChain users will gain access to powerful
visualization and debugging capabilities. Infino enables easy tracking
of inputs, outputs, token usage, execution time of LLMs. This
comprehensive observability ensures a deeper understanding of individual
executions and facilitates effective debugging.
Co-authors: @vinaykakade @savannahar68
---------
Co-authored-by: Vinay Kakade <vinaykakade@gmail.com>
This PR adds Rockset as a vectorstore for langchain.
[Rockset](https://rockset.com/blog/introducing-vector-search-on-rockset/)
is a real time OLAP database which provides a fast and efficient vector
search functionality. Further since it is entirely schemaless, it can
store metadata in separate columns thereby allowing fast metadata
filters during vector similarity search (as opposed to storing the
entire metadata in a single JSON column). It currently supports three
distance functions: `COSINE_SIMILARITY`, `EUCLIDEAN_DISTANCE`, and
`DOT_PRODUCT`.
This PR adds `rockset` client as an optional dependency.
We would love a twitter shoutout, our handle is
https://twitter.com/RocksetCloud
---------
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
This pull request introduces a new feature to the LangChain QA Retrieval
Chains with Structures. The change involves adding a prompt template as
an optional parameter for the RetrievalQA chains that utilize the
recently implemented OpenAI Functions.
The main purpose of this enhancement is to provide users with the
ability to input a more customizable prompt to the chain. By introducing
a prompt template as an optional parameter, users can tailor the prompt
to their specific needs and context, thereby improving the flexibility
and effectiveness of the RetrievalQA chains.
## Changes Made
- Created a new optional parameter, "prompt", for the RetrievalQA with
structure chains.
- Added an example to the RetrievalQA with sources notebook.
My twitter handle is @El_Rey_Zero
---------
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
Added the functionality to leverage 3 new Codey models from Vertex AI:
- code-bison - Code generation using the existing LLM integration
- code-gecko - Code completion using the existing LLM integration
- codechat-bison - Code chat using the existing chat_model integration
---------
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
This PR adds `KuzuGraph` and `KuzuQAChain` for interacting with [Kùzu
database](https://github.com/kuzudb/kuzu). Kùzu is an in-process
property graph database management system (GDBMS) built for query speed
and scalability. The `KuzuGraph` and `KuzuQAChain` provide the same
functionality as the existing integration with NebulaGraph and Neo4j and
enables query generation and question answering over Kùzu database.
A notebook example and a simple test case have also been added.
---------
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
This addresses #6291 adding support for using Cassandra (and compatible
databases, such as DataStax Astra DB) as a [Vector
Store](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor(ANN)+Vector+Search+via+Storage-Attached+Indexes).
A new class `Cassandra` is introduced, which complies with the contract
and interface for a vector store, along with the corresponding
integration test, a sample notebook and modified dependency toml.
Dependencies: the implementation relies on the library `cassio`, which
simplifies interacting with Cassandra for ML- and LLM-oriented
workloads. CassIO, in turn, uses the `cassandra-driver` low-lever
drivers to communicate with the database. The former is added as
optional dependency (+ in `extended_testing`), the latter was already in
the project.
Integration testing relies on a locally-running instance of Cassandra.
[Here](https://cassio.org/more_info/#use-a-local-vector-capable-cassandra)
a detailed description can be found on how to compile and run it (at the
time of writing the feature has not made it yet to a release).
During development of the integration tests, I added a new "fake
embedding" class for what I consider a more controlled way of testing
the MMR search method. Likewise, I had to amend what looked like a
glitch in the behaviour of `ConsistentFakeEmbeddings` whereby an
`embed_query` call would have bypassed storage of the requested text in
the class cache for use in later repeated invocations.
@dev2049 might be the right person to tag here for a review. Thank you!
---------
Co-authored-by: rlm <pexpresss31@gmail.com>
Hello Folks,
Thanks for creating and maintaining this great project. I'm excited to
submit this PR to add Alibaba Cloud OpenSearch as a new vector store.
OpenSearch is a one-stop platform to develop intelligent search
services. OpenSearch was built based on the large-scale distributed
search engine developed by Alibaba. OpenSearch serves more than 500
business cases in Alibaba Group and thousands of Alibaba Cloud
customers. OpenSearch helps develop search services in different search
scenarios, including e-commerce, O2O, multimedia, the content industry,
communities and forums, and big data query in enterprises.
OpenSearch provides the vector search feature. In specific scenarios,
especially test question search and image search scenarios, you can use
the vector search feature together with the multimodal search feature to
improve the accuracy of search results.
This PR includes:
A AlibabaCloudOpenSearch class that can connect to the Alibaba Cloud
OpenSearch instance.
add embedings and metadata into a opensearch datasource.
querying by squared euclidean and metadata.
integration tests.
ipython notebook and docs.
I have read your contributing guidelines. And I have passed the tests
below
- [x] make format
- [x] make lint
- [x] make coverage
- [x] make test
---------
Co-authored-by: zhaoshengbo <shengbo.zsb@alibaba-inc.com>
1. Introduced new distance strategies support: **DOT_PRODUCT** and
**EUCLIDEAN_DISTANCE** for enhanced flexibility.
2. Implemented a feature to filter results based on metadata fields.
3. Incorporated connection attributes specifying "langchain python sdk"
usage for enhanced traceability and debugging.
4. Expanded the suite of integration tests for improved code
reliability.
5. Updated the existing notebook with the usage example
@dev2049
---------
Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.
Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.
After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->
<!-- Remove if not applicable -->
Fixes a link typo from `/-/route` to `/-/routes`.
and change endpoint format
from `f"{self.anyscale_service_url}/{self.anyscale_service_route}"` to
`f"{self.anyscale_service_url}{self.anyscale_service_route}"`
Also adding documentation about the format of the endpoint
#### Before submitting
<!-- If you're adding a new integration, please include:
1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use
See contribution guidelines for more information on how to write tests,
lint
etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
#### Who can review?
Tag maintainers/contributors who might be interested:
<!-- For a quicker response, figure out the right person to tag with @
@hwchase17 - project lead
Tracing / Callbacks
- @agola11
Async
- @agola11
DataLoaders
- @eyurtsev
Models
- @hwchase17
- @agola11
Agents / Tools / Toolkits
- @hwchase17
VectorStores / Retrievers / Memory
- @dev2049
-->
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Fixed several inconsistencies:
- file names and notebook titles should be similar otherwise ToC on the
[retrievers
page](https://python.langchain.com/en/latest/modules/indexes/retrievers.html)
and on the left ToC tab are different. For example, now, `Self-querying
with Chroma` is not correctly alphabetically sorted because its file
named `chroma_self_query.ipynb`
- `Stringing compressors and document transformers...` demoted from `#`
to `##`. Otherwise, it appears in Toc.
- several formatting problems
#### Who can review?
@hwchase17
@dev2049
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Minor new line character in the markdown.
Also, this option is not yet in the latest version of LangChain
(0.0.190) from Conda. Maybe in the next update.
@eyurtsev
@hwchase17
This PR adds an example of doing question answering over documents using
OpenAI Function Agents.
#### Who can review?
@hwchase17
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- return raw and full output (but keep run shortcut method functional)
- change output parser to take in generations (good for working with
messages)
- add output parser to base class, always run (default to same as
current)
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
#### Before submitting
Add memory support for `OpenAIFunctionsAgent` like
`StructuredChatAgent`.
#### Who can review?
@hwchase17
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
To bypass SSL verification errors during fetching, you can include the
`verify=False` parameter. This markdown proves useful, especially for
beginners in the field of web scraping.
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.
Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.
After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->
Fixes#6079
#### Who can review?
Tag maintainers/contributors who might be interested:
@hwchase17
@eyurtsev
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
To bypass SSL verification errors during web scraping, you can include
the ssl_verify=False parameter along with the headers parameter. This
combination of arguments proves useful, especially for beginners in the
field of web scraping.
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.
Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.
After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->
Fixes#1829
#### Before submitting
<!-- If you're adding a new integration, please include:
1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use
See contribution guidelines for more information on how to write tests,
lint
etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
#### Who can review?
Tag maintainers/contributors who might be interested:
@hwchase17 @eyurtsev
-->
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
## DocArray as a Retriever
[DocArray](https://github.com/docarray/docarray) is an open-source tool
for managing your multi-modal data. It offers flexibility to store and
search through your data using various document index backends. This PR
introduces `DocArrayRetriever` - which works with any available backend
and serves as a retriever for Langchain apps.
Also, I added 2 notebooks:
DocArray Backends - intro to all 5 currently supported backends, how to
initialize, index, and use them as a retriever
DocArray Usage - showcasing what additional search parameters you can
pass to create versatile retrievers
Example:
```python
from docarray.index import InMemoryExactNNIndex
from docarray import BaseDoc, DocList
from docarray.typing import NdArray
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.retrievers import DocArrayRetriever
# define document schema
class MyDoc(BaseDoc):
description: str
description_embedding: NdArray[1536]
embeddings = OpenAIEmbeddings()
# create documents
descriptions = ["description 1", "description 2"]
desc_embeddings = embeddings.embed_documents(texts=descriptions)
docs = DocList[MyDoc](
[
MyDoc(description=desc, description_embedding=embedding)
for desc, embedding in zip(descriptions, desc_embeddings)
]
)
# initialize document index with data
db = InMemoryExactNNIndex[MyDoc](docs)
# create a retriever
retriever = DocArrayRetriever(
index=db,
embeddings=embeddings,
search_field="description_embedding",
content_field="description",
)
# find the relevant document
doc = retriever.get_relevant_documents("action movies")
print(doc)
```
#### Who can review?
@dev2049
---------
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
1. Changed the implementation of add_texts interface for the AwaDB
vector store in order to improve the performance
2. Upgrade the AwaDB from 0.3.2 to 0.3.3
---------
Co-authored-by: vincent <awadb.vincent@gmail.com>