Commit Graph

126 Commits (15f650ae8cfb7d45bd9f50c086d2b8cfb0182a40)

Author SHA1 Message Date
minhajul-clarifai 6e57306a13
Clarifai integration (#5954)
# Changes
This PR adds [Clarifai](https://www.clarifai.com/) integration to
Langchain. Clarifai is an end-to-end AI Platform. Clarifai offers user
the ability to use many types of LLM (OpenAI, cohere, ect and other open
source models). As well, a clarifai app can be treated as a vector
database to upload and retrieve data. The integrations includes:
- Clarifai LLM integration: Clarifai supports many types of language
model that users can utilize for their application
- Clarifai VectorDB: A Clarifai application can hold data and
embeddings. You can run semantic search with the embeddings

#### Before submitting
- [x] Added integration test for LLM 
- [x] Added integration test for VectorDB 
- [x] Added notebook for LLM 
- [x] Added notebook for VectorDB 

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Andrey E. Vedishchev a2a0715bd4
Minor Grammar Fixes in Docs and Comments (#6536)
Just some grammar fixes: I found "retriver" instead of "retriever" in
several comments across the documentation and in the comments. I fixed
it.


Co-authored-by: andrey.vedishchev <andrey.vedishchev@rgigroup.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
dirtysalt 57cc3d1d3d
[Feature][VectorStore] Support StarRocks as vector db (#6119)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

Here are some examples to use StarRocks as vectordb

```
from langchain.vectorstores import StarRocks
from langchain.vectorstores.starrocks import StarRocksSettings

embeddings = OpenAIEmbeddings()

# conifgure starrocks settings
settings = StarRocksSettings()
settings.port = 41003
settings.host = '127.0.0.1'
settings.username = 'root'
settings.password = ''
settings.database = 'zya'

# to fill new embeddings
docsearch = StarRocks.from_documents(split_docs, embeddings, config = settings)   


# or to use already-built embeddings in database.
docsearch = StarRocks(embeddings, settings)
```

#### Who can review?

Tag maintainers/contributors who might be interested:

@dev2049 

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Anubhav Bindlish 94c7899257
Integrate Rockset as Vectorstore (#6216)
This PR adds Rockset as a vectorstore for langchain.
[Rockset](https://rockset.com/blog/introducing-vector-search-on-rockset/)
is a real time OLAP database which provides a fast and efficient vector
search functionality. Further since it is entirely schemaless, it can
store metadata in separate columns thereby allowing fast metadata
filters during vector similarity search (as opposed to storing the
entire metadata in a single JSON column). It currently supports three
distance functions: `COSINE_SIMILARITY`, `EUCLIDEAN_DISTANCE`, and
`DOT_PRODUCT`.

This PR adds `rockset` client as an optional dependency. 

We would love a twitter shoutout, our handle is
https://twitter.com/RocksetCloud

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Davis Chase 3298bf4f00
docs/fix links (#6498) 1 year ago
Lance Martin ae6196507d
Update notebook for MD header splitter and create new cookbook (#6399)
Move MD header text splitter example to its own cookbook.
1 year ago
Stefano Lottini 22af93d851
Vector store support for Cassandra (#6426)
This addresses #6291 adding support for using Cassandra (and compatible
databases, such as DataStax Astra DB) as a [Vector
Store](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor(ANN)+Vector+Search+via+Storage-Attached+Indexes).

A new class `Cassandra` is introduced, which complies with the contract
and interface for a vector store, along with the corresponding
integration test, a sample notebook and modified dependency toml.

Dependencies: the implementation relies on the library `cassio`, which
simplifies interacting with Cassandra for ML- and LLM-oriented
workloads. CassIO, in turn, uses the `cassandra-driver` low-lever
drivers to communicate with the database. The former is added as
optional dependency (+ in `extended_testing`), the latter was already in
the project.

Integration testing relies on a locally-running instance of Cassandra.
[Here](https://cassio.org/more_info/#use-a-local-vector-capable-cassandra)
a detailed description can be found on how to compile and run it (at the
time of writing the feature has not made it yet to a release).

During development of the integration tests, I added a new "fake
embedding" class for what I consider a more controlled way of testing
the MMR search method. Likewise, I had to amend what looked like a
glitch in the behaviour of `ConsistentFakeEmbeddings` whereby an
`embed_query` call would have bypassed storage of the requested text in
the class cache for use in later repeated invocations.

@dev2049 might be the right person to tag here for a review. Thank you!

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
1 year ago
zhaoshengbo ab44c24333
Add Alibaba Cloud OpenSearch as a new vector store (#6154)
Hello Folks,

Thanks for creating and maintaining this great project. I'm excited to
submit this PR to add Alibaba Cloud OpenSearch as a new vector store.

OpenSearch is a one-stop platform to develop intelligent search
services. OpenSearch was built based on the large-scale distributed
search engine developed by Alibaba. OpenSearch serves more than 500
business cases in Alibaba Group and thousands of Alibaba Cloud
customers. OpenSearch helps develop search services in different search
scenarios, including e-commerce, O2O, multimedia, the content industry,
communities and forums, and big data query in enterprises.

OpenSearch provides the vector search feature. In specific scenarios,
especially test question search and image search scenarios, you can use
the vector search feature together with the multimodal search feature to
improve the accuracy of search results.


This PR includes:

A AlibabaCloudOpenSearch class that can connect to the Alibaba Cloud
OpenSearch instance.
add embedings and metadata into a opensearch datasource.
querying by squared euclidean and metadata.
integration tests.
ipython notebook and docs.

I have read your contributing guidelines. And I have passed the tests
below

- [x]  make format
- [x]  make lint
- [x]  make coverage
- [x]  make test

---------

Co-authored-by: zhaoshengbo <shengbo.zsb@alibaba-inc.com>
1 year ago
Harrison Chase 9eec7c3206
Harrison/unstructured page number (#6464)
Co-authored-by: Reza Sanaie <reza@sanaie.ca>
1 year ago
volodymyr-memsql d2e9b621ab
Update SinglStoreDB vectorstore (#6423)
1. Introduced new distance strategies support: **DOT_PRODUCT** and
**EUCLIDEAN_DISTANCE** for enhanced flexibility.
2. Implemented a feature to filter results based on metadata fields.
3. Incorporated connection attributes specifying "langchain python sdk"
usage for enhanced traceability and debugging.
4. Expanded the suite of integration tests for improved code
reliability.
5. Updated the existing notebook with the usage example

@dev2049

---------

Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Leonid Ganeline 03b16ed2b1
docs `retrievers` fixes (#6299)
Fixed several inconsistencies:
- file names and notebook titles should be similar otherwise ToC on the
[retrievers
page](https://python.langchain.com/en/latest/modules/indexes/retrievers.html)
and on the left ToC tab are different. For example, now, `Self-querying
with Chroma` is not correctly alphabetically sorted because its file
named `chroma_self_query.ipynb`
- `Stringing compressors and document transformers...` demoted from `#`
to `##`. Otherwise, it appears in Toc.
- several formatting problems

#### Who can review?

@hwchase17 
@dev2049

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Dhruvil Shah 9494623869
Update web_base.ipynb (#6430)
Minor new line character in the markdown.

Also, this option is not yet in the latest version of LangChain
(0.0.190) from Conda. Maybe in the next update.

@eyurtsev
@hwchase17
1 year ago
Harrison Chase 286452c7f0 remove mongo 1 year ago
Dhruvil Shah ba90e3c990
Update web_base.ipynb for guiding purposes (#6248)
To bypass SSL verification errors during fetching, you can include the
`verify=False` parameter. This markdown proves useful, especially for
beginners in the field of web scraping.

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

Fixes #6079 

#### Who can review?

Tag maintainers/contributors who might be interested:
@hwchase17 
@eyurtsev

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Dhruvil Shah 92f05a67a4
Add markdown to specify important arguments (#6246)
To bypass SSL verification errors during web scraping, you can include
the ssl_verify=False parameter along with the headers parameter. This
combination of arguments proves useful, especially for beginners in the
field of web scraping.

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

Fixes #1829 

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:
@hwchase17 @eyurtsev 
 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Harrison Chase c0c2fd0782
Harrison/zep mem (#6388)
Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>
1 year ago
Harrison Chase 9bf5b0defa
Harrison/myscale self query (#6376)
Co-authored-by: Fangrui Liu <fangruil@moqi.ai>
Co-authored-by: 刘 方瑞 <fangrui.liu@outlook.com>
Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>
1 year ago
Harrison Chase a8cb9ee013
Harrison/gdrive enhancements (#6375)
Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
1 year ago
Lance Martin 370becdfc2
Add self query retriever example with MD header splitting (#6359)
Flesh out the notebook example for `MarkdownHeaderTextSplitter`
1 year ago
Lance Martin 2c97fbabbd
Update MD header text splitter notebook (#6339)
Highlight use case for maintaining header groups when splitting.
1 year ago
Harrison Chase a2bbe3dda4
Harrison/mmr support for opensearch (#6349)
Co-authored-by: Mehmet Öner Yalçın <oneryalcin@gmail.com>
1 year ago
Harrison Chase 680d6bbbf8 fix titles in documentation 1 year ago
Saba Sturua 427551eabf
DocArray as a Retriever (#6031)
## DocArray as a Retriever

[DocArray](https://github.com/docarray/docarray) is an open-source tool
for managing your multi-modal data. It offers flexibility to store and
search through your data using various document index backends. This PR
introduces `DocArrayRetriever` - which works with any available backend
and serves as a retriever for Langchain apps.

Also, I added 2 notebooks:
DocArray Backends - intro to all 5 currently supported backends, how to
initialize, index, and use them as a retriever
DocArray Usage - showcasing what additional search parameters you can
pass to create versatile retrievers

Example:
```python
from docarray.index import InMemoryExactNNIndex
from docarray import BaseDoc, DocList
from docarray.typing import NdArray
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.retrievers import DocArrayRetriever


# define document schema
class MyDoc(BaseDoc):
    description: str
    description_embedding: NdArray[1536]


embeddings = OpenAIEmbeddings()
# create documents
descriptions = ["description 1", "description 2"]
desc_embeddings = embeddings.embed_documents(texts=descriptions)
docs = DocList[MyDoc](
    [
        MyDoc(description=desc, description_embedding=embedding)
        for desc, embedding in zip(descriptions, desc_embeddings)
    ]
)

# initialize document index with data
db = InMemoryExactNNIndex[MyDoc](docs)

# create a retriever
retriever = DocArrayRetriever(
    index=db,
    embeddings=embeddings,
    search_field="description_embedding",
    content_field="description",
)

# find the relevant document
doc = retriever.get_relevant_documents("action movies")
print(doc)
```

#### Who can review?

@dev2049

---------

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
1 year ago
Harrison Chase af18413d97
Harrison/deeplake new features (#6263)
Co-authored-by: adilkhan <adilkhan.sarsen@nu.edu.kz>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
ljeagle ad324a39ae
Improve the performance of add_texts interface and upgrade the AwaDB from 0.3.2 to 0.3.3 (#6316)
1. Changed the implementation of add_texts interface for the AwaDB
vector store in order to improve the performance
2. Upgrade the AwaDB from 0.3.2 to 0.3.3

---------

Co-authored-by: vincent <awadb.vincent@gmail.com>
1 year ago
Davis Chase 87e502c6bc
Doc refactor (#6300)
Co-authored-by: jacoblee93 <jacoblee93@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago