Commit Graph

140 Commits (7d29bb2c0247a908cadb25bbccacdf277ae112fe)

Author SHA1 Message Date
Yifei Song 7d29bb2c02
Add Xorbits Dataframe as a Document Loader (#7319)
- [Xorbits](https://doc.xorbits.io/en/latest/) is an open-source
computing framework that makes it easy to scale data science and machine
learning workloads in parallel. Xorbits can leverage multi cores or GPUs
to accelerate computation on a single machine, or scale out up to
thousands of machines to support processing terabytes of data.

- This PR added support for the Xorbits document loader, which allows
langchain to leverage Xorbits to parallelize and distribute the loading
of data.
- Dependencies: This change requires the Xorbits library to be installed
in order to be used.
`pip install xorbits`
- Request for review: @rlancemartin, @eyurtsev
- Twitter handle: https://twitter.com/Xorbitsio

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Paul-Emile Brotons d2cf0d16b3
adding max_marginal_relevance_search method to MongoDBAtlasVectorSearch (#7310)
Adding a maximal_marginal_relevance method to the
MongoDBAtlasVectorSearch vectorstore enhances the user experience by
providing more diverse search results

Issue: #7304
1 year ago
Matt Robinson bcab894f4e
feat: Add `UnstructuredTSVLoader` (#7367)
### Summary

Adds an `UnstructuredTSVLoader` for TSV files. Also updates the doc
strings for `UnstructuredCSV` and `UnstructuredExcel` loaders.

### Testing

```python
from langchain.document_loaders.tsv import UnstructuredTSVLoader

loader = UnstructuredTSVLoader(
    file_path="example_data/mlb_teams_2012.csv", mode="elements"
)
docs = loader.load()
```
1 year ago
nikkie dfc3f83b0f
docs(vectorstores/integrations/chroma): Fix loading and saving (#7437)
- Description: Fix loading and saving code about Chroma
- Issue: the issue #7436 
- Dependencies: -
- Twitter handle: https://twitter.com/ftnext
1 year ago
Daniel Chalef c7f7788d0b
Add ZepMemory; improve ZepChatMessageHistory handling of metadata; Fix bugs (#7444)
Hey @hwchase17 - 

This PR adds a `ZepMemory` class, improves handling of Zep's message
metadata, and makes it easier for folks building custom chains to
persist metadata alongside their chat history.

We've had plenty confused users unfamiliar with ChatMessageHistory
classes and how to wrap the `ZepChatMessageHistory` in a
`ConversationBufferMemory`. So we've created the `ZepMemory` class as a
light wrapper for `ZepChatMessageHistory`.

Details:
- add ZepMemory, modify notebook to demo use of ZepMemory
- Modify summary to be SystemMessage
- add metadata argument to add_message; add Zep metadata to
Message.additional_kwargs
- support passing in metadata
1 year ago
Nolan 5da9f9abcb
docs(agents/toolkits): Fix error in document_comparison_toolkit.ipynb (#7417)
Replace this comment with:
- Description: Removes unneeded output warning in documentation at
https://python.langchain.com/docs/modules/agents/toolkits/document_comparison_toolkit
  - Issue: -
  - Dependencies: -
  - Tag maintainer: @baskaryan
  - Twitter handle: @finnless
1 year ago
Delgermurun a1603fccfb
integrate JinaChat (#6927)
Integration with https://chat.jina.ai/api. It is OpenAI compatible API.

- Twitter handle:
[https://twitter.com/JinaAI_](https://twitter.com/JinaAI_)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Roger Yu 633b673b85
Update pinecone.ipynb (#7382)
Fix typo
1 year ago
Joshua Carroll 705d2f5b92
Update the API Reference link in Streamlit integration docs (#7377)
This page:


https://python.langchain.com/docs/modules/callbacks/integrations/streamlit

Has a bad API Reference link currently. This PR fixes it to the correct
link.

Also updates the embedded app link to
https://langchain-mrkl.streamlit.app/ (better name) which is hosted in
langchain-ai/streamlit-agent repo
1 year ago
Georges Petrov ec033ae277
Rename Databerry to Chaindesk (#7022)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Harrison Chase 7cdf97ba9b
Harrison/add to imports (#7370)
pgvector cleanup
1 year ago
Bagatur 4d427b2397
Base language model docstrings (#7104) 1 year ago
Alex Gamble df746ad821
Add a callback handler for Context (https://getcontext.ai) (#7151)
### Description

Adding a callback handler for Context. Context is a product analytics
platform for AI chat experiences to help you understand how users are
interacting with your product.

I've added the callback library + an example notebook showing its use.

### Dependencies

Requires the user to install the `context-python` library. The library
is lazily-loaded when the callback is instantiated.

### Announcing the feature

We spoke with Harrison a few weeks ago about also doing a blog post
announcing our integration, so will coordinate this with him. Our
Twitter handle for the company is @getcontextai, and the founders are
@_agamble and @HenrySG.

Thanks in advance!
1 year ago
German Martin 3ce4e46c8c
The Fellowship of the Vectors: New Embeddings Filter using clustering. (#7015)
Continuing with Tolkien inspired series of langchain tools. I bring to
you:
**The Fellowship of the Vectors**, AKA EmbeddingsClusteringFilter.
This document filter uses embeddings to group vectors together into
clusters, then allows you to pick an arbitrary number of documents
vector based on proximity to the cluster centers. That's a
representative sample of the cluster.

The original idea is from [Greg Kamradt](https://github.com/gkamradt)
from this video (Level4):
https://www.youtube.com/watch?v=qaPMdcCqtWk&t=365s

I added few tricks to make it a bit more versatile, so you can
parametrize what to do with duplicate documents in case of cluster
overlap: replace the duplicates with the next closest document or remove
it. This allow you to use it as an special kind of redundant filter too.
Additionally you can choose 2 diff orders: grouped by cluster or
respecting the original retriever scores.
In my use case I was using the docs grouped by cluster to run refine
chains per cluster to generate summarization over a large corpus of
documents.
Let me know if you want to change anything!

@rlancemartin, @eyurtsev, @hwchase17,

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
1 year ago
Bagatur d1c7237034
openai fn update nb (#7352) 1 year ago
Bagatur 1c8cff32f1
Generic OpenAI fn chain (#7270)
Add loading functions for openai function chains and add docs page
1 year ago
OwenElliott 3074306ae1
Marqo Vector Store Examples & Type Hints (#7326)
This PR improves the example notebook for the Marqo vectorstore
implementation by adding a new RetrievalQAWithSourcesChain example. The
`embedding` parameter in `from_documents` has its type updated to
`Union[Embeddings, None]` and a default parameter of None because this
is ignored in Marqo.

This PR also upgrades the Marqo version to 0.11.0 to remove the device
parameter after a breaking change to the API.

Related to #7068 @tomhamer @hwchase17

---------

Co-authored-by: Tom Hamer <tom@marqo.ai>
1 year ago
Bagatur a9c5b4bcea
Bagatur/clarifai update (#7324)
This PR improves upon the Clarifai LangChain integration with improved docs, errors, args and the addition of embedding model support in LancChain for Clarifai's embedding models and an overview of the various ways you can integrate with Clarifai added to the docs.

---------

Co-authored-by: Matthew Zeiler <zeiler@clarifai.com>
1 year ago
John Landahl e047541b5f
Corrected a typo in elasticsearch.ipynb (#7318)
Simple typo fix
1 year ago
Leonid Ganeline 6ff9e9b34a
updated `huggingface_hub` examples (#7292)
Added examples for models:
- Google `Flan`
- TII `Falcon`
- Salesforce `XGen`
1 year ago
Dídac Sabatés e0cb3ea90c
Fix sql_database.ipynb link (#6525)
Looks like the
[SQLDatabaseChain](https://langchain.readthedocs.io/en/latest/modules/chains/examples/sqlite.html)
in the SQL Database Agent page was broken I've change it to the SQL
Chain page
1 year ago
hayao-k c23e16c459
docs: Fixed typos in Amazon Kendra Retriever documentation (#7261)
## Description
Fixed to the official service name Amazon Kendra.

## Tag maintainer
@baskaryan
1 year ago
zhaoshengbo e8f24164f0
Improve the alibaba cloud opensearch vector store documentation (#6964)
Based on user feedback, we have improved the Alibaba Cloud OpenSearch
vector store documentation.

Co-authored-by: zhaoshengbo <shengbo.zsb@alibaba-inc.com>
1 year ago
Stefano Lottini e61cfb6e99
FLARE Example notebook: switch to named arg to pass pydantic validation (#7267)
Adding the name of the parameter to comply with latest requirements by
Pydantic usage for BaseModels.
1 year ago
os1ma b151d4257a
docs: Update documentation for Wikipedia tool to use WikipediaQueryRun (#7258)
**Description**
In the following page, "Wikipedia" tool is explained.

https://python.langchain.com/docs/modules/agents/tools/integrations/wikipedia

However, the WikipediaAPIWrapper being used is not a tool. This PR
updated the documentation to use a tool WikipediaQueryRun.

**Issue**
None

**Tag maintainer**
Agents / Tools / Toolkits: @hinthornw
1 year ago
Shantanu Nair f773c21723
Update supabase match_docs ddl and notebook to use expected id type (#7257)
- Description: Switch supabase match function DDL to use expected uuid
type instead of bigint
- Issue: https://github.com/hwchase17/langchain/issues/6743,
https://github.com/hwchase17/langchain/issues/7179
  - Tag maintainer:  @rlancemartin, @eyurtsev
  - Twitter handle: https://twitter.com/ShantanuNair
1 year ago
Myeongseop Kim 0e878ccc2d
Add HumanInputChatModel (#7256)
- Description: This is a chat model equivalent of HumanInputLLM. An
example notebook is also added.
  - Tag maintainer: @hwchase17, @baskaryan
  - Twitter handle: N/A
1 year ago
Harrison Chase 52b016920c
Harrison/update anthropic (#7237)
Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>
1 year ago
Hashem Alsaket 6aa66fd2b0
Update Hugging Face Hub notebook (#7236)
Description: `flan-t5-xl` hangs, updated to `flan-t5-xxl`. Tested all
stabilityai LLMs- all hang so removed from tutorial. Temperature > 0 to
prevent unintended determinism.
Issue: #3275 
Tag maintainer: @baskaryan
1 year ago
Mike Nitsenko d669b9ece9
Document loader for Cube Semantic Layer (#6882)
### Description

This pull request introduces the "Cube Semantic Layer" document loader,
which demonstrates the retrieval of Cube's data model metadata in a
format suitable for passing to LLMs as embeddings. This enhancement aims
to provide contextual information and improve the understanding of data.

Twitter handle:
@the_cube_dev

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
1 year ago
Tom e533da8bf2
Adding Marqo to vectorstore ecosystem (#7068)
This PR brings in a vectorstore interface for
[Marqo](https://www.marqo.ai/).

The Marqo vectorstore exposes some of Marqo's functionality in addition
the the VectorStore base class. The Marqo vectorstore also makes the
embedding parameter optional because inference for embeddings is an
inherent part of Marqo.

Docs, notebook examples and integration tests included.

Related PR:
https://github.com/hwchase17/langchain/pull/2807

---------

Co-authored-by: Tom Hamer <tom@marqo.ai>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Harrison Chase 6711854e30
Harrison/dataforseo (#7214)
Co-authored-by: Alexander <sune357@gmail.com>
1 year ago
Conrad Fernandez 6eff0fa2ca
Added documentation for add_texts function for Pinecone integration (#7134)
- Description: added some documentation to the Pinecone vector store
docs page.
- Issue: #7126 
- Dependencies: None
- Tag maintainer: @baskaryan 

I can add more documentation on the Pinecone integration functions as I
am going to go in great depth into this area. Just wanted to check with
the maintainers is if this is all good.
1 year ago
felixocker db98c44f8f
Support for SPARQL (#7165)
# [SPARQL](https://www.w3.org/TR/rdf-sparql-query/) for
[LangChain](https://github.com/hwchase17/langchain)

## Description
LangChain support for knowledge graphs relying on W3C standards using
RDFlib: SPARQL/ RDF(S)/ OWL with special focus on RDF \
* Works with local files, files from the web, and SPARQL endpoints
* Supports both SELECT and UPDATE queries
* Includes both a Jupyter notebook with an example and integration tests

## Contribution compared to related PRs and discussions
* [Wikibase agent](https://github.com/hwchase17/langchain/pull/2690) -
uses SPARQL, but specifically for wikibase querying
* [Cypher qa](https://github.com/hwchase17/langchain/pull/5078) - graph
DB question answering for Neo4J via Cypher
* [PR 6050](https://github.com/hwchase17/langchain/pull/6050) - tries
something similar, but does not cover UPDATE queries and supports only
RDF
* Discussions on [w3c mailing list](mailto:semantic-web@w3.org) related
to the combination of LLMs (specifically ChatGPT) and knowledge graphs

## Dependencies
* [RDFlib](https://github.com/RDFLib/rdflib)

## Tag maintainer
Graph database related to memory -> @hwchase17
1 year ago
Prakul Agarwal 38f853dfa3
Fixed typos in MongoDB Atlas Vector Search documentation (#7174)
Fix for typos in MongoDB Atlas Vector Search documentation
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
1 year ago
Raouf Chebri 6fc24743b7
Add pg_hnsw vectorstore integration (#6893)
Hi @rlancemartin, @eyurtsev!

- Description: Adding HNSW extension support for Postgres. Similar to
pgvector vectorstore, with 3 differences
      1. it uses HNSW extension for exact and ANN searches, 
      2. Vectors are of type array of real
      3. Only supports L2
      
- Dependencies: [HNSW](https://github.com/knizhnik/hnsw) extension for
Postgres
  
  - Example:
  ```python
    db = HNSWVectoreStore.from_documents(
      embedding=embeddings,
      documents=docs,
      collection_name=collection_name,
      connection_string=connection_string
  )
  
  query = "What did the president say about Ketanji Brown Jackson"
docs_with_score: List[Tuple[Document, float]] =
db.similarity_search_with_score(query)
  ```

The example notebook is in the PR too.
1 year ago
Simon Cheung 81eebc4070
Add HugeGraphQAChain to support gremlin generating chain (#7132)
[Apache HugeGraph](https://github.com/apache/incubator-hugegraph) is a
convenient, efficient, and adaptable graph database, compatible with the
Apache TinkerPop3 framework and the Gremlin query language.

In this PR, the HugeGraph and HugeGraphQAChain provide the same
functionality as the existing integration with Neo4j and enables query
generation and question answering over HugeGraph database. The
difference is that the graph query language supported by HugeGraph is
not cypher but another very popular graph query language
[Gremlin](https://tinkerpop.apache.org/gremlin.html).

A notebook example and a simple test case have also been added.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Saverio Proto 5585607654
Improve Bing Search example (#7128)
# Description

Improve Bing Search example:
1 year ago
Lance Martin 265c285057
Fix GPT4All bug w/ "n_ctx" param (#7093)
Running `GPT4All` per the
[docs](https://python.langchain.com/docs/modules/model_io/models/llms/integrations/gpt4all),
I see:

```
$ from langchain.llms import GPT4All
$ model = GPT4All(model=local_path)
$ model("The capital of France is ", max_tokens=10)
TypeError: generate() got an unexpected keyword argument 'n_ctx'
```

It appears `n_ctx` is [no longer a supported
param](https://docs.gpt4all.io/gpt4all_python.html#gpt4all.gpt4all.GPT4All.generate)
in the GPT4All API from https://github.com/nomic-ai/gpt4all/pull/1090.

It now uses `max_tokens`, so I set this.

And I also set other defaults used in GPT4All client
[here](https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-bindings/python/gpt4all/gpt4all.py).

Confirm it now works:
```
$ from langchain.llms import GPT4All
$ model = GPT4All(model=local_path)
$ model("The capital of France is ", max_tokens=10)
< Model logging > 
"....Paris."
```

---------

Co-authored-by: R. Lance Martin <rlm@Rs-MacBook-Pro.local>
1 year ago
Stefano Lottini 6631fd5168
Align cassio versions between examples for Cassandra integration (#7099)
Just reducing confusion by requiring cassio>=0.0.7 consistently across
examples.
1 year ago
Lance Martin 9ca4c54428
Minor updates to notebook for MultiQueryRetriever (#7102)
* Add an easier-to-run example.
* Add logging per https://github.com/hwchase17/langchain/pull/6891.
* Updated params per https://github.com/hwchase17/langchain/pull/5962.

---------

Co-authored-by: R. Lance Martin <rlm@Rs-MacBook-Pro.local>
Co-authored-by: Lance Martin <lance@langchain.dev>
1 year ago
genewoo e49abd1277
Add Metal support to llama.cpp doc (#7092)
- Description: Add Metal support to llama.cpp doc
  - Issue: #7091 
  - Dependencies: N/A
  - Twitter handle: gene_wu
1 year ago
rjarun8 e2d61ab85a
Add SpacyEmbeddings class (#6967)
- Description: Added a new SpacyEmbeddings class for generating
embeddings using the Spacy library.
- Issue: Sentencebert/Bert/Spacy/Doc2vec embedding support #6952
- Dependencies: This change requires the Spacy library and the
'en_core_web_sm' Spacy model.
- Tag maintainer: @dev2049
- Twitter handle: N/A

This change includes a new SpacyEmbeddings class, but does not include a
test or an example notebook.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
adam91holt 80e86b602e
Remove duplicate mongodb integration doc (#7006) 1 year ago
Johnny Lim a081e419a0
Fix sample in FAISS section (#7050)
This PR fixes a sample in the FAISS section in the reference docs.
1 year ago
Leonid Ganeline 200be43da6
added `Brave Search` document_loader (#6989)
- Added `Brave Search` document loader.
- Refactored BraveSearch wrapper
- Added a Jupyter Notebook example
- Added `Ecosystem/Integrations` BraveSearch page 

Please review:
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
1 year ago
Sergey Kozlov 6d15854cda
Add JSON Lines support to JSONLoader (#6913)
**Description**:

The JSON Lines format is used by some services such as OpenAI and
HuggingFace. It's also a convenient alternative to CSV.

This PR adds JSON Lines support to `JSONLoader` and also updates related
tests.

**Tag maintainer**: @rlancemartin, @eyurtsev.

PS I was not able to build docs locally so didn't update related
section.
1 year ago
Ofer Mendelevitch 153b56d19b
Vectara upd2 (#6506)
Update to Vectara integration 
- By user request added "add_files" to take advantage of Vectara
capabilities to process files on the backend, without the need for
separate loading of documents and chunking in the chain.
- Updated vectara.ipynb example notebook to be broader and added testing
of add_file()
 
  @hwchase17 - project lead

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
1 year ago
Leonid Ganeline 77ae8084a0
docstrings `document_loaders` 1 (#6847)
- Updated docstrings in `document_loaders`
- several code fixes.
- added `docs/extras/ecosystem/integrations/airtable.md`

@rlancemartin, @eyurtsev
1 year ago
Stefano Lottini 8d2281a8ca
Second Attempt - Add concurrent insertion of vector rows in the Cassandra Vector Store (#7017)
Retrying with the same improvements as in #6772, this time trying not to
mess up with branches.

@rlancemartin doing a fresh new PR from a branch with a new name. This
should do. Thank you for your help!

---------

Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
Co-authored-by: rlm <pexpresss31@gmail.com>
1 year ago