Commit Graph

1412 Commits (b8b8a138df0e70a2073682d9082f7cc6ae2acb41)

Author SHA1 Message Date
Yifei Song 7d29bb2c02
Add Xorbits Dataframe as a Document Loader (#7319)
- [Xorbits](https://doc.xorbits.io/en/latest/) is an open-source
computing framework that makes it easy to scale data science and machine
learning workloads in parallel. Xorbits can leverage multi cores or GPUs
to accelerate computation on a single machine, or scale out up to
thousands of machines to support processing terabytes of data.

- This PR added support for the Xorbits document loader, which allows
langchain to leverage Xorbits to parallelize and distribute the loading
of data.
- Dependencies: This change requires the Xorbits library to be installed
in order to be used.
`pip install xorbits`
- Request for review: @rlancemartin, @eyurtsev
- Twitter handle: https://twitter.com/Xorbitsio

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Paul-Emile Brotons d2cf0d16b3
adding max_marginal_relevance_search method to MongoDBAtlasVectorSearch (#7310)
Adding a maximal_marginal_relevance method to the
MongoDBAtlasVectorSearch vectorstore enhances the user experience by
providing more diverse search results

Issue: #7304
1 year ago
Matt Robinson bcab894f4e
feat: Add `UnstructuredTSVLoader` (#7367)
### Summary

Adds an `UnstructuredTSVLoader` for TSV files. Also updates the doc
strings for `UnstructuredCSV` and `UnstructuredExcel` loaders.

### Testing

```python
from langchain.document_loaders.tsv import UnstructuredTSVLoader

loader = UnstructuredTSVLoader(
    file_path="example_data/mlb_teams_2012.csv", mode="elements"
)
docs = loader.load()
```
1 year ago
nikkie dfc3f83b0f
docs(vectorstores/integrations/chroma): Fix loading and saving (#7437)
- Description: Fix loading and saving code about Chroma
- Issue: the issue #7436 
- Dependencies: -
- Twitter handle: https://twitter.com/ftnext
1 year ago
Daniel Chalef c7f7788d0b
Add ZepMemory; improve ZepChatMessageHistory handling of metadata; Fix bugs (#7444)
Hey @hwchase17 - 

This PR adds a `ZepMemory` class, improves handling of Zep's message
metadata, and makes it easier for folks building custom chains to
persist metadata alongside their chat history.

We've had plenty confused users unfamiliar with ChatMessageHistory
classes and how to wrap the `ZepChatMessageHistory` in a
`ConversationBufferMemory`. So we've created the `ZepMemory` class as a
light wrapper for `ZepChatMessageHistory`.

Details:
- add ZepMemory, modify notebook to demo use of ZepMemory
- Modify summary to be SystemMessage
- add metadata argument to add_message; add Zep metadata to
Message.additional_kwargs
- support passing in metadata
1 year ago
Saurabh Chaturvedi 8f8e8d701e
Fix info about YouTube (#7447)
(Unintentionally mean 😅) nit: YouTube wasn't created by Google, this PR
fixes the mention in docs.
1 year ago
Jeroen Van Goey f5bd88757e
Fix typo (#7416)
`quesitons` -> `questions`.
1 year ago
Nolan 5da9f9abcb
docs(agents/toolkits): Fix error in document_comparison_toolkit.ipynb (#7417)
Replace this comment with:
- Description: Removes unneeded output warning in documentation at
https://python.langchain.com/docs/modules/agents/toolkits/document_comparison_toolkit
  - Issue: -
  - Dependencies: -
  - Tag maintainer: @baskaryan
  - Twitter handle: @finnless
1 year ago
nikkie 2eb4a2ceea
docs(retrievers/get-started): Fix broken state_of_the_union.txt link (#7399)
Thank you for this awesome library.

- Description: Fix broken link in documentation 
- Issue:
-
https://python.langchain.com/docs/modules/data_connection/retrievers/#get-started
- the URL:
https://github.com/hwchase17/langchain/blob/master/docs/modules/state_of_the_union.txt
- I think the right one is
https://github.com/hwchase17/langchain/blob/master/docs/extras/modules/state_of_the_union.txt
- Dependencies: -
- Tag maintainer: @baskaryan
- Twitter handle: -
1 year ago
Delgermurun a1603fccfb
integrate JinaChat (#6927)
Integration with https://chat.jina.ai/api. It is OpenAI compatible API.

- Twitter handle:
[https://twitter.com/JinaAI_](https://twitter.com/JinaAI_)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Roger Yu 633b673b85
Update pinecone.ipynb (#7382)
Fix typo
1 year ago
William FH 4789c99bc2
Add String Distance and Embedding Evaluators (#7123)
Add a string evaluator and pairwise string evaluator implementation for:
- Embedding distance
- String distance

Update docs
1 year ago
Bagatur 1ac347b4e3
update databerry-chaindesk redirect (#7378) 1 year ago
Joshua Carroll 705d2f5b92
Update the API Reference link in Streamlit integration docs (#7377)
This page:


https://python.langchain.com/docs/modules/callbacks/integrations/streamlit

Has a bad API Reference link currently. This PR fixes it to the correct
link.

Also updates the embedded app link to
https://langchain-mrkl.streamlit.app/ (better name) which is hosted in
langchain-ai/streamlit-agent repo
1 year ago
Georges Petrov ec033ae277
Rename Databerry to Chaindesk (#7022)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Harrison Chase 7cdf97ba9b
Harrison/add to imports (#7370)
pgvector cleanup
1 year ago
Bagatur 4d427b2397
Base language model docstrings (#7104) 1 year ago
Alex Gamble df746ad821
Add a callback handler for Context (https://getcontext.ai) (#7151)
### Description

Adding a callback handler for Context. Context is a product analytics
platform for AI chat experiences to help you understand how users are
interacting with your product.

I've added the callback library + an example notebook showing its use.

### Dependencies

Requires the user to install the `context-python` library. The library
is lazily-loaded when the callback is instantiated.

### Announcing the feature

We spoke with Harrison a few weeks ago about also doing a blog post
announcing our integration, so will coordinate this with him. Our
Twitter handle for the company is @getcontextai, and the founders are
@_agamble and @HenrySG.

Thanks in advance!
1 year ago
German Martin 3ce4e46c8c
The Fellowship of the Vectors: New Embeddings Filter using clustering. (#7015)
Continuing with Tolkien inspired series of langchain tools. I bring to
you:
**The Fellowship of the Vectors**, AKA EmbeddingsClusteringFilter.
This document filter uses embeddings to group vectors together into
clusters, then allows you to pick an arbitrary number of documents
vector based on proximity to the cluster centers. That's a
representative sample of the cluster.

The original idea is from [Greg Kamradt](https://github.com/gkamradt)
from this video (Level4):
https://www.youtube.com/watch?v=qaPMdcCqtWk&t=365s

I added few tricks to make it a bit more versatile, so you can
parametrize what to do with duplicate documents in case of cluster
overlap: replace the duplicates with the next closest document or remove
it. This allow you to use it as an special kind of redundant filter too.
Additionally you can choose 2 diff orders: grouped by cluster or
respecting the original retriever scores.
In my use case I was using the docs grouped by cluster to run refine
chains per cluster to generate summarization over a large corpus of
documents.
Let me know if you want to change anything!

@rlancemartin, @eyurtsev, @hwchase17,

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
1 year ago
Leonid Ganeline b489466488
docs: `dependents` update 4 (#7360)
Updated links and counters of the `dependents` page.
1 year ago
Bagatur d1c7237034
openai fn update nb (#7352) 1 year ago
Bagatur 1c8cff32f1
Generic OpenAI fn chain (#7270)
Add loading functions for openai function chains and add docs page
1 year ago
Bagatur fd7145970f
Output parser redirect (#7330)
Related to ##7311
1 year ago
OwenElliott 3074306ae1
Marqo Vector Store Examples & Type Hints (#7326)
This PR improves the example notebook for the Marqo vectorstore
implementation by adding a new RetrievalQAWithSourcesChain example. The
`embedding` parameter in `from_documents` has its type updated to
`Union[Embeddings, None]` and a default parameter of None because this
is ignored in Marqo.

This PR also upgrades the Marqo version to 0.11.0 to remove the device
parameter after a breaking change to the API.

Related to #7068 @tomhamer @hwchase17

---------

Co-authored-by: Tom Hamer <tom@marqo.ai>
1 year ago
Bagatur a9c5b4bcea
Bagatur/clarifai update (#7324)
This PR improves upon the Clarifai LangChain integration with improved docs, errors, args and the addition of embedding model support in LancChain for Clarifai's embedding models and an overview of the various ways you can integrate with Clarifai added to the docs.

---------

Co-authored-by: Matthew Zeiler <zeiler@clarifai.com>
1 year ago
John Landahl e047541b5f
Corrected a typo in elasticsearch.ipynb (#7318)
Simple typo fix
1 year ago
Subsegment 152dc59060
docs : add cnosdb to Ecosystem Integrations (#7316)
- Implement a `from_cnosdb` method for the `SQLDatabase` class
  - Write CnosDB documentation and add it to Ecosystem Integrations
1 year ago
Bagatur a6b39afe0e
rm side nav (#7297) 1 year ago
Leonid Ganeline 6ff9e9b34a
updated `huggingface_hub` examples (#7292)
Added examples for models:
- Google `Flan`
- TII `Falcon`
- Salesforce `XGen`
1 year ago
Dídac Sabatés e0cb3ea90c
Fix sql_database.ipynb link (#6525)
Looks like the
[SQLDatabaseChain](https://langchain.readthedocs.io/en/latest/modules/chains/examples/sqlite.html)
in the SQL Database Agent page was broken I've change it to the SQL
Chain page
1 year ago
Leonid Ganeline 4450791edd
docs: tutorials update (#7230)
updated `tutorials.mdx`:
- added a link to new `Deeplearning AI` course on LangChain
- added links to other tutorial videos
- fixed format

@baskaryan, @hwchase17
1 year ago
hayao-k c23e16c459
docs: Fixed typos in Amazon Kendra Retriever documentation (#7261)
## Description
Fixed to the official service name Amazon Kendra.

## Tag maintainer
@baskaryan
1 year ago
zhaoshengbo e8f24164f0
Improve the alibaba cloud opensearch vector store documentation (#6964)
Based on user feedback, we have improved the Alibaba Cloud OpenSearch
vector store documentation.

Co-authored-by: zhaoshengbo <shengbo.zsb@alibaba-inc.com>
1 year ago
emarco177 b9d6d4cd4c
added template repo for CI/CD deployment on Google Cloud Run (#7218)
Replace this comment with:
- Description: added documentation for a template repo that helps
dockerizing and deploying a LangChain using a Cloud Build CI/CD pipeline
to Google Cloud build serverless
  - Issue: None,
  - Dependencies: None,
  - Tag maintainer: @baskaryan,
  - Twitter handle: EdenEmarco177

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.
1 year ago
Stefano Lottini e61cfb6e99
FLARE Example notebook: switch to named arg to pass pydantic validation (#7267)
Adding the name of the parameter to comply with latest requirements by
Pydantic usage for BaseModels.
1 year ago
os1ma b151d4257a
docs: Update documentation for Wikipedia tool to use WikipediaQueryRun (#7258)
**Description**
In the following page, "Wikipedia" tool is explained.

https://python.langchain.com/docs/modules/agents/tools/integrations/wikipedia

However, the WikipediaAPIWrapper being used is not a tool. This PR
updated the documentation to use a tool WikipediaQueryRun.

**Issue**
None

**Tag maintainer**
Agents / Tools / Toolkits: @hinthornw
1 year ago
Jeroen Van Goey 887bb12287
Use correct Language for html_splitter (#7274)
`html_splitter` was using `Language.MARKDOWN`.
1 year ago
Shantanu Nair f773c21723
Update supabase match_docs ddl and notebook to use expected id type (#7257)
- Description: Switch supabase match function DDL to use expected uuid
type instead of bigint
- Issue: https://github.com/hwchase17/langchain/issues/6743,
https://github.com/hwchase17/langchain/issues/7179
  - Tag maintainer:  @rlancemartin, @eyurtsev
  - Twitter handle: https://twitter.com/ShantanuNair
1 year ago
Myeongseop Kim 0e878ccc2d
Add HumanInputChatModel (#7256)
- Description: This is a chat model equivalent of HumanInputLLM. An
example notebook is also added.
  - Tag maintainer: @hwchase17, @baskaryan
  - Twitter handle: N/A
1 year ago
Harrison Chase 52b016920c
Harrison/update anthropic (#7237)
Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>
1 year ago
Hashem Alsaket 6aa66fd2b0
Update Hugging Face Hub notebook (#7236)
Description: `flan-t5-xl` hangs, updated to `flan-t5-xxl`. Tested all
stabilityai LLMs- all hang so removed from tutorial. Temperature > 0 to
prevent unintended determinism.
Issue: #3275 
Tag maintainer: @baskaryan
1 year ago
Harrison Chase d6541da161
remove arize nb (#7238)
was causing some issues with docs build
1 year ago
Mike Nitsenko d669b9ece9
Document loader for Cube Semantic Layer (#6882)
### Description

This pull request introduces the "Cube Semantic Layer" document loader,
which demonstrates the retrieval of Cube's data model metadata in a
format suitable for passing to LLMs as embeddings. This enhancement aims
to provide contextual information and improve the understanding of data.

Twitter handle:
@the_cube_dev

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
1 year ago
Tom e533da8bf2
Adding Marqo to vectorstore ecosystem (#7068)
This PR brings in a vectorstore interface for
[Marqo](https://www.marqo.ai/).

The Marqo vectorstore exposes some of Marqo's functionality in addition
the the VectorStore base class. The Marqo vectorstore also makes the
embedding parameter optional because inference for embeddings is an
inherent part of Marqo.

Docs, notebook examples and integration tests included.

Related PR:
https://github.com/hwchase17/langchain/pull/2807

---------

Co-authored-by: Tom Hamer <tom@marqo.ai>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Harrison Chase 47e7d09dff
fix arize nb (#7227) 1 year ago
Harrison Chase 6711854e30
Harrison/dataforseo (#7214)
Co-authored-by: Alexander <sune357@gmail.com>
1 year ago
Hakan Tekgul 61938a02a1
Create arize_llm_observability.ipynb (#7000)
Adding documentation and notebook for Arize callback handler. 

  - @dev2049
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
1 year ago
Leonid Ganeline ecee4d6e92
docs: update `youtube` videos and tutorials (#6515)
added tutorials.mdx; updated youtube.mdx

Rationale: the Tutorials section in the documentation is top-priority.
(for example, https://pytorch.org/docs/stable/index.html) Not every
project has resources to make tutorials. We have such a privilege.
Community experts created several tutorials on YouTube. But the tutorial
links are now hidden on the YouTube page and not easily discovered by
first-time visitors.

- Added new videos and tutorials that were created since the last
update.
- Made some reprioritization between videos on the base of the view
numbers.

#### Who can review?

  - @hwchase17
    - @dev2049
1 year ago
Josh Reini 30d8d1d3d0
add trulens integration (#7096)
Description: Add TruLens integration.

Twitter: @trulensml

For review:
  - Tracing: @agola11
  - Tools: @hinthornw
1 year ago
Conrad Fernandez 6eff0fa2ca
Added documentation for add_texts function for Pinecone integration (#7134)
- Description: added some documentation to the Pinecone vector store
docs page.
- Issue: #7126 
- Dependencies: None
- Tag maintainer: @baskaryan 

I can add more documentation on the Pinecone integration functions as I
am going to go in great depth into this area. Just wanted to check with
the maintainers is if this is all good.
1 year ago
felixocker db98c44f8f
Support for SPARQL (#7165)
# [SPARQL](https://www.w3.org/TR/rdf-sparql-query/) for
[LangChain](https://github.com/hwchase17/langchain)

## Description
LangChain support for knowledge graphs relying on W3C standards using
RDFlib: SPARQL/ RDF(S)/ OWL with special focus on RDF \
* Works with local files, files from the web, and SPARQL endpoints
* Supports both SELECT and UPDATE queries
* Includes both a Jupyter notebook with an example and integration tests

## Contribution compared to related PRs and discussions
* [Wikibase agent](https://github.com/hwchase17/langchain/pull/2690) -
uses SPARQL, but specifically for wikibase querying
* [Cypher qa](https://github.com/hwchase17/langchain/pull/5078) - graph
DB question answering for Neo4J via Cypher
* [PR 6050](https://github.com/hwchase17/langchain/pull/6050) - tries
something similar, but does not cover UPDATE queries and supports only
RDF
* Discussions on [w3c mailing list](mailto:semantic-web@w3.org) related
to the combination of LLMs (specifically ChatGPT) and knowledge graphs

## Dependencies
* [RDFlib](https://github.com/RDFLib/rdflib)

## Tag maintainer
Graph database related to memory -> @hwchase17
1 year ago
Prakul Agarwal 38f853dfa3
Fixed typos in MongoDB Atlas Vector Search documentation (#7174)
Fix for typos in MongoDB Atlas Vector Search documentation
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
1 year ago
Raouf Chebri 6fc24743b7
Add pg_hnsw vectorstore integration (#6893)
Hi @rlancemartin, @eyurtsev!

- Description: Adding HNSW extension support for Postgres. Similar to
pgvector vectorstore, with 3 differences
      1. it uses HNSW extension for exact and ANN searches, 
      2. Vectors are of type array of real
      3. Only supports L2
      
- Dependencies: [HNSW](https://github.com/knizhnik/hnsw) extension for
Postgres
  
  - Example:
  ```python
    db = HNSWVectoreStore.from_documents(
      embedding=embeddings,
      documents=docs,
      collection_name=collection_name,
      connection_string=connection_string
  )
  
  query = "What did the president say about Ketanji Brown Jackson"
docs_with_score: List[Tuple[Document, float]] =
db.similarity_search_with_score(query)
  ```

The example notebook is in the PR too.
1 year ago
Max Cembalest 2984803597
cleaned Arthur tracking demo notebook (#7147)
Cleaned title and reduced clutter for integration demo notebook for the
Arthur callback handler
1 year ago
Deepankar Mahapatro da69a6771f
docs: update Jina ecosystem (#7149)
Documentation update for [Jina
ecosystem](https://python.langchain.com/docs/ecosystem/integrations/jina)
and `langchain-serve` in the deployments section to latest features.

@hwchase17 

<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
1 year ago
Simon Cheung 81eebc4070
Add HugeGraphQAChain to support gremlin generating chain (#7132)
[Apache HugeGraph](https://github.com/apache/incubator-hugegraph) is a
convenient, efficient, and adaptable graph database, compatible with the
Apache TinkerPop3 framework and the Gremlin query language.

In this PR, the HugeGraph and HugeGraphQAChain provide the same
functionality as the existing integration with Neo4j and enables query
generation and question answering over HugeGraph database. The
difference is that the graph query language supported by HugeGraph is
not cypher but another very popular graph query language
[Gremlin](https://tinkerpop.apache.org/gremlin.html).

A notebook example and a simple test case have also been added.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Saverio Proto 5585607654
Improve Bing Search example (#7128)
# Description

Improve Bing Search example:
1 year ago
Lance Martin 265c285057
Fix GPT4All bug w/ "n_ctx" param (#7093)
Running `GPT4All` per the
[docs](https://python.langchain.com/docs/modules/model_io/models/llms/integrations/gpt4all),
I see:

```
$ from langchain.llms import GPT4All
$ model = GPT4All(model=local_path)
$ model("The capital of France is ", max_tokens=10)
TypeError: generate() got an unexpected keyword argument 'n_ctx'
```

It appears `n_ctx` is [no longer a supported
param](https://docs.gpt4all.io/gpt4all_python.html#gpt4all.gpt4all.GPT4All.generate)
in the GPT4All API from https://github.com/nomic-ai/gpt4all/pull/1090.

It now uses `max_tokens`, so I set this.

And I also set other defaults used in GPT4All client
[here](https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-bindings/python/gpt4all/gpt4all.py).

Confirm it now works:
```
$ from langchain.llms import GPT4All
$ model = GPT4All(model=local_path)
$ model("The capital of France is ", max_tokens=10)
< Model logging > 
"....Paris."
```

---------

Co-authored-by: R. Lance Martin <rlm@Rs-MacBook-Pro.local>
1 year ago
Stefano Lottini 6631fd5168
Align cassio versions between examples for Cassandra integration (#7099)
Just reducing confusion by requiring cassio>=0.0.7 consistently across
examples.
1 year ago
Ruixi Fan 0b69a7e9ab
[Document fix] Fix an expired link qa_benchmarking_pg.ipynb (#7110)
## Change description

- Description: Fix an expired link that points to the readthedocs site.
  - Dependencies: No
1 year ago
Lance Martin 9ca4c54428
Minor updates to notebook for MultiQueryRetriever (#7102)
* Add an easier-to-run example.
* Add logging per https://github.com/hwchase17/langchain/pull/6891.
* Updated params per https://github.com/hwchase17/langchain/pull/5962.

---------

Co-authored-by: R. Lance Martin <rlm@Rs-MacBook-Pro.local>
Co-authored-by: Lance Martin <lance@langchain.dev>
1 year ago
Nicolas 490fcf9d98
docs: New experimental UI for Mendable Search (#6558)
This PR introduces a new Mendable UI tailored to a better search
experience.

We're more closely integrating our traditional search with our AI
generation.
With this change, you won't have to tab back and forth between the
mendable bot and the keyword search. Both types of search are handled in
the same bar. This should make the docs easier to navigate. while still
letting users get code generations or AI-summarized answers if they so
wish. Also, it should reduce the cost.

Would love to hear your feedback :)

Cc: @dev2049 @hwchase17
1 year ago
genewoo e49abd1277
Add Metal support to llama.cpp doc (#7092)
- Description: Add Metal support to llama.cpp doc
  - Issue: #7091 
  - Dependencies: N/A
  - Twitter handle: gene_wu
1 year ago
rjarun8 e2d61ab85a
Add SpacyEmbeddings class (#6967)
- Description: Added a new SpacyEmbeddings class for generating
embeddings using the Spacy library.
- Issue: Sentencebert/Bert/Spacy/Doc2vec embedding support #6952
- Dependencies: This change requires the Spacy library and the
'en_core_web_sm' Spacy model.
- Tag maintainer: @dev2049
- Twitter handle: N/A

This change includes a new SpacyEmbeddings class, but does not include a
test or an example notebook.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Leonid Ganeline 16fbd528c5
docs: commented out `editUrl` option (#6440) 1 year ago
adam91holt 80e86b602e
Remove duplicate mongodb integration doc (#7006) 1 year ago
joaomsimoes c669d98693
Update get_started.mdx (#7005)
typo in chat = ChatOpenAI(open_api_key="...") should be openai_api_key
1 year ago
Johnny Lim a081e419a0
Fix sample in FAISS section (#7050)
This PR fixes a sample in the FAISS section in the reference docs.
1 year ago
Leonid Ganeline 200be43da6
added `Brave Search` document_loader (#6989)
- Added `Brave Search` document loader.
- Refactored BraveSearch wrapper
- Added a Jupyter Notebook example
- Added `Ecosystem/Integrations` BraveSearch page 

Please review:
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
1 year ago
Sergey Kozlov 6d15854cda
Add JSON Lines support to JSONLoader (#6913)
**Description**:

The JSON Lines format is used by some services such as OpenAI and
HuggingFace. It's also a convenient alternative to CSV.

This PR adds JSON Lines support to `JSONLoader` and also updates related
tests.

**Tag maintainer**: @rlancemartin, @eyurtsev.

PS I was not able to build docs locally so didn't update related
section.
1 year ago
Ofer Mendelevitch 153b56d19b
Vectara upd2 (#6506)
Update to Vectara integration 
- By user request added "add_files" to take advantage of Vectara
capabilities to process files on the backend, without the need for
separate loading of documents and chunking in the chain.
- Updated vectara.ipynb example notebook to be broader and added testing
of add_file()
 
  @hwchase17 - project lead

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
1 year ago
Leonid Ganeline 77ae8084a0
docstrings `document_loaders` 1 (#6847)
- Updated docstrings in `document_loaders`
- several code fixes.
- added `docs/extras/ecosystem/integrations/airtable.md`

@rlancemartin, @eyurtsev
1 year ago
Bagatur 7acd524210
Rm retriever kwargs (#7013)
Doesn't actually limit the Retriever interface but hopefully in practice
it does
1 year ago
Johnny Lim 9dc77614e3
Polish reference docs (#7045)
This PR fixes broken links in the reference docs.
1 year ago
Johnny Lim 052c797429
Fix typo (#7023)
This PR fixes a typo.
1 year ago
Stefano Lottini 8d2281a8ca
Second Attempt - Add concurrent insertion of vector rows in the Cassandra Vector Store (#7017)
Retrying with the same improvements as in #6772, this time trying not to
mess up with branches.

@rlancemartin doing a fresh new PR from a branch with a new name. This
should do. Thank you for your help!

---------

Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
Co-authored-by: rlm <pexpresss31@gmail.com>
1 year ago
Matt Robinson 0498dad562
feat: enable `UnstructuredEmailLoader` to process attachments (#6977)
### Summary

Updates `UnstructuredEmailLoader` so that it can process attachments in
addition to the e-mail content. The loader will process attachments if
the `process_attachments` kwarg is passed when the loader is
instantiated.

### Testing

```python

file_path = "fake-email-attachment.eml"
loader = UnstructuredEmailLoader(
    file_path, mode="elements", process_attachments=True
)
docs = loader.load()
docs[-1]
```

### Reviewers

-  @rlancemartin 
-  @eyurtsev
- @hwchase17
1 year ago
Matthew Foster Walsh 59697b406d
Fix typo in quickstart.mdx (#6985)
Removed an extra "to" from a sentence. @dev2049 very minor documentation
fix.
1 year ago
Paul Grillenberger aa37b10b28
Fix: Correct typo (#6988)
Description: Correct a minor typo in the docs. @dev2049
1 year ago
Zander Chase b0859c9b18
Add New Retriever Interface with Callbacks (#5962)
Handle the new retriever events in a way that (I think) is entirely
backwards compatible? Needs more testing for some of the chain changes
and all.

This creates an entire new run type, however. We could also just treat
this as an event within a chain run presumably (same with memory)

Adds a subclass initializer that upgrades old retriever implementations
to the new schema, along with tests to ensure they work.

First commit doesn't upgrade any of our retriever implementations (to
show that we can pass the tests along with additional ones testing the
upgrade logic).

Second commit upgrades the known universe of retrievers in langchain.

- [X] Add callback handling methods for retriever start/end/error (open
to renaming to 'retrieval' if you want that)
- [X] Update BaseRetriever schema to support callbacks
- [X] Tests for upgrading old "v1" retrievers for backwards
compatibility
- [X] Update existing retriever implementations to implement the new
interface
- [X] Update calls within chains to .{a]get_relevant_documents to pass
the child callback manager
- [X] Update the notebooks/docs to reflect the new interface
- [X] Test notebooks thoroughly


Not handled:
- Memory pass throughs: retrieval memory doesn't have a parent callback
manager passed through the method

---------

Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>
1 year ago
William FH a5b206caf3
Remove Promptlayer Notebook (#6996)
It's breaking our docs build
1 year ago
Daniel Chalef b26cca8008
Zep Authentication (#6728)
## Description: Add Zep API Key argument to ZepChatMessageHistory and
ZepRetriever
- correct docs site links
- add zep api_key auth to constructors

ZepChatMessageHistory: @hwchase17, 
ZepRetriever: @rlancemartin, @eyurtsev
1 year ago
William FH e4625846e5
Add Flyte Callback Handler (#6139) (#6986)
Signed-off-by: Samhita Alla <aallasamhita@gmail.com>
Co-authored-by: Samhita Alla <aallasamhita@gmail.com>
1 year ago
Davis Chase eb180e321f
Page per class-style api reference (#6560)
can make it prettier, but what do we think of overall structure?

https://api.python.langchain.com/en/dev2049-page_per_class/api_ref.html

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Nuno Campos <nuno@boringbits.io>
1 year ago
William FH 64039b9f11
Promptlayer Callback (#6975)
Co-authored-by: Saleh Hindi <saleh.hindi.one@gmail.com>
Co-authored-by: jped <jonathanped@gmail.com>
1 year ago
William FH 13c62cf6b1
Arthur Callback (#6972)
Co-authored-by: Max Cembalest <115359769+arthuractivemodeling@users.noreply.github.com>
1 year ago
William FH 8c73037dff
Simplify eval arg names (#6944)
It'll be easier to switch between these if the names of predictions are
consistent
1 year ago
Davis Chase bd6a0ee9e9
Redirect vecstores (#6948) 1 year ago
Davis Chase f780678910
Add back in clickhouse mongo vecstore notebooks (#6949) 1 year ago
Jacob Lee 73831ef3d8
Change code block color scheme (#6945)
Adds contrast, makes code blocks more readable.
1 year ago
Hashem Alsaket 5861770a53
Updated QA notebook (#6801)
Description: `all_metadatas` was not defined, `OpenAIEmbeddings` was not
imported,
Issue: #6723 the issue # it fixes (if applicable),
Dependencies: lark,
Tag maintainer: @vowelparrot , @dev2049

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
1 year ago
Kacper Łukawski 140ba682f1
Support named vectors in Qdrant (#6871)
# Description

This PR makes it possible to use named vectors from Qdrant in Langchain.
That was requested multiple times, as people want to reuse externally
created collections in Langchain. It doesn't change anything for the
existing applications. The changes were covered with some integration
tests and included in the docs.

## Example

```python
Qdrant.from_documents(
    docs,
    embeddings,
    location=":memory:",
    collection_name="my_documents",
    vector_name="custom_vector",
)
```

### Issue: #2594 

Tagging @rlancemartin & @eyurtsev. I'd appreciate your review.
1 year ago
corranmac 20c6ade2fc
Grobid parser for Scientific Articles from PDF (#6729)
### Scientific Article PDF Parsing via Grobid

`Description:`
This change adds the GrobidParser class, which uses the Grobid library
to parse scientific articles into a universal XML format containing the
article title, references, sections, section text etc. The GrobidParser
uses a local Grobid server to return PDFs document as XML and parses the
XML to optionally produce documents of individual sentences or of whole
paragraphs. Metadata includes the text, paragraph number, pdf relative
bboxes, pages (text may overlap over two pages), section title
(Introduction, Methodology etc), section_number (i.e 1.1, 2.3), the
title of the paper and finally the file path.
      
Grobid parsing is useful beyond standard pdf parsing as it accurately
outputs sections and paragraphs within them. This allows for
post-fitering of results for specific sections i.e. limiting results to
the methodology section or results. While sections are split via
headings, ideally they could be classified specifically into
introduction, methodology, results, discussion, conclusion. I'm
currently experimenting with chatgpt-3.5 for this function, which could
later be implemented as a textsplitter.

`Dependencies:`
For use, the grobid repo must be cloned and Java must be installed, for
colab this is:

```
!apt-get install -y openjdk-11-jdk -q
!update-alternatives --set java /usr/lib/jvm/java-11-openjdk-amd64/bin/java
!git clone https://github.com/kermitt2/grobid.git
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-11-openjdk-amd64"
os.chdir('grobid')
!./gradlew clean install
```

Once installed the server is ran on localhost:8070 via
```
get_ipython().system_raw('nohup ./gradlew run > grobid.log 2>&1 &')
```

@rlancemartin, @eyurtsev

Twitter Handle: @Corranmac

Grobid Demo Notebook is
[here](https://colab.research.google.com/drive/1X-St_mQRmmm8YWtct_tcJNtoktbdGBmd?usp=sharing).

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
1 year ago
Harrison Chase 0ba175e13f
move octo notebook (#6901) 1 year ago
Stefano Lottini 75fb9d2fdc
Cassandra support for chat history using CassIO library (#6771)
### Overview

This PR aims at building on #4378, expanding the capabilities and
building on top of the `cassIO` library to interface with the database
(as opposed to using the core drivers directly).

Usage of `cassIO` (a library abstracting Cassandra access for
ML/GenAI-specific purposes) is already established since #6426 was
merged, so no new dependencies are introduced.

In the same spirit, we try to uniform the interface for using Cassandra
instances throughout LangChain: all our appreciation of the work by
@jj701 notwithstanding, who paved the way for this incremental work
(thank you!), we identified a few reasons for changing the way a
`CassandraChatMessageHistory` is instantiated. Advocating a syntax
change is something we don't take lighthearted way, so we add some
explanations about this below.

Additionally, this PR expands on integration testing, enables use of
Cassandra's native Time-to-Live (TTL) features and improves the phrasing
around the notebook example and the short "integrations" documentation
paragraph.

We would kindly request @hwchase to review (since this is an elaboration
and proposed improvement of #4378 who had the same reviewer).

### About the __init__ breaking changes

There are
[many](https://docs.datastax.com/en/developer/python-driver/3.28/api/cassandra/cluster/)
options when creating the `Cluster` object, and new ones might be added
at any time. Choosing some of them and exposing them as `__init__`
parameters `CassandraChatMessageHistory` will prove to be insufficient
for at least some users.

On the other hand, working through `kwargs` or adding a long, long list
of arguments to `__init__` is not a desirable option either. For this
reason, (as done in #6426), we propose that whoever instantiates the
Chat Message History class provide a Cassandra `Session` object, ready
to use. This also enables easier injection of mocks and usage of
Cassandra-compatible connections (such as those to the cloud database
DataStax Astra DB, obtained with a different set of init parameters than
`contact_points` and `port`).

We feel that a breaking change might still be acceptable since LangChain
is at `0.*`. However, while maintaining that the approach we propose
will be more flexible in the future, room could be made for a
"compatibility layer" that respects the current init method. Honestly,
we would to that only if there are strong reasons for it, as that would
entail an additional maintenance burden.

### Other changes

We propose to remove the keyspace creation from the class code for two
reasons: first, production Cassandra instances often employ RBAC so that
the database user reading/writing from tables does not necessarily (and
generally shouldn't) have permission to create keyspaces, and second
that programmatic keyspace creation is not a best practice (it should be
done more or less manually, with extra care about schema mismatched
among nodes, etc). Removing this (usually unnecessary) operation from
the `__init__` path would also improve initialization performance
(shorter time).

We suggest, likewise, to remove the `__del__` method (which would close
the database connection), for the following reason: it is the
recommended best practice to create a single Cassandra `Session` object
throughout an application (it is a resource-heavy object capable to
handle concurrency internally), so in case Cassandra is used in other
ways by the app there is the risk of truncating the connection for all
usages when the history instance is destroyed. Moreover, the `Session`
object, in typical applications, is best left to garbage-collect itself
automatically.

As mentioned above, we defer the actual database I/O to the `cassIO`
library, which is designed to encode practices optimized for LLM
applications (among other) without the need to expose LangChain
developers to the internals of CQL (Cassandra Query Language). CassIO is
already employed by the LangChain's Vector Store support for Cassandra.

We added a few more connection options in the companion notebook example
(most notably, Astra DB) to encourage usage by anyone who cannot run
their own Cassandra cluster.

We surface the `ttl_seconds` option for automatic handling of an
expiration time to chat history messages, a likely useful feature given
that very old messages generally may lose their importance.

We elaborated a bit more on the integration testing (Time-to-live,
separation of "session ids", ...).

### Remarks from linter & co.

We reinstated `cassio` as a dependency both in the "optional" group and
in the "integration testing" group of `pyproject.toml`. This might not
be the right thing do to, in which case the author of this PR offer his
apologies (lack of confidence with Poetry - happy to be pointed in the
right direction, though!).

During linter tests, we were hit by some errors which appear unrelated
to the code in the PR. We left them here and report on them here for
awareness:

```
langchain/vectorstores/mongodb_atlas.py:137: error: Argument 1 to "insert_many" of "Collection" has incompatible type "List[Dict[str, Sequence[object]]]"; expected "Iterable[Union[MongoDBDocumentType, RawBSONDocument]]"  [arg-type]
langchain/vectorstores/mongodb_atlas.py:186: error: Argument 1 to "aggregate" of "Collection" has incompatible type "List[object]"; expected "Sequence[Mapping[str, Any]]"  [arg-type]

langchain/vectorstores/qdrant.py:16: error: Name "grpc" is not defined  [name-defined]
langchain/vectorstores/qdrant.py:19: error: Name "grpc" is not defined  [name-defined]
langchain/vectorstores/qdrant.py:20: error: Name "grpc" is not defined  [name-defined]
langchain/vectorstores/qdrant.py:22: error: Name "grpc" is not defined  [name-defined]
langchain/vectorstores/qdrant.py:23: error: Name "grpc" is not defined  [name-defined]
```

In the same spirit, we observe that to even get `import langchain` run,
it seems that a `pip install bs4` is missing from the minimal package
installation path.

Thank you!
1 year ago
Harrison Chase 3ac08c3de4
Harrison/octo ml (#6897)
Co-authored-by: Bassem Yacoube <125713079+AI-Bassem@users.noreply.github.com>
Co-authored-by: Shotaro Kohama <khmshtr28@gmail.com>
Co-authored-by: Rian Dolphin <34861538+rian-dolphin@users.noreply.github.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
Co-authored-by: Shashank Deshpande <shashankdeshpande18@gmail.com>
1 year ago
Shashank Deshpande 99cfe192da
added example notebook - use custom functions with openai agent (#6865)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
1 year ago
Robert Lewis c9c8d2599e
Update Zapier Jupyter notebook to include brief OAuth example (#6892)
Description: Adds a brief example of using an OAuth access token with
the Zapier wrapper. Also links to the Zapier documentation to learn more
about OAuth flows.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Davis Chase f07dd02b50
Docs /redirects (#6790)
Auto-generated a bunch of redirects from initial docs refactor commit
1 year ago
Yaohui Wang 9d1bd18596
feat (documents): add LarkSuite document loader (#6420)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

### Summary

This PR adds a LarkSuite (FeiShu) document loader. 
> [LarkSuite](https://www.larksuite.com/) is an enterprise collaboration
platform developed by ByteDance.

### Tests

- an integration test case is added
- an example notebook showing usage is added. [Notebook
preview](https://github.com/yaohui-wyh/langchain/blob/master/docs/extras/modules/data_connection/document_loaders/integrations/larksuite.ipynb)

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

### Who can review?

- PTAL @eyurtsev @hwchase17

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: Yaohui Wang <wangyaohui.01@bytedance.com>
1 year ago