Commit Graph

1162 Commits (5d5298087f5e58a0ed8cc9a26ad72f385af163c6)

Author SHA1 Message Date
lonestriker 6f36f0f930
Add oobabooga/text-generation-webui support as a llm (#5997)
Add oobabooga/text-generation-webui support as an LLM. Currently,
supports using text-generation-webui's non-streaming API interface.
Allows users who already have text-gen running to use the same models
with langchain.

#### Before submitting

Simple usage, similar to existing LLM supported:

```
from langchain.llms import TextGen
llm = TextGen(model_url = "http://localhost:5000")
```
#### Who can review?

 @hwchase17 - project lead

---------

Co-authored-by: Hien Ngo <Hien.Ngo@adia.ae>
1 year ago
Saba Sturua 427551eabf
DocArray as a Retriever (#6031)
## DocArray as a Retriever

[DocArray](https://github.com/docarray/docarray) is an open-source tool
for managing your multi-modal data. It offers flexibility to store and
search through your data using various document index backends. This PR
introduces `DocArrayRetriever` - which works with any available backend
and serves as a retriever for Langchain apps.

Also, I added 2 notebooks:
DocArray Backends - intro to all 5 currently supported backends, how to
initialize, index, and use them as a retriever
DocArray Usage - showcasing what additional search parameters you can
pass to create versatile retrievers

Example:
```python
from docarray.index import InMemoryExactNNIndex
from docarray import BaseDoc, DocList
from docarray.typing import NdArray
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.retrievers import DocArrayRetriever


# define document schema
class MyDoc(BaseDoc):
    description: str
    description_embedding: NdArray[1536]


embeddings = OpenAIEmbeddings()
# create documents
descriptions = ["description 1", "description 2"]
desc_embeddings = embeddings.embed_documents(texts=descriptions)
docs = DocList[MyDoc](
    [
        MyDoc(description=desc, description_embedding=embedding)
        for desc, embedding in zip(descriptions, desc_embeddings)
    ]
)

# initialize document index with data
db = InMemoryExactNNIndex[MyDoc](docs)

# create a retriever
retriever = DocArrayRetriever(
    index=db,
    embeddings=embeddings,
    search_field="description_embedding",
    content_field="description",
)

# find the relevant document
doc = retriever.get_relevant_documents("action movies")
print(doc)
```

#### Who can review?

@dev2049

---------

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
1 year ago
Masafumi Mori 7bb437146d
fix links to prompt templates and example selectors (#6332)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # 
links to prompt templates and example selectors on the
[Prompts](https://python.langchain.com/docs/modules/model_io/prompts/)
page are invalid.

#### Before submitting
Just a small note that I tried to run `make docs_clean` and other
related commands before PR written
[here](https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md#build-documentation-locally),
it gives me an error:
```bash
langchain % make docs_clean
Traceback (most recent call last):
  File "/Users/masafumi/Downloads/langchain/.venv/bin/make", line 5, in <module>
    from scripts.proto import main
ModuleNotFoundError: No module named 'scripts'
make: *** [docs_clean] Error 1
# Poetry (version 1.5.1)
# Python 3.9.13
```
I couldn't figure out how to fix this, so I didn't run those command.
But links should work.

#### Who can review?

Tag maintainers/contributors who might be interested:
@hwchase17

Similar issue #6323

Co-authored-by: masafumimori <m.masafumimori@outlook.com>
1 year ago
Francisco Ingham 83eea230f3
changed height in the nb example (#6327)
changed height in the example to a more reasonable number (from 9 feet
to 6 feet)
1 year ago
Harrison Chase af18413d97
Harrison/deeplake new features (#6263)
Co-authored-by: adilkhan <adilkhan.sarsen@nu.edu.kz>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Davis Chase 6640293087
fix eval guide links (#6319) 1 year ago
ljeagle ad324a39ae
Improve the performance of add_texts interface and upgrade the AwaDB from 0.3.2 to 0.3.3 (#6316)
1. Changed the implementation of add_texts interface for the AwaDB
vector store in order to improve the performance
2. Upgrade the AwaDB from 0.3.2 to 0.3.3

---------

Co-authored-by: vincent <awadb.vincent@gmail.com>
1 year ago
Davis Chase 24b2af5218
nit (#6305) 1 year ago
Davis Chase 03b5891cf7
more redirect (#6314) 1 year ago
Davis Chase eaee492dbc
basic redirect (#6309) 1 year ago
Davis Chase 2f47e5c766
update api link (#6303) 1 year ago
Davis Chase d558bcfad8
rm ignore_vercel (#6302) 1 year ago
Davis Chase 87e502c6bc
Doc refactor (#6300)
Co-authored-by: jacoblee93 <jacoblee93@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Harrison Chase 6aafb46807
Harrison/openai functions (#6223)
Co-authored-by: Francisco Ingham <24279597+fpingham@users.noreply.github.com>
1 year ago
Alon Roth 0013256e81
Support chat history persistence in AutoGPT (#5716)
**Short Description**
Added a new argument to AutoGPT class which allows to persist the chat
history to a file.

**Changes**
1. Removed the `self.full_message_history: List[BaseMessage] = []`
2. Replaced it with `chat_history_memory` which can take any subclasses
of `BaseChatMessageHistory`

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Martin Antos 1913320cbe
Feature/add acreom loader (#5780)
adding new loader for [acreom](https://acreom.com) vaults. It's based on
the Obsidian loader with some additional text processing for acreom
specific markdown elements.

 @eyurtsev please take a look!

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
1 year ago
Harrison Chase e82687ddf4
Harrison/use functions agent (#6185)
Co-authored-by: Francisco Ingham <24279597+fpingham@users.noreply.github.com>
1 year ago
Ryo Kanazawa 7d2b946d0b
Fix typo `pandocs` to `pandoc` (#6203)
Fixes https://github.com/hwchase17/langchain/issues/6204

### Context

An typo issue with `pandoc`.

#### Who can review?
@hwchase17
1 year ago
0xJordan c5a46e7435
feat: Add support for the Solidity language (#6054)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

## Add Solidity programming language support for code splitter.

Twitter: @0xjord4n_

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
#### Who can review?

Tag maintainers/contributors who might be interested:
@hwchase17

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
1 year ago
Nuno Campos 17c4ec4812
Add docs for tags (#6155)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
1 year ago
thiswillbeyourgithub 4a649e3b14
typo: 'following following' to 'following' (#6163)
Co-authored-by: thiswillbeyourgithub <github@32mail.33mail.com>
1 year ago
Maciej Bryński 8a44c879c6
Update readthedocs_documentation.ipynb (#6148)
Minor fix in documentation. 
Change URL in wget call to proper one.
1 year ago
Harrison Chase 6ac120f299
bump ver to 200 (#6130) 1 year ago
Harrison Chase e41f0b341c
add functions agent (#6113) 1 year ago
Harrison Chase 1281fdf0f2
Harrison/notebook functions (#6103) 1 year ago
Wenchen Li f9edf76e7c
Implement `max_marginal_relevance_search` in `VectorStore` of Pinecone (#6056)
This adds implementation of MMR search in pinecone; and I have two
semi-related observations about this vector store class:
- Maybe we should also have a
`similarity_search_by_vector_returning_embeddings` like in supabase, but
it's not in the base `VectorStore` class so I didn't implement
- Talking about the base class, there's
`similarity_search_with_relevance_scores`, but in pinecone it is called
`similarity_search_with_score`; maybe we should consider renaming it to
align with other `VectorStore` base and sub classes (or add that as an
alias for backward compatibility)

#### Who can review?

Tag maintainers/contributors who might be interested:
 - VectorStores / Retrievers / Memory - @dev2049
1 year ago
Lance Martin ee3d0513ad
Add tests and update notebook for MarkdownHeaderTextSplitter (#6069)
Add test and update notebook for `MarkdownHeaderTextSplitter`.
1 year ago
Julius Lipp 5b6bbf4ab2
Add embaas document extraction api endpoints (#6048)
# Introduces embaas document extraction api endpoints

In this PR, we add support for embaas document extraction endpoints to
Text Embedding Models (with LLMs, in different PRs coming). We currently
offer the MTEB leaderboard top performers, will continue to add top
embedding models and soon add support for customers to deploy thier own
models. Additional Documentation + Infomation can be found
[here](https://embaas.io).

While developing this integration, I closely followed the patterns
established by other langchain integrations. Nonetheless, if there are
any aspects that require adjustments or if there's a better way to
present a new integration, let me know! :)

Additionally, I fixed some docs in the embeddings integration.

Related PR: #5976 

#### Who can review?
  DataLoaders
  - @eyurtsev
1 year ago
Lance Martin b023f0c0f2
Text splitter for Markdown files by header (#5860)
This creates a new kind of text splitter for markdown files.

The user can supply a set of headers that they want to split the file
on.

We define a new text splitter class, `MarkdownHeaderTextSplitter`, that
does a few things:

(1) For each line, it determines the associated set of user-specified
headers
(2) It groups lines with common headers into splits

See notebook for example usage and test cases.
1 year ago
Harrison Chase 5922742d56 comment out 1 year ago
Harrison Chase 681ba6d520 embaas title 1 year ago
Ben Flast 7a5e36f3f5
Mongo db doc fix (#6042)
I missed a few errors in my initial fix @hwchase1.  Thanks!
1 year ago
Harrison Chase d1561b74eb
Harrison/cognitive search (#6011)
Co-authored-by: Fabrizio Ruocco <ruoccofabrizio@gmail.com>
1 year ago
wenmeng zhou bb7ac9edb5
add dashscope text embedding (#5929)
#### What I do
Adding embedding api for
[DashScope](https://help.aliyun.com/product/610100.html), which is the
DAMO Academy's multilingual text unified vector model based on the LLM
base. It caters to multiple mainstream languages worldwide and offers
high-quality vector services, helping developers quickly transform text
data into high-quality vector data. Currently supported languages
include Chinese, English, Spanish, French, Portuguese, Indonesian, and
more.

#### Who can review?

  Models
  - @hwchase17
  - @agola11

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Ben Flast 010d0bfeea
Update MongoDB Atlas support docs (#6022)
Updating MongoDB Atlas support docs @hwchase17 let me know if you have
any questions
1 year ago
Harrison Chase e05997c25e
Harrison/hologres (#6012)
Co-authored-by: Changgeng Zhao <changgeng@nyu.edu>
Co-authored-by: Changgeng Zhao <zhaochanggeng.zcg@alibaba-inc.com>
1 year ago
ju-bezdek 18f5c985d9
Langchain decorators (#6017)
Added description of LangChain Decorators  into the integration section

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->


#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

@hwchase17 

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
1 year ago
Harrison Chase a7227ee01b
Harrison/embaas (#6010)
Co-authored-by: Julius Lipp <43986145+juliuslipp@users.noreply.github.com>
1 year ago
Akhil Vempali d7d629911b
feat: Added filtering option to FAISS vectorstore (#5966)
Inspired by the filtering capability available in ChromaDB, added the
same functionality to the FAISS vectorestore as well. Since FAISS does
not have an inbuilt method of filtering used the approach suggested in
this [thread](https://github.com/facebookresearch/faiss/issues/1079)
Langchain Issue inspiration:
https://github.com/hwchase17/langchain/issues/4572

- [x] Added filtering capability to semantic similarly and MMR
- [x] Added test cases for filtering in
`tests/integration_tests/vectorstores/test_faiss.py`

#### Who can review?

Tag maintainers/contributors who might be interested:

  VectorStores / Retrievers / Memory
  - @dev2049
  - @hwchase17
1 year ago
Ikko Eltociear Ashimine c868a3eef3
Update databricks.md (#6006)
HuggingFace -> Hugging Face


#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?
1 year ago
Satheesh Valluru d2270a2261
Fix: Grammer fix in documentation (#5925)
Fix for grammatical errors in the documentation of `vectorstore`.  
@vowelparrot
1 year ago
Ofer Mendelevitch f8cf09a230
Update to Vectara integration (#5950)
This PR updates the Vectara integration (@hwchase17 ):
* Adds reuse of requests.session to imrpove efficiency and speed.
* Utilizes Vectara's low-level API (instead of standard API) to better
match user's specific chunking with LangChain
* Now add_texts puts all the texts into a single Vectara document so
indexing is much faster.
* updated variables names from alpha to lambda_val (to be consistent
with Vectara docs) and added n_context_sentence so it's available to use
if needed.
* Updates to documentation and tests

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
qued e4224a396b
feat: Add `UnstructuredXMLLoader` for `.xml` files (#5955)
# Unstructured XML Loader
Adds an `UnstructuredXMLLoader` class for .xml files. Works with
unstructured>=0.6.7. A plain text representation of the text with the
XML tags will be available under the `page_content` attribute in the
doc.

### Testing
```python
from langchain.document_loaders import UnstructuredXMLLoader

loader = UnstructuredXMLLoader(
    "example_data/factbook.xml",
)
docs = loader.load()
```


## Who can review?

@hwchase17 
@eyurtsev
1 year ago
Lance Martin 21bd16bb59
Create Airtable loader (#5958)
Create document loader for Airtable
1 year ago
Harrison Chase 9218684759
Add a new vector store - AwaDB (#5971) (#5992)
Added AwaDB vector store, which is a wrapper over the AwaDB, that can be
used as a vector storage and has an efficient similarity search. Added
integration tests for the vector store
Added jupyter notebook with the example

Delete a unneeded empty file and resolve the
conflict(https://github.com/hwchase17/langchain/pull/5886)

Please check, Thanks!

@dev2049
@hwchase17

---------

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: ljeagle <vincent_jieli@yeah.net>
Co-authored-by: vincent <awadb.vincent@gmail.com>
1 year ago
Tomaz Bratanic d5819a7ca7
Add additional parameters to Graph Cypher Chain (#5979)
Based on the inspiration from the SQL chain, the following three
parameters are added to Graph Cypher Chain.

- top_k: Limited the number of results from the database to be used as
context
- return_direct: Return database results without transforming them to
natural language
- return_intermediate_steps: Return intermediate steps
1 year ago
constDave 5f356b9993
Fixed typo missing "use" (#5991)
<!--
Fixed a simple typo on
https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/vectorstore.html
where the word "use" was missing.

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
1 year ago
German Martin 736a1819aa
LOTR: Lord of the Retrievers. A retriever that merge several retrievers together applying document_formatters to them. (#5798)
"One Retriever to merge them all, One Retriever to expose them, One
Retriever to bring them all and in and process them with Document
formatters."

Hi @dev2049! Here bothering people again!

I'm using this simple idea to deal with merging the output of several
retrievers into one.
I'm aware of DocumentCompressorPipeline and
ContextualCompressionRetriever but I don't think they allow us to do
something like this. Also I was getting in trouble to get the pipeline
working too. Please correct me if i'm wrong.

This allow to do some sort of "retrieval" preprocessing and then using
the retrieval with the curated results anywhere you could use a
retriever.
My use case is to generate diff indexes with diff embeddings and sources
for a more colorful results then filtering them with one or many
document formatters.

I saw some people looking for something like this, here:
https://github.com/hwchase17/langchain/issues/3991
and something similar here:
https://github.com/hwchase17/langchain/issues/5555

This is just a proposal I know I'm missing tests , etc. If you think
this is a worth it idea I can work on tests and anything you want to
change.
Let me know!

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Harrison Chase 7af186fddf
fixes to docs (#5919) 1 year ago
Rubén Martínez db7ef635c0
Add support for the endpoint URL in DynamoDBChatMesasgeHistory (#5836)
This PR adds the possibility of specifying the endpoint URL to AWS in
the DynamoDBChatMessageHistory, so that it is possible to target not
only the AWS cloud services, but also a local installation.

Specifying the endpoint URL, which is normally not done when addressing
the cloud services, is very helpful when targeting a local instance
(like [Localstack](https://localstack.cloud/)) when running local tests.

Fixes #5835

#### Who can review?

Tag maintainers/contributors who might be interested: @dev2049

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago