Commit Graph

3040 Commits

Author SHA1 Message Date
eLafo
db8b13df4c
adds doc_content_chars_max argument to WikipediaLoader (#6645)
# Description
It adds a new initialization param in `WikipediaLoader` so we can
override the `doc_content_chars_max` param used in `WikipediaAPIWrapper`
under the hood, e.g:

```python
from langchain.document_loaders import WikipediaLoader

# doc_content_chars_max is the new init param
loader = WikipediaLoader(query="python", doc_content_chars_max=90000)
```

## Decisions
`doc_content_chars_max` default value will be 4000, because it's the
current value
I have added pycode comments

# Issue
#6639

# Dependencies
None


# Twitter handle
[@elafo](https://twitter.com/elafo)
2023-06-23 15:22:09 -07:00
Davis Chase
5e5b30b74f
openapi -> openai nit (#6667) 2023-06-23 15:09:02 -07:00
Jeff Huber
2acf109c4b
update chroma notebook (#6664)
@rlancemartin I updated the notebook for Chroma to hopefully be a lot
easier for users.
2023-06-23 15:03:06 -07:00
Eduard van Valkenburg
48381f1f78
PowerBI: catch outdated token (#6634)
This adds just a small tweak to catch the error that says the token is
expired rather then retrying.
2023-06-23 15:01:08 -07:00
Piyush Jain
b1de927f1b
Kendra retriever api (#6616)
## Description
Replaces [Kendra
Retriever](https://github.com/hwchase17/langchain/blob/master/langchain/retrievers/aws_kendra_index_retriever.py)
with an updated version that uses the new [retriever
API](https://docs.aws.amazon.com/kendra/latest/dg/searching-retrieve.html)
which is better suited for retrieval augmented generation (RAG) systems.

**Note**: This change requires the latest version (1.26.159) of boto3 to
work. `pip install -U boto3` to upgrade the boto3 version.

cc @hupe1980
cc @dev2049
2023-06-23 14:59:35 -07:00
ChrisLovejoy
4e5d78579b
fix minor typo in vector_db_qa.mdx (#6604)
- Description: minor typo fixed - doesn't instead of does. No other
changes.
2023-06-23 14:57:37 -07:00
Ikko Eltociear Ashimine
73da193a4b
Fix typo in myscale_self_query.ipynb (#6601) 2023-06-23 14:57:12 -07:00
Saarthak Maini
ba256b23f2
Fix Typo (#6595)
Resolves #6582
2023-06-23 14:56:54 -07:00
kourosh hakhamaneshi
f6fdabd20b
Fix ray-project/Aviary integration (#6607)
- Description: The aviary integration has changed url link. This PR
provide fix for those changes and also it makes providing the input URL
optional to the API (since they can be set via env variables).
  - Issue: N/A
  - Dependencies: N/A
  - Twitter handle: N/A

---------

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2023-06-23 14:49:53 -07:00
northern-64bit
dbe1d029ec
Fix grammar mistake in base.py in planners (#6611)
Fix a typo in
`langchain/experimental/plan_and_execute/planners/base.py`, by changing
"Given input, decided what to do." to "Given input, decide what to do."

This is in the docstring for functions running LLM chains which shall
create a plan, "decided" does not make any sense in this context.
2023-06-23 14:47:10 -07:00
Aaron Pham
082976d8d0
fix(docs): broken link for OpenLLM (#6622)
This link for the notebook of OpenLLM is not migrated to the new format

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change,
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-23 13:59:17 -07:00
Davis Chase
fe828185ed
Dev2049/bump 212 (#6665) 2023-06-23 13:48:02 -07:00
Hassan Ouda
9e52134d30
ChatVertexAI broken - Fix error with sending context in params (#6652)
vertex Ai chat is broken right now. That is because context is in params
and chat.send_message doesn't accept that as a params.

- Closes issue [ChatVertexAI Error: _ChatSessionBase.send_message() got
an unexpected keyword argument 'context'
#6610](https://github.com/hwchase17/langchain/issues/6610)
2023-06-23 13:38:21 -07:00
Lance Martin
c2b25c17c5
Recursive URL loader (#6455)
We may want to process load all URLs under a root directory.

For example, let's look at the [LangChain JS
documentation](https://js.langchain.com/docs/).

This has many interesting child pages that we may want to read in bulk.

Of course, the `WebBaseLoader` can load a list of pages. 

But, the challenge is traversing the tree of child pages and actually
assembling that list!
 
We do this using the `RecusiveUrlLoader`.

This also gives us the flexibility to exclude some children (e.g., the
`api` directory with > 800 child pages).
2023-06-23 13:09:00 -07:00
Lance Martin
be02572d58
Add delete and ensure add_texts performs upsert (w/ ID optional) (#6126)
## Goal 

We want to ensure consistency across vectordbs:
1/ add `delete` by ID method to the base vectorstore class
2/ ensure `add_texts` performs `upsert` with ID optionally passed

## Testing
- [x] Pinecone: notebook test w/ `langchain_test` vectorstore.
- [x] Chroma: Review by @jeffchuber, notebook test w/ in memory
vectorstore.
- [x] Supabase: Review by @copple, notebook test w/ `langchain_test`
table.
- [x] Weaviate: Notebook test w/ `langchain_test` index. 
- [x] Elastic: Revied by @vestal. Notebook test w/ `langchain_test`
table.
- [ ] Redis: Asked for review from owner of recent `delete` method
https://github.com/hwchase17/langchain/pull/6222
2023-06-23 13:03:10 -07:00
Lance Martin
393f469eb3
Create merge loader that combines documents from a set of loaders (#6659)
Simple utility loader that combines documents from a set of specified
loaders.
2023-06-23 13:02:48 -07:00
Davis Chase
6988039975
openapi_openai docstring (#6661) 2023-06-23 11:38:33 -07:00
Davis Chase
b25933b607
bump 211 (#6660) 2023-06-23 11:10:48 -07:00
Davis Chase
e013459b18
Openapi to openai (#6658) 2023-06-23 11:00:34 -07:00
Davis Chase
b062a3f938
bump 210 (#6656) 2023-06-23 09:37:58 -07:00
Alejandra De Luna
980c865174
fix: remove callbacks arg from Tool and StructuredTool inferred schema (#6483)
Fixes #5456 

This PR removes the `callbacks` argument from a tool's schema when
creating a `Tool` or `StructuredTool` with the `from_function` method
and `infer_schema` is set to `True`. The `callbacks` argument is now
removed in the `create_schema_from_function` and `_get_filtered_args`
methods. As suggested by @vowelparrot, this fix provides a
straightforward solution that minimally affects the existing
implementation.

A test was added to verify that this change enables the expected use of
`Tool` and `StructuredTool` when using a `CallbackManager` and inferring
the tool's schema.

  - @hwchase17
2023-06-23 01:48:27 -07:00
Zander Chase
b4fe7f3a09
Session to project (#6249)
Sessions are being renamed to projects in the tracer
2023-06-23 01:11:01 -07:00
Zander Chase
9c09861946
Add tags in agent initialization (#6559)
Add better docstrings for agent executor as well

Inspo: https://github.com/hwchase17/langchainjs/pull/1722

![image](https://github.com/hwchase17/langchain/assets/130414180/d11662bc-0c0e-4166-9ff3-354d41a9144a)
2023-06-22 22:35:00 -07:00
Lance Martin
6e69bfbb28
Loader for OpenCityData and minor cleanups to Pandas, Airtable loaders (#6301)
Many cities have open data portals for events like crime, traffic, etc.

Socrata provides an API for many, including SF (e.g., see
[here](https://dev.socrata.com/foundry/data.sfgov.org/tmnf-yvry)).

This is a new data loader for city data that uses Socrata API.
2023-06-22 22:20:42 -07:00
Christoph Kahl
9d42621fa4
added redis method to delete entries by keys (#6222)
In addition to my last pr (return keys of added entries), we also need a
method to delete the entries by keys.

@dev2049
2023-06-22 13:26:47 -07:00
Tim Conkling
c28990d871
StreamlitCallbackHandler (#6315)
A new implementation of `StreamlitCallbackHandler`. It formats Agent
thoughts into Streamlit expanders.

You can see the handler in action here:
https://langchain-mrkl.streamlit.app/

Per a discussion with Harrison, we'll be adding a
`StreamlitCallbackHandler` implementation to an upcoming
[Streamlit](https://github.com/streamlit/streamlit) release as well, and
will be updating it as we add new LLM- and LangChain-specific features
to Streamlit.

The idea with this PR is that the LangChain `StreamlitCallbackHandler`
will "auto-update" in a way that keeps it forward- (and backward-)
compatible with Streamlit. If the user has an older Streamlit version
installed, the LangChain `StreamlitCallbackHandler` will be used; if
they have a newer Streamlit version that has an updated
`StreamlitCallbackHandler`, that implementation will be used instead.

(I'm opening this as a draft to get the conversation going and make sure
we're on the same page. We're really excited to land this into
LangChain!)

#### Who can review?

@agola11, @hwchase17
2023-06-22 13:14:28 -07:00
Nuno Campos
74ac6fb6b9
Allow callback handlers to opt into being run inline (#6424)
This is useful eg for callback handlers that use context vars (like open
telemetry)

See https://github.com/hwchase17/langchain/pull/6095
2023-06-22 11:36:19 -07:00
Harrison Chase
a9108c1809
add mongo (HOLD) (#6437)
do not merge in
2023-06-22 11:08:12 -07:00
Lance Martin
30f7288082
MD header text splitter returns Documents (#6571)
Return `Documents` from MD header text splitter to simplify UX.

Updates the test as well as example notebooks.
2023-06-22 09:25:38 -07:00
Rogério Chaves
3436da65a4
Fix callback forwarding in async plan method for OpenAI function agent (#6584)
The callback argument was missing, preventing me to get callbacks to
work properly when using it async
2023-06-22 08:18:31 -07:00
Davis Chase
b909bc8b58
bump 209 (#6593) 2023-06-22 08:18:19 -07:00
minhajul-clarifai
6e57306a13
Clarifai integration (#5954)
# Changes
This PR adds [Clarifai](https://www.clarifai.com/) integration to
Langchain. Clarifai is an end-to-end AI Platform. Clarifai offers user
the ability to use many types of LLM (OpenAI, cohere, ect and other open
source models). As well, a clarifai app can be treated as a vector
database to upload and retrieve data. The integrations includes:
- Clarifai LLM integration: Clarifai supports many types of language
model that users can utilize for their application
- Clarifai VectorDB: A Clarifai application can hold data and
embeddings. You can run semantic search with the embeddings

#### Before submitting
- [x] Added integration test for LLM 
- [x] Added integration test for VectorDB 
- [x] Added notebook for LLM 
- [x] Added notebook for VectorDB 

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-22 08:00:15 -07:00
Jeroen Van Goey
7f6f5c2a6a
Add missing word in comment (#6587)
Changed

```
# Do this so we can exactly what's going on under the hood
```
to
```
# Do this so we can see exactly what's going on under the hood
```
2023-06-22 07:54:28 -07:00
Davis Chase
d50de2728f
Add AzureML endpoint LLM wrapper (#6580)
### Description

We have added a new LLM integration `azureml_endpoint` that allows users
to leverage models from the AzureML platform. Microsoft recently
announced the release of [Azure Foundation

Models](https://learn.microsoft.com/en-us/azure/machine-learning/concept-foundation-models?view=azureml-api-2)
which users can find in the AzureML Model Catalog. The Model Catalog
contains a variety of open source and Hugging Face models that users can
deploy on AzureML. The `azureml_endpoint` allows LangChain users to use
the deployed Azure Foundation Models.

### Dependencies

No added dependencies were required for the change.

### Tests

Integration tests were added in
`tests/integration_tests/llms/test_azureml_endpoint.py`.

### Notebook

A Jupyter notebook demonstrating how to use `azureml_endpoint` was added
to `docs/modules/llms/integrations/azureml_endpoint_example.ipynb`.

### Twitters

[Prakhar Gupta](https://twitter.com/prakhar_in)
[Matthew DeGuzman](https://twitter.com/matthew_d13)

---------

Co-authored-by: Matthew DeGuzman <91019033+matthewdeguzman@users.noreply.github.com>
Co-authored-by: prakharg-msft <75808410+prakharg-msft@users.noreply.github.com>
2023-06-22 01:46:01 -07:00
Davis Chase
4fabd02d25
Add OpenLLM wrapper(#6578)
LLM wrapper for models served with OpenLLM

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Chaoyu <paranoyang@gmail.com>
2023-06-22 01:18:14 -07:00
Brendan Graham
d718f3b6d0
feat: interfaces for async embeddings, implement async openai (#6563)
Since it seems like #6111 will be blocked for a bit, I've forked
@tyree731's fork and implemented the requested changes.

This change adds support to the base Embeddings class for two methods,
aembed_query and aembed_documents, those two methods supporting async
equivalents of embed_query and
embed_documents respectively. This ever so slightly rounds out async
support within langchain, with an initial implementation of this
functionality being implemented for openai.

Implements https://github.com/hwchase17/langchain/issues/6109

---------

Co-authored-by: Stephen Tyree <tyree731@gmail.com>
2023-06-21 23:16:33 -07:00
ljeagle
ca24dc2d5f
Upgrade the version of AwaDB and add some new interfaces (#6565)
1. upgrade the version of AwaDB
2. add some new interfaces
3. fix bug of packing page content error

@dev2049  please review, thanks!

---------

Co-authored-by: vincent <awadb.vincent@gmail.com>
2023-06-21 23:15:18 -07:00
Harrison Chase
937a7e93f2
add motherduck docs (#6572) 2023-06-21 23:13:45 -07:00
Muhammad Vaid
ae81b96b60
Detailed using the Twilio tool to send messages with 3rd party apps incl. WhatsApp (#6562)
Everything needed to support sending messages over WhatsApp Business
Platform (GA), Facebook Messenger (Public Beta) and Google Business
Messages (Private Beta) was present. Just added some details on
leveraging it.
2023-06-21 19:26:50 -07:00
Kenzie Mihardja
b8d78424ab
Change Data Loader Namespace (#6568)
Description:
Update the artifact name of the xml file and the namespaces. Co-authored
with @tjaffri
Co-authored-by: Kenzie Mihardja <kenzie@docugami.com>
2023-06-21 19:24:04 -07:00
Gengliang Wang
0673245d0c
Remove duplicate databricks entries in ecosystem integrations (#6569)
Currently, there are two Databricks entries in
https://python.langchain.com/docs/ecosystem/integrations/
<img width="277" alt="image"
src="https://github.com/hwchase17/langchain/assets/1097932/86ab4ad2-6bce-4459-9d56-1ab2fbb69f6d">

The reason is that there are duplicated notebooks for Databricks
integration:
*
https://github.com/hwchase17/langchain/blob/master/docs/extras/ecosystem/integrations/databricks.ipynb
*
https://github.com/hwchase17/langchain/blob/master/docs/extras/ecosystem/integrations/databricks/databricks.ipynb

This PR is to remove the second one for simplicity.
2023-06-21 19:14:33 -07:00
Suri Chen
14b9418cc5
Fix whatsappchatloader - enable parsing new datetime format on WhatsApp chat (#6555)
- Description: observed new format on WhatsApp exported chat - example:
`[2023/5/4, 16:17:13] ~ Carolina: 🥺`
  - Dependencies: no additional dependencies required
  - Tag maintainer: @rlancemartin, @eyurtsev
2023-06-21 19:11:49 -07:00
Zander Chase
5322bac5fc
Wait for all futures (#6554)
- Expose method to wait for all futures
- Wait for submissions in the run_on_dataset functions to ensure runs
are fully submitted before cleaning up
2023-06-21 18:20:17 -07:00
HenriZuber
e0605b464b
feat: faiss filter from list (#6537)
### Feature

Using FAISS on a retrievalQA task, I found myself wanting to allow in
multiple sources. From what I understood, the filter feature takes in a
dict of form {key: value} which then will check in the metadata for the
exact value linked to that key.
I added some logic to be able to pass a list which will be checked
against instead of an exact value. Passing an exact value will also
work.

Here's an example of how I could then use it in my own project:

```
    pdfs_to_filter_in = ["file_A", "file_B"]
    filter_dict = {
        "source": [f"source_pdfs/{pdf_name}.pdf" for pdf_name in pdfs_to_filter_in]
    }
    retriever = db.as_retriever()
    retriever.search_kwargs = {"filter": filter_dict}
```

I added an integration test based on the other ones I found in
`tests/integration_tests/vectorstores/test_faiss.py` under
`test_faiss_with_metadatas_and_list_filter()`.

It doesn't feel like this is worthy of its own notebook or doc, but I'm
open to suggestions if needed.

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-21 10:49:01 -07:00
Davis Chase
00a7403236
update pr tmpl (#6552) 2023-06-21 10:03:52 -07:00
Jeroen Van Goey
57b5f42847
Remove unintended double negation in docstring (#6541)
Small typo fix.

`ImportError: If importing vertexai SDK didn't not succeed.` ->
`ImportError: If importing vertexai SDK did not succeed.`.
2023-06-21 10:01:28 -07:00
Andrey E. Vedishchev
a2a0715bd4
Minor Grammar Fixes in Docs and Comments (#6536)
Just some grammar fixes: I found "retriver" instead of "retriever" in
several comments across the documentation and in the comments. I fixed
it.


Co-authored-by: andrey.vedishchev <andrey.vedishchev@rgigroup.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-21 09:53:31 -07:00
dirtysalt
57cc3d1d3d
[Feature][VectorStore] Support StarRocks as vector db (#6119)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

Here are some examples to use StarRocks as vectordb

```
from langchain.vectorstores import StarRocks
from langchain.vectorstores.starrocks import StarRocksSettings

embeddings = OpenAIEmbeddings()

# conifgure starrocks settings
settings = StarRocksSettings()
settings.port = 41003
settings.host = '127.0.0.1'
settings.username = 'root'
settings.password = ''
settings.database = 'zya'

# to fill new embeddings
docsearch = StarRocks.from_documents(split_docs, embeddings, config = settings)   


# or to use already-built embeddings in database.
docsearch = StarRocks(embeddings, settings)
```

#### Who can review?

Tag maintainers/contributors who might be interested:

@dev2049 

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-21 09:02:33 -07:00
Zander Chase
7a4ff424fc
Relax string input mapper check (#6544)
for run evaluator. It could be that an evalutor doesn't need the output
2023-06-21 08:01:42 -07:00
Harrison Chase
ace442b992
bump to ver 208 (#6540) 2023-06-21 07:32:36 -07:00