Commit Graph

512 Commits (1f11f806413837b2f41046c46b8963b8e0ae6b5f)

Author SHA1 Message Date
Matt Robinson bf3f554357
feat: batch multiple files in a single Unstructured API request (#4525)
### Submit Multiple Files to the Unstructured API

Enables batching multiple files into a single Unstructured API requests.
Support for requests with multiple files was added to both
`UnstructuredAPIFileLoader` and `UnstructuredAPIFileIOLoader`. Note that
if you submit multiple files in "single" mode, the result will be
concatenated into a single document. We recommend using this feature in
"elements" mode.

### Testing

The following should load both documents, using two of the example docs
from the integration tests folder.

```python
    from langchain.document_loaders import UnstructuredAPIFileLoader

    file_paths = ["examples/layout-parser-paper.pdf",  "examples/whatsapp_chat.txt"]

    loader = UnstructuredAPIFileLoader(
        file_paths=file_paths,
        api_key="FAKE_API_KEY",
        strategy="fast",
        mode="elements",
    )
    docs = loader.load()
```
1 year ago
Harrison Chase b0431c672b
Harrison/psychic (#5063)
Co-authored-by: Ayan Bandyopadhyay <ayanb9440@gmail.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Davis Chase 3bc0bf0079
fix prompt saving (#4987)
will add unit tests
1 year ago
Davis Chase 080eb1b3fc
Fix graphql tool (#4984)
Fix construction and add unit test.
1 year ago
Mike McGarry ddd595fe81
feature/4493 Improve Evernote Document Loader (#4577)
# Improve Evernote Document Loader

When exporting from Evernote you may export more than one note.
Currently the Evernote loader concatenates the content of all notes in
the export into a single document and only attaches the name of the
export file as metadata on the document.

This change ensures that each note is loaded as an independent document
and all available metadata on the note e.g. author, title, created,
updated are added as metadata on each document.

It also uses an existing optional dependency of `html2text` instead of
`pypandoc` to remove the need to download the pandoc application via
`download_pandoc()` to be able to use the `pypandoc` python bindings.

Fixes #4493 

Co-authored-by: Mike McGarry <mike.mcgarry@finbourne.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Eugene Yurtsev 0ff59569dc
Adds 'IN' metadata filter for pgvector for checking set presence (#4982)
# Adds "IN" metadata filter for pgvector to all checking for set
presence

PGVector currently supports metadata filters of the form:
```
{"filter": {"key": "value"}}
```
which will return documents where the "key" metadata field is equal to
"value".

This PR adds support for metadata filters of the form:
```
{"filter": {"key": { "IN" : ["list", "of", "values"]}}}
```

Other vector stores support this via an "$in" syntax. I chose to use
"IN" to match postgres' syntax, though happy to switch.
Tested locally with PGVector and ChatVectorDBChain.


@dev2049

---------

Co-authored-by: jade@spanninglabs.com <jade@spanninglabs.com>
1 year ago
Eugene Yurtsev 06e524416c
power bi api wrapper integration tests & bug fix (#4983)
# Powerbi API wrapper bug fix + integration tests

- Bug fix by removing `TYPE_CHECKING` in in utilities/powerbi.py
- Added integration test for power bi api in
utilities/test_powerbi_api.py
- Added integration test for power bi agent in
agent/test_powerbi_agent.py
- Edited .env.examples to help set up power bi related environment
variables
- Updated demo notebook with working code in
docs../examples/powerbi.ipynb - AzureOpenAI -> ChatOpenAI

Notes: 

Chat models (gpt3.5, gpt4) are much more capable than davinci at writing
DAX queries, so that is important to getting the agent to work properly.
Interestingly, gpt3.5-turbo needed the examples=DEFAULT_FEWSHOT_EXAMPLES
to write consistent DAX queries, so gpt4 seems necessary as the smart
llm.

Fixes #4325

## Before submitting

Azure-core and Azure-identity are necessary dependencies

check integration tests with the following:
`pytest tests/integration_tests/utilities/test_powerbi_api.py`
`pytest tests/integration_tests/agent/test_powerbi_agent.py`

You will need a power bi account with a dataset id + table name in order
to test. See .env.examples for details.

## Who can review?
@hwchase17
@vowelparrot

---------

Co-authored-by: aditya-pethe <adityapethe1@gmail.com>
1 year ago
Harrison Chase 88a3a56c1a
Add Spark SQL support (#4602) (#4956)
# Add Spark SQL support 
* Add Spark SQL support. It can connect to Spark via building a
local/remote SparkSession.
* Include a notebook example

I tried some complicated queries (window function, table joins), and the
tool works well.
Compared to the [Spark Dataframe

agent](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/spark.html),
this tool is able to generate queries across multiple tables.

---------

# Your PR Title (What it does)

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->

---------

Co-authored-by: Gengliang Wang <gengliang@apache.org>
Co-authored-by: Mike W <62768671+skcoirz@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: UmerHA <40663591+UmerHA@users.noreply.github.com>
Co-authored-by: 张城铭 <z@hyperf.io>
Co-authored-by: assert <zhangchengming@kkguan.com>
Co-authored-by: blob42 <spike@w530>
Co-authored-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Co-authored-by: Richard He <he.yucheng@outlook.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com>
Co-authored-by: Alexey Nominas <60900649+Chae4ek@users.noreply.github.com>
Co-authored-by: elBarkey <elbarkey@gmail.com>
Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com>
Co-authored-by: Jeffrey D <1289344+verygoodsoftwarenotvirus@users.noreply.github.com>
Co-authored-by: so2liu <yangliu35@outlook.com>
Co-authored-by: Viswanadh Rayavarapu <44315599+vishwa-rn@users.noreply.github.com>
Co-authored-by: Chakib Ben Ziane <contact@blob42.xyz>
Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>
Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
Co-authored-by: Jari Bakken <jari.bakken@gmail.com>
Co-authored-by: escafati <scafatieugenio@gmail.com>
1 year ago
Daniel Chalef c8c2276ccb
Zep Retriever - Vector Search Over Chat History (#4533)
# Zep Retriever - Vector Search Over Chat History with the Zep Long-term
Memory Service

More on Zep: https://github.com/getzep/zep

Note: This PR is related to and relies on
https://github.com/hwchase17/langchain/pull/4834. I did not want to
modify the `pyproject.toml` file to add the `zep-python` dependency a
second time.

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
1 year ago
Davis Chase 55baa0d153
Update redis integration tests (#4937) 1 year ago
Harrison Chase c9a362e482
add alias for model (#4553)
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Eugene Yurtsev e46202829f
feat #4479: TextLoader auto detect encoding and improved exceptions (#4927)
# TextLoader auto detect encoding and enhanced exception handling

- Add an option to enable encoding detection on `TextLoader`. 
- The detection is done using `chardet`
- The loading is done by trying all detected encodings by order of
confidence or raise an exception otherwise.

### New Dependencies:
- `chardet`

Fixes #4479 

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

- @eyurtsev

---------

Co-authored-by: blob42 <spike@w530>
1 year ago
Harrison Chase 9e2227ba11
Harrison/serper api bug (#4902)
Co-authored-by: Jerry Luan <xmaswillyou@gmail.com>
1 year ago
Davis Chase 8966f61ca5
Zep memory (#4898)
Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>
1 year ago
Davis Chase e28bdf4453
Cadlabs/python tool sanitization (#4754)
Co-authored-by: BenSchZA <BenSchZA@users.noreply.github.com>
1 year ago
Eugene Yurtsev 0dc304ca80
Add html parsers (#4874)
# Add bs4 html parser

* Some minor refactors
* Extract the bs4 html parsing code from the bs html loader
* Move some tests from integration tests to unit tests
1 year ago
Eugene Yurtsev 8e41143bf5
Add a generic document loader (#4875)
# Add generic document loader

* This PR adds a generic document loader which can assemble a loader
from a blob loader and a parser
* Adds a registry for parsers
* Populate registry with a default mimetype based parser


## Expected changes

- Parsing involves loading content via IO so can be sped up via:
  * Threading in sync
  * Async  
- The actual parsing logic may be computatinoally involved: may need to
figure out to add multi-processing support
- May want to add suffix based parser since suffixes are easier to
specify in comparison to mime types

## Before submitting

No notebooks yet, we first need to get a few of the basic parsers up
(prior to advertising the interface)
1 year ago
yujiosaka 2f8eb95a91
Remove unnecessary comment (#4845)
# Remove unnecessary comment

Remove unnecessary comment accidentally included in #4800

## Before submitting

- no test
- no document

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
1 year ago
Zander Chase 8dcad0f272
Add Support for Flexible Input Format for LLM and Chat Model Runs (#4805)
Previously, the client expected a strict 'prompt' or 'messages' format
and wouldn't permit running a chat model or llm on prompts or messages
(respectively).

Since many datasets may want to specify custom key: string , relax this
requirement.
Also, add support for running a chat model on raw prompts and LLM on
chat messages through their respective fallbacks.
1 year ago
Ankush Gola aa73a888fa
Some notebook and client fixes (add retries, clean up docs, etc) (#4820)
# Your PR Title (What it does)

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->
1 year ago
charosen 75fe9d3555
Add from_file method to message prompt template (#4713)
**Feature**: This PR adds `from_template_file` class method to
BaseStringMessagePromptTemplate. This is useful to help user to create
message prompt templates directly from template files, including
`ChatMessagePromptTemplate`, `HumanMessagePromptTemplate`,
`AIMessagePromptTemplate` & `SystemMessagePromptTemplate`.

**Tests**: Unit tests have been added in this PR.

Co-authored-by: charosen <charosen@bupt.cn>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
yujiosaka 6561efebb7
Accept uuids kwargs for weaviate (#4800)
# Accept uuids kwargs for weaviate

Fixes #4791
1 year ago
Adam Quigley e78c9be312
Add Confluence Loader unit tests (#3333)
Adds some basic unit tests for the ConfluenceLoader that can be extended
later. Ports this [PR from
llama-hub](https://github.com/emptycrown/llama-hub/pull/208) and adapts
it to `langchain`.

@Jflick58 and @zywilliamli adding you here as potential reviewers

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Magnus Friberg d126276693
Specify which data to return from chromadb (#4393)
# Improve the Chroma get() method by adding the optional "include"
parameter.

The Chroma get() method excludes embeddings by default. You can
customize the response by specifying the "include" parameter to
selectively retrieve the desired data from the collection.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Raduan Al-Shedivat 00c6ec8a2d
fix(document_loaders/telegram): fix pandas calls + add tests (#4806)
# Fix Telegram API loader + add tests.
I was testing this integration and it was broken with next error:
```python
message_threads = loader._get_message_threads(df)
KeyError: False
```
Also, this particular loader didn't have any tests / related group in
poetry, so I added those as well.

@hwchase17 / @eyurtsev please take a look on this fix PR.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Eugene Yurtsev 255690d78e
Catch changes to test group (#4802)
# Catch changes to test group

Add test to catch changes to test group.
1 year ago
Eugene Yurtsev c3b6129beb
Block sockets for unit-tests (#4803)
# Block usage of sockets during unit tests

Catch any tests that attempt to use the network.
1 year ago
了空 f7e3d97b19
Remove unnecessary spaces from document object’s page_content of BiliBiliLoader (#4619)
- Remove unnecessary spaces from document object’s page_content of
BiliBiliLoader
- Fix BiliBiliLoader document and test file
1 year ago
Harrison Chase a7af32c274
Cassandra support for chat history (#4378) (#4764)
# Cassandra support for chat history

### Description

- Store chat messages in cassandra

### Dependency

- cassandra-driver - Python Module

## Before submitting

- Added Integration Test

## Who can review?

@hwchase17
@agola11

# Your PR Title (What it does)

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->

Co-authored-by: Jinto Jose <129657162+jj701@users.noreply.github.com>
1 year ago
Anirudh Suresh 03ac39368f
Fixing DeepLake Overwrite Flag (#4683)
# Fix DeepLake Overwrite Flag Issue

Fixes Issue #4682: essentially, setting overwrite to False in the
DeepLake constructor still triggers an overwrite, because the logic is
just checking for the presence of "overwrite" in kwargs. The fix is
simple--just add some checks to inspect if "overwrite" in kwargs AND
kwargs["overwrite"]==True.

Added a new test in
tests/integration_tests/vectorstores/test_deeplake.py to reflect the
desired behavior.


Co-authored-by: Anirudh Suresh <ani@Anirudhs-MBP.cable.rcn.com>
Co-authored-by: Anirudh Suresh <ani@Anirudhs-MacBook-Pro.local>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
whuwxl 3f0357f94a
Add summarization task type for HuggingFace APIs (#4721)
# Add summarization task type for HuggingFace APIs

Add summarization task type for HuggingFace APIs.
This task type is described by [HuggingFace inference
API](https://huggingface.co/docs/api-inference/detailed_parameters#summarization-task)

My project utilizes LangChain to connect multiple LLMs, including
various HuggingFace models that support the summarization task.
Integrating this task type is highly convenient and beneficial.

Fixes #4720
1 year ago
Roma cb802edf75
[Feature] Add GraphQL Query Tool (#4409)
# Add GraphQL Query Support

This PR introduces a GraphQL API Wrapper tool that allows LLM agents to
query GraphQL databases. The tool utilizes the httpx and gql Python
packages to interact with GraphQL APIs and provides a simple interface
for running queries with LLM agents.

@vowelparrot

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Eugene Yurtsev 09587a3201
Clean up tests for pdf parsers (#4595)
# Organize tests for pdf parsers

Clean up tests for pdf parsers, remove duplicate tests, convert to unit
tests.
1 year ago
Eugene Yurtsev d3300bd799
YouTube Loader: Replace regexp with built-in parsing (#4729) 1 year ago
Eugene Yurtsev 3c490b5ba3
Docugami DataLoader (#4727)
### Adds a document loader for Docugami

Specifically:

1. Adds a data loader that talks to the [Docugami](http://docugami.com)
API to download processed documents as semantic XML
2. Parses the semantic XML into chunks, with additional metadata
capturing chunk semantics
3. Adds a detailed notebook showing how you can use additional metadata
returned by Docugami for techniques like the [self-querying
retriever](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query_retriever.html)
4. Adds an integration test, and related documentation

Here is an example of a result that is not possible without the
capabilities added by Docugami (from the notebook):

<img width="1585" alt="image"
src="https://github.com/hwchase17/langchain/assets/749277/bb6c1ce3-13dc-4349-a53b-de16681fdd5b">

---------

Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
Co-authored-by: Taqi Jaffri <tjaffri@gmail.com>
1 year ago
KNiski c2761aa8f4
Improve video_id extraction in YoutubeLoader (#4452)
# Improve video_id extraction in `YoutubeLoader`

`YoutubeLoader.from_youtube_url` can only deal with one specific url
format. I've introduced `YoutubeLoader.extract_video_id` which can
extract video id from common YT urls.

Fixes #4451 


@eyurtsev

---------

Co-authored-by: Kamil Niski <kamil.niski@gmail.com>
1 year ago
Lester Yang cd3f9865f3
Feature: pdfplumber PDF loader with BaseBlobParser (#4552)
# Feature: pdfplumber PDF loader with BaseBlobParser

* Adds pdfplumber as a PDF loader
* Adds pdfplumber as a blob parser.
1 year ago
Harrison Chase b6e3ac17c4
Harrison/sitemap local (#4704)
Co-authored-by: Lukas Bauer <lukas.bauer@mayflower.de>
1 year ago
Harrison Chase 12b4ee1fc7
Harrison/telegram chat loader (#4698)
Co-authored-by: Akinwande Komolafe <47945512+Sensei-akin@users.noreply.github.com>
Co-authored-by: Akinwande Komolafe <akhinoz@gmail.com>
1 year ago
Li Yuanzheng 3b6206af49
Respect User-Specified User-Agent in WebBaseLoader (#4579)
# Respect User-Specified User-Agent in WebBaseLoader
This pull request modifies the `WebBaseLoader` class initializer from
the `langchain.document_loaders.web_base` module to preserve any
User-Agent specified by the user in the `header_template` parameter.
Previously, even if a User-Agent was specified in `header_template`, it
would always be overridden by a random User-Agent generated by the
`fake_useragent` library.

With this change, if a User-Agent is specified in `header_template`, it
will be used. Only in the case where no User-Agent is specified will a
random User-Agent be generated and used. This provides additional
flexibility when using the `WebBaseLoader` class, allowing users to
specify their own User-Agent if they have a specific need or preference,
while still providing a reasonable default for cases where no User-Agent
is specified.

This change has no impact on existing users who do not specify a
User-Agent, as the behavior in this case remains the same. However, for
users who do specify a User-Agent, their choice will now be respected
and used for all subsequent requests made using the `WebBaseLoader`
class.


Fixes #4167

## Before submitting

============================= test session starts
==============================
collecting ... collected 1 item


test_web_base.py::TestWebBaseLoader::test_respect_user_specified_user_agent

============================== 1 passed in 3.64s
===============================
PASSED [100%]

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested: @eyurtsev

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
1 year ago
Samuli Rauatmaa 66828ad231
add the existing OpenWeatherMap tool to the public api (#4292)
[OpenWeatherMapAPIWrapper](f70e18a5b3/docs/modules/agents/tools/examples/openweathermap.ipynb)
works wonderfully, but the _tool_ itself can't be used in master branch.

- added OpenWeatherMap **tool** to the public api, to be loadable with
`load_tools` by using "openweathermap-api" tool name (that name is used
in the existing
[docs](aff33d52c5/docs/modules/agents/tools/getting_started.md),
at the bottom of the page)
- updated OpenWeatherMap tool's **description** to make the input format
match what the API expects (e.g. `London,GB` instead of `'London,GB'`)
- added [ecosystem documentation page for
OpenWeatherMap](f9c41594fe/docs/ecosystem/openweathermap.md)
- added tool usage example to [OpenWeatherMap's
notebook](f9c41594fe/docs/modules/agents/tools/examples/openweathermap.ipynb)

Let me know if there's something I missed or something needs to be
updated! Or feel free to make edits yourself if that makes it easier for
you 🙂
1 year ago
Harrison Chase cdc20d1203
Harrison/json loader fix (#4686)
Co-authored-by: Triet Le <112841660+triet-lq-holistics@users.noreply.github.com>
1 year ago
Harrison Chase f2f2aced6d
allow partials in from_template (#4638) 1 year ago
Harrison Chase fbfa49f2c1
agent serialization (#4642) 1 year ago
Harrison Chase f5e2f70115
Harrison/json new line (#4646)
Co-authored-by: David Chen <davidchen@gliacloud.com>
1 year ago
Harrison Chase 279605b4d3
Harrison/metaphor search (#4657)
Co-authored-by: Jeffrey Wang <jeffreyzhiyuanwang@gmail.com>
1 year ago
Prerit Das 2747ccbcf1
Allow custom base Zapier prompt (#4213)
Currently, all Zapier tools are built using the pre-written base Zapier
prompt. These small changes (that retain default behavior) will allow a
user to create a Zapier tool using the ZapierNLARunTool while providing
their own base prompt.

Their prompt must contain input fields for zapier_description and
params, checked and enforced in the tool's root validator.

An example of when this may be useful: user has several, say 10, Zapier
tools enabled. Currently, the long generic default Zapier base prompt is
attached to every single tool, using an extreme number of tokens for no
real added benefit (repeated). User prompts LLM on how to use Zapier
tools once, then overrides the base prompt.

Or: user has a few specific Zapier tools and wants to maximize their
success rate. So, user writes prompts/descriptions for those tools
specific to their use case, and provides those to the ZapierNLARunTool.

A consideration - this is the simplest way to implement this I could
think of... though ideally custom prompting would be possible at the
Toolkit level as well. For now, this should be sufficient in solving the
concerns outlined above.
1 year ago
Paresh Mathur e2bc836571
Fix #4087 by setting the correct csv dialect (#4103)
The error in #4087 was happening because of the use of csv.Dialect.*
which is just an empty base class. we need to make a choice on what is
our base dialect. I usually use excel so I put it as excel, if
maintainers have other preferences do let me know.

Open Questions:
1. What should be the default dialect?
2. Should we rework all tests to mock the open function rather than the
csv.DictReader?
3. Should we make a separate input for `dialect` like we have for
`encoding`?

---------

Co-authored-by: = <=>
1 year ago
Zander Chase 928cdd57a4
[Breaking] Refactor Base Tracer(#4549)
### Refactor the BaseTracer
- Remove the 'session' abstraction from the BaseTracer
- Rename 'RunV2' object(s) to be called 'Run' objects (Rename previous
Run objects to be RunV1 objects)
- Ditto for sessions: TracerSession*V2 -> TracerSession*
- Remove now deprecated conversion from v1 run objects to v2 run objects
in LangChainTracerV2
- Add conversion from v2 run objects to v1 run objects in V1 tracer
1 year ago
Harrison Chase 485ecc3580
option for csv agent to not include df in prompt (#4610) 1 year ago
Zander Chase 0c6ed657ef
Convert Chain to a Chain Factory (#4605)
## Change Chain argument in client to accept a chain factory

The `run_over_dataset` functionality seeks to treat each iteration of an
example as an independent trial.
Chains have memory, so it's easier to permit this type of behavior if we
accept a factory method rather than the chain object directly.

There's still corner cases / UX pains people will likely run into, like:
- Caching may cause issues
- if memory is persisted to a shared object (e.g., same redis queue) ,
this could impact what is retrieved
- If we're running the async methods with concurrency using local
models, if someone naively instantiates the chain and loads each time,
it could lead to tons of disk I/O or OOM
1 year ago
Davis Chase 36f9e9a0ba
Skip flaky unit test (#4591) 1 year ago
Eugene Yurtsev 08ed927c32
Turn on extended tests (#4588)
# Turn on strict extended tests

This PR turns on strict testing for extended tests.
1 year ago
Zander Chase d96f6a106b
Add Steamship Image Generation Tool (#4580)
Co-authored-by: Enias Cailliau <enias@steamship.com>
1 year ago
Eugene Yurtsev a5371a0fa2
Add pytest --only-extended and --only-core options (#4494)
# Adds testing options to pytest

This PR adds the following options: 

* `--only-core` will skip all extended tests, running all core tests.
* `--only-extended` will skip all core tests. Forcing alll extended
tests to be run.

Running `py.test` without specifying either option will remain
unaffected. Run
all tests that can be run within the unit_tests direction. Extended
tests will
run if required packages are installed.

## Before submitting

## Who can review?
1 year ago
Leonid Ganeline e17d0319d5
Add `arxiv` retriever (#4538) 1 year ago
SimFG 7bcf238a1a
Optimize the initialization method of GPTCache (#4522)
Optimize the initialization method of GPTCache, so that users can use GPTCache more quickly.
1 year ago
Zander Chase f4d3cf2dfb
Add Invocation Params (#4509)
### Add Invocation Params to Logged Run


Adds an llm type to each chat model as well as an override of the dict()
method to log the invocation parameters for each call

---------

Co-authored-by: Ankush Gola <ankush.gola@gmail.com>
1 year ago
Zander Chase 4ee47926ca
Add on_chat_message_start (#4499)
### Add on_chat_message_start to callback manager and base tracer

Goal: trace messages directly to permit reloading as chat messages
(store in an integration-agnostic way)

Add an `on_chat_message_start` method. Fall back to `on_llm_start()` for
handlers that don't have it implemented.

Does so in a non-backwards-compat breaking way (for now)
1 year ago
Sunish Sheth 812e5f43f5
Add _type for all parsers (#4189)
Used for serialization. Also add test that recurses through
our subclasses to check they have them implemented

Would fix https://github.com/hwchase17/langchain/issues/3217
Blocking: https://github.com/mlflow/mlflow/pull/8297

---------

Signed-off-by: Sunish Sheth <sunishsheth2009@gmail.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
kYLe 0d51a1f12b
Add LLMs support for Anyscale Service (#4350)
Add Anyscale service integration under LLM

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Evan Jones f668251948
parameterized distance metrics; lint; format; tests (#4375)
# Parameterize Redis vectorstore index

Redis vectorstore allows for three different distance metrics: `L2`
(flat L2), `COSINE`, and `IP` (inner product). Currently, the
`Redis._create_index` method hard codes the distance metric to COSINE.

I've parameterized this as an argument in the `Redis.from_texts` method
-- pretty simple.

Fixes #4368 

## Before submitting

I've added an integration test showing indexes can be instantiated with
all three values in the `REDIS_DISTANCE_METRICS` literal. An example
notebook seemed overkill here. Normal API documentation would be more
appropriate, but no standards are in place for that yet.

## Who can review?

Not sure who's responsible for the vectorstore module... Maybe @eyurtsev
/ @hwchase17 / @agola11 ?
1 year ago
Zander Chase d969f43ed8
Load HuggingFace Tool (#4475)
# Add option to `load_huggingface_tool`

Expose a method to load a huggingface Tool from the HF hub

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Davis Chase 9ec60ad832
Add azure cognitive search retriever (#4467)
All credit to @UmerHA, made a couple small changes

---------

Co-authored-by: UmerHA <40663591+UmerHA@users.noreply.github.com>
1 year ago
Davis Chase 46b100ea63
Add DocArray vector stores (#4483)
Thanks to @anna-charlotte and @jupyterjazz for the contribution! Made
few small changes to get it across the finish line

---------

Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
Co-authored-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Co-authored-by: jupyterjazz <saba.sturua@jina.ai>
Co-authored-by: Saba Sturua <45267439+jupyterjazz@users.noreply.github.com>
1 year ago
Harrison Chase b2f920e891
add tracing v2 env var (#4465)
Co-authored-by: Ankush Gola <ankush.gola@gmail.com>
1 year ago
Davis Chase 04475bea7d
Mv plan and execute to experimental (#4459) 1 year ago
Eugene Yurtsev 80558b5b27
Add workflow for testing with all deps (#4410)
# Add action to test with all dependencies installed

PR adds a custom action for setting up poetry that allows specifying a
cache key:
https://github.com/actions/setup-python/issues/505#issuecomment-1273013236

This makes it possible to run 2 types of unit tests: 

(1) unit tests with only core dependencies
(2) unit tests with extended dependencies (e.g., those that rely on an
optional pdf parsing library)


As part of this PR, we're moving some pdf parsing tests into the
unit-tests section and making sure that these unit tests get executed
when running with extended dependencies.
1 year ago
Matt Robinson 3637d6da6e
feat: add loader for open office odt files (#4405)
# ODF File Loader

Adds a data loader for handling Open Office ODT files. Requires
`unstructured>=0.6.3`.

### Testing

The following should work using the `fake.odt` example doc from the
[`unstructured` repo](https://github.com/Unstructured-IO/unstructured).

```python
from langchain.document_loaders import UnstructuredODTLoader

loader = UnstructuredODTLoader(file_path="fake.odt", mode="elements")
loader.load()

loader = UnstructuredODTLoader(file_path="fake.odt", mode="single")
loader.load()
```
1 year ago
Harrison Chase 6b8d144ccc
Harrison/plan and solve (#4422) 1 year ago
Rukmani 2b14036126
Update WhatsAppChatLoader to include the character ~ in the sender name (#4420)
Fixes #4153

If the sender of a message in a group chat isn't in your contact list,
they will appear with a ~ prefix in the exported chat. This PR adds
support for parsing such lines.
1 year ago
Zander Chase f2150285a4
Fix nested runs example ID (#4413)
#### Only reference example ID on the parent run

Previously, I was assigning the example ID to every child run. 
Adds a test.
1 year ago
Aivin V. Solatorio 6335cb5b3a
Add support for Qdrant nested filter (#4354)
# Add support for Qdrant nested filter

This extends the filter functionality for the Qdrant vectorstore. The
current filter implementation is limited to a single-level metadata
structure; however, Qdrant supports nested metadata filtering. This
extends the functionality for users to maximize the filter functionality
when using Qdrant as the vectorstore.

Reference: https://qdrant.tech/documentation/filtering/#nested-key

---------

Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>
1 year ago
Martin Holzhauer 872605a5c5
Add an option to extract more metadata from crawled websites (#4347)
This pr makes it possible to extract more metadata from websites for
later use.

my usecase:
parsing ld+json or microdata from sites and store it as structured data
in the metadata field
1 year ago
Leonid Ganeline ce15ffae6a
added `Wikipedia` retriever (#4302)
- added `Wikipedia` retriever. It is effectively a wrapper for
`WikipediaAPIWrapper`. It wrapps load() into get_relevant_documents()
- sorted `__all__` in the `retrievers/__init__`
- added integration tests for the WikipediaRetriever
- added an example (as Jupyter notebook) for the WikipediaRetriever
1 year ago
Eugene Yurtsev 2ceb807da2
Add PDF parser implementations (#4356)
# Add PDF parser implementations

This PR separates the data loading from the parsing for a number of
existing PDF loaders.

Parser tests have been designed to help encourage developers to create a
consistent interface for parsing PDFs.

This interface can be made more consistent in the future by adding
information into the initializer on desired behavior with respect to splitting by
page etc.

This code is expected to be backwards compatible -- with the exception
of a bug fix with pymupdf parser which was returning `bytes` in the page
content rather than strings.

Also changing the lazy parser method of document loader to return an
Iterator rather than Iterable over documents.

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoader Abstractions
        - @eyurtsev

        LLM/Chat Wrappers
        - @hwchase17
        - @agola11

        Tools / Toolkits
        - @vowelparrot
 -->
1 year ago
Eugene Yurtsev ae0c3382dd
Add MimeType based parser (#4376)
# Add MimeType Based Parser

This PR adds a MimeType Based Parser. The parser inspects the mime-type
of the blob it is parsing and based on the mime-type can delegate to the sub
parser.

## Before submitting

Waiting on adding notebooks until more implementations are landed. 

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:


@hwchase17
@vowelparrot
1 year ago
Davis Chase ba0057c077
Check OpenAI model kwargs (#4366)
Handle duplicate and incorrectly specified OpenAI params

Thanks @PawelFaron for the fix! Made small update

Closes #4331

---------

Co-authored-by: PawelFaron <42373772+PawelFaron@users.noreply.github.com>
Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>
1 year ago
Davis Chase 02ebb15c4a
Fix TextSplitter.from_tiktoken(#4361)
Thanks to @danb27 for the fix! Minor update

Fixes https://github.com/hwchase17/langchain/issues/4357

---------

Co-authored-by: Dan Bianchini <42096328+danb27@users.noreply.github.com>
1 year ago
Naveen Tatikonda 782df1db10
OpenSearch: Add Similarity Search with Score (#4089)
### Description
Add `similarity_search_with_score` method for OpenSearch to return
scores along with documents in the search results

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
1 year ago
Eugene Yurtsev aa11f7c89b
Add progress bar to filesystemblob loader, update pytest config for unit tests (#4212)
This PR adds:

* Option to show a tqdm progress bar when using the file system blob loader
* Update pytest run configuration to be stricter
* Adding a new marker that checks that required pkgs exist
1 year ago
Zander Chase 8b284f9ad0
Pass parsed inputs through to tool _run (#4309) 1 year ago
Zander Chase 35c9e6ab40
Pass Callbacks through load_tools (#4298)
- Update the load_tools method to properly accept `callbacks` arguments.
- Add a deprecation warning when `callback_manager` is passed
- Add two unit tests to check the deprecation warning is raised and to
confirm the callback is passed through.

Closes issue #4096
1 year ago
Jinto Jose 8a338412fa
mongodb support for chat history (#4266) 1 year ago
Harrison Chase c8b0b6e6c1
add youtube tools (#4320) 1 year ago
Leonid Ganeline 9544b30821
added `Wikipedia` document loader (#4141)
- Added the `Wikipedia` document loader. It is based on the existing
`unilities/WikipediaAPIWrapper`
- Added a respective ut-s and example notebook
- Sorted list of classes in __init__
1 year ago
Eugene Yurtsev 423f497168
Add BlobParser abstraction (#3979)
This PR adds the BlobParser abstraction.

It follows the proposal described here:
https://github.com/hwchase17/langchain/pull/2833#issuecomment-1509097756
1 year ago
Davis Chase 5ca13cc1f0
Dev2049/pypdfium2 (#4209)
thanks @jerrytigerxu for the addition!

---------

Co-authored-by: Jere Xu <jtxu2008@gmail.com>
Co-authored-by: jerrytigerxu <jere.tiger.xu@gmailc.om>
1 year ago
George 2324f19c85
Update qdrant interface (#3971)
Hello

1) Passing `embedding_function` as a callable seems to be outdated and
the common interface is to pass `Embeddings` instance

2) At the moment `Qdrant.add_texts` is designed to be used with
`embeddings.embed_query`, which is 1) slow 2) causes ambiguity due to 1.
It should be used with `embeddings.embed_documents`

This PR solves both problems and also provides some new tests
1 year ago
Zander Chase 1017e5cee2
Add LCP Client (#4198)
Adding a client to fetch datasets, examples, and runs from a LCP
instance and run objects over them.
1 year ago
Zander Chase a30f42da4e
Update V2 Tracer (#4193)
- Update the RunCreate object to work with recent changes
- Add optional Example ID to the tracer
- Adjust default persist_session behavior to attempt to load the session
if it exists
- Raise more useful HTTP errors for logging
- Add unit testing
- Fix the default ID to be a UUID for v2 tracer sessions


Broken out from the big draft here:
https://github.com/hwchase17/langchain/pull/4061
1 year ago
Mike Wang c3044b1bf0
[test] Add integration_test for PandasAgent (#4056)
- confirm creation
- confirm functionality with a simple dimension check.

The test now is calling OpenAI API directly, but learning from
@vowelparrot that we’re caching the requests, so that it’s not that
expensive. I also found we’re calling OpenAI api in other integration
tests. Please lmk if there is any concern of real external API calls. I
can alternatively make a fake LLM for this test. Thanks
1 year ago
Aivin V. Solatorio 6567b73e1a
JSON loader (#4067)
This implements a loader of text passages in JSON format. The `jq`
syntax is used to define a schema for accessing the relevant contents
from the JSON file. This requires dependency on the `jq` package:
https://pypi.org/project/jq/.

---------

Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>
1 year ago
hp0404 2a3c5f8353
Update WhatsAppChatLoader regex to handle multiple date-time formats (#4186)
This PR updates the `message_line_regex` used by `WhatsAppChatLoader` to
support different date-time formats used in WhatsApp chat exports;
resolves #4153.

The new regex handles the following input formats:
```terminal
[05.05.23, 15:48:11] James: Hi here
[11/8/21, 9:41:32 AM] User name: Message 123
1/23/23, 3:19 AM - User 2: Bye!
1/23/23, 3:22_AM - User 1: And let me know if anything changes
```

Tests have been added to verify that the loader works correctly with all
formats.
1 year ago
Zander Chase 84cfa76e00
Update Cohere Reranker (#4180)
The forward ref annotations don't get updated if we only iimport with
type checking

---------

Co-authored-by: Abhinav Verma <abhinav_win12@yahoo.co.in>
1 year ago
Zander Chase 6032a051e9
Add Tenant ID to V2 Tracer (#4135)
Update the V2 tracer to
- use UUIDs instead of int's
- load a tenant ID and use that when saving sessions
1 year ago
Zander Chase 2f087d63af
Fix Python RePL Tool (#4137)
Filter out kwargs from inferred schema when determining if a tool is
single input.

Add a couple unit tests.

Move tool unit tests to the tools dir
1 year ago
Harrison Chase d4cf1eb60a
Add firestore memory (#3792) (#3941)
If you have any other suggestions or feedback, please let me know.

---------

Co-authored-by: yakigac <10434946+yakigac@users.noreply.github.com>
1 year ago
Mike Wang 67db495fcf
[agent] Add Spark Agent (#4020)
- added support for spark through pyspark library.
- added jupyter notebook as example.
1 year ago
rogerserper b1446bea5f
google-serper: async + full json results + support for Google Images, Places and News (#4078)
* implemented arun, results, and aresults. Reuses aiosession if
available.
* helper tools GoogleSerperRun and GoogleSerperResults
* support for Google Images, Places and News (examples given) and
filtering based on time (e.g. past hour)
* updated docs
1 year ago