Commit Graph

285 Commits

Author SHA1 Message Date
Joe Reuter
8f0cd91d57
Airbyte based loaders (#8586)
This PR adds 8 new loaders:
* `AirbyteCDKLoader` This reader can wrap and run all python-based
Airbyte source connectors.
* Separate loaders for the most commonly used APIs:
  * `AirbyteGongLoader`
  * `AirbyteHubspotLoader`
  * `AirbyteSalesforceLoader`
  * `AirbyteShopifyLoader`
  * `AirbyteStripeLoader`
  * `AirbyteTypeformLoader`
  * `AirbyteZendeskSupportLoader`

## Documentation and getting started
I added the basic shape of the config to the notebooks. This increases
the maintenance effort a bit, but I think it's worth it to make sure
people can get started quickly with these important connectors. This is
also why I linked the spec and the documentation page in the readme as
these two contain all the information to configure a source correctly
(e.g. it won't suggest using oauth if that's avoidable even if the
connector supports it).

## Document generation
The "documents" produced by these loaders won't have a text part
(instead, all the record fields are put into the metadata). If a text is
required by the use case, the caller needs to do custom transformation
suitable for their use case.

## Incremental sync
All loaders support incremental syncs if the underlying streams support
it. By storing the `last_state` from the reader instance away and
passing it in when loading, it will only load updated records.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-08 14:49:25 -07:00
Eugene Yurtsev
15f650ae8c
Add base storage interface, 2 implementations and utility encoder (#8895)
This PR defines an abstract interface for key value stores.

It provides 2 implementations: 
1. Local File System
2. In memory -- used to facilitate testing

It also provides an encoder utility to help take care of serialization
from arbitrary data to data that can be stored by the given store
2023-08-08 17:29:06 -04:00
Harrison Chase
7543a3d70e
Harrison/image (#845)
Co-authored-by: Ashutosh Sanzgiri <sanzgiri@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-08 13:58:27 -07:00
Bagatur
ab193338aa
bump 258 (#8932) 2023-08-08 12:54:51 -07:00
Eugene Yurtsev
bb12184551
Internal code deprecation API (#8763)
Proposal for an internal API to deprecate LangChain code.

This PR is heavily based on:
https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/_api/deprecation.py

This PR only includes deprecation functionality (no renaming etc.). 
Additional functionality can be added on a need basis (e.g., renaming
parameters), but best to roll out as an MVP to test this
out.

DeprecationWarnings are ignored by default. We can change the policy for
the deprecation warnings, but we'll need to make sure we're not creating
noise for users due to internal code invoking deprecated functionality.
2023-08-08 15:42:22 -04:00
Leonid Ganeline
33a2f58fbf
tensoflow_datasets document loader (#8721)
This PR adds `tensoflow_datasets` document loader
2023-08-08 15:19:28 -04:00
Holt Skinner
fad26e79a3
fix: Resolve AttributeError in Google Cloud Enterprise Search retriever (#8872)
- Reverting some of the changes made in
https://github.com/langchain-ai/langchain/pull/8369
2023-08-08 12:11:12 -07:00
William FH
b2eb4ff0fc
Relax Validation in Eval (#8902)
Just check for missing keys
2023-08-08 11:59:30 -07:00
Leonid Ganeline
2d078c7767
PubMed document loader (#8893)
- added `PubMed Document Loader` artifacts; ut-s; examples 
- fixed `PubMed utility`; ut-s

@hwchase17
2023-08-08 14:26:03 -04:00
Ofer Mendelevitch
a7824f16f2
Added consistent timeout for Vectara calls (#8892)
- Description: consistent timeout at 60s for all calls to Vectara API
- Tag maintainer: @rlancemartin, @eyurtsev

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-08 11:10:32 -07:00
Bagatur
642b57c7ff
nit (#8927) 2023-08-08 10:54:25 -07:00
manmax31
4a07fba9f0
Improve query prompt of BGE embeddings (#8908)
Replace this comment with:
- Description: Improved query of BGE embeddings after talking with the
devs of BGE embeddings ,
  - Dependencies: any dependencies required for this change,
  - Tag maintainer: @hwchase17 ,
  - Twitter handle: @ManabChetia3

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2023-08-08 10:20:37 -07:00
Chris Pappalardo
beab637f04
added filter kwarg to VectorStoreIndexWrapper query and query_with_so… (#8844)
- Description: added filter to query methods in VectorStoreIndexWrapper
for filtering by metadata (i.e. search_kwargs)
- Tag maintainer: @rlancemartin, @eyurtsev

Updated the doc snippet on this topic as well. It took me a long while
to figure out how to filter the vectorstore by filename, so this might
help someone else out.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-08 10:10:45 -07:00
David vonThenen
bf4a112aa6
Fixes to the Nebula LLM Integration (#8918)
This addresses some issues with introducing the Nebula LLM to LangChain
in this PR:
https://github.com/langchain-ai/langchain/pull/8876

This fixes the following:
- Removes `SYMBLAI` from variable names
- Fixes bug with `Bearer` for the API KEY


Thanks again in advance for your help!
cc: @hwchase17, @baskaryan

---------

Co-authored-by: dvonthenen <david.vonthenen@gmail.com>
2023-08-08 10:04:43 -07:00
Marie-Philippe Gill
6b9f266837
Add user_context to AmazonKendraRetriever (#8869)
### Description 

Now, we can pass information like a JWT token using user_context:  

```python
self.retriever = AmazonKendraRetriever(index_id=kendraIndexId, user_context={"Token": jwt_token})
```

- [x] `make lint`
- [x] `make format`
- [x] `make test`

Also tested by pip installing in my own project, and it allows access
through the token.

### Maintainers 

 @rlancemartin, @eyurtsev

### My twitter handle 

[girlknowstech](https://twitter.com/girlknowstech)
2023-08-08 08:37:03 -07:00
GitHub-L
67718c1d6b
Update OpenAPI code to fetch use the requestBody
- Description: The API doc passed to LLM only included the content of
responses but did not include the content of requestBody, causing the
agent to be unable to construct the correct request parameters based on
the requestBody information. Add two lines of code fixed the bug,
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
  - Tag maintainer: @hinthornw ,
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
2023-08-08 10:33:21 -04:00
Leonid Kuligin
52d6b91c18
Fixed a source for documents uploaded from GCS (#8912)
Sets source for documents uploaded from GCS to source on gcs
#8911

Co-authored-by: Leonid Kuligin <kuligin@google.com>
2023-08-08 09:34:43 -04:00
Bagatur
022ef170f8
bump 257 (#8903) 2023-08-08 01:16:33 -07:00
Jacob Lee
fa30a57034
Adds Ollama as an LLM (#8829)
Adds Ollama as an LLM. Ollama can run various open source models locally
e.g. Llama 2 and Vicuna, automatically configuring and GPU-optimizing
them.

@rlancemartin @hwchase17

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
2023-08-07 21:19:22 -07:00
Ash Vardanian
1f9124ceaa
Add: USearch Vector Store (#8835)
## Description

I am excited to propose an integration with USearch, a lightweight
vector-search engine available for both Python and JavaScript, among
other languages.

## Dependencies

It introduces a new PyPi dependency - `usearch`. I am unsure if it must
be added to the Poetry file, as this would make the PR too clunky.
Please let me know.

## Profiles

- Maintainers: @ashvardanian @davvard
- Twitter handles: @ashvardanian @unum_cloud

---------

Co-authored-by: Davit Vardanyan <78792753+davvard@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-07 20:41:00 -07:00
Leonid Kuligin
b52a3785c9
Allow to specify a custom loader for GcsFileLoader (#8868)
Co-authored-by: Leonid Kuligin <kuligin@google.com>
2023-08-07 22:57:31 -04:00
Bruno Bornsztein
d56eff042a
Make json output parser handle newlines inside markdown code blocks (#8682)
Update to #8528

Newlines and other special characters within markdown code blocks
returned as `action_input` should be handled correctly (in particular,
unescaped `"` => `\"` and `\n` => `\\n`) so they don't break JSON
parsing.

@baskaryan
2023-08-07 15:49:54 -07:00
Oege Dijk
cff52638b2
when encountering error during fetch return "" in web_base.py (#8753)
when e.g. downloading a sitemap with a malformed url (e.g.
"ttp://example.com/index.html" with the h omitted at the beginning of
the url), this will ensure that the sitemap download does not crash, but
just emits a warning. (maybe should be optional with e.g. a
`skip_faulty_urls:bool=True` parameter, but this was the most
straightforward fix)

@rlancemartin, @eyurtsev
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-07 15:35:41 -07:00
Bennji94
33cdb06b5c
Async RetryOutputParser, RetryWithErrorOutputParser and OutputFixingParser (#8776)
Added async parsing functions for RetryOutputParser,
RetryWithErrorOutputParser and OutputFixingParser.

The async parse functions call the arun methods of the used LLMChains.

Fix for #7989

---------

Co-authored-by: Benjamin May <benjamin.may94@gmail.com>
2023-08-07 14:42:48 -07:00
Joshua Sundance Bailey
7fc07ba5df
Create ChatAnyscale (#8770)
- Description: Adds the ChatAnyscale class with llama-2 7b, llama-2 13b,
and llama-2 70b on [Anyscale
Endpoints](https://app.endpoints.anyscale.com/)
- It inherits from ChatOpenAI and requires openai (probably unnecessary
but it made for a quick and easy implementation)
- Inspired by https://github.com/langchain-ai/langchain/pull/8434
(@kylehh and @baskaryan )
2023-08-07 13:21:05 -07:00
idcore
fe78aff1f2
Add new parameter forced_decoder_ids to OpenAIWhisperParserLocal + small bug fix (#8793)
- Description: new parameter forced_decoder_ids for
OpenAIWhisperParserLocal to force input language, and enable optional
translate mode. Usage example:
processor = WhisperProcessor.from_pretrained("openai/whisper-medium")
forced_decoder_ids = processor.get_decoder_prompt_ids(language="french",
task="transcribe")
#forced_decoder_ids =
processor.get_decoder_prompt_ids(language="french", task="translate")
loader = GenericLoader(YoutubeAudioLoader(urls, save_dir),
OpenAIWhisperParserLocal(lang_model="openai/whisper-medium",forced_decoder_ids=forced_decoder_ids))
  - Issue #8792
  - Tag maintainer: @rlancemartin, @eyurtsev

---------

Co-authored-by: idcore <eugene.novozhilov@gmail.com>
2023-08-07 13:17:58 -07:00
David vonThenen
40079d4936
Introduce Nebula LLM to LangChain (#8876)
## Description

This PR adds Nebula to the available LLMs in LangChain.

Nebula is an LLM focused on conversation understanding and enables users
to extract conversation insights from video, audio, text, and chat-based
conversations. These conversations can occur between any mix of human or
AI participants.

Examples of some questions you could ask Nebula from a given
conversation are:
- What could be the customer’s pain points based on the conversation?
- What sales opportunities can be identified from this conversation?
- What best practices can be derived from this conversation for future
customer interactions?

You can read more about Nebula here:

https://symbl.ai/blog/extract-insights-symbl-ai-generative-ai-recall-ai-meetings/

#### Integration Test 

An integration test is added, but it requires network access. Since
Nebula is fully managed like OpenAI, network access is required to
exercise the integration test.

#### Linting

- [x] make lint
- [x] make test (TODO: there seems to be a failure in another
non-related test??? Need to check on this.)
- [x] make format

### Dependencies

No new dependencies were introduced.

### Twitter handle

[@symbldotai](https://twitter.com/symbldotai)
[@dvonthenen](https://twitter.com/dvonthenen)


If you have any questions, please let me know.

cc: @hwchase17, @baskaryan

---------

Co-authored-by: dvonthenen <david.vonthenen@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-07 13:15:26 -07:00
Eugene Yurtsev
f616aee35a
JsonOutputFunctionParser: Fix mutation in place bug (#8758)
Fixes mutation in place in the JsonOutputFunctionParser. This causes
issues when trying to re-use the original AI message.
2023-08-07 14:32:46 -04:00
shibuiwilliam
ab47557db3
fix evaluation parse test (#8859)
# What
- fix evaluation parse test

<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: Fix evaluation parse test
  - Issue: None
  - Dependencies: None
  - Tag maintainer: @baskaryan
  - Twitter handle: @MLOpsJ

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-08-07 11:15:41 -07:00
manmax31
40096c73cd
Add BGE embeddings support (#8848)
- Description: [BGE-large](https://huggingface.co/BAAI/bge-large-en)
embeddings from BAAI are at the top of [MTEB
leaderboard](https://huggingface.co/spaces/mteb/leaderboard). Hence
adding support for it.
- Tag maintainer: @baskaryan
- Twitter handle: @ManabChetia3

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-08-07 11:15:30 -07:00
shibuiwilliam
fbc83dfdbb
Fix/abstract add message (#8856)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: Fix/abstract add message
  - Issue: None
  - Dependencies: None
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
  - Twitter handle: @MLOpsJ

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-08-07 11:02:19 -07:00
William FH
91be7eee66
Add concurrency support for run_on_dataset (#8841)
Long-term, would be better to use the lower-level batch() method(s) but
it may take me a bit longer to clean up. This unblocks in the meantime,
though it may fail when the evaluated chain raises a
`NotImplementedError` for a corresponding async method
2023-08-07 09:24:48 -07:00
Bagatur
fc2f450f2d
bump 256 (#8870) 2023-08-07 08:29:02 -07:00
Tudor Golubenco
aeaef8f3a3
Add support for Xata as a vector store (#8822)
This adds support for [Xata](https://xata.io) (data platform based on
Postgres) as a vector store. We have recently added [Xata to
Langchain.js](https://github.com/hwchase17/langchainjs/pull/2125) and
would love to have the equivalent in the Python project as well.

The PR includes integration tests and a Jupyter notebook as docs. Please
let me know if anything else would be needed or helpful.

I have added the xata python SDK as an optional dependency.

## To run the integration tests

You will need to create a DB in xata (see the docs), then run something
like:

```
OPENAI_API_KEY=sk-... XATA_API_KEY=xau_... XATA_DB_URL='https://....xata.sh/db/langchain'  poetry run pytest tests/integration_tests/vectorstores/test_xata.py
```

<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Philip Krauss <35487337+philkra@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-07 08:14:52 -07:00
Leonid Kuligin
6e3fa59073
Added chat history to codey models (#8831)
#7469

since 1.29.0, Vertex SDK supports a chat history provided to a codey
chat model.

Co-authored-by: Leonid Kuligin <kuligin@google.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-08-07 07:34:35 -07:00
Massimiliano Pronesti
a616e19975
feat(llms): add support for vLLM (#8806)
Hello langchain maintainers, 
this PR aims at integrating
[vllm](https://vllm.readthedocs.io/en/latest/#) into langchain. This PR
closes #8729.

This feature clearly depends on `vllm`, but I've seen other models
supported here depend on packages that are not included in the
pyproject.toml (e.g. `gpt4all`, `text-generation`) so I thought it was
the case for this as well.

@hwchase17, @baskaryan

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-08-07 07:32:02 -07:00
Bagatur
100d9ce4c7
bump 255 (#8865) 2023-08-07 07:25:23 -07:00
Vic Cao
c9da300e4d
fix: overwrite stream for ChatOpenAI in runtime (#8288)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
@hwchase17, @baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Nuno Campos <nuno@boringbits.io>
2023-08-07 10:18:30 +01:00
Karthik Raja A
5a9765b1b5
MultiOn client toolkit update 2.0 (#8750)
- Updated to use newer better function interaction
 - Previous version had only one callback
 - @hinthornw @hwchase17  Can you look into this
 -  Shout out to @MultiON_AI @DivGarg9 on twitter

---------

Co-authored-by: Naman Garg <ngarg3@binghamton.edu>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-08-06 22:24:10 -07:00
Emre
454998c1fb
Fix invalid escape sequence warnings (#8771)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

Description: The lines I have changed looks like incorrectly escaped for
regex. In python 3.11, I receive DeprecationWarning for these lines.
You don't see any warnings unless you explicitly run python with `-W
always::DeprecationWarning` flag. So, this is my attempt to fix it.

Here are the warnings from log files:

```
/usr/local/lib/python3.11/site-packages/langchain/text_splitter.py:919: DeprecationWarning: invalid escape sequence '\s'
/usr/local/lib/python3.11/site-packages/langchain/text_splitter.py:918: DeprecationWarning: invalid escape sequence '\s'
/usr/local/lib/python3.11/site-packages/langchain/text_splitter.py:917: DeprecationWarning: invalid escape sequence '\s'
/usr/local/lib/python3.11/site-packages/langchain/text_splitter.py:916: DeprecationWarning: invalid escape sequence '\c'
/usr/local/lib/python3.11/site-packages/langchain/text_splitter.py:903: DeprecationWarning: invalid escape sequence '\*'
/usr/local/lib/python3.11/site-packages/langchain/text_splitter.py:804: DeprecationWarning: invalid escape sequence '\*'
/usr/local/lib/python3.11/site-packages/langchain/text_splitter.py:804: DeprecationWarning: invalid escape sequence '\*'
```

cc @baskaryan

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-08-06 17:01:18 -07:00
Harrison Chase
0adc282d70
Harrison/as retriever docstring (#8840)
Co-authored-by: Bytestorm <31070777+Bytestorm5@users.noreply.github.com>
2023-08-06 17:00:57 -07:00
Zend
bd4865b6fe
Async Recursive URL loader (#8502)
Description: This PR improves the function of recursive_url_loader, such
as limiting the depth of the access, and customizable extractors(from
the raw webpage to the text of the Document object), so that users can
use other tools to extract the webpage. This PR also includes the
document and test for the new loader.
Old PR closed due to project structure change. #7756

Because socket requests are not allowed, the old unit test was removed.
Issue: N/A
Dependencies: asyncio, aiohttp
Tag maintainer: @rlancemartin
Twitter handle: @ Zend_Nihility

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
2023-08-06 16:22:31 -07:00
fqassemi
485d716c21
Feature faiss delete (#8135)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
- Description: docstore had two main method: add and search, however,
dealing with docstore sometimes requires deleting an entry from
docstore. So I have added a simple delete method that deletes items from
docstore. Additionally, I have added the delete method to faiss
vectorstore for the very same reason.
  - Issue: NA
  - Dependencies: NA
  - Tag maintainer:  @rlancemartin, @eyurtsev
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-08-06 15:46:30 -07:00
Pierre Alexandre SCHEMBRI
4a7ebb7184
Fix issue #7616 (#7617)
Fix Issue #7616 with a simpler approach to extract function names (use
`__name__` attribute)

@hwchase17

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-08-06 15:12:03 -07:00
Ankur Agarwal
797c9e92c8
#8786 Fixed: Callback handler disconnect in between (#8787)
Fixes for  #8786 @agola11 

- Description: The flow of callback is breaking till the last chain, as
callbacks are missed in between chain along nested path. This will help
get full trace and correlate parent child relationship in all nested
chains.

  - Issue: the issue #8786 
  - Dependencies: NA
  - Tag maintainer: @agola11 
  - Twitter handle: Agarwal_Ankur
2023-08-06 15:11:45 -07:00
William FH
983678dedc
Add Dist Metrics for String Distance Evaluation (#8837)
Co-authored-by: shibuiwilliam <shibuiyusuke@gmail.com>
2023-08-06 14:05:00 -07:00
William FH
f76d50d8dc
fix exception inconsistencies (#8812) (#8839)
Merge #8812 with main to fix unrelated test failure

Co-authored-by: shibuiwilliam <shibuiyusuke@gmail.com>
2023-08-06 14:04:49 -07:00
Bagatur
15c271e7b3
bump 254 (#8834) 2023-08-06 11:34:54 -07:00
Bagatur
d7b613a293
Bagatur/revert revert nuclia (#8833) 2023-08-06 11:24:36 -07:00
Bagatur
2f309a4ce6
Revert "Bagatur/nuclia (#8404)" (#8832) 2023-08-06 11:14:01 -07:00