Commit Graph

1506 Commits

Author SHA1 Message Date
Samuel Berthe
2be3515a66
SQLDatabase: adding security disclamer (#7710)
It might be obvious to most engineers, but I think everybody should be
cautious when using such a chain.

![image](https://github.com/hwchase17/langchain/assets/2951285/a1df6567-9d56-4c12-98ea-767401ae2ac8)
2023-07-14 13:38:16 -04:00
Bagatur
bae93682f6
update docs (#7714) 2023-07-14 11:49:09 -04:00
Bagatur
b065da6933
Bagatur/docs nit (#7712) 2023-07-14 11:13:02 -04:00
Bagatur
87d81b6acc
Redirect old text splitter page (#7708)
related to #7665
2023-07-14 11:12:18 -04:00
Aarav Borthakur
210296a71f
Integrate Rockset as a document loader (#7681)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

Integrate [Rockset](https://rockset.com/docs/) as a document loader.

Issue: None
Dependencies: Nothing new (rockset's dependency was already added
[here](https://github.com/hwchase17/langchain/pull/6216))
Tag maintainer: @rlancemartin

I have added a test for the integration and an example notebook showing
its use. I ran `make lint` and everything looks good.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-14 07:58:13 -07:00
Samuel Berthe
7d4843fe84
feat(chains): adding ElasticsearchDatabaseChain for interacting with analytics database (#7686)
This pull request adds a ElasticsearchDatabaseChain chain for
interacting with analytics database, in the manner of the
SQLDatabaseChain.

Maintainer: @samber
Twitter handler: samuelberthe

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-14 10:30:57 -04:00
Daniel
6d88b23ef7
Update pgembedding.ipynb (#7699)
Update the extension name. It changed from pg_hnsw to pg_embedding.

Thank you. I missed this in my previous commit.
2023-07-14 08:39:01 -04:00
Richy Wang
45bb414be2
Add LLM for Alibaba's Damo Academy's Tongyi Qwen API (#7477)
- Add langchain.llms.Tonyi for text completion, in examples into the
Tonyi Text API,
- Add system tests.

Note async completion for the Text API is not yet supported and will be
included in a future PR.

Dependencies: dashscope. It will be installed manually cause it is not
need by everyone.

Happy for feedback on any aspect of this PR @hwchase17 @baskaryan.
2023-07-14 01:58:22 -04:00
Lance Martin
6325a3517c
Make recursive loader yield while crawling (#7568)
Support actual lazy_load since it can take a while to crawl larger
directories.
2023-07-13 21:55:20 -07:00
UmerHA
82f3e32d8d
[Small upgrade] Allow document limit in AzureCognitiveSearchRetriever (#7690)
Multiple people have asked in #5081 for a way to limit the documents
returned from an AzureCognitiveSearchRetriever. This PR adds the `top_n`
parameter to allow that.


Twitter handle:
 [@UmerHAdil](twitter.com/umerHAdil)
2023-07-13 23:04:40 -04:00
Daniel
854f3fe9b1
Update pgembedding.ipynb (#7682)
Correct links to the pg_embedding repository and the Neon documentation.
2023-07-13 19:54:07 -04:00
William FH
051fac1e66
Improve walkthrough links for sphinx (#7672)
Co-authored-by: Ankush Gola <9536492+agola11@users.noreply.github.com>
2023-07-13 16:08:31 -07:00
Bagatur
5db4dba526
add integrations hub link to docs (#7675) 2023-07-13 18:44:10 -04:00
Jasper
fbc97a77ed
add browserless loader (#7562)
# Browserless

Added support for Browserless' `/content` endpoint as a document loader.

### About Browserless

Browserless is a cloud service that provides access to headless Chrome
browsers via a REST API. It allows developers to automate Chromium in a
serverless fashion without having to configure and maintain their own
Chrome infrastructure.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Lance Martin <lance@langchain.dev>
2023-07-13 13:18:28 -07:00
frangin2003
c7b687e944
Simplify GraphQL Tool Initialization documentation by Removing 'llm' Argument (#7651)
This PR is aimed at enhancing the clarity of the documentation in the
langchain project.

**Description**:
In the graphql.ipynb file, I have removed the unnecessary 'llm' argument
from the initialization process of the GraphQL tool (of type
_EXTRA_OPTIONAL_TOOLS). The 'llm' argument is not required for this
process. Its presence could potentially confuse users. This modification
simplifies the understanding of tool initialization and minimizes
potential confusion.

**Issue**: Not applicable, as this is a documentation improvement.

**Dependencies**: None.

**I kindly request a review from the following maintainer**: @hinthornw,
who is responsible for Agents / Tools / Toolkits.

No new integration is being added in this PR, hence no need for a test
or an example notebook.

Please see the changes for more detail and let me know if any further
modification is necessary.
2023-07-13 14:52:07 -04:00
William FH
a673a51efa
[Breaking] Update Evaluation Functionality (#7388)
- Migrate from deprecated langchainplus_sdk to `langsmith` package
- Update the `run_on_dataset()` API to use an eval config
- Update a number of evaluators, as well as the loading logic
- Update docstrings / reference docs
- Update tracer to share single HTTP session
2023-07-13 02:13:06 -07:00
Matt Adams
98e1bbfbbd
Add missing dependencies to apify.ipynb (#6331)
Fixes errors caused by missing dependencies when running the notebook.
2023-07-13 03:02:23 -04:00
Francisco Ingham
488d2d5da9
Entity extraction improvements (#6342)
Added fix to avoid irrelevant attributes being returned plus an example
of extracting unrelated entities and an exampe of using an 'extra_info'
attribute to extract unstructured data for an entity.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-13 02:16:05 -04:00
Bagatur
7f8ff2a317
add tagger nb (#7637) 2023-07-13 01:48:23 -04:00
Jacob Lee
cdb93ab5ca
Adds OpenAI functions powered document metadata tagger (#7521)
Adds a new document transformer that automatically extracts metadata for
a document based on an input schema. I also moved
`document_transformers.py` to `document_transformers/__init__.py` to
group it with this new transformer - it didn't seem to cause issues in
the notebook, but let me know if I've done something wrong there.

Also had a linter issue I couldn't figure out:

```
MacBook-Pro:langchain jacoblee$ make lint
poetry run mypy .
docs/dist/conf.py: error: Duplicate module named "conf" (also at "./docs/api_reference/conf.py")
docs/dist/conf.py: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#mapping-file-paths-to-modules for more info
docs/dist/conf.py: note: Common resolutions include: a) using `--exclude` to avoid checking one of them, b) adding `__init__.py` somewhere, c) using `--explicit-package-bases` or adjusting MYPYPATH
Found 1 error in 1 file (errors prevented further checking)
make: *** [lint] Error 2
```

@rlancemartin @baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-13 01:12:41 -04:00
Jason Fan
8effd90be0
Add new types of document transformers (#7379)
- Description: Add two new document transformers that translates
documents into different languages and converts documents into q&a
format to improve vector search results. Uses OpenAI function calling
via the [doctran](https://github.com/psychic-api/doctran/tree/main)
library.
  - Issue: N/A
  - Dependencies: `doctran = "^0.0.5"`
  - Tag maintainer: @rlancemartin @eyurtsev @hwchase17 
  - Twitter handle: @psychicapi or @jfan001

Notes
- Adheres to the `DocumentTransformer` abstraction set by @dev2049 in
#3182
- refactored `EmbeddingsRedundantFilter` to put it in a file under a new
`document_transformers` module
- Added basic docs for `DocumentInterrogator`, `DocumentTransformer` as
well as the existing `EmbeddingsRedundantFilter`

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 23:53:30 -04:00
Jamie Broomall
0e1d7a27c6
WhyLabsCallbackHandler updates (#7621)
Updates to the WhyLabsCallbackHandler and example notebook
- Update dependency to langkit 0.0.6 which defines new helper methods
for callback integrations
- Update WhyLabsCallbackHandler to use the new `get_callback_instance`
so that the callback is mostly defined in langkit
- Remove much of the implementation of the WhyLabsCallbackHandler here
in favor of the callback instance

This does not change the behavior of the whylabs callback handler
implementation but is a reorganization that moves some of the
implementation externally to our optional dependency package, and should
make future updates easier.

@agola11
2023-07-12 23:46:56 -04:00
Gaurang Pawar
53722dcfdc
Fixed a typo in pinecone_hybrid_search.ipynb (#7627)
Fixed a small typo in documentation
2023-07-12 23:46:41 -04:00
Bagatur
ee70d4a0cd
mv tutorials (#7614) 2023-07-12 17:33:36 -04:00
Yaohui Wang
d85c33a5c3
Fix the markdown rendering issue with a code block inside a markdown code block (#6625)
### Description

- Fix the markdown rendering issue with a code block inside a markdown,
using a different number of backticks for the delimiters.

Current doc site:
<https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/code_splitter#markdown>

After fix:
<img width="480" alt="image"
src="https://github.com/hwchase17/langchain/assets/3115235/d9921d59-64e6-4a34-9c62-79743667f528">


### Who can review

PTAL @dev2049 

Co-authored-by: Yaohui Wang <wangyaohui.01@bytedance.com>
2023-07-12 16:29:25 -04:00
Yaroslav Halchenko
0d92a7f357
codespell: workflow, config + some (quite a few) typos fixed (#6785)
Probably the most  boring PR to review ;)

Individual commits might be easier to digest

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2023-07-12 16:20:08 -04:00
Sam
931e68692e
Adds a chain around sympy for symbolic math (#6834)
- Description: Adds a new chain that acts as a wrapper around Sympy to
give LLMs the ability to do some symbolic math.
- Dependencies: SymPy

---------

Co-authored-by: sreiswig <sreiswig@github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 15:17:32 -04:00
Subsegment
6e1000dc8d
docs : Use more meaningful cnosdb examples (#7587)
This change makes the ecosystem integrations cnosdb documentation more
realistic and easy to understand.

- change examples of question and table
- modify typo and format
2023-07-12 10:31:55 -04:00
ausboss
50316f6477
Adding LLM wrapper for Kobold AI (#7560)
- Description: add wrapper that lets you use KoboldAI api in langchain
  - Issue: n/a
  - Dependencies: none extra, just what exists in lanchain
  - Tag maintainer: @baskaryan 
  - Twitter handle: @zanzibased
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 03:48:12 -04:00
os1ma
2667ddc686
Fix make docs_build and related scripts (#7276)
**Description: a description of the change**

Fixed `make docs_build` and related scripts which caused errors. There
are several changes.

First, I made the build of the documentation and the API Reference into
two separate commands. This is because it takes less time to build. The
commands for documents are `make docs_build`, `make docs_clean`, and
`make docs_linkcheck`. The commands for API Reference are `make
api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`.

It looked like `docs/.local_build.sh` could be used to build the
documentation, so I used that. Since `.local_build.sh` was also building
API Rerefence internally, I removed that process. `.local_build.sh` also
added some Bash options to stop in error or so. Futher more added `cd
"${SCRIPT_DIR}"` at the beginning so that the script will work no matter
which directory it is executed in.

`docs/api_reference/api_reference.rst` is removed, because which is
generated by `docs/api_reference/create_api_rst.py`, and added it to
.gitignore.

Finally, the description of CONTRIBUTING.md was modified.

**Issue: the issue # it fixes (if applicable)**

https://github.com/hwchase17/langchain/issues/6413

**Dependencies: any dependencies required for this change**

`nbdoc` was missing in group docs so it was added. I installed it with
the `poetry add --group docs nbdoc` command. I am concerned if any
modifications are needed to poetry.lock. I would greatly appreciate it
if you could pay close attention to this file during the review.

**Tag maintainer**
- General / Misc / if you don't know who to tag: @baskaryan

If this PR needs any additional changes, I'll be happy to make them!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-11 22:05:14 -04:00
schop-rob
e811c5e8c6
Add OpenAI organization ID to docs (#7398)
Description: I added an example of how to reference the OpenAI API
Organization ID, because I couldn't find it before. In the example, it
is mentioned how to achieve this using environment variables as well as
parameters for the OpenAI()-class
Issue: -
Dependencies: -
Twitter @schop-rob
2023-07-11 20:51:58 -04:00
Kenny
8741e55e7c
Template formats documentation (#7404)
Simple addition to the documentation, adding the correct import
statement & showcasing using Python FStrings.
2023-07-11 18:24:24 -04:00
OwenElliott
9cb2347453
Fix broken link from Marqo Ecosystem (#7510)
Small fix to a link from the Marqo page in the ecosystem.

The link was not updated correctly when the documentation structure
changed to html pages instead of links to notebooks.
2023-07-11 17:15:15 -04:00
Kacper Łukawski
1f83b5f47e
Reuse the existing collection if configured properly in Qdrant.from_texts (#7530)
This PR changes the behavior of `Qdrant.from_texts` so the collection is
reused if not requested to recreate it. Previously, calling
`Qdrant.from_texts` or `Qdrant.from_documents` resulted in removing the
old data which was confusing for many.
2023-07-11 16:24:35 -04:00
Felix Brockmeier
406a9dc11f
Add notebook example for Lemon AI NLP Workflow Automation (#7556)
- Description: Added notebook to LangChain docs that explains how to use
Lemon AI NLP Workflow Automation tool with Langchain
  
- Issue: not applicable
  
- Dependencies: not applicable
  
- Tag maintainer: @agola11
  
- Twitter handle: felixbrockm
2023-07-11 15:15:11 -04:00
Lance Martin
9e067b8cc9
Add env setup (#7550)
Include setup
2023-07-11 09:48:40 -07:00
Bagatur
d2137eea9f
fix cpal docs (#7545) 2023-07-11 11:07:45 -04:00
Boris
9129318466
CPAL (#6255)
# Causal program-aided language (CPAL) chain

## Motivation

This builds on the recent [PAL](https://arxiv.org/abs/2211.10435) to
stop LLM hallucination. The problem with the
[PAL](https://arxiv.org/abs/2211.10435) approach is that it hallucinates
on a math problem with a nested chain of dependence. The innovation here
is that this new CPAL approach includes causal structure to fix
hallucination.

For example, using the below word problem, PAL answers with 5, and CPAL
answers with 13.

    "Tim buys the same number of pets as Cindy and Boris."
    "Cindy buys the same number of pets as Bill plus Bob."
    "Boris buys the same number of pets as Ben plus Beth."
    "Bill buys the same number of pets as Obama."
    "Bob buys the same number of pets as Obama."
    "Ben buys the same number of pets as Obama."
    "Beth buys the same number of pets as Obama."
    "If Obama buys one pet, how many pets total does everyone buy?"

The CPAL chain represents the causal structure of the above narrative as
a causal graph or DAG, which it can also plot, as shown below.


![complex-graph](https://github.com/hwchase17/langchain/assets/367522/d938db15-f941-493d-8605-536ad530f576)

.

The two major sections below are:

1. Technical overview
2. Future application

Also see [this jupyter
notebook](https://github.com/borisdev/langchain/blob/master/docs/extras/modules/chains/additional/cpal.ipynb)
doc.


## 1. Technical overview

### CPAL versus PAL

Like [PAL](https://arxiv.org/abs/2211.10435), CPAL intends to reduce
large language model (LLM) hallucination.

The CPAL chain is different from the PAL chain for a couple of reasons. 

* CPAL adds a causal structure (or DAG) to link entity actions (or math
expressions).
* The CPAL math expressions are modeling a chain of cause and effect
relations, which can be intervened upon, whereas for the PAL chain math
expressions are projected math identities.

PAL's generated python code is wrong. It hallucinates when complexity
increases.

```python
def solution():
    """Tim buys the same number of pets as Cindy and Boris.Cindy buys the same number of pets as Bill plus Bob.Boris buys the same number of pets as Ben plus Beth.Bill buys the same number of pets as Obama.Bob buys the same number of pets as Obama.Ben buys the same number of pets as Obama.Beth buys the same number of pets as Obama.If Obama buys one pet, how many pets total does everyone buy?"""
    obama_pets = 1
    tim_pets = obama_pets
    cindy_pets = obama_pets + obama_pets
    boris_pets = obama_pets + obama_pets
    total_pets = tim_pets + cindy_pets + boris_pets
    result = total_pets
    return result  # math result is 5
```

CPAL's generated python code is correct.

```python
story outcome data
    name                                   code  value      depends_on
0  obama                                   pass    1.0              []
1   bill               bill.value = obama.value    1.0         [obama]
2    bob                bob.value = obama.value    1.0         [obama]
3    ben                ben.value = obama.value    1.0         [obama]
4   beth               beth.value = obama.value    1.0         [obama]
5  cindy   cindy.value = bill.value + bob.value    2.0     [bill, bob]
6  boris   boris.value = ben.value + beth.value    2.0     [ben, beth]
7    tim  tim.value = cindy.value + boris.value    4.0  [cindy, boris]

query data
{
    "question": "how many pets total does everyone buy?",
    "expression": "SELECT SUM(value) FROM df",
    "llm_error_msg": ""
}
# query result is 13
```

Based on the comments below, CPAL's intended location in the library is
`experimental/chains/cpal` and PAL's location is`chains/pal`.

### CPAL vs Graph QA

Both the CPAL chain and the Graph QA chain extract entity-action-entity
relations into a DAG.

The CPAL chain is different from the Graph QA chain for a few reasons.

* Graph QA does not connect entities to math expressions
* Graph QA does not associate actions in a sequence of dependence.
* Graph QA does not decompose the narrative into these three parts:
  1. Story plot or causal model
  4. Hypothetical question
  5. Hypothetical condition 

### Evaluation

Preliminary evaluation on simple math word problems shows that this CPAL
chain generates less hallucination than the PAL chain on answering
questions about a causal narrative. Two examples are in [this jupyter
notebook](https://github.com/borisdev/langchain/blob/master/docs/extras/modules/chains/additional/cpal.ipynb)
doc.

## 2. Future application

### "Describe as Narrative, Test as Code"

The thesis here is that the Describe as Narrative, Test as Code approach
allows you to represent a causal mental model both as code and as a
narrative, giving you the best of both worlds.

#### Why describe a causal mental mode as a narrative?

The narrative form is quick. At a consensus building meeting, people use
narratives to persuade others of their causal mental model, aka. plan.
You can share, version control and index a narrative.

#### Why test a causal mental model as a code?

Code is testable, complex narratives are not. Though fast, narratives
are problematic as their complexity increases. The problem is LLMs and
humans are prone to hallucination when predicting the outcomes of a
narrative. The cost of building a consensus around the validity of a
narrative outcome grows as its narrative complexity increases. Code does
not require tribal knowledge or social power to validate.

Code is composable, complex narratives are not. The answer of one CPAL
chain can be the hypothetical conditions of another CPAL Chain. For
stochastic simulations, a composable plan can be integrated with the
[DoWhy library](https://github.com/py-why/dowhy). Lastly, for the
futuristic folk, a composable plan as code allows ordinary community
folk to design a plan that can be integrated with a blockchain for
funding.

An explanation of a dependency planning application is
[here.](https://github.com/borisdev/cpal-llm-chain-demo)

--- 
Twitter handle: @boris_dev

---------

Co-authored-by: Boris Dev <borisdev@Boriss-MacBook-Air.local>
2023-07-11 10:11:21 -04:00
Alejandra De Luna
2e4047e5e7
feat: support generate as an early stopping method for OpenAIFunctionsAgent (#7229)
This PR proposes an implementation to support `generate` as an
`early_stopping_method` for the new `OpenAIFunctionsAgent` class.

The motivation behind is to facilitate the user to set a maximum number
of actions the agent can take with `max_iterations` and force a final
response with this new agent (as with the `Agent` class).

The following changes were made:

- The `OpenAIFunctionsAgent.return_stopped_response` method was
overwritten to support `generate` as an `early_stopping_method`
- A boolean `with_functions` parameter was added to the
`OpenAIFunctionsAgent.plan` method

This way the `OpenAIFunctionsAgent.return_stopped_response` method can
call the `OpenAIFunctionsAgent.plan` method with `with_function=False`
when the `early_stopping_method` is set to `generate`, making a call to
the LLM with no functions and forcing a final response from the
`"assistant"`.

  - Relevant maintainer: @hinthornw
  - Twitter handle: @aledelunap

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-11 09:25:02 -04:00
Lance Martin
4a94f56258
Minor edits to QA docs (#7507)
Small clean-ups
2023-07-10 22:15:05 -07:00
Lance Martin
bd0c6381f5
Minor update to clarify map-reduce custom prompt usage (#7453)
Update docs for map-reduce custom prompt usage
2023-07-10 16:43:44 -07:00
Lance Martin
28d2b213a4
Update landing page for "question answering over documents" (#7152)
Improve documentation for a central use-case, qa / chat over documents.

This will be merged as an update to `index.mdx`
[here](https://python.langchain.com/docs/use_cases/question_answering/).

Testing w/ local Docusaurus server:

```
From `docs` directory:
mkdir _dist
cp -r {docs_skeleton,snippets} _dist
cp -r extras/* _dist/docs_skeleton/docs
cd _dist/docs_skeleton
yarn install
yarn start
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-10 14:15:13 -07:00
Adilkhan Sarsen
5debd5043e
Added deeplake use case examples of the new features (#6528)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
 
 1. Added use cases of the new features
 2. Done some code refactoring

---------

Co-authored-by: Ivo Stranic <istranic@gmail.com>
2023-07-10 07:04:29 -07:00
Kazuki Maeda
92b4418c8c
Datadog logs loader (#7356)
### Description
Created a Loader to get a list of specific logs from Datadog Logs.

### Dependencies
`datadog_api_client` is required.

### Twitter handle
[kzk_maeda](https://twitter.com/kzk_maeda)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-10 04:27:55 -04:00
Yifei Song
7d29bb2c02
Add Xorbits Dataframe as a Document Loader (#7319)
- [Xorbits](https://doc.xorbits.io/en/latest/) is an open-source
computing framework that makes it easy to scale data science and machine
learning workloads in parallel. Xorbits can leverage multi cores or GPUs
to accelerate computation on a single machine, or scale out up to
thousands of machines to support processing terabytes of data.

- This PR added support for the Xorbits document loader, which allows
langchain to leverage Xorbits to parallelize and distribute the loading
of data.
- Dependencies: This change requires the Xorbits library to be installed
in order to be used.
`pip install xorbits`
- Request for review: @rlancemartin, @eyurtsev
- Twitter handle: https://twitter.com/Xorbitsio

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-10 04:24:47 -04:00
Paul-Emile Brotons
d2cf0d16b3
adding max_marginal_relevance_search method to MongoDBAtlasVectorSearch (#7310)
Adding a maximal_marginal_relevance method to the
MongoDBAtlasVectorSearch vectorstore enhances the user experience by
providing more diverse search results

Issue: #7304
2023-07-10 04:04:19 -04:00
Matt Robinson
bcab894f4e
feat: Add UnstructuredTSVLoader (#7367)
### Summary

Adds an `UnstructuredTSVLoader` for TSV files. Also updates the doc
strings for `UnstructuredCSV` and `UnstructuredExcel` loaders.

### Testing

```python
from langchain.document_loaders.tsv import UnstructuredTSVLoader

loader = UnstructuredTSVLoader(
    file_path="example_data/mlb_teams_2012.csv", mode="elements"
)
docs = loader.load()
```
2023-07-10 03:07:10 -04:00
nikkie
dfc3f83b0f
docs(vectorstores/integrations/chroma): Fix loading and saving (#7437)
- Description: Fix loading and saving code about Chroma
- Issue: the issue #7436 
- Dependencies: -
- Twitter handle: https://twitter.com/ftnext
2023-07-10 02:05:15 -04:00
Daniel Chalef
c7f7788d0b
Add ZepMemory; improve ZepChatMessageHistory handling of metadata; Fix bugs (#7444)
Hey @hwchase17 - 

This PR adds a `ZepMemory` class, improves handling of Zep's message
metadata, and makes it easier for folks building custom chains to
persist metadata alongside their chat history.

We've had plenty confused users unfamiliar with ChatMessageHistory
classes and how to wrap the `ZepChatMessageHistory` in a
`ConversationBufferMemory`. So we've created the `ZepMemory` class as a
light wrapper for `ZepChatMessageHistory`.

Details:
- add ZepMemory, modify notebook to demo use of ZepMemory
- Modify summary to be SystemMessage
- add metadata argument to add_message; add Zep metadata to
Message.additional_kwargs
- support passing in metadata
2023-07-10 01:53:49 -04:00
Saurabh Chaturvedi
8f8e8d701e
Fix info about YouTube (#7447)
(Unintentionally mean 😅) nit: YouTube wasn't created by Google, this PR
fixes the mention in docs.
2023-07-10 01:52:55 -04:00
Jeroen Van Goey
f5bd88757e
Fix typo (#7416)
`quesitons` -> `questions`.
2023-07-09 00:54:48 -04:00
Nolan
5da9f9abcb
docs(agents/toolkits): Fix error in document_comparison_toolkit.ipynb (#7417)
Replace this comment with:
- Description: Removes unneeded output warning in documentation at
https://python.langchain.com/docs/modules/agents/toolkits/document_comparison_toolkit
  - Issue: -
  - Dependencies: -
  - Tag maintainer: @baskaryan
  - Twitter handle: @finnless
2023-07-08 19:51:08 -04:00
nikkie
2eb4a2ceea
docs(retrievers/get-started): Fix broken state_of_the_union.txt link (#7399)
Thank you for this awesome library.

- Description: Fix broken link in documentation 
- Issue:
-
https://python.langchain.com/docs/modules/data_connection/retrievers/#get-started
- the URL:
https://github.com/hwchase17/langchain/blob/master/docs/modules/state_of_the_union.txt
- I think the right one is
https://github.com/hwchase17/langchain/blob/master/docs/extras/modules/state_of_the_union.txt
- Dependencies: -
- Tag maintainer: @baskaryan
- Twitter handle: -
2023-07-08 11:11:05 -04:00
Delgermurun
a1603fccfb
integrate JinaChat (#6927)
Integration with https://chat.jina.ai/api. It is OpenAI compatible API.

- Twitter handle:
[https://twitter.com/JinaAI_](https://twitter.com/JinaAI_)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-07-08 02:17:04 -04:00
Roger Yu
633b673b85
Update pinecone.ipynb (#7382)
Fix typo
2023-07-08 01:48:03 -04:00
William FH
4789c99bc2
Add String Distance and Embedding Evaluators (#7123)
Add a string evaluator and pairwise string evaluator implementation for:
- Embedding distance
- String distance

Update docs
2023-07-07 21:44:31 -07:00
Bagatur
1ac347b4e3
update databerry-chaindesk redirect (#7378) 2023-07-07 19:11:46 -04:00
Joshua Carroll
705d2f5b92
Update the API Reference link in Streamlit integration docs (#7377)
This page:


https://python.langchain.com/docs/modules/callbacks/integrations/streamlit

Has a bad API Reference link currently. This PR fixes it to the correct
link.

Also updates the embedded app link to
https://langchain-mrkl.streamlit.app/ (better name) which is hosted in
langchain-ai/streamlit-agent repo
2023-07-07 17:35:57 -04:00
Georges Petrov
ec033ae277
Rename Databerry to Chaindesk (#7022)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-07 17:28:04 -04:00
Harrison Chase
7cdf97ba9b
Harrison/add to imports (#7370)
pgvector cleanup
2023-07-07 16:27:44 -04:00
Bagatur
4d427b2397
Base language model docstrings (#7104) 2023-07-07 16:09:10 -04:00
Alex Gamble
df746ad821
Add a callback handler for Context (https://getcontext.ai) (#7151)
### Description

Adding a callback handler for Context. Context is a product analytics
platform for AI chat experiences to help you understand how users are
interacting with your product.

I've added the callback library + an example notebook showing its use.

### Dependencies

Requires the user to install the `context-python` library. The library
is lazily-loaded when the callback is instantiated.

### Announcing the feature

We spoke with Harrison a few weeks ago about also doing a blog post
announcing our integration, so will coordinate this with him. Our
Twitter handle for the company is @getcontextai, and the founders are
@_agamble and @HenrySG.

Thanks in advance!
2023-07-07 15:33:29 -04:00
German Martin
3ce4e46c8c
The Fellowship of the Vectors: New Embeddings Filter using clustering. (#7015)
Continuing with Tolkien inspired series of langchain tools. I bring to
you:
**The Fellowship of the Vectors**, AKA EmbeddingsClusteringFilter.
This document filter uses embeddings to group vectors together into
clusters, then allows you to pick an arbitrary number of documents
vector based on proximity to the cluster centers. That's a
representative sample of the cluster.

The original idea is from [Greg Kamradt](https://github.com/gkamradt)
from this video (Level4):
https://www.youtube.com/watch?v=qaPMdcCqtWk&t=365s

I added few tricks to make it a bit more versatile, so you can
parametrize what to do with duplicate documents in case of cluster
overlap: replace the duplicates with the next closest document or remove
it. This allow you to use it as an special kind of redundant filter too.
Additionally you can choose 2 diff orders: grouped by cluster or
respecting the original retriever scores.
In my use case I was using the docs grouped by cluster to run refine
chains per cluster to generate summarization over a large corpus of
documents.
Let me know if you want to change anything!

@rlancemartin, @eyurtsev, @hwchase17,

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-07-07 10:28:17 -07:00
Leonid Ganeline
b489466488
docs: dependents update 4 (#7360)
Updated links and counters of the `dependents` page.
2023-07-07 13:22:30 -04:00
Bagatur
d1c7237034
openai fn update nb (#7352) 2023-07-07 11:52:21 -04:00
Bagatur
1c8cff32f1
Generic OpenAI fn chain (#7270)
Add loading functions for openai function chains and add docs page
2023-07-07 05:44:53 -04:00
Bagatur
fd7145970f
Output parser redirect (#7330)
Related to ##7311
2023-07-07 04:26:34 -04:00
OwenElliott
3074306ae1
Marqo Vector Store Examples & Type Hints (#7326)
This PR improves the example notebook for the Marqo vectorstore
implementation by adding a new RetrievalQAWithSourcesChain example. The
`embedding` parameter in `from_documents` has its type updated to
`Union[Embeddings, None]` and a default parameter of None because this
is ignored in Marqo.

This PR also upgrades the Marqo version to 0.11.0 to remove the device
parameter after a breaking change to the API.

Related to #7068 @tomhamer @hwchase17

---------

Co-authored-by: Tom Hamer <tom@marqo.ai>
2023-07-07 04:11:20 -04:00
Bagatur
a9c5b4bcea
Bagatur/clarifai update (#7324)
This PR improves upon the Clarifai LangChain integration with improved docs, errors, args and the addition of embedding model support in LancChain for Clarifai's embedding models and an overview of the various ways you can integrate with Clarifai added to the docs.

---------

Co-authored-by: Matthew Zeiler <zeiler@clarifai.com>
2023-07-07 02:23:20 -04:00
John Landahl
e047541b5f
Corrected a typo in elasticsearch.ipynb (#7318)
Simple typo fix
2023-07-07 01:35:32 -04:00
Subsegment
152dc59060
docs : add cnosdb to Ecosystem Integrations (#7316)
- Implement a `from_cnosdb` method for the `SQLDatabase` class
  - Write CnosDB documentation and add it to Ecosystem Integrations
2023-07-07 01:35:22 -04:00
Bagatur
a6b39afe0e
rm side nav (#7297) 2023-07-06 15:19:29 -04:00
Leonid Ganeline
6ff9e9b34a
updated huggingface_hub examples (#7292)
Added examples for models:
- Google `Flan`
- TII `Falcon`
- Salesforce `XGen`
2023-07-06 15:04:37 -04:00
Dídac Sabatés
e0cb3ea90c
Fix sql_database.ipynb link (#6525)
Looks like the
[SQLDatabaseChain](https://langchain.readthedocs.io/en/latest/modules/chains/examples/sqlite.html)
in the SQL Database Agent page was broken I've change it to the SQL
Chain page
2023-07-06 13:07:37 -04:00
Leonid Ganeline
4450791edd
docs: tutorials update (#7230)
updated `tutorials.mdx`:
- added a link to new `Deeplearning AI` course on LangChain
- added links to other tutorial videos
- fixed format

@baskaryan, @hwchase17
2023-07-06 12:44:23 -04:00
hayao-k
c23e16c459
docs: Fixed typos in Amazon Kendra Retriever documentation (#7261)
## Description
Fixed to the official service name Amazon Kendra.

## Tag maintainer
@baskaryan
2023-07-06 11:56:52 -04:00
zhaoshengbo
e8f24164f0
Improve the alibaba cloud opensearch vector store documentation (#6964)
Based on user feedback, we have improved the Alibaba Cloud OpenSearch
vector store documentation.

Co-authored-by: zhaoshengbo <shengbo.zsb@alibaba-inc.com>
2023-07-06 09:47:49 -04:00
emarco177
b9d6d4cd4c
added template repo for CI/CD deployment on Google Cloud Run (#7218)
Replace this comment with:
- Description: added documentation for a template repo that helps
dockerizing and deploying a LangChain using a Cloud Build CI/CD pipeline
to Google Cloud build serverless
  - Issue: None,
  - Dependencies: None,
  - Tag maintainer: @baskaryan,
  - Twitter handle: EdenEmarco177

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.
2023-07-06 09:38:38 -04:00
Stefano Lottini
e61cfb6e99
FLARE Example notebook: switch to named arg to pass pydantic validation (#7267)
Adding the name of the parameter to comply with latest requirements by
Pydantic usage for BaseModels.
2023-07-06 09:32:00 -04:00
os1ma
b151d4257a
docs: Update documentation for Wikipedia tool to use WikipediaQueryRun (#7258)
**Description**
In the following page, "Wikipedia" tool is explained.

https://python.langchain.com/docs/modules/agents/tools/integrations/wikipedia

However, the WikipediaAPIWrapper being used is not a tool. This PR
updated the documentation to use a tool WikipediaQueryRun.

**Issue**
None

**Tag maintainer**
Agents / Tools / Toolkits: @hinthornw
2023-07-06 09:29:38 -04:00
Jeroen Van Goey
887bb12287
Use correct Language for html_splitter (#7274)
`html_splitter` was using `Language.MARKDOWN`.
2023-07-06 09:24:25 -04:00
Shantanu Nair
f773c21723
Update supabase match_docs ddl and notebook to use expected id type (#7257)
- Description: Switch supabase match function DDL to use expected uuid
type instead of bigint
- Issue: https://github.com/hwchase17/langchain/issues/6743,
https://github.com/hwchase17/langchain/issues/7179
  - Tag maintainer:  @rlancemartin, @eyurtsev
  - Twitter handle: https://twitter.com/ShantanuNair
2023-07-06 09:22:41 -04:00
Myeongseop Kim
0e878ccc2d
Add HumanInputChatModel (#7256)
- Description: This is a chat model equivalent of HumanInputLLM. An
example notebook is also added.
  - Tag maintainer: @hwchase17, @baskaryan
  - Twitter handle: N/A
2023-07-06 09:21:03 -04:00
Harrison Chase
52b016920c
Harrison/update anthropic (#7237)
Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>
2023-07-05 21:02:35 -04:00
Hashem Alsaket
6aa66fd2b0
Update Hugging Face Hub notebook (#7236)
Description: `flan-t5-xl` hangs, updated to `flan-t5-xxl`. Tested all
stabilityai LLMs- all hang so removed from tutorial. Temperature > 0 to
prevent unintended determinism.
Issue: #3275 
Tag maintainer: @baskaryan
2023-07-05 20:45:02 -04:00
Harrison Chase
d6541da161
remove arize nb (#7238)
was causing some issues with docs build
2023-07-05 18:34:20 -04:00
Mike Nitsenko
d669b9ece9
Document loader for Cube Semantic Layer (#6882)
### Description

This pull request introduces the "Cube Semantic Layer" document loader,
which demonstrates the retrieval of Cube's data model metadata in a
format suitable for passing to LLMs as embeddings. This enhancement aims
to provide contextual information and improve the understanding of data.

Twitter handle:
@the_cube_dev

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-07-05 15:18:12 -07:00
Tom
e533da8bf2
Adding Marqo to vectorstore ecosystem (#7068)
This PR brings in a vectorstore interface for
[Marqo](https://www.marqo.ai/).

The Marqo vectorstore exposes some of Marqo's functionality in addition
the the VectorStore base class. The Marqo vectorstore also makes the
embedding parameter optional because inference for embeddings is an
inherent part of Marqo.

Docs, notebook examples and integration tests included.

Related PR:
https://github.com/hwchase17/langchain/pull/2807

---------

Co-authored-by: Tom Hamer <tom@marqo.ai>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-07-05 14:44:12 -07:00
Harrison Chase
47e7d09dff
fix arize nb (#7227) 2023-07-05 16:55:48 -04:00
Harrison Chase
6711854e30
Harrison/dataforseo (#7214)
Co-authored-by: Alexander <sune357@gmail.com>
2023-07-05 16:02:02 -04:00
Hakan Tekgul
61938a02a1
Create arize_llm_observability.ipynb (#7000)
Adding documentation and notebook for Arize callback handler. 

  - @dev2049
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
2023-07-05 15:55:47 -04:00
Leonid Ganeline
ecee4d6e92
docs: update youtube videos and tutorials (#6515)
added tutorials.mdx; updated youtube.mdx

Rationale: the Tutorials section in the documentation is top-priority.
(for example, https://pytorch.org/docs/stable/index.html) Not every
project has resources to make tutorials. We have such a privilege.
Community experts created several tutorials on YouTube. But the tutorial
links are now hidden on the YouTube page and not easily discovered by
first-time visitors.

- Added new videos and tutorials that were created since the last
update.
- Made some reprioritization between videos on the base of the view
numbers.

#### Who can review?

  - @hwchase17
    - @dev2049
2023-07-05 12:50:31 -07:00
Josh Reini
30d8d1d3d0
add trulens integration (#7096)
Description: Add TruLens integration.

Twitter: @trulensml

For review:
  - Tracing: @agola11
  - Tools: @hinthornw
2023-07-05 14:04:55 -04:00
Conrad Fernandez
6eff0fa2ca
Added documentation for add_texts function for Pinecone integration (#7134)
- Description: added some documentation to the Pinecone vector store
docs page.
- Issue: #7126 
- Dependencies: None
- Tag maintainer: @baskaryan 

I can add more documentation on the Pinecone integration functions as I
am going to go in great depth into this area. Just wanted to check with
the maintainers is if this is all good.
2023-07-05 13:11:37 -04:00
felixocker
db98c44f8f
Support for SPARQL (#7165)
# [SPARQL](https://www.w3.org/TR/rdf-sparql-query/) for
[LangChain](https://github.com/hwchase17/langchain)

## Description
LangChain support for knowledge graphs relying on W3C standards using
RDFlib: SPARQL/ RDF(S)/ OWL with special focus on RDF \
* Works with local files, files from the web, and SPARQL endpoints
* Supports both SELECT and UPDATE queries
* Includes both a Jupyter notebook with an example and integration tests

## Contribution compared to related PRs and discussions
* [Wikibase agent](https://github.com/hwchase17/langchain/pull/2690) -
uses SPARQL, but specifically for wikibase querying
* [Cypher qa](https://github.com/hwchase17/langchain/pull/5078) - graph
DB question answering for Neo4J via Cypher
* [PR 6050](https://github.com/hwchase17/langchain/pull/6050) - tries
something similar, but does not cover UPDATE queries and supports only
RDF
* Discussions on [w3c mailing list](mailto:semantic-web@w3.org) related
to the combination of LLMs (specifically ChatGPT) and knowledge graphs

## Dependencies
* [RDFlib](https://github.com/RDFLib/rdflib)

## Tag maintainer
Graph database related to memory -> @hwchase17
2023-07-05 13:00:16 -04:00
Prakul Agarwal
38f853dfa3
Fixed typos in MongoDB Atlas Vector Search documentation (#7174)
Fix for typos in MongoDB Atlas Vector Search documentation
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-07-05 12:48:00 -04:00
Raouf Chebri
6fc24743b7
Add pg_hnsw vectorstore integration (#6893)
Hi @rlancemartin, @eyurtsev!

- Description: Adding HNSW extension support for Postgres. Similar to
pgvector vectorstore, with 3 differences
      1. it uses HNSW extension for exact and ANN searches, 
      2. Vectors are of type array of real
      3. Only supports L2
      
- Dependencies: [HNSW](https://github.com/knizhnik/hnsw) extension for
Postgres
  
  - Example:
  ```python
    db = HNSWVectoreStore.from_documents(
      embedding=embeddings,
      documents=docs,
      collection_name=collection_name,
      connection_string=connection_string
  )
  
  query = "What did the president say about Ketanji Brown Jackson"
docs_with_score: List[Tuple[Document, float]] =
db.similarity_search_with_score(query)
  ```

The example notebook is in the PR too.
2023-07-05 08:10:10 -07:00
Max Cembalest
2984803597
cleaned Arthur tracking demo notebook (#7147)
Cleaned title and reduced clutter for integration demo notebook for the
Arthur callback handler
2023-07-04 18:15:25 -04:00
Deepankar Mahapatro
da69a6771f
docs: update Jina ecosystem (#7149)
Documentation update for [Jina
ecosystem](https://python.langchain.com/docs/ecosystem/integrations/jina)
and `langchain-serve` in the deployments section to latest features.

@hwchase17 

<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-07-04 18:07:50 -04:00
Simon Cheung
81eebc4070
Add HugeGraphQAChain to support gremlin generating chain (#7132)
[Apache HugeGraph](https://github.com/apache/incubator-hugegraph) is a
convenient, efficient, and adaptable graph database, compatible with the
Apache TinkerPop3 framework and the Gremlin query language.

In this PR, the HugeGraph and HugeGraphQAChain provide the same
functionality as the existing integration with Neo4j and enables query
generation and question answering over HugeGraph database. The
difference is that the graph query language supported by HugeGraph is
not cypher but another very popular graph query language
[Gremlin](https://tinkerpop.apache.org/gremlin.html).

A notebook example and a simple test case have also been added.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-04 10:21:21 -06:00
Saverio Proto
5585607654
Improve Bing Search example (#7128)
# Description

Improve Bing Search example:
2023-07-04 09:58:03 -06:00
Lance Martin
265c285057
Fix GPT4All bug w/ "n_ctx" param (#7093)
Running `GPT4All` per the
[docs](https://python.langchain.com/docs/modules/model_io/models/llms/integrations/gpt4all),
I see:

```
$ from langchain.llms import GPT4All
$ model = GPT4All(model=local_path)
$ model("The capital of France is ", max_tokens=10)
TypeError: generate() got an unexpected keyword argument 'n_ctx'
```

It appears `n_ctx` is [no longer a supported
param](https://docs.gpt4all.io/gpt4all_python.html#gpt4all.gpt4all.GPT4All.generate)
in the GPT4All API from https://github.com/nomic-ai/gpt4all/pull/1090.

It now uses `max_tokens`, so I set this.

And I also set other defaults used in GPT4All client
[here](https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-bindings/python/gpt4all/gpt4all.py).

Confirm it now works:
```
$ from langchain.llms import GPT4All
$ model = GPT4All(model=local_path)
$ model("The capital of France is ", max_tokens=10)
< Model logging > 
"....Paris."
```

---------

Co-authored-by: R. Lance Martin <rlm@Rs-MacBook-Pro.local>
2023-07-04 08:53:52 -07:00
Stefano Lottini
6631fd5168
Align cassio versions between examples for Cassandra integration (#7099)
Just reducing confusion by requiring cassio>=0.0.7 consistently across
examples.
2023-07-04 04:21:48 -06:00
Ruixi Fan
0b69a7e9ab
[Document fix] Fix an expired link qa_benchmarking_pg.ipynb (#7110)
## Change description

- Description: Fix an expired link that points to the readthedocs site.
  - Dependencies: No
2023-07-03 19:03:16 -06:00
Lance Martin
9ca4c54428
Minor updates to notebook for MultiQueryRetriever (#7102)
* Add an easier-to-run example.
* Add logging per https://github.com/hwchase17/langchain/pull/6891.
* Updated params per https://github.com/hwchase17/langchain/pull/5962.

---------

Co-authored-by: R. Lance Martin <rlm@Rs-MacBook-Pro.local>
Co-authored-by: Lance Martin <lance@langchain.dev>
2023-07-03 17:32:50 -07:00
Nicolas
490fcf9d98
docs: New experimental UI for Mendable Search (#6558)
This PR introduces a new Mendable UI tailored to a better search
experience.

We're more closely integrating our traditional search with our AI
generation.
With this change, you won't have to tab back and forth between the
mendable bot and the keyword search. Both types of search are handled in
the same bar. This should make the docs easier to navigate. while still
letting users get code generations or AI-summarized answers if they so
wish. Also, it should reduce the cost.

Would love to hear your feedback :)

Cc: @dev2049 @hwchase17
2023-07-03 20:52:13 +01:00
genewoo
e49abd1277
Add Metal support to llama.cpp doc (#7092)
- Description: Add Metal support to llama.cpp doc
  - Issue: #7091 
  - Dependencies: N/A
  - Twitter handle: gene_wu
2023-07-03 13:35:39 -06:00
rjarun8
e2d61ab85a
Add SpacyEmbeddings class (#6967)
- Description: Added a new SpacyEmbeddings class for generating
embeddings using the Spacy library.
- Issue: Sentencebert/Bert/Spacy/Doc2vec embedding support #6952
- Dependencies: This change requires the Spacy library and the
'en_core_web_sm' Spacy model.
- Tag maintainer: @dev2049
- Twitter handle: N/A

This change includes a new SpacyEmbeddings class, but does not include a
test or an example notebook.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-03 09:38:31 -06:00
Leonid Ganeline
16fbd528c5
docs: commented out editUrl option (#6440) 2023-07-03 07:59:11 -07:00
adam91holt
80e86b602e
Remove duplicate mongodb integration doc (#7006) 2023-07-03 02:23:33 -06:00
joaomsimoes
c669d98693
Update get_started.mdx (#7005)
typo in chat = ChatOpenAI(open_api_key="...") should be openai_api_key
2023-07-03 02:23:12 -06:00
Johnny Lim
a081e419a0
Fix sample in FAISS section (#7050)
This PR fixes a sample in the FAISS section in the reference docs.
2023-07-03 02:18:32 -06:00
Leonid Ganeline
200be43da6
added Brave Search document_loader (#6989)
- Added `Brave Search` document loader.
- Refactored BraveSearch wrapper
- Added a Jupyter Notebook example
- Added `Ecosystem/Integrations` BraveSearch page 

Please review:
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
2023-07-02 19:01:24 -07:00
Sergey Kozlov
6d15854cda
Add JSON Lines support to JSONLoader (#6913)
**Description**:

The JSON Lines format is used by some services such as OpenAI and
HuggingFace. It's also a convenient alternative to CSV.

This PR adds JSON Lines support to `JSONLoader` and also updates related
tests.

**Tag maintainer**: @rlancemartin, @eyurtsev.

PS I was not able to build docs locally so didn't update related
section.
2023-07-02 12:32:41 -07:00
Ofer Mendelevitch
153b56d19b
Vectara upd2 (#6506)
Update to Vectara integration 
- By user request added "add_files" to take advantage of Vectara
capabilities to process files on the backend, without the need for
separate loading of documents and chunking in the chain.
- Updated vectara.ipynb example notebook to be broader and added testing
of add_file()
 
  @hwchase17 - project lead

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-07-02 12:15:50 -07:00
Leonid Ganeline
77ae8084a0
docstrings document_loaders 1 (#6847)
- Updated docstrings in `document_loaders`
- several code fixes.
- added `docs/extras/ecosystem/integrations/airtable.md`

@rlancemartin, @eyurtsev
2023-07-02 12:13:04 -07:00
Bagatur
7acd524210
Rm retriever kwargs (#7013)
Doesn't actually limit the Retriever interface but hopefully in practice
it does
2023-07-02 08:22:24 -06:00
Johnny Lim
9dc77614e3
Polish reference docs (#7045)
This PR fixes broken links in the reference docs.
2023-07-02 08:08:51 -06:00
Johnny Lim
052c797429
Fix typo (#7023)
This PR fixes a typo.
2023-07-02 01:17:30 -06:00
Stefano Lottini
8d2281a8ca
Second Attempt - Add concurrent insertion of vector rows in the Cassandra Vector Store (#7017)
Retrying with the same improvements as in #6772, this time trying not to
mess up with branches.

@rlancemartin doing a fresh new PR from a branch with a new name. This
should do. Thank you for your help!

---------

Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
Co-authored-by: rlm <pexpresss31@gmail.com>
2023-07-01 11:09:52 -07:00
Matt Robinson
0498dad562
feat: enable UnstructuredEmailLoader to process attachments (#6977)
### Summary

Updates `UnstructuredEmailLoader` so that it can process attachments in
addition to the e-mail content. The loader will process attachments if
the `process_attachments` kwarg is passed when the loader is
instantiated.

### Testing

```python

file_path = "fake-email-attachment.eml"
loader = UnstructuredEmailLoader(
    file_path, mode="elements", process_attachments=True
)
docs = loader.load()
docs[-1]
```

### Reviewers

-  @rlancemartin 
-  @eyurtsev
- @hwchase17
2023-07-01 06:09:26 -07:00
Matthew Foster Walsh
59697b406d
Fix typo in quickstart.mdx (#6985)
Removed an extra "to" from a sentence. @dev2049 very minor documentation
fix.
2023-07-01 02:53:52 -06:00
Paul Grillenberger
aa37b10b28
Fix: Correct typo (#6988)
Description: Correct a minor typo in the docs. @dev2049
2023-07-01 02:53:34 -06:00
Zander Chase
b0859c9b18
Add New Retriever Interface with Callbacks (#5962)
Handle the new retriever events in a way that (I think) is entirely
backwards compatible? Needs more testing for some of the chain changes
and all.

This creates an entire new run type, however. We could also just treat
this as an event within a chain run presumably (same with memory)

Adds a subclass initializer that upgrades old retriever implementations
to the new schema, along with tests to ensure they work.

First commit doesn't upgrade any of our retriever implementations (to
show that we can pass the tests along with additional ones testing the
upgrade logic).

Second commit upgrades the known universe of retrievers in langchain.

- [X] Add callback handling methods for retriever start/end/error (open
to renaming to 'retrieval' if you want that)
- [X] Update BaseRetriever schema to support callbacks
- [X] Tests for upgrading old "v1" retrievers for backwards
compatibility
- [X] Update existing retriever implementations to implement the new
interface
- [X] Update calls within chains to .{a]get_relevant_documents to pass
the child callback manager
- [X] Update the notebooks/docs to reflect the new interface
- [X] Test notebooks thoroughly


Not handled:
- Memory pass throughs: retrieval memory doesn't have a parent callback
manager passed through the method

---------

Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>
2023-06-30 14:44:03 -07:00
William FH
a5b206caf3
Remove Promptlayer Notebook (#6996)
It's breaking our docs build
2023-06-30 14:30:24 -07:00
Daniel Chalef
b26cca8008
Zep Authentication (#6728)
## Description: Add Zep API Key argument to ZepChatMessageHistory and
ZepRetriever
- correct docs site links
- add zep api_key auth to constructors

ZepChatMessageHistory: @hwchase17, 
ZepRetriever: @rlancemartin, @eyurtsev
2023-06-30 14:24:26 -07:00
William FH
e4625846e5
Add Flyte Callback Handler (#6139) (#6986)
Signed-off-by: Samhita Alla <aallasamhita@gmail.com>
Co-authored-by: Samhita Alla <aallasamhita@gmail.com>
2023-06-30 12:25:22 -07:00
Davis Chase
eb180e321f
Page per class-style api reference (#6560)
can make it prettier, but what do we think of overall structure?

https://api.python.langchain.com/en/dev2049-page_per_class/api_ref.html

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Nuno Campos <nuno@boringbits.io>
2023-06-30 09:23:32 -07:00
William FH
64039b9f11
Promptlayer Callback (#6975)
Co-authored-by: Saleh Hindi <saleh.hindi.one@gmail.com>
Co-authored-by: jped <jonathanped@gmail.com>
2023-06-30 08:32:42 -07:00
William FH
13c62cf6b1
Arthur Callback (#6972)
Co-authored-by: Max Cembalest <115359769+arthuractivemodeling@users.noreply.github.com>
2023-06-30 07:48:02 -07:00
William FH
8c73037dff
Simplify eval arg names (#6944)
It'll be easier to switch between these if the names of predictions are
consistent
2023-06-30 07:47:53 -07:00
Davis Chase
bd6a0ee9e9
Redirect vecstores (#6948) 2023-06-29 19:22:21 -07:00
Davis Chase
f780678910
Add back in clickhouse mongo vecstore notebooks (#6949) 2023-06-29 19:21:47 -07:00
Jacob Lee
73831ef3d8
Change code block color scheme (#6945)
Adds contrast, makes code blocks more readable.
2023-06-29 19:21:11 -07:00
Hashem Alsaket
5861770a53
Updated QA notebook (#6801)
Description: `all_metadatas` was not defined, `OpenAIEmbeddings` was not
imported,
Issue: #6723 the issue # it fixes (if applicable),
Dependencies: lark,
Tag maintainer: @vowelparrot , @dev2049

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-06-29 15:41:53 -07:00
Kacper Łukawski
140ba682f1
Support named vectors in Qdrant (#6871)
# Description

This PR makes it possible to use named vectors from Qdrant in Langchain.
That was requested multiple times, as people want to reuse externally
created collections in Langchain. It doesn't change anything for the
existing applications. The changes were covered with some integration
tests and included in the docs.

## Example

```python
Qdrant.from_documents(
    docs,
    embeddings,
    location=":memory:",
    collection_name="my_documents",
    vector_name="custom_vector",
)
```

### Issue: #2594 

Tagging @rlancemartin & @eyurtsev. I'd appreciate your review.
2023-06-29 15:14:22 -07:00
corranmac
20c6ade2fc
Grobid parser for Scientific Articles from PDF (#6729)
### Scientific Article PDF Parsing via Grobid

`Description:`
This change adds the GrobidParser class, which uses the Grobid library
to parse scientific articles into a universal XML format containing the
article title, references, sections, section text etc. The GrobidParser
uses a local Grobid server to return PDFs document as XML and parses the
XML to optionally produce documents of individual sentences or of whole
paragraphs. Metadata includes the text, paragraph number, pdf relative
bboxes, pages (text may overlap over two pages), section title
(Introduction, Methodology etc), section_number (i.e 1.1, 2.3), the
title of the paper and finally the file path.
      
Grobid parsing is useful beyond standard pdf parsing as it accurately
outputs sections and paragraphs within them. This allows for
post-fitering of results for specific sections i.e. limiting results to
the methodology section or results. While sections are split via
headings, ideally they could be classified specifically into
introduction, methodology, results, discussion, conclusion. I'm
currently experimenting with chatgpt-3.5 for this function, which could
later be implemented as a textsplitter.

`Dependencies:`
For use, the grobid repo must be cloned and Java must be installed, for
colab this is:

```
!apt-get install -y openjdk-11-jdk -q
!update-alternatives --set java /usr/lib/jvm/java-11-openjdk-amd64/bin/java
!git clone https://github.com/kermitt2/grobid.git
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-11-openjdk-amd64"
os.chdir('grobid')
!./gradlew clean install
```

Once installed the server is ran on localhost:8070 via
```
get_ipython().system_raw('nohup ./gradlew run > grobid.log 2>&1 &')
```

@rlancemartin, @eyurtsev

Twitter Handle: @Corranmac

Grobid Demo Notebook is
[here](https://colab.research.google.com/drive/1X-St_mQRmmm8YWtct_tcJNtoktbdGBmd?usp=sharing).

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-06-29 14:29:29 -07:00
Harrison Chase
0ba175e13f
move octo notebook (#6901) 2023-06-29 12:20:55 -07:00
Stefano Lottini
75fb9d2fdc
Cassandra support for chat history using CassIO library (#6771)
### Overview

This PR aims at building on #4378, expanding the capabilities and
building on top of the `cassIO` library to interface with the database
(as opposed to using the core drivers directly).

Usage of `cassIO` (a library abstracting Cassandra access for
ML/GenAI-specific purposes) is already established since #6426 was
merged, so no new dependencies are introduced.

In the same spirit, we try to uniform the interface for using Cassandra
instances throughout LangChain: all our appreciation of the work by
@jj701 notwithstanding, who paved the way for this incremental work
(thank you!), we identified a few reasons for changing the way a
`CassandraChatMessageHistory` is instantiated. Advocating a syntax
change is something we don't take lighthearted way, so we add some
explanations about this below.

Additionally, this PR expands on integration testing, enables use of
Cassandra's native Time-to-Live (TTL) features and improves the phrasing
around the notebook example and the short "integrations" documentation
paragraph.

We would kindly request @hwchase to review (since this is an elaboration
and proposed improvement of #4378 who had the same reviewer).

### About the __init__ breaking changes

There are
[many](https://docs.datastax.com/en/developer/python-driver/3.28/api/cassandra/cluster/)
options when creating the `Cluster` object, and new ones might be added
at any time. Choosing some of them and exposing them as `__init__`
parameters `CassandraChatMessageHistory` will prove to be insufficient
for at least some users.

On the other hand, working through `kwargs` or adding a long, long list
of arguments to `__init__` is not a desirable option either. For this
reason, (as done in #6426), we propose that whoever instantiates the
Chat Message History class provide a Cassandra `Session` object, ready
to use. This also enables easier injection of mocks and usage of
Cassandra-compatible connections (such as those to the cloud database
DataStax Astra DB, obtained with a different set of init parameters than
`contact_points` and `port`).

We feel that a breaking change might still be acceptable since LangChain
is at `0.*`. However, while maintaining that the approach we propose
will be more flexible in the future, room could be made for a
"compatibility layer" that respects the current init method. Honestly,
we would to that only if there are strong reasons for it, as that would
entail an additional maintenance burden.

### Other changes

We propose to remove the keyspace creation from the class code for two
reasons: first, production Cassandra instances often employ RBAC so that
the database user reading/writing from tables does not necessarily (and
generally shouldn't) have permission to create keyspaces, and second
that programmatic keyspace creation is not a best practice (it should be
done more or less manually, with extra care about schema mismatched
among nodes, etc). Removing this (usually unnecessary) operation from
the `__init__` path would also improve initialization performance
(shorter time).

We suggest, likewise, to remove the `__del__` method (which would close
the database connection), for the following reason: it is the
recommended best practice to create a single Cassandra `Session` object
throughout an application (it is a resource-heavy object capable to
handle concurrency internally), so in case Cassandra is used in other
ways by the app there is the risk of truncating the connection for all
usages when the history instance is destroyed. Moreover, the `Session`
object, in typical applications, is best left to garbage-collect itself
automatically.

As mentioned above, we defer the actual database I/O to the `cassIO`
library, which is designed to encode practices optimized for LLM
applications (among other) without the need to expose LangChain
developers to the internals of CQL (Cassandra Query Language). CassIO is
already employed by the LangChain's Vector Store support for Cassandra.

We added a few more connection options in the companion notebook example
(most notably, Astra DB) to encourage usage by anyone who cannot run
their own Cassandra cluster.

We surface the `ttl_seconds` option for automatic handling of an
expiration time to chat history messages, a likely useful feature given
that very old messages generally may lose their importance.

We elaborated a bit more on the integration testing (Time-to-live,
separation of "session ids", ...).

### Remarks from linter & co.

We reinstated `cassio` as a dependency both in the "optional" group and
in the "integration testing" group of `pyproject.toml`. This might not
be the right thing do to, in which case the author of this PR offer his
apologies (lack of confidence with Poetry - happy to be pointed in the
right direction, though!).

During linter tests, we were hit by some errors which appear unrelated
to the code in the PR. We left them here and report on them here for
awareness:

```
langchain/vectorstores/mongodb_atlas.py:137: error: Argument 1 to "insert_many" of "Collection" has incompatible type "List[Dict[str, Sequence[object]]]"; expected "Iterable[Union[MongoDBDocumentType, RawBSONDocument]]"  [arg-type]
langchain/vectorstores/mongodb_atlas.py:186: error: Argument 1 to "aggregate" of "Collection" has incompatible type "List[object]"; expected "Sequence[Mapping[str, Any]]"  [arg-type]

langchain/vectorstores/qdrant.py:16: error: Name "grpc" is not defined  [name-defined]
langchain/vectorstores/qdrant.py:19: error: Name "grpc" is not defined  [name-defined]
langchain/vectorstores/qdrant.py:20: error: Name "grpc" is not defined  [name-defined]
langchain/vectorstores/qdrant.py:22: error: Name "grpc" is not defined  [name-defined]
langchain/vectorstores/qdrant.py:23: error: Name "grpc" is not defined  [name-defined]
```

In the same spirit, we observe that to even get `import langchain` run,
it seems that a `pip install bs4` is missing from the minimal package
installation path.

Thank you!
2023-06-29 10:50:34 -07:00
Harrison Chase
3ac08c3de4
Harrison/octo ml (#6897)
Co-authored-by: Bassem Yacoube <125713079+AI-Bassem@users.noreply.github.com>
Co-authored-by: Shotaro Kohama <khmshtr28@gmail.com>
Co-authored-by: Rian Dolphin <34861538+rian-dolphin@users.noreply.github.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
Co-authored-by: Shashank Deshpande <shashankdeshpande18@gmail.com>
2023-06-28 23:04:11 -07:00
Shashank Deshpande
99cfe192da
added example notebook - use custom functions with openai agent (#6865)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-06-28 22:07:33 -07:00
Robert Lewis
c9c8d2599e
Update Zapier Jupyter notebook to include brief OAuth example (#6892)
Description: Adds a brief example of using an OAuth access token with
the Zapier wrapper. Also links to the Zapier documentation to learn more
about OAuth flows.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-28 18:06:22 -07:00
Davis Chase
f07dd02b50
Docs /redirects (#6790)
Auto-generated a bunch of redirects from initial docs refactor commit
2023-06-28 17:07:53 -07:00
Yaohui Wang
9d1bd18596
feat (documents): add LarkSuite document loader (#6420)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

### Summary

This PR adds a LarkSuite (FeiShu) document loader. 
> [LarkSuite](https://www.larksuite.com/) is an enterprise collaboration
platform developed by ByteDance.

### Tests

- an integration test case is added
- an example notebook showing usage is added. [Notebook
preview](https://github.com/yaohui-wyh/langchain/blob/master/docs/extras/modules/data_connection/document_loaders/integrations/larksuite.ipynb)

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

### Who can review?

- PTAL @eyurtsev @hwchase17

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: Yaohui Wang <wangyaohui.01@bytedance.com>
2023-06-27 23:08:05 -07:00
Jingsong Gao
a435a436c1
feat(document_loaders): add tencent cos directory and file loader (#6401)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

- add tencent cos directory and file support for document-loader

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

@eyurtsev
2023-06-27 23:07:20 -07:00
Shashank Deshpande
1db266b20d
Update link in apis.mdx (#6812)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-06-27 23:00:26 -07:00
Lance Martin
3f9900a864
Create MultiQueryRetriever (#6833)
Distance-based vector database retrieval embeds (represents) queries in
high-dimensional space and finds similar embedded documents based on
"distance". But, retrieval may produce difference results with subtle
changes in query wording or if the embeddings do not capture the
semantics of the data well. Prompt engineering / tuning is sometimes
done to manually address these problems, but can be tedious.

The `MultiQueryRetriever` automates the process of prompt tuning by
using an LLM to generate multiple queries from different perspectives
for a given user input query. For each query, it retrieves a set of
relevant documents and takes the unique union across all queries to get
a larger set of potentially relevant documents. By generating multiple
perspectives on the same question, the `MultiQueryRetriever` might be
able to overcome some of the limitations of the distance-based retrieval
and get a richer set of results.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-27 22:59:40 -07:00
Tim Asp
3ca1a387c2
Web Loader: Add proxy support (#6792)
Proxies are helpful, especially when you start querying against more
anti-bot websites.

[Proxy
services](https://developers.oxylabs.io/advanced-proxy-solutions/web-unblocker/making-requests)
(of which there are many) and `requests` make it easy to rotate IPs to
prevent banning by just passing along a simple dict to `requests`.

CC @rlancemartin, @eyurtsev
2023-06-27 22:27:49 -07:00
Matt Robinson
dd2a151543
Docs/unstructured api key (#6781)
### Summary

The Unstructured API will soon begin requiring API keys. This PR updates
the Unstructured integrations docs with instructions on how to generate
Unstructured API keys.

### Reviewers

@rlancemartin
@eyurtsev
@hwchase17
2023-06-27 16:54:15 -07:00
Matt Robinson
b24472eae3
feat: Add UnstructuredOrgModeLoader (#6842)
### Summary

Adds `UnstructuredOrgModeLoader` for processing
[Org-mode](https://en.wikipedia.org/wiki/Org-mode) documents.

### Testing

```python
from langchain.document_loaders import UnstructuredOrgModeLoader

loader = UnstructuredOrgModeLoader(
    file_path="example_data/README.org", mode="elements"
)
docs = loader.load()
print(docs[0])
```

### Reviewers

- @rlancemartin
- @eyurtsev
- @hwchase17
2023-06-27 16:34:17 -07:00