Commit Graph

3639 Commits

Author SHA1 Message Date
Wilson Leao Neto
179a39954d
Provides access to a Document page_content formatter in the AmazonKendraRetriever (#8034)
- Description: 
- Provides a new attribute in the AmazonKendraRetriever which processes
a ResultItem and returns a string that will be used as page_content;
- The excerpt metadata should not be changed, it will be kept as was
retrieved. But it is cleaned when composing the page_content;
    - Refactors the AmazonKendraRetriever to improve code reusability;
- Issue: #7787 
- Tag maintainer: @3coins @baskaryan
- Twitter handle: wilsonleao

**Why?**

Some use cases need to adjust the page_content by dynamically combining
the ResultItem attributes depending on the context of the item.
2023-08-03 20:54:49 -07:00
Ilya
6f0bccfeb5
Add regex control over separators in character text splitter (#7933)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
#7854

Added the ability to use the `separator` ase a regex or a simple
character.
Fixed a bug where `start_index` was incorrectly counting from -1.

Who can review?
@eyurtsev
@hwchase17 
@mmz-001
2023-08-03 20:25:23 -07:00
Vasileios Mansolas
e68a1d73d0
Fix Issue #6650: Enable Azure Active Directory token-based auth access for AzureChatOpenAI (#8622)
When using AzureChatOpenAI the openai_api_type defaults to "azure". The
utils' get_from_dict_or_env() function triggered by the root validator
does not look for user provided values from environment variables
OPENAI_API_TYPE, so other values like "azure_ad" are replaced with
"azure". This does not allow the use of token-based auth.

By removing the "default" value, this allows environment variables to be
pulled at runtime for the openai_api_type and thus enables the other
api_types which are expected to work.

This fixes #6650

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-03 20:21:41 -07:00
Ofer Mendelevitch
29f51055e8
Updates to Vectara documentation (#8699)
- Description: updates to Vectara documentation with more details on how
to get started.
- Issue: NA
- Dependencies: NA
- Tag maintainer: @rlancemartin, @eyurtsev
- Twitter handle: @vectara, @ofermend

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-03 20:21:17 -07:00
Alec Flett
5d765408ce
propagate callbacks through load_summarize_chain (#7565)
This lets you pass callbacks when you create the summarize chain:

```
summarize = load_summarize_chain(llm, chain_type="map_reduce", callbacks=[my_callbacks])
summary = summarize(documents)
```
See #5572 for a similar surgical fix.

tagging @hwchase17 for callbacks work

<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-08-03 20:12:34 -07:00
Alec Flett
404d103c41
propagate RetrievalQA chain callbacks through its own LLMChain and StuffDocumentsChain (#7853)
This is another case, similar to #5572 and #7565 where the callbacks are
getting dropped during construction of the chains.

tagging @hwchase17 and @agola11 for callbacks propagation

<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-08-03 20:11:58 -07:00
Bal Narendra Sapa
47eea32f6a
add serializer methods (#7914)
Description: I have added two methods serializer and deserializer
methods. There was method called save local but it saves the to the
local disk. I wanted the vectorstore in the format using which i can
push it to the sql database's blob field. I have used this while i was
working on something

@rlancemartin, @eyurtsev

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-08-03 20:10:35 -07:00
Ryan Sloan
b786335dd1
fix RecursiveUrlLoader (#8582)
Description: the recursive url loader does not fully crawl for all urls
under base url
Maintainer: @baskaryan
2023-08-03 16:51:57 -07:00
William FH
f81e613086
Fix Async Retry Event Handling (#8659)
It fails currently because the event loop is already running.

The `retry` decorator alraedy infers an `AsyncRetrying` handler for
coroutines (see [tenacity
line](aa6f8f0a24/tenacity/__init__.py (L535)))
However before_sleep always gets called synchronously (see [tenacity
line](aa6f8f0a24/tenacity/__init__.py (L338))).


Instead, check for a running loop and use that it exists. Of course,
it's running an async method synchronously which is not _nice_. Given
how important LLMs are, it may make sense to have a task list or
something but I'd want to chat with @nfcampos on where that would live.

This PR also fixes the unit tests to check the handler is called and to
make sure the async test is run (it looks like it's just been being
skipped). It would have failed prior to the proposed fixes but passes
now.
2023-08-03 15:02:16 -07:00
ruze
8ef7e14a85
RSS Feed / OPML loader (#8694)
Replace this comment with:
- Description: added a document loader for a list of RSS feeds or OPML.
It iterates through the list and uses NewsURLLoader to load each
article.
  - Issue: N/A
  - Dependencies: feedparser, listparser
  - Tag maintainer: @rlancemartin, @eyurtsev
  - Twitter handle: @ruze

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-03 14:58:06 -07:00
sumandeng
53e4148a1b
add model_revison parameter to ModelScopeEmbeddings (#8669)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-03 14:17:48 -07:00
Yoshi
4e8f11b36a
Deterministic Fake Embedding Model (#8706)
Solves #8644 
This embedding models output identical random embedding vectors, given
the input texts are identical.
Useful when used in unittest.
@baskaryan
2023-08-03 13:36:45 -07:00
Leonid Kuligin
2928a1a3c9
added minimum expected version of SDK to the error description (#8712)
#7932

Co-authored-by: Leonid Kuligin <kuligin@google.com>
2023-08-03 13:28:42 -07:00
Harrison Chase
814faa9de5
relax deps for yaml (#8713)
context: https://github.com/yaml/pyyaml/issues/724

I think this is fine? I don't think we use yaml too heavily
2023-08-03 13:22:17 -07:00
Holt Skinner
8a8917e0d9
feat: Add Spell Correction Spec to Google Cloud Enterprise Search connector (#8705) 2023-08-03 13:38:45 -04:00
Bagatur
b2b71b0d35
Bagatur/eden llm (#8670)
Co-authored-by: RedhaWassim <rwasssim@gmail.com>
Co-authored-by: KyrianC <ckyrian@protonmail.com>
Co-authored-by: sam <melaine.samy@gmail.com>
2023-08-03 10:24:51 -07:00
William FH
8022293124
lint (#8702) 2023-08-03 09:33:28 -07:00
axa99
1f54ec899b
updated interface jupyter notebook explanations (#8689)
Updated the documentation in the interface.ipynb to clearly show the
_input_ and _output_ types for various components @baskaryan
2023-08-03 11:53:31 -04:00
William FH
a137492b53
Permit none key in chain mapper (#8696) 2023-08-03 08:50:36 -07:00
Bagatur
e283dc8d50
bump 251 (#8690) 2023-08-03 06:28:36 -07:00
Eugene Yurtsev
81e0cbf2d5
Minor typo fix (#8657)
Fix typo in doc-string.
2023-08-02 23:20:25 -07:00
Lance Martin
37aade19da
Minor formatting and additional figure for summarization use case (#8663) 2023-08-02 21:52:29 -07:00
Harrison Chase
43dffe39fb
Harrison/conversational retrieval agent (#8639)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-02 18:05:15 -07:00
ruze
71f98db2fe
Newspaper (#8647)
- Description: Added newspaper3k based news article loader. Provide a
list of urls.
  - Issue: N/A
  - Dependencies: newspaper3k,
  - Tag maintainer: @rlancemartin , @eyurtsev 
  - Twitter handle: @ruze

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-02 17:56:08 -07:00
shibuiwilliam
f68f3b23d7
add missing RemoteLangChainRetriever _get_relevant_documents test (#8628)
# What
- Add missing RemoteLangChainRetriever _get_relevant_documents test

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-02 17:20:40 -07:00
William FH
206901fa01
Use salt instead of datetime (#8653)
If you want to kick off two runs at the same time it'll cause errors.
Use a uuid instead
2023-08-02 17:15:50 -07:00
William FH
7ea2b08d1f
Use call directly for chain (#8655)
for run_on_dataset since the `run()` method requires a single output
2023-08-02 17:11:39 -07:00
William FH
368aa4ede7
fix enum error message (#8652)
could be a string so don't directly call value
2023-08-02 17:11:27 -07:00
millerick
5018af8839
docs: fix some grammar (#8654)
### Description
Fixes a grammar issue I noticed when reading through the documentation.

### Maintainers
@baskaryan

Co-authored-by: mmillerick <mmillerick@blend.com>
2023-08-02 16:48:01 -07:00
Erick Friis
96b0ff182e
Enterprise support form wording (#8641) 2023-08-02 15:18:20 -07:00
Lance Martin
59194c2214
Add summarization use-case (#8376)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-02 14:25:11 -07:00
Will Thompson
ee1d13678e
🐛 Docs Fixes [2 one-liners, examples broken] (#8519)
## Description: 
   
1)Map reduce example in docs is missing an important import statement.
Figured other people would benefit from being able to copy 🍝 the code.

2)RefineDocumentsChain example also broken.

## Issue: 

None

## Dependencies:

None. One liner.

## Tag maintainer:

@baskaryan

## Twitter handle: 

I mean, it's a one line fix lol. But @will_thompson_k is my twitter
handle.
2023-08-02 13:39:41 -07:00
Leonid Ganeline
1335f2b9f8
MLflow examples (#8642)
Updated `MLflow` examples with links to the examples from MLflow

 @baskaryan
2023-08-02 13:30:28 -07:00
Kacper Łukawski
16551536e3
Refactor Qdrant integration (#8634)
This small PR introduces new parameters into Qdrant (`on_disk`), fixes
some tests and changes the error message to be more clear.

Tagging: @baskaryan, @rlancemartin, @eyurtsev
2023-08-02 10:30:18 -07:00
Erick Friis
c5fb3b6069
Enterprise support form in airtable (#8607) 2023-08-02 09:49:59 -07:00
Eugene Yurtsev
1ec0b18379
Re-add __add__ functionality for messages (revert #8245) (#8489)
This PR reverts #8245, so `__add__` is defined on base messages.

Resolves issue: https://github.com/langchain-ai/langchain/issues/8472
2023-08-02 10:51:44 -04:00
Bagatur
f31047a394
bump 250 (#8632) 2023-08-02 07:47:36 -07:00
Comendeiro
5c516945d0
Add local support for audio models (PR #7329) (#7591)
- Description: run the poetry dependencies
  - Issue: #7329 
  - Dependencies: any dependencies required for this change,
  - Tag maintainer: @rlancemartin

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-02 01:24:53 -07:00
Naveen Tatikonda
d2adec3818
[Opensearch] : Fix the service validation in http_auth (#8609)
### Description
OpenSearch supports validation using both Master Credentials (Username
and password) and IAM. For Master Credentials users will not pass the
argument `service` in `http_auth` and the existing code will break. To
fix this, I have updated the condition to check if service attribute is
present in http_auth before accessing it.

### Maintainers
@baskaryan @navneet1v

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
2023-08-02 01:16:38 -07:00
Harrison Chase
7c5c0557cb
cast to string when measuring token length (#8617) 2023-08-02 00:12:59 -07:00
rjanardhan3
68113348cc
Fireworks integration (#8322)
Description - Integrates Fireworks within Langchain LLMs to allow users
to use Fireworks models with Langchain, mainly for summarization.

Issue - Not applicable
Dependencies - None
Tag maintainer - @rlancemartin

---------

Co-authored-by: Raj Janardhan <rajjanardhan@Rajs-Laptop.attlocal.net>
2023-08-01 21:17:26 -07:00
Bagatur
b574507c51
normalized openai embeddings embed_query (#8604)
we weren't normalizing when embedding queries
2023-08-01 17:12:10 -07:00
Neil Murphy
31820a31e4
Add firestore_client param to FirestoreChatMessageHistory if caller already has one; also lets them specify GCP project, etc. (#8601)
Existing implementation requires that you install `firebase-admin`
package, and prevents you from using an existing Firestore client
instance if available.

This adds optional `firestore_client` param to
`FirestoreChatMessageHistory`, so users can just use their existing
client/settings. If not passed, existing logic executes to initialize a
`firestore_client`.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-01 15:42:13 -07:00
Naveen Tatikonda
13ccf202de
[OpenSearch] : Fix AOSS Initialization (#8600)
### Description
This PR fixes the AOSS Initialization in Opensearch.

### Maintainers
@rlancemartin, @eyurtsev, @navneet1v

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
2023-08-01 15:33:51 -07:00
Joshua Carroll
6705928b9d
Add StreamlitChatMessageHistory (#8497)
Add a StreamlitChatMessageHistory class that stores chat messages in
[Streamlit's Session
State](https://docs.streamlit.io/library/api-reference/session-state).

Note: The integration test uses a currently-experimental Streamlit
testing framework to simulate the execution of a Streamlit app. Marking
this PR as draft until I confirm with the Streamlit team that we're
comfortable supporting it.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-01 14:28:15 -07:00
Matt Robinson
8961c720b8
docs: update unstructured install instructions (#8596)
### Summary

Updates the `unstructured` install instructions. For
`unstructured>=0.9.0`, dependencies are broken out by document type and
the base `unstructured` package includes fewer dependencies. `pip
install "unstructured[local-inference]"` has been replace by `pip
install "unstructured[all-docs]"`, though the `local-inference` extra is
still supported for the time being.

### Reviewers

- @rlancemartin
- @eyurtsev
- @hwchase17
2023-08-01 14:17:49 -07:00
Bagatur
73072d3db8
mv (#8595) 2023-08-01 14:17:04 -07:00
brettdbrewer
2de028834f
updated to use new llm_util query (#8591)
- Description: added memgraph_graph.py which defines the MemgraphGraph
class, subclassing off the existing Neo4jGraph class. This lets you
query the Memgraph graph database using natural language. It leverages
the Neo4j drivers and the bolt protocol.
- Dependencies: since it is a subclass off of Neo4jGraph, it is
dependent on it and the GraphCypherQA Chain implementations. It is
dependent on the Neo4j drivers being present. It is dependent on having
a running Memgraph instance to connect to.
  - Tag maintainer: @baskaryan
  - Twitter handle: @villageideate
- example usage can be seen in this repo
https://github.com/brettdbrewer/MemgraphGraph/

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-01 14:16:15 -07:00
Tesfagabir Meharizghi
a7000ee89e
Callback handler for Amazon SageMaker Experiments (#8587)
## Description

This PR implements a callback handler for SageMaker Experiments which is
similar to that of mlflow.
* When creating the callback handler, it takes the experiment's run
object as an argument. All the callback outputs are then logged to the
run object.
* The output of each callback action (e.g., `on_llm_start`) is saved to
S3 bucket as json file.
* Optionally, you can also log additional information such as the LLM
hyper-parameters to the same run object.
* Once the callback object is no more needed, you will need to call the
`flush_tracker()` method. This makes sure that any intermediate files
are deleted.
* A separate notebook example is provided to show how the callback is
used.

@3coins  @agola11

---------

Co-authored-by: Tesfagabir Meharizghi <mehariz@amazon.com>
2023-08-01 13:47:08 -07:00
Harrison Chase
9c2b29a1cb
Harrison/loader bug (#8559)
Co-authored-by: ddroghini <d.droghini@mflgroup.com>
Co-authored-by: Buckler89 <Droghini.diego@gmail.com>
2023-08-01 13:31:49 -07:00