Commit Graph

5065 Commits (8f50b616c5093b4a8a4162b29c01538cd5f980bf)
 

Author SHA1 Message Date
Erick Friis 8f50b616c5
Remove optional from vectara source (#11493)
fyi @ofermend

---------

Co-authored-by: Ofer Mendelevitch <ofer@vectara.com>
Co-authored-by: Ofer Mendelevitch <ofermend@gmail.com>
11 months ago
Maciej Dzieżyc bcd308c368
Fix Open in Colab link for ClearML docs 2 (#11491)
Description: Fixed the Open in Colab link for ClearML docs
Issue: https://github.com/allegroai/clearml/issues/1125
Twitter handle: DziezycMaciej
11 months ago
Bagatur 88ab69c288
mv docs extras (#11399) 11 months ago
Bagatur 53887242a1
bump 310 (#11486) 11 months ago
Bagatur 1bf8ef1a4f
rm brave (#11482) 11 months ago
Jesús Vélez Santiago a1c7532298
Add async sql record manager and async indexing API (#10726)
- **Description:** Add support for a SQLRecordManager in async
environments. It includes the creation of `RecorManagerAsync` abstract
class.
- **Issue:** None
- **Dependencies:** Optional `aiosqlite`.
- **Tag maintainer:** @nfcampos 
- **Twitter handle:** @jvelezmagic

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
11 months ago
Qihui Xie 57ade13b2b
fix llm_inputs duplication problem in intermediate_steps in SQLDatabaseChain (#10279)
Use `.copy()` to fix the bug that the first `llm_inputs` element is
overwritten by the second `llm_inputs` element in `intermediate_steps`.

***Problem description:***
In [line 127](

c732d8fffd/libs/experimental/langchain_experimental/sql/base.py (L127C17-L127C17)),
the `llm_inputs` of the sql generation step is appended as the first
element of `intermediate_steps`:
```
            intermediate_steps.append(llm_inputs)  # input: sql generation
```

However, `llm_inputs` is a mutable dict, it is updated in [line
179](https://github.com/langchain-ai/langchain/blob/master/libs/experimental/langchain_experimental/sql/base.py#L179)
for the final answer step:
```
                llm_inputs["input"] = input_text
```
Then, the updated `llm_inputs` is appended as another element of
`intermediate_steps` in [line
180](c732d8fffd/libs/experimental/langchain_experimental/sql/base.py (L180)):
```
                intermediate_steps.append(llm_inputs)  # input: final answer
```

As a result, the final `intermediate_steps` returned in [line
189](c732d8fffd/libs/experimental/langchain_experimental/sql/base.py (L189C43-L189C43))
actually contains two same `llm_inputs` elements, i.e., the `llm_inputs`
for the sql generation step overwritten by the one for final answer step
by mistake. Users are not able to get the actual `llm_inputs` for the
sql generation step from `intermediate_steps`

Simply calling `.copy()` when appending `llm_inputs` to
`intermediate_steps` can solve this problem.
11 months ago
Florian d78f418c0d
Extract abstracts from Pubmed articles, even if they have no extra label (#10245)
### Description
This pull request involves modifications to the extraction method for
abstracts/summaries within the PubMed utility. A condition has been
added to verify the presence of unlabeled abstracts. Now an abstract
will be extracted even if it does not have a subtitle. In addition, the
extraction of the abstract was extended to books.

### Issue
The PubMed utility occasionally returns an empty result when extracting
abstracts from articles, despite the presence of an abstract for the
paper on PubMed. This issue arises due to the varying structure of
articles; some articles follow a "subtitle/label: text" format, while
others do not include subtitles in their abstracts. An example of the
latter case can be found at:
[https://pubmed.ncbi.nlm.nih.gov/37666905/](url)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
11 months ago
Viktor Zhemchuzhnikov fd9da60aea
Add async support to SelfQueryRetriever (#10175)
### Description

SelfQueryRetriever is missing async support, so I am adding it.
I also removed deprecated predict_and_parse method usage here, and added
some tests.

### Issue
N/A

### Tag maintainer
Not yet

### Twitter handle
N/A
11 months ago
Theron Tau 35297ca0d3
Add feature for extracting images from pdf and recognizing text from images. (#10653)
**Description**

It is for #10423 that it will be a useful feature if we can extract
images from pdf and recognize text on them. I have implemented it with
`PyPDFLoader`, `PyPDFium2Loader`, `PyPDFDirectoryLoader`,
`PyMuPDFLoader`, `PDFMinerLoader`, and `PDFPlumberLoader`.
[RapidOCR](https://github.com/RapidAI/RapidOCR.git) is used to recognize
text on extracted images. It is time-consuming for ocr so a boolen
parameter `extract_images` is set to control whether to extract and
recognize. I have tested the time usage for each parser on my own laptop
thinkbook 14+ with AMD R7-6800H by unit test and the result is:

| extract_images | PyPDFParser | PDFMinerParser | PyMuPDFParser |
PyPDFium2Parser | PDFPlumberParser |
| ------------- | ------------- | ------------- | ------------- |
------------- | ------------- |
| False | 0.27s | 0.39s | 0.06s | 0.08s | 1.01s |
| True  | 17.01s  | 20.67s | 20.32s | 19,75s | 20.55s |

**Issue**

#10423 

**Dependencies**

rapidocr_onnxruntime in
[RapidOCR](https://github.com/RapidAI/RapidOCR/tree/main)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
11 months ago
Bagatur 8e3fbc97ca
Add vowpal_wabbit RL chain (#11462) 11 months ago
Haris Wang f1269830a0
Fix bug in MarkdownHeaderTextSplitter for codeblock (#10262)
- Description: The previous version of the MarkdownHeaderTextSplitter
did not take into account the possibility of '#' appearing within code
blocks, which caused segmentation anomalies in these situations. This PR
has fixed this issue.
  - Issue: 
  - Dependencies: No
  - Tag maintainer: 
  - Twitter handle: 

cc @baskaryan @eyurtsev  @rlancemartin

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
11 months ago
Eddie Cohen 656d2303f7
add in, nin for pinecone (#10303)
Description: Adds the in and nin comparators for pinecone seen
[here](https://docs.pinecone.io/docs/metadata-filtering#metadata-query-language)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
11 months ago
Bagatur a3a2ce623e Revise vowpal_wabbit notebook 11 months ago
Bagatur 8fafa1af91 merge 11 months ago
olgavrou 3b07c0cf3d
RL Chain with VowpalWabbit (#10242)
- Description: This PR adds a new chain `rl_chain.PickBest` for learned
prompt variable injection, detailed description and usage can be found
in the example notebook added. It essentially adds a
[VowpalWabbit](https://github.com/VowpalWabbit/vowpal_wabbit) layer
before the llm call in order to learn or personalize prompt variable
selections.

Most of the code is to make the API simple and provide lots of defaults
and data wrangling that is needed to use Vowpal Wabbit, so that the user
of the chain doesn't have to worry about it.

- Dependencies:
[vowpal-wabbit-next](https://pypi.org/project/vowpal-wabbit-next/),
     - sentence-transformers (already a dep)
     - numpy (already a dep)
  - tagging @ataymano who contributed to this chain
  - Tag maintainer: @baskaryan
  - Twitter handle: @olgavrou


Added example notebook and unit tests
11 months ago
Manikanta5112 56048b909f
added ContentFormatter escape special characters for message content (#10319)
---------

Co-authored-by: Manikanta5112 <42089393+mani5112@users.noreply.github.com>
11 months ago
Leonid Ganeline d17416ec79
docstrings `callbacks` (#11456)
Added missed docstrings to the `callbacks/`

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
11 months ago
Ofer Mendelevitch 3c7653bf0f
"source" argument in constructor of Vectara (#11454)
Replace this entire comment with:
- **Description:** minor update to constructor to allow for
specification of "source"
  - **Tag maintainer:** @baskaryan
  - **Twitter handle:** @ofermend
11 months ago
Eugene Yurtsev d9018ae5f1
Improve CLI ux (#11452)
Improve UX for cli
11 months ago
Jaikanth J 9f85f7c543
fix(cache): use dumps for RedisCache (#10408)
# Description
Attempts to fix RedisCache for ChatGenerations using `loads` and `dumps`
used in SQLAlchemy cache by @hwchase17 . this is better than pickle
dump, because this won't execute any arbitrary code during
de-serialisation.

# Issues
#7722 & #8666 

# Dependencies
None, but removes the warning introduced in #8041 by @baskaryan

Handle: @jaikanthjay46
11 months ago
rodrigo-clickup 5944c1851b
Add ClickUp Toolkit (#10662)
- **Description:** Adds a toolkit to interact with the
[ClickUp](https://clickup.com/) [Public API](https://clickup.com/api/)
- **Dependencies:** None
- **Tag maintainer:** @rodrigo-georgian, @rodrigo-clickup,
@aiswaryasankarwork
- **Twitter handle:** 
- Aiswarya (https://twitter.com/Aiswarya_Sankar,
https://www.linkedin.com/in/sankaraiswarya/)
   - Rodrigo (https://www.linkedin.com/in/rodrigo-ceballos-lentini/)


---------

Co-authored-by: Aiswarya Sankar <aiswaryasankar@Aiswaryas-MacBook-Pro.local>
Co-authored-by: aiswaryasankarwork <143119412+aiswaryasankarwork@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
11 months ago
John Reynolds 68901e1e40
Update output_parser.py (#10430)
- Description: Updated output parser for mrkl to remove any
hallucination actions after the final answer; this was encountered when
using Anthropic claude v2 for planning; reopening PR with updated unit
tests
- Issue: #10278 
- Dependencies: N/A
- Twitter handle: @johnreynolds
11 months ago
Joshua Sundance Bailey 790010703b
ArcGISLoader: Limit number of results in query (#10615)
Description: this PR changes the `ArcGISLoader` to set
`return_all_records` to `False` when `result_record_count` is provided
as a keyword argument. Previously, `return_all_records` was `True` by
default and this made the API ignore `result_record_count`.

Issue: `ArcGISLoader` would ignore `result_record_count` unless user
also passed `return_all_records=False`.
11 months ago
Beck Bekmyradov f9df55f7d2
Fix a Typo in Documentation (#11453)
- **Description:** This commit corrects a minor typo in the
documentation. It changes "frum" to "from" in the sentence: "The results
from search are passed back to the LLM for synthesis into an answer" in
the file `docs/extras/use_cases/more/agents/agents.ipynb`. This typo fix
enhances the clarity and accuracy of the documentation.
- **Tag maintainer:** @baskaryan
11 months ago
Bagatur f5ce286932
fix api docs build (#11445) 11 months ago
mrbean 9903a70379
Add youdotcom retriever (#11304)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
11 months ago
ashish-dahal 1655ff2ded
Fix PyMuPDFLoader kwargs (#11434)
- **Description:** Fix the `PyMuPDFLoader` to accept `loader_kwargs`
from the document loader's `loader_kwargs` option. This provides more
flexibility in formatting the output from documents.

- **Issue:** The `loader_kwargs` is not passed into the `load` method
from the document loader, which limits configuration options.

- **Dependencies:**  None

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
11 months ago
Leonid Kuligin e4a46747dc
integration test for DocAI parser (#11424)
- **Description:** added an integration test
  - **Issue:** #11407 

@baskaryan
11 months ago
Aashish Saini 2abbdc6ecb
Update bageldb.py (#11421)
I have restructured the code to ensure uniform handling of ImportError.
In place of previously used ValueError, I've adopted the standard
practice of raising ImportError with explanatory messages. This
modification enhances code readability and clarifies that any problems
stem from module importation.
11 months ago
Syed Ather Rizvi bfd48925e5
Feature/csharp text splitter doc (#10571)
- **Description:** Just docs related to csharp code splitter
   
- **Issue:** It's related to a request made by @baskaryan in a comment
on my previous PR #10350
  - **Dependencies:** None
  - **Twitter handle:** @ather19

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
11 months ago
Nuno Campos 2c11302598
Update langchain_release.yml (#11444) 11 months ago
maks-operlejn-ds 2aae1102b0
Instance anonymization (#10501)
### Description

Add instance anonymization - if `John Doe` will appear twice in the
text, it will be treated as the same entity.
The difference between `PresidioAnonymizer` and
`PresidioReversibleAnonymizer` is that only the second one has a
built-in memory, so it will remember anonymization mapping for multiple
texts:

```
>>> anonymizer = PresidioAnonymizer()
>>> anonymizer.anonymize("My name is John Doe. Hi John Doe!")
'My name is Noah Rhodes. Hi Noah Rhodes!'
>>> anonymizer.anonymize("My name is John Doe. Hi John Doe!")
'My name is Brett Russell. Hi Brett Russell!'
```
```
>>> anonymizer = PresidioReversibleAnonymizer()
>>> anonymizer.anonymize("My name is John Doe. Hi John Doe!")
'My name is Noah Rhodes. Hi Noah Rhodes!'
>>> anonymizer.anonymize("My name is John Doe. Hi John Doe!")
'My name is Noah Rhodes. Hi Noah Rhodes!'
```

### Twitter handle
@deepsense_ai / @MaksOpp

### Tag maintainer
@baskaryan @hwchase17 @hinthornw

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
11 months ago
Kyle Pancamo 203258b4d6
Update pdf.py comment for PyPDFLoader (#10495)
PyPDF does not chunk at the character level to my understanding.

Description: PyPDF does not chunk at the character level, but instead
breaks up content by page. Fixup comment

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
11 months ago
Juan Daza 4236ae3851
Added Streaming Capability to SageMaker LLMs (#10535)
This PR adds the ability to declare a Streaming response in the
SageMaker LLM by leveraging the `invoke_endpoint_with_response_stream`
capability in `boto3`. It is heavily based on the AWS Blog Post
announcement linked
[here](https://aws.amazon.com/blogs/machine-learning/elevating-the-generative-ai-experience-introducing-streaming-support-in-amazon-sagemaker-hosting/).

It does not add any additional dependencies since it uses the existing
`boto3` version.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
11 months ago
Laurentiu Piciu d9670a5945
openai_functions_multi_agent: solved the case when the "arguments" is valid JSON but it does not contain `actions` key (#10543)
Description: There are cases when the output from the LLM comes fine
(i.e. function_call["arguments"] is a valid JSON object), but it does
not contain the key "actions". So I split the validation in 2 steps:
loading arguments as JSON and then checking for "actions" in it.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
11 months ago
Eugene Yurtsev fcccde406d
Add SymbolicMathChain to experiment in preparation for deprecation (#11129)
Move symbolic math chain to experimental
11 months ago
Holt Skinner 9f73fec057
fix: Update Google Cloud Enterprise Search to Vertex AI Search (#10513)
- Description: Google Cloud Enterprise Search was renamed to Vertex AI
Search
-
https://cloud.google.com/blog/products/ai-machine-learning/vertex-ai-search-and-conversation-is-now-generally-available
- This PR updates the documentation and Retriever class to use the new
terminology.
- Changed retriever class from `GoogleCloudEnterpriseSearchRetriever` to
`GoogleVertexAISearchRetriever`
- Updated documentation to specify that `extractive_segments` requires
the new [Enterprise
edition](https://cloud.google.com/generative-ai-app-builder/docs/about-advanced-features#enterprise-features)
to be enabled.
  - Fixed spelling errors in documentation.
- Change parameter for Retriever from `search_engine_id` to
`data_store_id`
- When this retriever was originally implemented, there was no
distinction between a data store and search engine, but now these have
been split.
- Fixed an issue blocking some users where the api_endpoint can't be set
11 months ago
Patrick Randell 1d678f805f
Additional Weaviate Filter Comparators (#10522)
### Description
When using Weaviate Self-Retrievers, certain common filter comparators
generated by user queries were unimplemented, resulting in errors. This
PR implements some of them. All linting and format commands have been
run and tests passed.
### Issue
#10474
### Dependencies
timestamp module

---------

Co-authored-by: Patrick Randell <prandell@deloitte.com.au>
11 months ago
Nuno Campos 79011f835f
Remove str() from RunnableConfigurableAlternatives (#11446) 11 months ago
Mateusz Wosinski 656480feb6
Add language detection example (#10540)
### Description

Adds language detection examples based on
[langdetect](https://github.com/Mimino666/langdetect/tree/master/langdetect)
and [fasttext](https://github.com/facebookresearch/fastText/) libraries.
These frameworks can be especially useful together with components that
require selection of the language (e.g. data-anonymizer)

### Twitter handle

@deepsense_ai, @matt_wosinski
11 months ago
Harrison Chase 31d5bd84d7
make vectorstores optional (#11393) 11 months ago
Eugene Yurtsev 8aa545901a
Update agent type docs (#11137)
In code docs for agent types
11 months ago
Eugene Yurtsev 3e31d6e35f
Start deprecation of LLMBashChain (#11300)
In preparation for migration LLMBashChain and related tools add a
derprecation warning to the code.
11 months ago
Bagatur 8b6b8bf68c
bump 309 (#11443) 11 months ago
billytrend-cohere 2ff91a46c0
Add cohere /chat integration (#11389)
Add cohere /chat integration and an iPython notebook to demonstrate the
addition.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
11 months ago
adrienohana ca346011b7
added interactive login for azure cognitive search vector store (#11360)
**Description:** Previously if the access to Azure Cognitive Search was
not done via an API key, the default credential was called which doesn't
allow to use an interactive login. I simply added the option to use
"INTERACTIVE" as a key name, and this will launch a login window upon
initialization of the AzureSearch object.
11 months ago
ElliotKetchup 53d4f1554a
Update aws.mdx (#11431) 11 months ago
Lance Martin 211a74941a
Update QA doc w/ Runnables (#11401)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
11 months ago
Eugene Yurtsev 5a1f614175
Add docker compose to CLI (#11406)
Add docker compose to cli
11 months ago