Commit Graph

4434 Commits

Author SHA1 Message Date
olgavrou
d6320cc2c0 .. 2023-09-04 23:47:26 -04:00
olgavrou
7a4387c60d notebook fix 2023-09-04 23:46:04 -04:00
olgavrou
235dacc74a
Merge branch 'langchain-ai:master' into master 2023-09-05 11:14:08 -04:00
Bagatur
c8d7ee62ba
bump 282 (#10233) 2023-09-05 07:58:00 -07:00
Predrag Gruevski
e34ad6fefd
Temporarily disable step that seems to be transiently failing. (#10234) 2023-09-05 10:55:47 -04:00
Nuno Campos
5d8673a3c1
Fix usage of AsyncHtmlLoader with an already running event loop (#10220) 2023-09-05 07:25:28 -07:00
olgavrou
3a4c895280
Merge pull request #11 from VowpalWabbit/add_notebook
add random policy and notebook example
2023-09-05 09:36:20 -04:00
vintro
ac2310a405
add NumberedListOutputParser to output_parser init (#10204)
`from langchain.output_parsers import NumberedListOutputParser` did not
work, needed to add it to the init file
2023-09-05 01:12:41 -07:00
Junlin Zhou
8b95dabfe3
update(llms/TGI): Allow None as temperature value (#10212)
Text Generation Inference's client permits the use of a None temperature
as seen
[here](033230ae66/clients/python/text_generation/client.py (L71C9-L71C20)).
While I haved dived into TGI's server code and don't know about the
implications of using None as a temperature setting, I think we should
grant users the option to pass None as a temperature parameter to TGI.
2023-09-05 01:07:57 -07:00
William FH
be152b6a56
Better ls info (#10202) 2023-09-04 18:21:15 -07:00
olgavrou
9c45d5a27e restore hash keys 2023-09-04 20:58:05 -04:00
olgavrou
f22fcb8bcd no cache 2023-09-04 20:52:18 -04:00
olgavrou
8dc5365ee2 no cache key 2023-09-04 20:50:25 -04:00
olgavrou
5b6ebbc825 fixes in notebook 2023-09-04 19:42:43 -04:00
Christophe Bornet
f389c4fcab
Fix S3DirectoryLoader exception (#10193)
#9304 introduced a critical bug. The S3DirectoryLoader fails completely
because boto3 checks the naming of kw arguments and one of the args is
badly named (very sorry for that)

cc @baskaryan
2023-09-04 15:59:22 -07:00
olgavrou
5c2069890f policy fixes 2023-09-04 18:46:45 -04:00
olgavrou
736e0dd46e fix 2023-09-04 18:40:53 -04:00
olgavrou
5b1812f95b fix linting checks 2023-09-04 18:35:59 -04:00
olgavrou
f1d144cd6c run notebook and change location 2023-09-04 18:33:05 -04:00
Manuel Soria
dde1992fdd
Adding custom tools to SQL Agent (#10198)
Changes in:
- `create_sql_agent` function so that user can easily add custom tools
as complement for the toolkit.
- updating **sql use case** notebook to showcase 2 examples of extra
tools.

Motivation for these changes is having the possibility of including
domain expert knowledge to the agent, which improves accuracy and
reduces time/tokens.

---------

Co-authored-by: Manuel Soria <manuel.soria@greyscaleai.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-09-04 15:28:28 -07:00
olgavrou
62cf108700 add random policy and notebook 2023-09-04 18:08:46 -04:00
olgavrou
af4b560b86 fix poetry after merge 2023-09-04 17:28:11 -04:00
ElReyZero
5dbae94e04
OpenAIEmbeddings: Add optional an optional parameter to skip empty embeddings (#10196)
## Description

### Issue
This pull request addresses a lingering issue identified in PR #7070. In
that previous pull request, an attempt was made to address the problem
of empty embeddings when using the `OpenAIEmbeddings` class. While PR
#7070 introduced a mechanism to retry requests for embeddings, it didn't
fully resolve the issue as empty embeddings still occasionally
persisted.

### Problem
In certain specific use cases, empty embeddings can be encountered when
requesting data from the OpenAI API. In some cases, these empty
embeddings can be skipped or removed without affecting the functionality
of the application. However, they might not always be resolved through
retries, and their presence can adversely affect the functionality of
applications relying on the `OpenAIEmbeddings` class.

### Solution
To provide a more robust solution for handling empty embeddings, we
propose the introduction of an optional parameter, `skip_empty`, in the
`OpenAIEmbeddings` class. When set to `True`, this parameter will enable
the behavior of automatically skipping empty embeddings, ensuring that
problematic empty embeddings do not disrupt the processing flow. The
developer will be able to optionally toggle this behavior if needed
without disrupting the application flow.

## Changes Made
- Added an optional parameter, `skip_empty`, to the `OpenAIEmbeddings`
class.
- When `skip_empty` is set to `True`, empty embeddings are automatically
skipped without causing errors or disruptions.

### Example Usage
```python
from openai.embeddings import OpenAIEmbeddings

# Initialize the OpenAIEmbeddings class with skip_empty=True
embeddings = OpenAIEmbeddings(api_key="your_api_key", skip_empty=True)

# Request embeddings, empty embeddings are automatically skipped. docs is a variable containing the already splitted text.
results = embeddings.embed_documents(docs)

# Process results without interruption from empty embeddings
```
2023-09-04 14:10:36 -07:00
Lance Martin
8998060d85
Update docs w/ prompt hub (#10197)
Small updates to docs
2023-09-04 14:09:08 -07:00
olgavrou
00d56fb0fc merge from upstream 2023-09-04 16:48:59 -04:00
olgavrou
b59e2b5afa
Merge pull request #10 from VowpalWabbit/dot_prods_auto_embed
Dot prods auto embed
2023-09-05 05:01:42 -04:00
olgavrou
ae5edefdcd cleanup 2023-09-04 16:36:29 -04:00
Bagatur
a94dc6ee44
model garden nit (#10194) 2023-09-04 11:42:35 -07:00
Louis
bb8c095127
Add 'download_dir' argument to VLLM (#9754)
- Description:
Add a 'download_dir' argument to VLLM model (to change the cache
download directotu when retrieving a model from HF hub)
- Issue:
On some remote machine, I want the cache dir to be in a volume where I
have space (models are heavy nowadays). Sometimes the default HF cache
dir might not be what we want.
- Dependencies:
None

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-09-04 10:53:48 -07:00
Aashish Saini
8bba69ffd0
Fixed some grammatical typos in doc files (#10191)
Fixed some grammatical typos in doc files
CC: @baskaryan, @eyurtsev, @rlancemartin.

---------

Co-authored-by: Aashish Saini <141953346+AashishSainiShorthillsAI@users.noreply.github.com>
Co-authored-by: AryamanJaiswalShorthillsAI <142397527+AryamanJaiswalShorthillsAI@users.noreply.github.com>
Co-authored-by: Adarsh Shrivastav <142413097+AdarshKumarShorthillsAI@users.noreply.github.com>
Co-authored-by: Vishal <141389263+VishalYadavShorthillsAI@users.noreply.github.com>
Co-authored-by: ChetnaGuptaShorthillsAI <142381084+ChetnaGuptaShorthillsAI@users.noreply.github.com>
Co-authored-by: PankajKumarShorthillsAI <142473460+PankajKumarShorthillsAI@users.noreply.github.com>
Co-authored-by: AbhishekYadavShorthillsAI <142393903+AbhishekYadavShorthillsAI@users.noreply.github.com>
Co-authored-by: AmitSinghShorthillsAI <142410046+AmitSinghShorthillsAI@users.noreply.github.com>
Co-authored-by: Md Nazish Arman <142379599+MdNazishArmanShorthillsAI@users.noreply.github.com>
Co-authored-by: KamalSharmaShorthillsAI <142474019+KamalSharmaShorthillsAI@users.noreply.github.com>
Co-authored-by: Lakshya <lakshyagupta87@yahoo.com>
Co-authored-by: Aayush <142384656+AayushShorthillsAI@users.noreply.github.com>
Co-authored-by: AnujMauryaShorthillsAI <142393269+AnujMauryaShorthillsAI@users.noreply.github.com>
2023-09-04 10:48:08 -07:00
Bagatur
098b4aa465
bump 281 (#10189) 2023-09-04 08:51:50 -07:00
Aashish Saini
699f58fb83
Fixed Import Error type (#10168)
I have restructured the code to ensure uniform handling of ImportError.
In place of previously used ValueError, I've adopted the standard
practice of raising ImportError with explanatory messages. This
modification enhances code readability and clarifies that any problems
stem from module importation.

---------

Co-authored-by: Aashish Saini <141953346+AashishSainiShorthillsAI@users.noreply.github.com>
Co-authored-by: AryamanJaiswalShorthillsAI <142397527+AryamanJaiswalShorthillsAI@users.noreply.github.com>
Co-authored-by: Adarsh Shrivastav <142413097+AdarshKumarShorthillsAI@users.noreply.github.com>
Co-authored-by: Vishal <141389263+VishalYadavShorthillsAI@users.noreply.github.com>
Co-authored-by: ChetnaGuptaShorthillsAI <142381084+ChetnaGuptaShorthillsAI@users.noreply.github.com>
Co-authored-by: PankajKumarShorthillsAI <142473460+PankajKumarShorthillsAI@users.noreply.github.com>
Co-authored-by: AbhishekYadavShorthillsAI <142393903+AbhishekYadavShorthillsAI@users.noreply.github.com>
Co-authored-by: AmitSinghShorthillsAI <142410046+AmitSinghShorthillsAI@users.noreply.github.com>
Co-authored-by: Aayush <142384656+AayushShorthillsAI@users.noreply.github.com>
Co-authored-by: AnujMauryaShorthillsAI <142393269+AnujMauryaShorthillsAI@users.noreply.github.com>
2023-09-04 08:43:28 -07:00
刘 方瑞
de9e545542
MyScale hot fix on type check (#10180)
Previous PR #9353 has incomplete type checks and deprecation warnings.
This PR will fix those type check and add deprecation warning to myscale
vectorstore
2023-09-04 08:40:58 -07:00
JunXiang
cb928ed3d5
Fix: the duplicate characters wrong results when using pdfplumber loader (#10165)
(Reopen PR #7706, hope this problem can fix.)

When using `pdfplumber`, some documents may be parsed incorrectly,
resulting in **duplicated characters**.

Taking the
[linked](https://bruusgaard.no/wp-content/uploads/2021/05/Datasheet1000-series.pdf)
document as an example:

## Before
```python
from langchain.document_loaders import PDFPlumberLoader

pdf_file = 'file.pdf'
loader = PDFPlumberLoader(pdf_file)
docs = loader.load()
print(docs[0].page_content)
```

Results:
```
11000000 SSeerriieess
PPoorrttaabbllee ssiinnggllee ggaass ddeetteeccttoorrss ffoorr HHyyddrrooggeenn aanndd CCoommbbuussttiibbllee ggaasseess
TThhee RRiikkeenn KKeeiikkii GGPP--11000000 iiss aa ccoommppaacctt aanndd
lliigghhttwweeiigghhtt ggaass ddeetteeccttoorr wwiitthh hhiigghh sseennssiittiivviittyy ffoorr
tthhee ddeetteeccttiioonn ooff hhyyddrrooccaarrbboonnss.. TThhee mmeeaassuurreemmeenntt
iiss ppeerrffoorrmmeedd ffoorr tthhiiss ppuurrppoossee bbyy mmeeaannss ooff ccaattaallyyttiicc
sseennssoorr.. TThhee GGPP--11000000 hhaass aa bbuuiilltt--iinn ppuummpp wwiitthh
ppuummpp bboooosstteerr ffuunnccttiioonn aanndd aa ddiirreecctt sseelleeccttiioonn ffrroomm
aa lliisstt ooff 2255 hhyyddrrooccaarrbboonnss ffoorr eexxaacctt aalliiggnnmmeenntt ooff tthhee
ttaarrggeett ggaass -- OOnnllyy ccaalliibbrraattiioonn oonn CCHH iiss nneecceessssaarryy..
44
FFeeaattuurreess
TThhee RRiikkeenn KKeeiikkii 110000vvvvttaabbllee ssiinnggllee HHyyddrrooggeenn aanndd
CCoommbbuussttiibbllee ggaass ddeetteeccttoorrss..
TThheerree aarree 33 ssttaannddaarrdd mmooddeellss::
GGPP--11000000:: 00--1100%%LLEELL // 00--110000%%LLEELL ›› LLEELL ddeetteeccttoorr
NNCC--11000000:: 00--11000000ppppmm // 00--1100000000ppppmm ›› PPPPMM
ddeetteeccttoorr
DDiirreecctt rreeaaddiinngg ooff tthhee ccoonncceennttrraattiioonn vvaalluueess ooff
ccoommbbuussttiibbllee ggaasseess ooff 2255 ggaasseess ((55 NNPP--11000000))..
EEaassyy ooppeerraattiioonn ffeeaattuurree ooff cchhaannggiinngg tthhee ggaass nnaammee
ddiissppllaayy wwiitthh 11 sswwiittcchh bbuuttttoonn..
LLoonngg ddiissttaannccee ddrraawwiinngg ppoossssiibbllee wwiitthh tthhee ppuummpp
bboooosstteerr ffuunnccttiioonn..
VVaarriioouuss ccoommbbuussttiibbllee ggaasseess ccaann bbee mmeeaassuurreedd bbyy tthhee
ppppmm oorrddeerr wwiitthh NNCC--11000000..
www.bruusgaard.no postmaster@bruusgaard.no +47 67 54 93 30 Rev: 446-2
```

We can see that there are a large number of duplicated characters in the
text, which can cause issues in subsequent applications.

## After

Therefore, based on the
[solution](https://github.com/jsvine/pdfplumber/issues/71) provided by
the `pdfplumber` source project. I added the `"dedupe_chars()"` method
to address this problem. (Just pass the parameter `dedupe` to `True`)

```python
from langchain.document_loaders import PDFPlumberLoader

pdf_file = 'file.pdf'
loader = PDFPlumberLoader(pdf_file, dedupe=True)
docs = loader.load()
print(docs[0].page_content)
```

Results:

```
1000 Series
Portable single gas detectors for Hydrogen and Combustible gases
The Riken Keiki GP-1000 is a compact and
lightweight gas detector with high sensitivity for
the detection of hydrocarbons. The measurement
is performed for this purpose by means of catalytic
sensor. The GP-1000 has a built-in pump with
pump booster function and a direct selection from
a list of 25 hydrocarbons for exact alignment of the
target gas - Only calibration on CH is necessary.
4
Features
The Riken Keiki 100vvtable single Hydrogen and
Combustible gas detectors.
There are 3 standard models:
GP-1000: 0-10%LEL / 0-100%LEL › LEL detector
NC-1000: 0-1000ppm / 0-10000ppm › PPM
detector
Direct reading of the concentration values of
combustible gases of 25 gases (5 NP-1000).
Easy operation feature of changing the gas name
display with 1 switch button.
Long distance drawing possible with the pump
booster function.
Various combustible gases can be measured by the
ppm order with NC-1000.
www.bruusgaard.no postmaster@bruusgaard.no +47 67 54 93 30 Rev: 446-2
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-09-04 08:37:00 -07:00
olgavrou
e10980d445 fix linting error 2023-09-04 08:56:34 -04:00
olgavrou
0f7cde023b fix linting errors 2023-09-04 08:43:48 -04:00
olgavrou
4e9aecda90 formatting 2023-09-04 08:35:29 -04:00
olgavrou
67dc1a9dd2 cleanup 2023-09-04 07:36:47 -04:00
olgavrou
ca163f0ee6 fixes and tests 2023-09-04 07:10:44 -04:00
olgavrou
b162f1c8e1 dot product of encodings as default auto_embed 2023-09-04 05:50:15 -04:00
Aashish Saini
27944cb611
Fixed Import Error (#10167)
I have restructured the code to ensure uniform handling of ImportError.
In place of previously used ValueError, I've adopted the standard
practice of raising ImportError with explanatory messages. This
modification enhances code readability and clarifies that any problems
stem from module importation.

---------

Co-authored-by: Aashish Saini <141953346+AashishSainiShorthillsAI@users.noreply.github.com>
Co-authored-by: AryamanJaiswalShorthillsAI <142397527+AryamanJaiswalShorthillsAI@users.noreply.github.com>
Co-authored-by: Adarsh Shrivastav <142413097+AdarshKumarShorthillsAI@users.noreply.github.com>
Co-authored-by: Vishal <141389263+VishalYadavShorthillsAI@users.noreply.github.com>
Co-authored-by: ChetnaGuptaShorthillsAI <142381084+ChetnaGuptaShorthillsAI@users.noreply.github.com>
Co-authored-by: PankajKumarShorthillsAI <142473460+PankajKumarShorthillsAI@users.noreply.github.com>
Co-authored-by: AbhishekYadavShorthillsAI <142393903+AbhishekYadavShorthillsAI@users.noreply.github.com>
Co-authored-by: AmitSinghShorthillsAI <142410046+AmitSinghShorthillsAI@users.noreply.github.com>
Co-authored-by: Aayush <142384656+AayushShorthillsAI@users.noreply.github.com>
Co-authored-by: AnujMauryaShorthillsAI <142393269+AnujMauryaShorthillsAI@users.noreply.github.com>
2023-09-04 00:32:09 -07:00
Massimiliano Pronesti
10e0431e48
feat(llms): add model_kwargs to hf tgi (#10139)
@baskaryan
Following what we discussed in #9724 and your suggestion, I've added a
`model_kwargs` parameter to hf tgi.
2023-09-04 00:24:13 -07:00
Eugene Yurtsev
e0f6ba08d6
FileSysteBlobLoader: Expand user path (#10133)
Fix for: https://github.com/langchain-ai/langchain/issues/10019

Verified fix manually
2023-09-04 00:21:33 -07:00
Krish Dholakia
31bbe80758
add additional model support to chatlitellm (#10134)
---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-09-04 00:16:40 -07:00
IlyaKIS1
de3322609e
Implemented Milvus translator for self-querying (#10162)
- Implemented the MilvusTranslator for self-querying using Milvus vector
store
- Made unit tests to test its functionality
- Documented the Milvus self-querying
2023-09-04 00:16:18 -07:00
Aashish Saini
7403faa063
Fixed typo in get_started.mdx (#10163)
Fix typo: 'Whats up' -> 'What's up'

Thanks
CC: @baskaryan, @eyurtsev, @rlancemartin.

---------

Co-authored-by: Aashish Saini <141953346+AashishSainiShorthillsAI@users.noreply.github.com>
Co-authored-by: AryamanJaiswalShorthillsAI <142397527+AryamanJaiswalShorthillsAI@users.noreply.github.com>
Co-authored-by: Adarsh Shrivastav <142413097+AdarshKumarShorthillsAI@users.noreply.github.com>
Co-authored-by: Vishal <141389263+VishalYadavShorthillsAI@users.noreply.github.com>
Co-authored-by: ChetnaGuptaShorthillsAI <142381084+ChetnaGuptaShorthillsAI@users.noreply.github.com>
Co-authored-by: PankajKumarShorthillsAI <142473460+PankajKumarShorthillsAI@users.noreply.github.com>
Co-authored-by: AbhishekYadavShorthillsAI <142393903+AbhishekYadavShorthillsAI@users.noreply.github.com>
Co-authored-by: AmitSinghShorthillsAI <142410046+AmitSinghShorthillsAI@users.noreply.github.com>
Co-authored-by: Aayush <142384656+AayushShorthillsAI@users.noreply.github.com>
2023-09-04 00:09:50 -07:00
Aashish Saini
f6f0b0f975
Fixed typo in bittensor.mdx (#10160)
Fixed Typo in bittenaor.mdx

---------

Co-authored-by: Aashish Saini <141953346+AashishSainiShorthillsAI@users.noreply.github.com>
Co-authored-by: AryamanJaiswalShorthillsAI <142397527+AryamanJaiswalShorthillsAI@users.noreply.github.com>
Co-authored-by: Adarsh Shrivastav <142413097+AdarshKumarShorthillsAI@users.noreply.github.com>
Co-authored-by: Vishal <141389263+VishalYadavShorthillsAI@users.noreply.github.com>
Co-authored-by: ChetnaGuptaShorthillsAI <142381084+ChetnaGuptaShorthillsAI@users.noreply.github.com>
Co-authored-by: PankajKumarShorthillsAI <142473460+PankajKumarShorthillsAI@users.noreply.github.com>
Co-authored-by: AbhishekYadavShorthillsAI <142393903+AbhishekYadavShorthillsAI@users.noreply.github.com>
Co-authored-by: Aayush <142384656+AayushShorthillsAI@users.noreply.github.com>
2023-09-03 21:49:33 -07:00
Christophe Bornet
803d0d9656
Add the possibility to configure boto3 in the S3 loaders (#9304)
- Description: this PR adds the possibility to configure boto3 in the S3
loaders. Any named argument you add will be used to create the Boto3
session. This is useful when the AWS credentials can't be passed as env
variables or can't be read from the credentials file.
  - Issue: N/A
  - Dependencies: N/A
  - Tag maintainer: ?
  - Twitter handle: cbornet_

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-09-03 21:06:49 -07:00
Leonid Ganeline
03174c91d0
docs: MLflow API and examples (#9547)
Added docs and links to the API and examples provided by MLflow itself
2023-09-03 20:52:20 -07:00
Xiaoyu Xee
9bcfd58580
Add dashvector self query retriever (#9684)
## Description
Add `Dashvector` retriever and self-query retriever

## How to use
```python
from langchain.vectorstores.dashvector import DashVector

vectorstore = DashVector.from_documents(docs, embeddings)
retriever = SelfQueryRetriever.from_llm(
    llm, vectorstore, document_content_description, metadata_field_info, verbose=True
)
```

---------

Co-authored-by: smallrain.xuxy <smallrain.xuxy@alibaba-inc.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-09-03 20:51:04 -07:00