Commit Graph

2990 Commits

Author SHA1 Message Date
Dídac Sabatés
e0cb3ea90c
Fix sql_database.ipynb link (#6525)
Looks like the
[SQLDatabaseChain](https://langchain.readthedocs.io/en/latest/modules/chains/examples/sqlite.html)
in the SQL Database Agent page was broken I've change it to the SQL
Chain page
2023-07-06 13:07:37 -04:00
Leonid Ganeline
4450791edd
docs: tutorials update (#7230)
updated `tutorials.mdx`:
- added a link to new `Deeplearning AI` course on LangChain
- added links to other tutorial videos
- fixed format

@baskaryan, @hwchase17
2023-07-06 12:44:23 -04:00
Diego Machado
a7ae35fe4e
Fix duplicated sentence in documentation's introduction (#6351)
Fix duplicated sentence in documentation's introduction
2023-07-06 12:12:18 -04:00
Bagatur
681f2678a3
add elasticknn to init (#7284) 2023-07-06 11:58:24 -04:00
hayao-k
c23e16c459
docs: Fixed typos in Amazon Kendra Retriever documentation (#7261)
## Description
Fixed to the official service name Amazon Kendra.

## Tag maintainer
@baskaryan
2023-07-06 11:56:52 -04:00
zhujiangwei
8c371e12eb
refactor BedrockEmbeddings class (#7266)
#### Description
refactor BedrockEmbeddings class to clean code as below:

1. inline content type and accept
2. rewrite input_body as a dictionary literal
3. no need to declare embeddings variable, so remove it
2023-07-06 11:56:30 -04:00
Chui
c7cf11b8ab
Remove whitespace in filename (#7264) 2023-07-06 11:55:42 -04:00
Jan Kubica
fed64ae060
Chroma: add vector search with scores (#6864)
- Description: Adding to Chroma integration the option to run a
similarity search by a vector with relevance scores. Fixing two minor
typos.
  
  - Issue: The "lambda_mult" typo is related to #4861 
  
  - Maintainer: @rlancemartin, @eyurtsev
2023-07-06 10:01:55 -04:00
William FH
576880abc5
Re-use Trajectory Evaluator (#7248)
Use the trajectory eval chain in the run evaluation implementation and
update the prepare inputs method to apply to both asynca nd sync
2023-07-06 07:00:24 -07:00
zhaoshengbo
e8f24164f0
Improve the alibaba cloud opensearch vector store documentation (#6964)
Based on user feedback, we have improved the Alibaba Cloud OpenSearch
vector store documentation.

Co-authored-by: zhaoshengbo <shengbo.zsb@alibaba-inc.com>
2023-07-06 09:47:49 -04:00
Eduard van Valkenburg
ae5aa496ee
PowerBI updates (#7143)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

Several updates for the PowerBI tools:

- Handle 0 records returned by requesting redo with different filtering
- Handle too large results by optionally tokenizing the result and
comparing against a max (change in signature, non-breaking)
- Implemented LLMChain with Chat for chat models for the tools. 
- Updates to the main prompt including tables
- Update to Tool prompt with TOPN function
- Split the tool prompt to allow the LLMChain with ChatPromptTemplate

Smaller fixes for stability.

For visibility: @hinthornw
2023-07-06 09:39:23 -04:00
emarco177
b9d6d4cd4c
added template repo for CI/CD deployment on Google Cloud Run (#7218)
Replace this comment with:
- Description: added documentation for a template repo that helps
dockerizing and deploying a LangChain using a Cloud Build CI/CD pipeline
to Google Cloud build serverless
  - Issue: None,
  - Dependencies: None,
  - Tag maintainer: @baskaryan,
  - Twitter handle: EdenEmarco177

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.
2023-07-06 09:38:38 -04:00
Leonid Kuligin
8b19f6a0da
Added retries for Vertex LLM (#7219)
#7217

---------

Co-authored-by: Leonid Kuligin <kuligin@google.com>
2023-07-06 09:38:01 -04:00
William FH
ec66d5188c
Add Better Errors for Comparison Chain (#7033)
+ change to ABC - this lets us add things like the evaluation name for
loading
2023-07-06 06:37:04 -07:00
Stefano Lottini
e61cfb6e99
FLARE Example notebook: switch to named arg to pass pydantic validation (#7267)
Adding the name of the parameter to comply with latest requirements by
Pydantic usage for BaseModels.
2023-07-06 09:32:00 -04:00
Sasmitha Manathunga
0c7a5cb206
Fix inconsistent behavior of CharacterTextSplitter when changing keep_separator (#7263)
- Description:
- When `keep_separator` is `True` the `_split_text_with_regex()` method
in `text_splitter` uses regex to split, but when `keep_separator` is
`False` it uses `str.split()`. This causes problems when the separator
is a special regex character like `.` or `*`. This PR fixes that by
using `re.split()` in both cases.
- Issue: #7262 
- Tag maintainer: @baskaryan
2023-07-06 09:30:03 -04:00
os1ma
b151d4257a
docs: Update documentation for Wikipedia tool to use WikipediaQueryRun (#7258)
**Description**
In the following page, "Wikipedia" tool is explained.

https://python.langchain.com/docs/modules/agents/tools/integrations/wikipedia

However, the WikipediaAPIWrapper being used is not a tool. This PR
updated the documentation to use a tool WikipediaQueryRun.

**Issue**
None

**Tag maintainer**
Agents / Tools / Toolkits: @hinthornw
2023-07-06 09:29:38 -04:00
Jeroen Van Goey
887bb12287
Use correct Language for html_splitter (#7274)
`html_splitter` was using `Language.MARKDOWN`.
2023-07-06 09:24:25 -04:00
Shantanu Nair
f773c21723
Update supabase match_docs ddl and notebook to use expected id type (#7257)
- Description: Switch supabase match function DDL to use expected uuid
type instead of bigint
- Issue: https://github.com/hwchase17/langchain/issues/6743,
https://github.com/hwchase17/langchain/issues/7179
  - Tag maintainer:  @rlancemartin, @eyurtsev
  - Twitter handle: https://twitter.com/ShantanuNair
2023-07-06 09:22:41 -04:00
Myeongseop Kim
0e878ccc2d
Add HumanInputChatModel (#7256)
- Description: This is a chat model equivalent of HumanInputLLM. An
example notebook is also added.
  - Tag maintainer: @hwchase17, @baskaryan
  - Twitter handle: N/A
2023-07-06 09:21:03 -04:00
Myeongseop Kim
57d8a3d1e8
Make tqdm for OpenAIEmbeddings optional (#7247)
- Description: I have added a `show_progress_bar` parameter (defaults.to
`False`) to the `OpenAIEmbeddings`. If the user sets `show_progress_bar`
to `True`, a progress bar will be displayed.
  - Issue: #7246
  - Dependencies: N/A
  - Tag maintainer: @hwchase17, @baskaryan
  - Twitter handle: N/A
2023-07-05 23:36:01 -04:00
Harrison Chase
c36f852846
fix conversational retrieval docs (#7245) 2023-07-05 21:51:33 -04:00
Harrison Chase
035ad33a5b
bump ver to 225 (#7244) 2023-07-05 21:22:18 -04:00
Shantanu Nair
cabd358c3a
Add missing token_max in reduce.py acombine_docs (#7241)
Replace this comment with:
- Description: reduce.py reduce chain implementation's acombine_docs
call does not propagate token_max. Without this, the async call will end
up using 3000 tokens, the default, for the collapse chain.
  - Tag maintainer: @hwchase17 @agola11 @baskaryan 
  - Twitter handle: https://twitter.com/ShantanuNair

Related PR: https://github.com/hwchase17/langchain/pull/7201 and
https://github.com/hwchase17/langchain/pull/7204

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-07-05 21:02:45 -04:00
Harrison Chase
52b016920c
Harrison/update anthropic (#7237)
Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>
2023-07-05 21:02:35 -04:00
Harrison Chase
695e7027e6
Harrison/parameter (#7081)
add parameter to use original question or not

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-05 20:51:25 -04:00
Yevgnen
930e319ca7
Add concurrency to GitbookLoader (#7069)
- Description: Fetch all pages concurrently.
- Dependencies: `scrape_all` -> `fetch_all` -> `_fetch_with_rate_limit`
-> `_fetch` (might be broken currently:
https://github.com/hwchase17/langchain/pull/6519)
  - Tag maintainer: @rlancemartin, @eyurtsev

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-07-05 20:51:10 -04:00
Hashem Alsaket
6aa66fd2b0
Update Hugging Face Hub notebook (#7236)
Description: `flan-t5-xl` hangs, updated to `flan-t5-xxl`. Tested all
stabilityai LLMs- all hang so removed from tutorial. Temperature > 0 to
prevent unintended determinism.
Issue: #3275 
Tag maintainer: @baskaryan
2023-07-05 20:45:02 -04:00
Mykola Zomchak
8afc8e6f5d
Fix web_base.py (#6519)
Fix for bug in SitemapLoader

`aiohttp` `get` does not accept `verify` argument, and currently throws
error, so SitemapLoader is not working

This PR fixes it by removing `verify` param for `get` function call

Fixes #6107

#### Who can review?

Tag maintainers/contributors who might be interested:

@eyurtsev

---------

Co-authored-by: techcenary <127699216+techcenary@users.noreply.github.com>
2023-07-05 16:53:57 -07:00
William FH
f891f7d69f
Skip evaluation of unfinished runs (#7235)
Cut down on errors logged

Co-authored-by: Ankush Gola <9536492+agola11@users.noreply.github.com>
2023-07-05 16:35:20 -07:00
William FH
83cf01683e
Add 'eval' tag (#7209)
Add an "eval" tag to traced evaluation runs

Most of this PR is actually
https://github.com/hwchase17/langchain/pull/7207 but I can't diff off
two separate PRs

---------

Co-authored-by: Ankush Gola <9536492+agola11@users.noreply.github.com>
2023-07-05 16:28:34 -07:00
William FH
607708a411
Add tags support for langchaintracer (#7207) 2023-07-05 16:19:04 -07:00
William FH
75aa408f10
Send evaluator logs to new session (#7206)
Also stop specifying "eval" mode since explicit project modes are
deprecated
2023-07-05 16:15:29 -07:00
Harrison Chase
0dc700eebf
Harrison/scene xplain (#7228)
Co-authored-by: Kevin Pham <37129444+deoxykev@users.noreply.github.com>
2023-07-05 18:34:50 -04:00
Harrison Chase
d6541da161
remove arize nb (#7238)
was causing some issues with docs build
2023-07-05 18:34:20 -04:00
Mike Nitsenko
d669b9ece9
Document loader for Cube Semantic Layer (#6882)
### Description

This pull request introduces the "Cube Semantic Layer" document loader,
which demonstrates the retrieval of Cube's data model metadata in a
format suitable for passing to LLMs as embeddings. This enhancement aims
to provide contextual information and improve the understanding of data.

Twitter handle:
@the_cube_dev

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-07-05 15:18:12 -07:00
Tom
e533da8bf2
Adding Marqo to vectorstore ecosystem (#7068)
This PR brings in a vectorstore interface for
[Marqo](https://www.marqo.ai/).

The Marqo vectorstore exposes some of Marqo's functionality in addition
the the VectorStore base class. The Marqo vectorstore also makes the
embedding parameter optional because inference for embeddings is an
inherent part of Marqo.

Docs, notebook examples and integration tests included.

Related PR:
https://github.com/hwchase17/langchain/pull/2807

---------

Co-authored-by: Tom Hamer <tom@marqo.ai>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-07-05 14:44:12 -07:00
Filip Haltmayer
836d2009cb
Update milvus and zilliz docstring (#7216)
Description:

Updating the docstrings for Milvus and Zilliz so that they appear
correctly on https://integrations.langchain.com/vectorstores. No changes
done to code.

Maintainer: 

@baskaryan

Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
2023-07-05 17:03:51 -04:00
Matt Robinson
d65b1951bd
docs: update docs strings for base unstructured loaders (#7222)
### Summary

Updates the docstrings for the unstructured base loaders so more useful
information appears on the integrations page. If these look good, will
add similar docstrings to the other loaders.

### Reviewers
  - @rlancemartin
  - @eyurtsev
  - @hwchase17
2023-07-05 17:02:26 -04:00
Mike Salvatore
265f05b10e
Enable InMemoryDocstore to be constructed without providing a dict (#6976)
- Description: Allow `InMemoryDocstore` to be created without passing a
dict to the constructor; the constructor can create a dict at runtime if
one isn't provided.
- Tag maintainer: @dev2049
2023-07-05 16:56:31 -04:00
Harrison Chase
47e7d09dff
fix arize nb (#7227) 2023-07-05 16:55:48 -04:00
Feras Almannaa
79b59a8e06
optimize pgvector add_texts (#7185)
- Description: At the moment, inserting new embeddings to pgvector is
querying all embeddings every time as the defined `embeddings`
relationship is using the default params, which sets `lazy="select"`.
This change drastically improves the performance and adds a few
additional cleanups:
* remove `collection.embeddings.append` as it was querying all
embeddings on insert, replace with `collection_id` param
* centralize storing logic in add_embeddings function to reduce
duplication
  * remove boilerplate

- Issue: No issue was opened.
- Dependencies: None.
- Tag maintainer: this is a vectorstore update, so I think
@rlancemartin, @eyurtsev
- Twitter handle: @falmannaa
2023-07-05 13:19:42 -07:00
Harrison Chase
6711854e30
Harrison/dataforseo (#7214)
Co-authored-by: Alexander <sune357@gmail.com>
2023-07-05 16:02:02 -04:00
Richy Wang
cab7d86f23
Implement delete interface of vector store on AnalyticDB (#7170)
Hi, there
  This pull request contains two commit:
**1. Implement delete interface with optional ids parameter on
AnalyticDB.**
**2. Allow customization of database connection behavior by exposing
engine_args parameter in interfaces.**
- This commit adds the `engine_args` parameter to the interfaces,
allowing users to customize the behavior of the database connection. The
`engine_args` parameter accepts a dictionary of additional arguments
that will be passed to the create_engine function. Users can now modify
various aspects of the database connection, such as connection pool size
and recycle time. This enhancement provides more flexibility and control
to users when interacting with the database through the exposed
interfaces.

This commit is related to VectorStores @rlancemartin @eyurtsev 

Thank you for your attention and consideration.
2023-07-05 13:01:00 -07:00
Mike Salvatore
3ae11b7582
Handle kwargs in FAISS.load_local() (#6987)
- Description: This allows parameters such as `relevance_score_fn` to be
passed to the `FAISS` constructor via the `load_local()` class method.
-  Tag maintainer: @rlancemartin @eyurtsev
2023-07-05 15:56:40 -04:00
Jamal
a2f191a322
Replace JIRA Arbitrary Code Execution vulnerability with finer grain API wrapper (#6992)
This fixes #4833 and the critical vulnerability
https://nvd.nist.gov/vuln/detail/CVE-2023-34540

Previously, the JIRA API Wrapper had a mode that simply pipelined user
input into an `exec()` function.
[The intended use of the 'other' mode is to cover any of Atlassian's API
that don't have an existing
interface](cc33bde74f/langchain/tools/jira/prompt.py (L24))

Fortunately all of the [Atlassian JIRA API methods are subfunctions of
their `Jira`
class](https://atlassian-python-api.readthedocs.io/jira.html), so this
implementation calls these subfunctions directly.

As well as passing a string representation of the function to call, the
implementation flexibly allows for optionally passing args and/or
keyword-args. These are given as part of the dictionary input. Example:
```
    {
        "function": "update_issue_field",   #function to execute
        "args": [                           #list of ordered args similar to other examples in this JiraAPIWrapper
            "key",
            {"summary": "New summary"}
        ],
        "kwargs": {}                        #dict of key value keyword-args pairs
    }
```

the above is equivalent to `self.jira.update_issue_field("key",
{"summary": "New summary"})`

Alternate query schema designs are welcome to make querying easier
without passing and evaluating arbitrary python code. I considered
parsing (without evaluating) input python code and extracting the
function, args, and kwargs from there and then pipelining them into the
callable function via `*f(args, **kwargs)` - but this seemed more
direct.

@vowelparrot @dev2049

---------

Co-authored-by: Jamal Rahman <jamal.rahman@builder.ai>
2023-07-05 15:56:01 -04:00
Hakan Tekgul
61938a02a1
Create arize_llm_observability.ipynb (#7000)
Adding documentation and notebook for Arize callback handler. 

  - @dev2049
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
2023-07-05 15:55:47 -04:00
Leonid Ganeline
ecee4d6e92
docs: update youtube videos and tutorials (#6515)
added tutorials.mdx; updated youtube.mdx

Rationale: the Tutorials section in the documentation is top-priority.
(for example, https://pytorch.org/docs/stable/index.html) Not every
project has resources to make tutorials. We have such a privilege.
Community experts created several tutorials on YouTube. But the tutorial
links are now hidden on the YouTube page and not easily discovered by
first-time visitors.

- Added new videos and tutorials that were created since the last
update.
- Made some reprioritization between videos on the base of the view
numbers.

#### Who can review?

  - @hwchase17
    - @dev2049
2023-07-05 12:50:31 -07:00
Santiago Delgado
fa55c5a16b
Fixed Office365 tool __init__.py files, tests, and get_tools() function (#7046)
## Description
Added Office365 tool modules to `__init__.py` files
## Issue
As described in Issue
https://github.com/hwchase17/langchain/issues/6936, the Office365
toolkit can't be loaded easily because it is not included in the
`__init__.py` files.
## Reviewer
@dev2049
2023-07-05 15:46:21 -04:00
wewebber-merlin
8a7c95e555
Retryable exception for empty OpenAI embedding. (#7070)
Description:

The OpenAI "embeddings" API intermittently falls into a failure state
where an embedding is returned as [ Nan ], rather than the expected 1536
floats. This patch checks for that state (specifically, for an embedding
of length 1) and if it occurs, throws an ApiError, which will cause the
chunk to be retried.

Issue:

I have been unable to find an official langchain issue for this problem,
but it is discussed (by another user) at
https://stackoverflow.com/questions/76469415/getting-embeddings-of-length-1-from-langchain-openaiembeddings

Maintainer: @dev2049

Testing: 

Since this is an intermittent OpenAI issue, I have not provided a unit
or integration test. The provided code has, though, been run
successfully over several million tokens.

---------

Co-authored-by: William Webber <william@williamwebber.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-07-05 15:23:45 -04:00