Commit Graph

4043 Commits (9e1dbd4b490d423b8ed4fc699975d91cef2e7cfc)
 

Author SHA1 Message Date
Bagatur 4999e8af7e
pin pydantic api ref build (#9556) 1 year ago
Predrag Gruevski 0565d81dc5
Update `SECURITY.md` email address. (#9558) 1 year ago
Predrag Gruevski 9f08d29bc8
Use PyPI Trusted Publishing to publish langchain packages. (#9467)
Trusted Publishing is the current best practice for publishing Python
packages. Rather than long-lived secret keys, it uses OpenID Connect
(OIDC) to allow our GitHub runner to directly authenticate itself to
PyPI and get a short-lived publishing token. This locks down publishing
quite a bit:
- There's no long-lived publish key to steal anymore.
- Publishing is *only* allowed via the *specifically designated* GitHub
workflow in the designated repo.

It also is operationally easier: no keys means there's nothing that
needs to be periodically rotated, nothing to worry about leaking, and
nobody can accidentally publish a release from their laptop because they
happened to have PyPI keys set up.

After this gets merged, we'll need to configure PyPI to start expecting
trusted publishing. It's only a few clicks and should only take a
minute; instructions are here:
https://docs.pypi.org/trusted-publishers/adding-a-publisher/

More info:
- https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/
- https://github.com/pypa/gh-action-pypi-publish
1 year ago
Predrag Gruevski 249752e8ee
Require manually triggering release workflows. (#9552) 1 year ago
Raynor Chavez 973866c894
fix: Updated marqo integration for marqo version 1.0.0+ (#9521)
- Description: Updated marqo integration to use tensor_fields instead of
non_tensor_fields. Upgraded marqo version to 1.2.4
  - Dependencies: marqo 1.2.4

---------

Co-authored-by: Raynor Kirkson E. Chavez <raynor.chavez@192.168.254.171>
Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Predrag Gruevski b2e6d01e8f
Add `SECURITY.md` file to the repo. (#9551) 1 year ago
Predrag Gruevski 875ea4b4c6
Fix conditional that erroneously always runs. (#9543)
The input it means to test for is `"libs/langchain"` and not
`"langchain"`.
1 year ago
Bagatur c7a5bb6031
bump 270 (#9549) 1 year ago
Nuno Campos 28e1ee4891
Nc/small fixes 21aug (#9542)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. These live is docs/extras
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17, @rlancemartin.
 -->
1 year ago
Predrag Gruevski a7eba8b006
Release on push to `master` instead of on closed PRs targeting it. (#9544)
This is safer than the prior approach, since it's safe by default: the
release workflows never get triggered for non-merged PRs, so there's no
possibility of a buggy conditional accidentally letting a workflow
proceed when it shouldn't have.

The only loss is that publishing no longer requires a `release` label on
the merged PR that bumps the version. We can add a separate CI step that
enforces that part as a condition for merging into `master`, if
desirable.
1 year ago
Bagatur d11841d760
bump 269 (#9487) 1 year ago
axiangcoding 05aa02005b
feat(llms): support ERNIE Embedding-V1 (#9370)
- Description: support [ERNIE
Embedding-V1](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/alj562vvu),
which is part of ERNIE ecology
- Issue: None
- Dependencies: None
- Tag maintainer: @baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
José Ferraz Neto f116e10d53
Add SharePoint Loader (#4284)
- Added a loader (`SharePointLoader`) that can pull documents (`pdf`,
`docx`, `doc`) from the [SharePoint Document
Library](https://support.microsoft.com/en-us/office/what-is-a-document-library-3b5976dd-65cf-4c9e-bf5a-713c10ca2872).
- Added a Base Loader (`O365BaseLoader`) to be used for all Loaders that
use [O365](https://github.com/O365/python-o365) Package
- Code refactoring on `OneDriveLoader` to use the new `O365BaseLoader`.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Utku Ege Tuluk bb4f7936f9
feat(llms): add streaming support to textgen (#9295)
- Description: Added streaming support to the textgen component in the
llms module.
  - Dependencies: websocket-client = "^1.6.1"
1 year ago
Predrag Gruevski a03003f5fd
Upgrade CI poetry version to 1.5.1. (#9479)
Poetry v1.5.1 was released on May 29, almost 3 months ago. Probably a
safe upgrade.
1 year ago
Yuki Miyake 85a1c6d0b7
🐛 fix unexpected run of release workflow (#9494)
I have discovered a bug located within `.github/workflows/_release.yml`
which is the primary cause of continuous integration (CI) errors. The
problem can be solved; therefore, I have constructed a PR to address the
issue.

## The Issue

Access the following link to view the exact errors: [Langhain Release
Workflow](https://github.com/langchain-ai/langchain/actions/workflows/langchain_release.yml)

The instances of these errors take place for **each PR** that updates
`pyproject.toml`, excluding those specifically associated with bumping
PRs.

See below for the specific error message:

```
Error: Error 422: Validation Failed: {"resource":"Release","code":"already_exists","field":"tag_name"}
```

An image of the error can be viewed here:

![Image](https://github.com/langchain-ai/langchain/assets/13769670/13125f73-9b53-49b7-a83e-653bb01a1da1)

The `_release.yml` document contains the following if-condition:

```yaml
    if: |
        ${{ github.event.pull_request.merged == true }}
        && ${{ contains(github.event.pull_request.labels.*.name, 'release') }}
```

## The Root Cause

The above job constantly runs as the `if-condition` is always identified
as `true`.

## The Logic

The `if-condition` can be defined as `if: ${{ b1 }} && ${{ b2 }}`, where
`b1` and `b2` are boolean values. However, in terms of condition
evaluation with GitHub Actions, `${{ false }}` is identified as a string
value, thereby rendering it as truthy as per the [official
documentation](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idif).

I have run some tests regarding this behavior within my forked
repository. You can consult my [debug
PR](https://github.com/zawakin/langchain/pull/1) for reference.

Here is the result of the tests:

|If-Condition|Outcome|
|:--:|:--:|
|`if: true && ${{ false }}`|Execution|
|`if: ${{ false }}` |Skipped|
|`if: true && false` |Skipped|
|`if: false`|Skipped|
|`if: ${{ true && false }}` |Skipped|

In view of the first and second results, we can infer that `${{ false
}}` can only be interpreted as `true` for conditions composed of some
expressions.
It is consistent that the condition of `if: ${{ inputs.working-directory
== 'libs/langchain' }}` works.

It is surprised to be skipped for the second case but it seems the spec
of GitHub Actions 😓

Anyway, the PR would fix these errors, I believe 👍 

Could you review this? @hwchase17 or @shoelsch , who is the author of
[PR](https://github.com/langchain-ai/langchain/pull/360).
1 year ago
Harrison Chase 9930ddc555
beef up retrieval docs (#9518) 1 year ago
Eugene Yurtsev 02c5c13a6e
Fast linters go first (#9501)
Proposal to reverse the order of linters based on the principle of
running the
fast ones first.
1 year ago
Leonid Ganeline fdbeb52756
`Qwen` model example (#9516)
added an example for `Qwen-7B` model on `HugginfFaceHub` 🤗
1 year ago
Martin Schade 0c8a88b3fa
AmazonTextractPDFLoader documentation updates (#9415)
Description: Updating documentation to add AmazonTextractPDFLoader
according to
[comment](https://github.com/langchain-ai/langchain/pull/8661#issuecomment-1666572992)
from [baskaryan](https://github.com/baskaryan)

Adding one notebook and instructions to the
modules/data_connection/document_loaders/pdf.mdx

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Asif Ahmad 08feed3332
Changed the NIBittensorLLM API URL to the correct one (#9419)
Changed https://api.neuralinterent.ai/ to https://api.neuralinternet.ai/
which is the valid URL for the API of NIBittensorLLM.
1 year ago
Ofer Mendelevitch a758496236
Fixed issue with metadata in query (#9500)
- Description: Changed metadata retrieval so that it combines Vectara
doc level and part level metadata
  - Tag maintainer: @rlancemartin
  - Twitter handle: @ofermend
1 year ago
EpixMan 103094286e
Fixing class calling error in the documentation of connecting_to_a_feature_store.ipynb (#9508) 1 year ago
IlyaKIS1 fd8fe209cb
Added In-Depth Langchain Agent Execution Guide (#9507)
Made the notion document of how Langchain executes agents method by
method in the codebase.
Can be helpful for developers that just started working with the
Langchain codebase.
1 year ago
Eugene Yurtsev e51bccdb28
Add strict flag to the JSON parser (#9471)
This updates the default configuration since I think it's almost always
what we want to happen. But we should evaluate whether there are any issues.
1 year ago
Rosário P. Fernandes 09a92bb9bf
chatbots use case - fix broken collab URL (#9491)
The current Collab URL returns a 404, since there is no `chatbots`
directory under `use_cases`.

<!-- Thank you for contributing to LangChain!

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17, @rlancemartin.
 -->
1 year ago
Stan Girard a214fe8a2d
docs(readme): fixed badges with new github url (#9493)
Mainly created for the code space url that was broken but fixed the
others in the same PR.
1 year ago
bsenst a956b69720
fix typo in huggingface_hub.ipynb (#9499) 1 year ago
Bagatur d87cfd33e8
Update pydantic compatibility guide (#9496) 1 year ago
Taqi Jaffri 069c0a041f comment update for poetry install 1 year ago
Taqi Jaffri 5cd244e9b7 CR feedback 1 year ago
Predrag Gruevski be9bc62f8b
Fix bash test regex for Linux under WSL2. (#9475)
It fails with `Permission denied` and not `not found`. Both seem
reasonable.
1 year ago
Ikko Eltociear Ashimine 0808949e54
Fix typo in apis.ipynb (#9490)
funtions -> functions
1 year ago
RajneeshSinghShorthillsAI 129d056085
fixed spelling mistake and added missing bracket in parent_document_r… (#9380)
…etriever.ipynb


Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
1 year ago
Lorenzo 5b3dbf12a5
Uniform valid suffixes and clarify exceptions (#9463)
**Description**:
- Uniformed the current valid suffixes (file formats) for loading agents
from hubs and files (to better handle future additions);
 - Clarified exception messages (also in unit test).
1 year ago
Brendan Collins 9f545825b7
Added Geometry Validation, Geometry Metadata, and WKT instead of Python str() to GeoDataFrame Loader (#9466)
@rlancemartin The current implementation within `Geopandas.GeoDataFrame`
loader uses the python builtin `str()` function on the input geometries.
While this looks very close to WKT (Well known text), Python's str
function doesn't guarantee that.

In the interest of interop., I've changed to the of use `wkt` property
on the Shapely geometries for generating the text representation of the
geometries.

Also, included here:
- validation of the input `page_content_column` as being a GeoSeries.
- geometry `crs` (Coordinate Reference System) / bounds
(xmin/ymin/xmax/ymax) added to Document metadata. Having the CRS is
critical... having the bounds is just helpful!

I think there is a larger question of "Should the geometry live in the
`page_content`, or should the record be better summarized and tuck the
geom into metadata?" ...something for another day and another PR.
1 year ago
Kacper Łukawski 616e728ef9
Enhance qdrant vs using async embed documents (#9462)
This is an extension of #8104. I updated some of the signatures so all
the tests pass.

@danhnn I couldn't commit to your PR, so I created a new one. Thanks for
your contribution!

@baskaryan Could you please merge it?

---------

Co-authored-by: Danh Nguyen <dnncntt@gmail.com>
1 year ago
Matt Robinson 83d2a871eb
fix: apply unstructured preprocess functions (#9473)
### Summary

Fixes a bug from #7850 where post processing functions in Unstructured
loaders were not apply. Adds a assertion to the test to verify the post
processing function was applied and also updates the explanation in the
example notebook.
1 year ago
William FH 292ae8468e
Let you specify run id in trace as chain group (#9484)
I think we'll deprecate this soon anyway but still nice to be able to
fetch the run id
1 year ago
NavanitDubeyShorthillsAI b58d492e05
Update pydantic_compatibility.md (#9382)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
1 year ago
Predrag Gruevski df8e35fd81
Remove incorrect ABC from two Elasticsearch classes. (#9470)
Neither is an ABC because their own example code instantiates them directly.
1 year ago
bsenst 083726ecda
fix small typo (#9464) 1 year ago
Predrag Gruevski 82f28ca9ef
`ChatPromptTemplate` is not an `ABC`, it's instantiated directly. (#9468)
Its own `__add__` method constructs `ChatPromptTemplate` objects
directly, it cannot be abstract.

Found while debugging something else with @nfcampos.
1 year ago
vamseeyarla 82fb56b79c
Issue 9401 - SequentialChain runs the same callbacks over and over in async mode (#9452)
Issue: https://github.com/langchain-ai/langchain/issues/9401

In the Async mode, SequentialChain implementation seems to run the same
callbacks over and over since it is re-using the same callbacks object.

Langchain version: 0.0.264, master

The implementation of this aysnc route differs from the sync route and
sync approach follows the right pattern of generating a new callbacks
object instead of re-using the old one and thus avoiding the cascading
run of callbacks at each step.

Async mode:
```
        _run_manager = run_manager or AsyncCallbackManagerForChainRun.get_noop_manager()
        callbacks = _run_manager.get_child()
        ...
        for i, chain in enumerate(self.chains):
            _input = await chain.arun(_input, callbacks=callbacks)
            ...
```

Regular mode:
```
        _run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()
        for i, chain in enumerate(self.chains):
            _input = chain.run(_input, callbacks=_run_manager.get_child(f"step_{i+1}"))
            ...
```

Notice how we are reusing the callbacks object in the Async code which
will have a cascading effect as we run through the chain. It runs the
same callbacks over and over resulting in issues.

Solution:
Define the async function in the same pattern as the regular one and
added tests.
---------

Co-authored-by: vamsee_yarlagadda <vamsee.y@airbnb.com>
1 year ago
Leonid Ganeline 99e5eaa9b1
`InternLM` example (#9465)
Added `InternML` model example to the HubbingFace Hub notebook
1 year ago
William FH d4f790fd40
Fix imports in notebook (#9458) 1 year ago
William FH c29fbede59
Wfh/rm num repetitions (#9425)
Makes it hard to do test run comparison views and we'd probably want to
just run multiple runs right now
1 year ago
Predrag Gruevski eee0d1d0dd
Update repository links in the package metadata. (#9454) 1 year ago
Predrag Gruevski ade683c589
Rely on `WORKDIR` env var to avoid ugly ternary operators in workflows. (#9456)
Ternary operators in GitHub Actions syntax are pretty ugly and hard to
read: `inputs.working-directory == '' && '.' ||
inputs.working-directory` means "if the condition is true, use `'.'` and
otherwise use the expression after the `||`".

This PR performs the ternary as few times as possible, assigning its
outcome to an env var we can then reuse as needed.
1 year ago
Bagatur 50b8f4dcc7
bump 268 (#9455) 1 year ago