langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-18 09:25:54 +00:00

Author	SHA1	Message	Date
Predrag Gruevski	9f08d29bc8	Use PyPI Trusted Publishing to publish langchain packages. (#9467 ) Trusted Publishing is the current best practice for publishing Python packages. Rather than long-lived secret keys, it uses OpenID Connect (OIDC) to allow our GitHub runner to directly authenticate itself to PyPI and get a short-lived publishing token. This locks down publishing quite a bit: - There's no long-lived publish key to steal anymore. - Publishing is only allowed via the specifically designated GitHub workflow in the designated repo. It also is operationally easier: no keys means there's nothing that needs to be periodically rotated, nothing to worry about leaking, and nobody can accidentally publish a release from their laptop because they happened to have PyPI keys set up. After this gets merged, we'll need to configure PyPI to start expecting trusted publishing. It's only a few clicks and should only take a minute; instructions are here: https://docs.pypi.org/trusted-publishers/adding-a-publisher/ More info: - https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/ - https://github.com/pypa/gh-action-pypi-publish	2023-08-21 14:44:29 -04:00
Predrag Gruevski	249752e8ee	Require manually triggering release workflows. (#9552 )	2023-08-21 13:54:44 -04:00
Raynor Chavez	973866c894	fix: Updated marqo integration for marqo version 1.0.0+ (#9521 ) - Description: Updated marqo integration to use tensor_fields instead of non_tensor_fields. Upgraded marqo version to 1.2.4 - Dependencies: marqo 1.2.4 --------- Co-authored-by: Raynor Kirkson E. Chavez <raynor.chavez@192.168.254.171> Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-21 10:43:15 -07:00
Predrag Gruevski	b2e6d01e8f	Add `SECURITY.md` file to the repo. (#9551 )	2023-08-21 13:39:59 -04:00
Predrag Gruevski	875ea4b4c6	Fix conditional that erroneously always runs. (#9543 ) The input it means to test for is `"libs/langchain"` and not `"langchain"`.	2023-08-21 13:24:33 -04:00
Bagatur	c7a5bb6031	bump 270 (#9549 )	2023-08-21 10:18:46 -07:00
Nuno Campos	28e1ee4891	Nc/small fixes 21aug (#9542 ) <!-- Thank you for contributing to LangChain! Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. These live is docs/extras directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17, @rlancemartin. -->	2023-08-21 18:01:20 +01:00
Predrag Gruevski	a7eba8b006	Release on push to `master` instead of on closed PRs targeting it. (#9544 ) This is safer than the prior approach, since it's safe by default: the release workflows never get triggered for non-merged PRs, so there's no possibility of a buggy conditional accidentally letting a workflow proceed when it shouldn't have. The only loss is that publishing no longer requires a `release` label on the merged PR that bumps the version. We can add a separate CI step that enforces that part as a condition for merging into `master`, if desirable.	2023-08-21 12:57:40 -04:00
Bagatur	d11841d760	bump 269 (#9487 )	2023-08-21 08:34:16 -07:00
axiangcoding	05aa02005b	feat(llms): support ERNIE Embedding-V1 (#9370 ) - Description: support [ERNIE Embedding-V1](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/alj562vvu), which is part of ERNIE ecology - Issue: None - Dependencies: None - Tag maintainer: @baskaryan --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-21 07:52:25 -07:00
José Ferraz Neto	f116e10d53	Add SharePoint Loader (#4284 ) - Added a loader (`SharePointLoader`) that can pull documents (`pdf`, `docx`, `doc`) from the [SharePoint Document Library](https://support.microsoft.com/en-us/office/what-is-a-document-library-3b5976dd-65cf-4c9e-bf5a-713c10ca2872). - Added a Base Loader (`O365BaseLoader`) to be used for all Loaders that use [O365](https://github.com/O365/python-o365) Package - Code refactoring on `OneDriveLoader` to use the new `O365BaseLoader`. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2023-08-21 07:49:07 -07:00
Utku Ege Tuluk	bb4f7936f9	feat(llms): add streaming support to textgen (#9295 ) - Description: Added streaming support to the textgen component in the llms module. - Dependencies: websocket-client = "^1.6.1"	2023-08-21 07:39:14 -07:00
Predrag Gruevski	a03003f5fd	Upgrade CI poetry version to 1.5.1. (#9479 ) Poetry v1.5.1 was released on May 29, almost 3 months ago. Probably a safe upgrade.	2023-08-21 10:35:56 -04:00
Yuki Miyake	85a1c6d0b7	🐛 fix unexpected run of release workflow (#9494 ) I have discovered a bug located within `.github/workflows/_release.yml` which is the primary cause of continuous integration (CI) errors. The problem can be solved; therefore, I have constructed a PR to address the issue. ## The Issue Access the following link to view the exact errors: [Langhain Release Workflow](https://github.com/langchain-ai/langchain/actions/workflows/langchain_release.yml) The instances of these errors take place for each PR that updates `pyproject.toml`, excluding those specifically associated with bumping PRs. See below for the specific error message: ``` Error: Error 422: Validation Failed: {"resource":"Release","code":"already_exists","field":"tag_name"} ``` An image of the error can be viewed here: ![Image](https://github.com/langchain-ai/langchain/assets/13769670/13125f73-9b53-49b7-a83e-653bb01a1da1) The `_release.yml` document contains the following if-condition: ```yaml if: \| ${{ github.event.pull_request.merged == true }} && ${{ contains(github.event.pull_request.labels.*.name, 'release') }} ``` ## The Root Cause The above job constantly runs as the `if-condition` is always identified as `true`. ## The Logic The `if-condition` can be defined as `if: ${{ b1 }} && ${{ b2 }}`, where `b1` and `b2` are boolean values. However, in terms of condition evaluation with GitHub Actions, `${{ false }}` is identified as a string value, thereby rendering it as truthy as per the [official documentation](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idif). I have run some tests regarding this behavior within my forked repository. You can consult my [debug PR](https://github.com/zawakin/langchain/pull/1) for reference. Here is the result of the tests: \|If-Condition\|Outcome\| \|:--:\|:--:\| \|`if: true && ${{ false }}`\|Execution\| \|`if: ${{ false }}` \|Skipped\| \|`if: true && false` \|Skipped\| \|`if: false`\|Skipped\| \|`if: ${{ true && false }}` \|Skipped\| In view of the first and second results, we can infer that `${{ false }}` can only be interpreted as `true` for conditions composed of some expressions. It is consistent that the condition of `if: ${{ inputs.working-directory == 'libs/langchain' }}` works. It is surprised to be skipped for the second case but it seems the spec of GitHub Actions 😓 Anyway, the PR would fix these errors, I believe 👍 Could you review this? @hwchase17 or @shoelsch , who is the author of [PR](https://github.com/langchain-ai/langchain/pull/360).	2023-08-21 10:34:03 -04:00
Harrison Chase	9930ddc555	beef up retrieval docs (#9518 )	2023-08-21 07:22:22 -07:00
Eugene Yurtsev	02c5c13a6e	Fast linters go first (#9501 ) Proposal to reverse the order of linters based on the principle of running the fast ones first.	2023-08-21 00:20:54 -07:00
Leonid Ganeline	fdbeb52756	`Qwen` model example (#9516 ) added an example for `Qwen-7B` model on `HugginfFaceHub` 🤗	2023-08-20 17:21:45 -07:00
Martin Schade	0c8a88b3fa	AmazonTextractPDFLoader documentation updates (#9415 ) Description: Updating documentation to add AmazonTextractPDFLoader according to [comment](https://github.com/langchain-ai/langchain/pull/8661#issuecomment-1666572992) from [baskaryan](https://github.com/baskaryan) Adding one notebook and instructions to the modules/data_connection/document_loaders/pdf.mdx --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-08-20 16:40:15 -07:00
Asif Ahmad	08feed3332	Changed the NIBittensorLLM API URL to the correct one (#9419 ) Changed https://api.neuralinterent.ai/ to https://api.neuralinternet.ai/ which is the valid URL for the API of NIBittensorLLM.	2023-08-20 16:25:19 -07:00
Ofer Mendelevitch	a758496236	Fixed issue with metadata in query (#9500 ) - Description: Changed metadata retrieval so that it combines Vectara doc level and part level metadata - Tag maintainer: @rlancemartin - Twitter handle: @ofermend	2023-08-20 16:00:14 -07:00
EpixMan	103094286e	Fixing class calling error in the documentation of connecting_to_a_feature_store.ipynb (#9508 )	2023-08-20 15:59:40 -07:00
IlyaKIS1	fd8fe209cb	Added In-Depth Langchain Agent Execution Guide (#9507 ) Made the notion document of how Langchain executes agents method by method in the codebase. Can be helpful for developers that just started working with the Langchain codebase.	2023-08-20 15:59:01 -07:00
Eugene Yurtsev	e51bccdb28	Add strict flag to the JSON parser (#9471 ) This updates the default configuration since I think it's almost always what we want to happen. But we should evaluate whether there are any issues.	2023-08-19 22:02:12 -04:00
Rosário P. Fernandes	09a92bb9bf	chatbots use case - fix broken collab URL (#9491 ) The current Collab URL returns a 404, since there is no `chatbots` directory under `use_cases`. <!-- Thank you for contributing to LangChain! If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17, @rlancemartin. -->	2023-08-19 14:53:54 -07:00
Stan Girard	a214fe8a2d	docs(readme): fixed badges with new github url (#9493 ) Mainly created for the code space url that was broken but fixed the others in the same PR.	2023-08-19 14:51:38 -07:00
bsenst	a956b69720	fix typo in huggingface_hub.ipynb (#9499 )	2023-08-19 14:50:05 -07:00
Bagatur	d87cfd33e8	Update pydantic compatibility guide (#9496 )	2023-08-19 14:44:19 -07:00
Predrag Gruevski	be9bc62f8b	Fix bash test regex for Linux under WSL2. (#9475 ) It fails with `Permission denied` and not `not found`. Both seem reasonable.	2023-08-19 09:27:14 -04:00
Ikko Eltociear Ashimine	0808949e54	Fix typo in apis.ipynb (#9490 ) funtions -> functions	2023-08-19 09:26:08 -04:00
RajneeshSinghShorthillsAI	129d056085	fixed spelling mistake and added missing bracket in parent_document_r… (#9380 ) …etriever.ipynb Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-08-18 21:36:56 -07:00
Lorenzo	5b3dbf12a5	Uniform valid suffixes and clarify exceptions (#9463 ) Description: - Uniformed the current valid suffixes (file formats) for loading agents from hubs and files (to better handle future additions); - Clarified exception messages (also in unit test).	2023-08-18 21:35:53 -07:00
Brendan Collins	9f545825b7	Added Geometry Validation, Geometry Metadata, and WKT instead of Python str() to GeoDataFrame Loader (#9466 ) @rlancemartin The current implementation within `Geopandas.GeoDataFrame` loader uses the python builtin `str()` function on the input geometries. While this looks very close to WKT (Well known text), Python's str function doesn't guarantee that. In the interest of interop., I've changed to the of use `wkt` property on the Shapely geometries for generating the text representation of the geometries. Also, included here: - validation of the input `page_content_column` as being a GeoSeries. - geometry `crs` (Coordinate Reference System) / bounds (xmin/ymin/xmax/ymax) added to Document metadata. Having the CRS is critical... having the bounds is just helpful! I think there is a larger question of "Should the geometry live in the `page_content`, or should the record be better summarized and tuck the geom into metadata?" ...something for another day and another PR.	2023-08-18 21:35:39 -07:00
Kacper Łukawski	616e728ef9	Enhance qdrant vs using async embed documents (#9462 ) This is an extension of #8104. I updated some of the signatures so all the tests pass. @danhnn I couldn't commit to your PR, so I created a new one. Thanks for your contribution! @baskaryan Could you please merge it? --------- Co-authored-by: Danh Nguyen <dnncntt@gmail.com>	2023-08-18 18:59:48 -07:00
Matt Robinson	83d2a871eb	fix: apply unstructured preprocess functions (#9473 ) ### Summary Fixes a bug from #7850 where post processing functions in Unstructured loaders were not apply. Adds a assertion to the test to verify the post processing function was applied and also updates the explanation in the example notebook.	2023-08-18 18:54:28 -07:00
William FH	292ae8468e	Let you specify run id in trace as chain group (#9484 ) I think we'll deprecate this soon anyway but still nice to be able to fetch the run id	2023-08-18 17:21:53 -07:00
NavanitDubeyShorthillsAI	b58d492e05	Update pydantic_compatibility.md (#9382 ) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2023-08-18 13:03:15 -07:00
Predrag Gruevski	df8e35fd81	Remove incorrect ABC from two Elasticsearch classes. (#9470 ) Neither is an ABC because their own example code instantiates them directly.	2023-08-18 15:01:02 -04:00
bsenst	083726ecda	fix small typo (#9464 )	2023-08-18 11:55:46 -07:00
Predrag Gruevski	82f28ca9ef	`ChatPromptTemplate` is not an `ABC`, it's instantiated directly. (#9468 ) Its own `__add__` method constructs `ChatPromptTemplate` objects directly, it cannot be abstract. Found while debugging something else with @nfcampos.	2023-08-18 14:37:10 -04:00
vamseeyarla	82fb56b79c	Issue 9401 - SequentialChain runs the same callbacks over and over in async mode (#9452 ) Issue: https://github.com/langchain-ai/langchain/issues/9401 In the Async mode, SequentialChain implementation seems to run the same callbacks over and over since it is re-using the same callbacks object. Langchain version: 0.0.264, master The implementation of this aysnc route differs from the sync route and sync approach follows the right pattern of generating a new callbacks object instead of re-using the old one and thus avoiding the cascading run of callbacks at each step. Async mode: ``` _run_manager = run_manager or AsyncCallbackManagerForChainRun.get_noop_manager() callbacks = _run_manager.get_child() ... for i, chain in enumerate(self.chains): _input = await chain.arun(_input, callbacks=callbacks) ... ``` Regular mode: ``` _run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager() for i, chain in enumerate(self.chains): _input = chain.run(_input, callbacks=_run_manager.get_child(f"step_{i+1}")) ... ``` Notice how we are reusing the callbacks object in the Async code which will have a cascading effect as we run through the chain. It runs the same callbacks over and over resulting in issues. Solution: Define the async function in the same pattern as the regular one and added tests. --------- Co-authored-by: vamsee_yarlagadda <vamsee.y@airbnb.com>	2023-08-18 11:26:12 -07:00
Leonid Ganeline	99e5eaa9b1	`InternLM` example (#9465 ) Added `InternML` model example to the HubbingFace Hub notebook	2023-08-18 11:17:17 -07:00
William FH	d4f790fd40	Fix imports in notebook (#9458 )	2023-08-18 10:08:47 -07:00
William FH	c29fbede59	Wfh/rm num repetitions (#9425 ) Makes it hard to do test run comparison views and we'd probably want to just run multiple runs right now	2023-08-18 10:08:39 -07:00
Predrag Gruevski	eee0d1d0dd	Update repository links in the package metadata. (#9454 )	2023-08-18 12:55:43 -04:00
Predrag Gruevski	ade683c589	Rely on `WORKDIR` env var to avoid ugly ternary operators in workflows. (#9456 ) Ternary operators in GitHub Actions syntax are pretty ugly and hard to read: `inputs.working-directory == '' && '.' \|\| inputs.working-directory` means "if the condition is true, use `'.'` and otherwise use the expression after the `\|\|`". This PR performs the ternary as few times as possible, assigning its outcome to an env var we can then reuse as needed.	2023-08-18 12:55:33 -04:00
Bagatur	50b8f4dcc7	bump 268 (#9455 )	2023-08-18 08:46:39 -07:00
AmitSinghShorthillsAI	2b06792c81	Fixing spelling mistakes in fallbacks.ipynb (#9376 ) Fix spelling errors in the text: 'Therefore' and 'Retrying I want to stress that your feedback is invaluable to us and is genuinely cherished. With gratitude, @baskaryan @hwchase17	2023-08-18 10:33:47 -04:00
PuneetDhimanShorthillsAI	61e4a06447	Corrected Sentence in router.ipynb (#9377 ) Added missing question marks in the lines in the router.ipynb @baskaryan @hwchase17	2023-08-18 10:32:17 -04:00
呂安	ead04487fd	doc: make install from source more clearer (#9433 ) Description: if just `pip install -e .` it will not install anything, we have to find the right directory to do `pip install -e .`	2023-08-18 10:30:55 -04:00
Predrag Gruevski	8976483f3a	Lint only on the min and max supported Python versions. (#9450 ) Only lint on the min and max supported Python versions. It's extremely unlikely that there's a lint issue on any version in between that doesn't show up on the min or max versions. GitHub rate-limits how many jobs can be running at any one time. Starting new jobs is also relatively slow, so linting on fewer versions makes CI faster.	2023-08-18 10:26:38 -04:00

1 2 3 4 5 ...

3909 Commits