Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- **Description:**
Support ChatMlflow.bind_tools method
Tested in Databricks:
<img width="836" alt="image"
src="https://github.com/user-attachments/assets/fa28ef50-0110-4698-8eda-4faf6f0b9ef8">
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Signed-off-by: Serena Ruan <serena.rxy@gmail.com>
**Description:**
- This PR exposes some functions in VDMS vectorstore, updates VDMS
related notebooks, updates tests, and upgrade version of VDMS (>=0.0.20)
**Issue:** N/A
**Dependencies:**
- Update vdms>=0.0.20
- [ ] **Community**: "Retrievers: Product Quantization"
- [X] This PR adds Product Quantization feature to the retrievers to the
Langchain Community. PQ is one of the fastest retrieval methods if the
embeddings are rich enough in context due to the concepts of
quantization and representation through centroids
- **Description:** Adding PQ as one of the retrievers
- **Dependencies:** using the package nanopq for this PR
- **Twitter handle:** vishnunkumar_
- [X] **Add tests and docs**: If you're adding a new integration, please
include
- [X] Added unit tests for the same in the retrievers.
- [] Will add an example notebook subsequently
- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/ -
done the same
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
### Description
This pull request added new document loaders to load documents of
various formats using [Dedoc](https://github.com/ispras/dedoc):
- `DedocFileLoader` (determine file types automatically and parse)
- `DedocPDFLoader` (for `PDF` and images parsing)
- `DedocAPIFileLoader` (determine file types automatically and parse
using Dedoc API without library installation)
[Dedoc](https://dedoc.readthedocs.io) is an open-source library/service
that extracts texts, tables, attached files and document structure
(e.g., titles, list items, etc.) from files of various formats. The
library is actively developed and maintained by a group of developers.
`Dedoc` supports `DOCX`, `XLSX`, `PPTX`, `EML`, `HTML`, `PDF`, images
and more.
Full list of supported formats can be found
[here](https://dedoc.readthedocs.io/en/latest/#id1).
For `PDF` documents, `Dedoc` allows to determine textual layer
correctness and split the document into paragraphs.
### Issue
This pull request extends variety of document loaders supported by
`langchain_community` allowing users to choose the most suitable option
for raw documents parsing.
### Dependencies
The PR added a new (optional) dependency `dedoc>=2.2.5` ([library
documentation](https://dedoc.readthedocs.io)) to the
`extended_testing_deps.txt`
### Twitter handle
None
### Add tests and docs
1. Test for the integration:
`libs/community/tests/integration_tests/document_loaders/test_dedoc.py`
2. Example notebook:
`docs/docs/integrations/document_loaders/dedoc.ipynb`
3. Information about the library:
`docs/docs/integrations/providers/dedoc.mdx`
### Lint and test
Done locally:
- `make format`
- `make lint`
- `make integration_tests`
- `make docs_build` (from the project root)
---------
Co-authored-by: Nasty <bogatenkova.anastasiya@mail.ru>
- **Description:** Add a `KeybertLinkExtractor` for graph vectorstores.
This allows extracting links from keywords in a Document and linking
nodes that have common keywords.
- **Issue:** None
- **Dependencies:** None.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
- **Description:** This allows extracting links between documents with
common named entities using [GLiNER](https://github.com/urchade/GLiNER).
- **Issue:** None
- **Dependencies:** None
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
**Description:**
- Updated constructors in PyPDFParser and PyPDFLoader to handle
`extraction_mode` and additional kwargs, aligning with the capabilities
of `PageObject.extract_text()` from pypdf.
- Added `test_pypdf_loader_with_layout` along with a corresponding
example text file to validate layout extraction from PDFs.
**Issue:** fixes#19735
**Dependencies:** This change requires updating the pypdf dependency
from version 3.4.0 to at least 4.0.0.
Additional changes include the addition of a new test
test_pypdf_loader_with_layout and an example text file to ensure the
functionality of layout extraction from PDFs aligns with the new
capabilities.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
You.com is releasing two new conversational APIs — Smart and Research.
This PR:
- integrates those APIs with Langchain, as an LLM
- streaming is supported
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- [x] PR title:
community: Add OCI Generative AI new model support
- [x] PR message:
- Description: adding support for new models offered by OCI Generative
AI services. This is a moderate update of our initial integration PR
16548 and includes a new integration for our chat models under
/langchain_community/chat_models/oci_generative_ai.py
- Issue: NA
- Dependencies: No new Dependencies, just latest version of our OCI sdk
- Twitter handle: NA
- [x] Add tests and docs:
1. we have updated our unit tests
2. we have updated our documentation including a new ipynb for our new
chat integration
- [x] Lint and test:
`make format`, `make lint`, and `make test` run successfully
---------
Co-authored-by: RHARPAZ <RHARPAZ@RHARPAZ-5750.us.oracle.com>
Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com>
Adding `UpstashRatelimitHandler` callback for rate limiting based on
number of chain invocations or LLM token usage.
For more details, see [upstash/ratelimit-py
repository](https://github.com/upstash/ratelimit-py) or the notebook
guide included in this PR.
Twitter handle: @cahidarda
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
They cause `poetry lock` to take a ton of time, and `uv pip install` can
resolve the constraints from these toml files in trivial time
(addressing problem with #19153)
This allows us to properly upgrade lockfile dependencies moving forward,
which revealed some issues that were either fixed or type-ignored (see
file comments)