Commit Graph

11 Commits (4fd1efc48f6ccd667b28fe8639d24ff47faf37b7)

Author SHA1 Message Date
Brice Fotzo 034a8c7c1b
community: support advanced text extraction options for pdf documents (#20265)
**Description:** 
- Updated constructors in PyPDFParser and PyPDFLoader to handle
`extraction_mode` and additional kwargs, aligning with the capabilities
of `PageObject.extract_text()` from pypdf.

- Added `test_pypdf_loader_with_layout` along with a corresponding
example text file to validate layout extraction from PDFs.

**Issue:** fixes #19735 

**Dependencies:** This change requires updating the pypdf dependency
from version 3.4.0 to at least 4.0.0.

Additional changes include the addition of a new test
test_pypdf_loader_with_layout and an example text file to ensure the
functionality of layout extraction from PDFs aligns with the new
capabilities.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2 months ago
Erick Friis 551640a030
templates: remove lockfiles (#22920)
poetry will default to latest versions without
3 months ago
Bagatur 50186da0a1
infra: rm unused # noqa violations (#22049)
Updating #21137
4 months ago
Erick Friis d83b720c40
templates: readme langsmith not private beta (#20173) 5 months ago
Erick Friis 49f3cc0f6b
templates: bump lockfile deps (#19001) 6 months ago
Erick Friis 1317578ad1
templates: use langchain-text-splitters (#18360)
- deps
- import
- import
7 months ago
Bagatur 5efb5c099f
text-splitters[minor], langchain[minor], community[patch], templates, docs: langchain-text-splitters 0.0.1 (#18346) 7 months ago
Erick Friis 3a2eb6e12b
infra: add print rule to ruff (#16221)
Added noqa for existing prints. Can slowly remove / will prevent more
being intro'd
7 months ago
Erick Friis 64785822dc
templates: bump (#17074) 8 months ago
Sagar B Manjunath 63e2acc964
docs: Fix minor issues in NVIDIA RAG canonical template (#16189)
- **Description:** Fixes a few issues in NVIDIAcanonical RAG template's
README, and adds a notebook for the template
- **Dependencies:** Adds the pypdf dependency which is needed for
ingestion, and updates the lock file

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
8 months ago
Sagar B Manjunath e6240fecab
templates: Add NVIDIA Canonical RAG example chain (#15758)
- **Description:** Adds a RAG template that uses NVIDIA AI playground
and embedding models, along with Milvus vector store

- **Dependencies:** This template depends on the AI playground service
in NVIDIA NGC. API keys with a significant trial compute are available
(10k queries at the time of writing). This template also depends on the
Milvus Vector store which is publicly available.

Note: [A quick link to get a
key](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/codellama-13b/api)
when you have an NGC account. Generate Key button at the top right of
the code window.

---------

Co-authored-by: Sagar B Manjunath <sbogadimanju@nvidia.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
8 months ago