langchain/libs
Erik 4e0a6ebe7d
community: Add warning when page_content is empty (#25955)
Page content sometimes is empty when PyMuPDF can not find text on pages.
For example, this can happen when the text of the PDF is not copyable
"by hand". Then an OCR solution is need - which is not integrated here.

This warning should accurately warn the user that some pages are lost
during this process.

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-09-19 05:22:09 +00:00
..
cli langchain-cli: release 0.0.31 (#26533) 2024-09-16 12:57:24 -04:00
community community: Add warning when page_content is empty (#25955) 2024-09-19 05:22:09 +00:00
core core: Add N(naming) ruff rules (#25362) 2024-09-19 05:09:39 +00:00
experimental Add check for prompt based approach in llm graph transformer (#26519) 2024-09-16 15:01:09 -07:00
langchain core[patch]: Fix "argument of type 'NoneType' is not iterable" error in LangChainTracer (#26576) 2024-09-17 10:29:46 -07:00
partners langchain_chroma: Pass through kwargs to Chroma collection.delete (#25970) 2024-09-19 04:21:24 +00:00
standard-tests anthropic[patch]: fix tool call and tool res image_url handling (#26587) 2024-09-17 14:30:07 -07:00
text-splitters text-splitters: release 0.3 (#26460) 2024-09-13 22:31:06 +00:00