fix: to rag-semi-structured template (#14568)

**Description:** 

Fixes to rag-semi-structured template.

- Added required libraries
- pdfminer was causing issues when installing with pip. pdfminer.six
works best
- Changed the pdf name for demo from llama2 to llava


<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
pull/13850/head
Shaurya Rohatgi 7 months ago committed by GitHub
parent a019183a01
commit a4992ffada
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -8,7 +8,7 @@ authors = [
readme = "README.md"
[tool.poetry.dependencies]
python = ">=3.8.1,<4.0"
python = ">=3.9,<3.11"
langchain = ">=0.0.325"
tiktoken = ">=0.5.1"
chromadb = ">=0.4.14"
@ -16,6 +16,12 @@ openai = "<2"
unstructured = ">=0.10.19"
pdf2image = ">=1.16.3"
pdfminer = "^20191125"
opencv-python = "^4.8.1.78"
pandas = "^2.1.4"
pytesseract = "^0.3.10"
pdfminer-six = "^20221105"
unstructured-pytesseract = "^0.3.12"
unstructured-inference = "^0.7.18"
[tool.poetry.group.dev.dependencies]
langchain-cli = ">=0.0.15"

@ -16,7 +16,7 @@ from unstructured.partition.pdf import partition_pdf
# Path to docs
path = "docs"
raw_pdf_elements = partition_pdf(
filename=path + "LLaMA2.pdf",
filename=path + "/LLaVA.pdf",
# Unstructured first finds embedded image blocks
extract_images_in_pdf=False,
# Use layout model (YOLOX) to get bounding boxes (for tables) and find titles

Loading…
Cancel
Save