diff --git a/docs/integrations/unstructured.md b/docs/integrations/unstructured.md index 6c07bb65..fb3ba2a5 100644 --- a/docs/integrations/unstructured.md +++ b/docs/integrations/unstructured.md @@ -4,8 +4,7 @@ [Unstructured.IO](https://www.unstructured.io/) extracts clean text from raw source documents like PDFs and Word documents. This page covers how to use the [`unstructured`](https://github.com/Unstructured-IO/unstructured) -ecosystem within LangChain. - +ecosystem within LangChain. ## Installation and Setup @@ -20,12 +19,6 @@ its dependencies running locally. - `tesseract-ocr`(images and PDFs) - `libreoffice` (MS Office docs) - `pandoc` (EPUBs) -- If you are parsing PDFs using the `"hi_res"` strategy, run the following to install the `detectron2` model, which - `unstructured` uses for layout detection: - - `pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@e2ce8dc#egg=detectron2"` - - If `detectron2` is not installed, `unstructured` will fallback to processing PDFs - using the `"fast"` strategy, which uses `pdfminer` directly and doesn't require - `detectron2`. If you want to get up and running with less set up, you can simply run `pip install unstructured` and use `UnstructuredAPIFileLoader` or diff --git a/docs/modules/indexes/document_loaders/examples/unstructured_file.ipynb b/docs/modules/indexes/document_loaders/examples/unstructured_file.ipynb index e391f1ac..8bccef0d 100644 --- a/docs/modules/indexes/document_loaders/examples/unstructured_file.ipynb +++ b/docs/modules/indexes/document_loaders/examples/unstructured_file.ipynb @@ -19,7 +19,6 @@ "source": [ "# # Install package\n", "!pip install \"unstructured[local-inference]\"\n", - "!pip install \"detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2\"\n", "!pip install layoutparser[layoutmodels,tesseract]" ] },