From 10e73a372370575f7c56d4eed03adcb7155382e0 Mon Sep 17 00:00:00 2001 From: Matt Robinson Date: Thu, 23 Feb 2023 15:34:44 -0500 Subject: [PATCH] docs: remove nltk download steps (#1253) ### Summary Updates the docs to remove the `nltk` download steps from `unstructured`. As of `unstructured` `0.4.14`, this is handled automatically in the relevant modules within `unstructured`. --- docs/ecosystem/unstructured.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/docs/ecosystem/unstructured.md b/docs/ecosystem/unstructured.md index a7a32a00..1133688a 100644 --- a/docs/ecosystem/unstructured.md +++ b/docs/ecosystem/unstructured.md @@ -17,10 +17,6 @@ This page is broken into two parts: installation and setup, and then references - `poppler-utils` - `tesseract-ocr` - `libreoffice` -- Run the following to install NLTK dependencies. `unstructured` will handle this automatically - soon. - - `python -c "import nltk; nltk.download('punkt')"` - - `python -c "import nltk; nltk.download('averaged_perceptron_tagger')"` - If you are parsing PDFs, run the following to install the `detectron2` model, which `unstructured` uses for layout detection: - `pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2"`