docs: remove nltk download steps (#1253)

### Summary

Updates the docs to remove the `nltk` download steps from
`unstructured`. As of `unstructured` `0.4.14`, this is handled
automatically in the relevant modules within `unstructured`.
This commit is contained in:
Matt Robinson 2023-02-23 15:34:44 -05:00 committed by GitHub
parent 5bc6dc076e
commit 10e73a3723
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -17,10 +17,6 @@ This page is broken into two parts: installation and setup, and then references
- `poppler-utils` - `poppler-utils`
- `tesseract-ocr` - `tesseract-ocr`
- `libreoffice` - `libreoffice`
- Run the following to install NLTK dependencies. `unstructured` will handle this automatically
soon.
- `python -c "import nltk; nltk.download('punkt')"`
- `python -c "import nltk; nltk.download('averaged_perceptron_tagger')"`
- If you are parsing PDFs, run the following to install the `detectron2` model, which - If you are parsing PDFs, run the following to install the `detectron2` model, which
`unstructured` uses for layout detection: `unstructured` uses for layout detection:
- `pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2"` - `pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2"`