mirror of
https://github.com/hwchase17/langchain
synced 2024-11-13 19:10:52 +00:00
7a9149f5dd
# OCR-based PDF loader This implements [Zerox](https://github.com/getomni-ai/zerox) PDF document loader. Zerox utilizes simple but very powerful (even though slower and more costly) approach to parsing PDF documents: it converts PDF to series of images and passes it to a vision model requesting the contents in markdown. It is especially suitable for complex PDFs that are not parsed well by other alternatives. ## Example use: ```python from langchain_community.document_loaders.pdf import ZeroxPDFLoader os.environ["OPENAI_API_KEY"] = "" ## your-api-key model = "gpt-4o-mini" ## openai model pdf_url = "https://assets.ctfassets.net/f1df9zr7wr1a/soP1fjvG1Wu66HJhu3FBS/034d6ca48edb119ae77dec5ce01a8612/OpenAI_Sacra_Teardown.pdf" loader = ZeroxPDFLoader(file_path=pdf_url, model=model) docs = loader.load() ``` The Zerox library supports wide range of provides/models. See Zerox documentation for details. - **Dependencies:** `zerox` - **Twitter handle:** @martintriska1 If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erickfriis@gmail.com> |
||
---|---|---|
.. | ||
api_reference | ||
cassettes | ||
data | ||
docs | ||
scripts | ||
src | ||
static | ||
.gitignore | ||
.yarnrc.yml | ||
babel.config.js | ||
docusaurus.config.js | ||
ignore-step.sh | ||
Makefile | ||
package.json | ||
README.md | ||
sidebars.js | ||
vercel_requirements.txt | ||
vercel.json | ||
yarn.lock |
LangChain Documentation
For more information on contributing to our documentation, see the Documentation Contributing Guide