mirror of https://github.com/hwchase17/langchain synced 2024-11-13 19:10:52 +00:00

History

Martin Triska 7a9149f5dd community: ZeroxPDFLoader (#27800 ) # OCR-based PDF loader This implements [Zerox](https://github.com/getomni-ai/zerox) PDF document loader. Zerox utilizes simple but very powerful (even though slower and more costly) approach to parsing PDF documents: it converts PDF to series of images and passes it to a vision model requesting the contents in markdown. It is especially suitable for complex PDFs that are not parsed well by other alternatives. ## Example use: ```python from langchain_community.document_loaders.pdf import ZeroxPDFLoader os.environ["OPENAI_API_KEY"] = "" ## your-api-key model = "gpt-4o-mini" ## openai model pdf_url = "https://assets.ctfassets.net/f1df9zr7wr1a/soP1fjvG1Wu66HJhu3FBS/034d6ca48edb119ae77dec5ce01a8612/OpenAI_Sacra_Teardown.pdf" loader = ZeroxPDFLoader(file_path=pdf_url, model=model) docs = loader.load() ``` The Zerox library supports wide range of provides/models. See Zerox documentation for details. - Dependencies: `zerox` - Twitter handle: @martintriska1 If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erickfriis@gmail.com>		2024-11-07 03:14:57 +00:00
..
api_reference	infra: remove some special cases (#27839 )	2024-11-01 21:13:43 +00:00
cassettes	docs: run how-to guides in CI (#27615 )	2024-10-30 12:35:38 -04:00
data	docs: 👥 Update LangChain people data (#27022 )	2024-10-08 17:09:07 +00:00
docs	community: ZeroxPDFLoader (#27800 )	2024-11-07 03:14:57 +00:00
scripts	infra: remove some special cases (#27839 )	2024-11-01 21:13:43 +00:00
src	Add nvidia as provider for embedding, llm (#27810 )	2024-11-04 19:45:51 +00:00
static	update llm graph transformer documentation (#27905 )	2024-11-05 11:54:26 -05:00
.gitignore	infra: cleanup docs build (#21134 )	2024-05-01 17:34:05 -07:00
.yarnrc.yml	docs[minor]: Add thumbs up/down to all docs pages (#18526 )	2024-03-04 15:14:28 -08:00
babel.config.js	Restructure docs (#11620 )	2023-10-10 12:55:19 -07:00
docusaurus.config.js	docs, core: error messaging [wip] (#27397 )	2024-10-17 03:39:36 +00:00
ignore-step.sh	multiple: pydantic 2 compatibility, v0.3 (#26443 )	2024-09-13 14:38:45 -07:00
Makefile	docs: platforms -> providers (#27285 )	2024-10-16 18:27:07 +00:00
package.json	docs: add discussions with giscus (#27172 )	2024-10-11 15:14:45 -07:00
README.md	docs: reorganize contributing docs (#27649 )	2024-10-25 22:41:54 +00:00
sidebars.js	docs: sidebar capitalization (#27894 )	2024-11-04 22:09:32 +00:00
vercel_requirements.txt	docs: add api referencs to langgraph (#26877 )	2024-09-26 15:21:10 -04:00
vercel.json	docs: INVALID_CHAT_HISTORY redirect (#27845 )	2024-11-01 21:35:11 +00:00
yarn.lock	docs: add discussions with giscus (#27172 )	2024-10-11 15:14:45 -07:00

README.md

LangChain Documentation

For more information on contributing to our documentation, see the Documentation Contributing Guide