langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-06 03:20:49 +00:00

History

Chetanya Rastogi aead062a70 Add an example tutorial for using PDFMinerPDFasHTMLLoader (#2960 ) Last week I added the `PDFMinerPDFasHTMLLoader`. I am adding some example code in the notebook to serve as a tutorial for how that loader can be used to create snippets of a pdf that are structured within sections. All the other loaders only provide the `Document` objects segmented by pages but that's pretty loose given the amount of other metadata that can be extracted. With the new loader, one can leverage font-size of the text to decide when a new sections starts and can segment the text more semantically as shown in the tutorial notebook. The cell shows that we are able to find the content of entire section under Related Work for the example pdf which is spread across 2 pages and hence is stored as two separate documents by other loaders		2023-04-16 08:34:39 -07:00
..
_static	docs: Quick fix to Mendable Search (#2876 )	2023-04-13 23:15:57 -07:00
ecosystem	Comet callback updates (#2889 )	2023-04-14 13:19:58 -07:00
getting_started	add: conda installation instructions (#2678 )	2023-04-10 20:54:13 -07:00
modules	Add an example tutorial for using PDFMinerPDFasHTMLLoader (#2960 )	2023-04-16 08:34:39 -07:00
reference	Adding milvus/zilliz into docs (#2686 )	2023-04-10 18:08:41 -07:00
tracing	bump version to 131 (#2391 )	2023-04-04 07:21:50 -07:00
use_cases	Fix typos (#2977 )	2023-04-16 08:28:36 -07:00
conf.py	docs: Mendable Search integration (#2803 )	2023-04-13 21:52:25 -07:00
deployments.md	Add link to repo for deploying LangChain to Digitalocean App Platform (#2894 )	2023-04-14 08:55:21 -07:00
ecosystem.rst	Docs refactor (#480 )	2023-01-02 08:24:09 -08:00
gallery.rst	[Docs] minor fixes to loaders links and rst warnings (#2846 )	2023-04-13 10:54:40 -07:00
glossary.md	big docs refactor (#1978 )	2023-03-26 19:49:46 -07:00
index.rst	cr	2023-04-09 13:10:46 -07:00
make.bat	initial commit	2022-10-24 14:51:15 -07:00
Makefile	Feature: linkcheck-action (#534 ) (#542 )	2023-01-04 21:39:50 -08:00
model_laboratory.ipynb	big docs refactor (#1978 )	2023-03-26 19:49:46 -07:00
reference.rst	Feature: linkcheck-action (#534 ) (#542 )	2023-01-04 21:39:50 -08:00
requirements.txt	Harrison/docs reqs (#2199 )	2023-03-30 08:20:30 -07:00
tracing.md	Harrison/tracing docs (#806 )	2023-01-29 20:49:35 -08:00