docs: fix pdf docs hierarchy and formatting (#4593)

# Fix pdf loader docs page


![image](https://github.com/hwchase17/langchain/assets/707699/4a11f379-00ed-4f7a-9870-71f74e0cadc6)

Using h1's messes with hierarchy, this fixes that, and moves the
PyPDFium2 loader out of the middle of PDFMiner docs
parallel_dir_loader
Tim Asp 1 year ago committed by GitHub
parent 36f9e9a0ba
commit ed0d557ede
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -337,75 +337,73 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "05187b33", "id": "96351714",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"id": "21998d18",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Using PDFMiner" "## Using PyPDFium2"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 7, "execution_count": 1,
"id": "2f0cc9ff", "id": "003fcc1d",
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from langchain.document_loaders import PDFMinerLoader" "from langchain.document_loaders import PyPDFium2Loader"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 8, "execution_count": 3,
"id": "42b531e8", "id": "46766e29",
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"loader = PDFMinerLoader(\"example_data/layout-parser-paper.pdf\")" "loader = PyPDFium2Loader(\"example_data/layout-parser-paper.pdf\")"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 9, "execution_count": 9,
"id": "483720b5",
"metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"data = loader.load()" "data = loader.load()"
] ],
"metadata": {
"collapsed": false
}
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "96351714",
"metadata": {},
"source": [ "source": [
"# Using PyPDFium2" "## Using PDFMiner"
] ],
"metadata": {
"collapsed": false
}
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 1, "execution_count": 7,
"id": "003fcc1d",
"metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from langchain.document_loaders import PyPDFium2Loader" "from langchain.document_loaders import PDFMinerLoader"
] ],
"metadata": {
"collapsed": false
}
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 3, "execution_count": 8,
"id": "46766e29",
"metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"loader = PyPDFium2Loader(\"example_data/layout-parser-paper.pdf\")" "loader = PDFMinerLoader(\"example_data/layout-parser-paper.pdf\")"
] ],
"metadata": {
"collapsed": false
}
}, },
{ {
"cell_type": "code", "cell_type": "code",
@ -422,7 +420,7 @@
"id": "c90a5fe8", "id": "c90a5fe8",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Using PDFMiner to generate HTML text" "### Using PDFMiner to generate HTML text"
] ]
}, },
{ {

Loading…
Cancel
Save