docs: added missed `document_loaders` examples (#5150)

# DOCS added missed document_loader examples Added missed examples: `JSON`, `Open Document Format (ODT)`, `Wikipedia`, `tomarkdown`. Updated them to a consistent format. ## Who can review? @hwchase17 @dev2049
12 months ago · 33929489b9
parent c111134a55
commit 33929489b9
4 changed files with 25 additions and 18 deletions
--- a/docs/modules/indexes/document_loaders.rst
+++ b/docs/modules/indexes/document_loaders.rst
@ -40,10 +40,11 @@ For detailed instructions on how to get set up with Unstructured, see installati
   ./document_loaders/examples/file_directory.ipynb
   ./document_loaders/examples/html.ipynb
   ./document_loaders/examples/image.ipynb
-   ./document_loaders/examples/jupyter_notebook.ipynb
+   ./document_loaders/examples/json.ipynb
   ./document_loaders/examples/markdown.ipynb
   ./document_loaders/examples/microsoft_powerpoint.ipynb
   ./document_loaders/examples/microsoft_word.ipynb
+   ./document_loaders/examples/odt.ipynb
   ./document_loaders/examples/pandas_dataframe.ipynb
   ./document_loaders/examples/pdf.ipynb
   ./document_loaders/examples/sitemap.ipynb
@ -81,6 +82,7 @@ We don't need any access permissions to these datasets and services.
   ./document_loaders/examples/ifixit.ipynb
   ./document_loaders/examples/imsdb.ipynb
   ./document_loaders/examples/mediawikidump.ipynb
+   ./document_loaders/examples/wikipedia.ipynb
   ./document_loaders/examples/youtube_transcript.ipynb


@ -131,4 +133,5 @@ We need access tokens and sometime other parameters to get access to these datas
   ./document_loaders/examples/slack.ipynb
   ./document_loaders/examples/spreedly.ipynb
   ./document_loaders/examples/stripe.ipynb
+   ./document_loaders/examples/tomarkdown.ipynb
   ./document_loaders/examples/twitter.ipynb
--- a/docs/modules/indexes/document_loaders/examples/json_loader.ipynb
+++ b/docs/modules/indexes/document_loaders/examples/json_loader.ipynb
@ -4,28 +4,30 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# JSON Files\n",
+    "# JSON\n",
    "\n",
-    "The `JSONLoader` uses a specified [jq schema](https://en.wikipedia.org/wiki/Jq_(programming_language)) to parse the JSON files.\n",
-    "\n",
-    "This notebook shows how to use the `JSONLoader` to load [JSON](https://en.wikipedia.org/wiki/JSON) files into documents. A few examples of `jq` schema extracting different parts of a JSON file are also shown.\n",
+    ">[JSON (JavaScript Object Notation)](https://en.wikipedia.org/wiki/JSON) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values).\n",
    "\n",
+    ">The `JSONLoader` uses a specified [jq schema](https://en.wikipedia.org/wiki/Jq_(programming_language)) to parse the JSON files. It uses the `jq` python package.\n",
    "Check this [manual](https://stedolan.github.io/jq/manual/#Basicfilters) for a detailed documentation of the `jq` syntax."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
+   "execution_count": 1,
+   "metadata": {
+    "tags": []
+   },
   "outputs": [],
   "source": [
-    "!pip install jq"
+    "#!pip install jq"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
+    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
@ -359,7 +361,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.9.16"
+   "version": "3.10.6"
  }
 },
 "nbformat": 4,
--- a/docs/modules/indexes/document_loaders/examples/odt.ipynb
+++ b/docs/modules/indexes/document_loaders/examples/odt.ipynb
@ -5,9 +5,13 @@
   "id": "22a849cc",
   "metadata": {},
   "source": [
-    "## Unstructured ODT Loader\n",
+    "# Open Document Format (ODT)\n",
    "\n",
-    "The `UnstructuredODTLoader` can be used to load Open Office ODT files."
+    ">The [Open Document Format for Office Applications (ODF)](https://en.wikipedia.org/wiki/OpenDocument), also known as `OpenDocument`, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed with the aim of providing an open, XML-based file format specification for office applications.\n",
+    "\n",
+    ">The standard is developed and maintained by a technical committee in the Organization for the Advancement of Structured Information Standards (`OASIS`) consortium. It was based on the Sun Microsystems specification for OpenOffice.org XML, the default format for `OpenOffice.org` and `LibreOffice`. It was originally developed for `StarOffice` \"to provide an open standard for office documents.\"\n",
+    "\n",
+    "The `UnstructuredODTLoader` is used to load `Open Office ODT` files."
   ]
  },
  {
@ -68,7 +72,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.8.13"
+   "version": "3.10.6"
  }
 },
 "nbformat": 4,
--- a/docs/modules/indexes/document_loaders/examples/tomarkdown.ipynb
+++ b/docs/modules/indexes/document_loaders/examples/tomarkdown.ipynb
@ -7,7 +7,7 @@
   "source": [
    "# 2Markdown\n",
    "\n",
-    "Uses [2markdown](https://2markdown.com/) to convert any webpage into a standard markdown file"
+    ">[2markdown](https://2markdown.com/) service transforms website content into structured markdown files.\n"
   ]
  },
  {
@ -17,7 +17,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "# You will need to get your own API key\n",
+    "# You will need to get your own API key. See https://2markdown.com/login\n",
    "\n",
    "api_key = \"\""
   ]
@ -56,9 +56,7 @@
   "cell_type": "code",
   "execution_count": 8,
   "id": "706304e9",
-   "metadata": {
-    "scrolled": false
-   },
+   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
@ -220,7 +218,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.9.1"
+   "version": "3.10.6"
  }
 },
 "nbformat": 4,