From 33929489b941b0e2a9a66c4dc9952a5e09919e5f Mon Sep 17 00:00:00 2001 From: Leonid Ganeline Date: Tue, 23 May 2023 21:56:41 -0700 Subject: [PATCH] docs: added missed `document_loaders` examples (#5150) # DOCS added missed document_loader examples Added missed examples: `JSON`, `Open Document Format (ODT)`, `Wikipedia`, `tomarkdown`. Updated them to a consistent format. ## Who can review? @hwchase17 @dev2049 --- docs/modules/indexes/document_loaders.rst | 5 ++++- .../examples/{json_loader.ipynb => json.ipynb} | 18 ++++++++++-------- .../document_loaders/examples/odt.ipynb | 10 +++++++--- .../document_loaders/examples/tomarkdown.ipynb | 10 ++++------ 4 files changed, 25 insertions(+), 18 deletions(-) rename docs/modules/indexes/document_loaders/examples/{json_loader.ipynb => json.ipynb} (96%) diff --git a/docs/modules/indexes/document_loaders.rst b/docs/modules/indexes/document_loaders.rst index 8c7306f2..80f47cd2 100644 --- a/docs/modules/indexes/document_loaders.rst +++ b/docs/modules/indexes/document_loaders.rst @@ -40,10 +40,11 @@ For detailed instructions on how to get set up with Unstructured, see installati ./document_loaders/examples/file_directory.ipynb ./document_loaders/examples/html.ipynb ./document_loaders/examples/image.ipynb - ./document_loaders/examples/jupyter_notebook.ipynb + ./document_loaders/examples/json.ipynb ./document_loaders/examples/markdown.ipynb ./document_loaders/examples/microsoft_powerpoint.ipynb ./document_loaders/examples/microsoft_word.ipynb + ./document_loaders/examples/odt.ipynb ./document_loaders/examples/pandas_dataframe.ipynb ./document_loaders/examples/pdf.ipynb ./document_loaders/examples/sitemap.ipynb @@ -81,6 +82,7 @@ We don't need any access permissions to these datasets and services. ./document_loaders/examples/ifixit.ipynb ./document_loaders/examples/imsdb.ipynb ./document_loaders/examples/mediawikidump.ipynb + ./document_loaders/examples/wikipedia.ipynb ./document_loaders/examples/youtube_transcript.ipynb @@ -131,4 +133,5 @@ We need access tokens and sometime other parameters to get access to these datas ./document_loaders/examples/slack.ipynb ./document_loaders/examples/spreedly.ipynb ./document_loaders/examples/stripe.ipynb + ./document_loaders/examples/tomarkdown.ipynb ./document_loaders/examples/twitter.ipynb diff --git a/docs/modules/indexes/document_loaders/examples/json_loader.ipynb b/docs/modules/indexes/document_loaders/examples/json.ipynb similarity index 96% rename from docs/modules/indexes/document_loaders/examples/json_loader.ipynb rename to docs/modules/indexes/document_loaders/examples/json.ipynb index 9f009cab..31f29672 100644 --- a/docs/modules/indexes/document_loaders/examples/json_loader.ipynb +++ b/docs/modules/indexes/document_loaders/examples/json.ipynb @@ -4,28 +4,30 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# JSON Files\n", + "# JSON\n", "\n", - "The `JSONLoader` uses a specified [jq schema](https://en.wikipedia.org/wiki/Jq_(programming_language)) to parse the JSON files.\n", - "\n", - "This notebook shows how to use the `JSONLoader` to load [JSON](https://en.wikipedia.org/wiki/JSON) files into documents. A few examples of `jq` schema extracting different parts of a JSON file are also shown.\n", + ">[JSON (JavaScript Object Notation)](https://en.wikipedia.org/wiki/JSON) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values).\n", "\n", + ">The `JSONLoader` uses a specified [jq schema](https://en.wikipedia.org/wiki/Jq_(programming_language)) to parse the JSON files. It uses the `jq` python package.\n", "Check this [manual](https://stedolan.github.io/jq/manual/#Basicfilters) for a detailed documentation of the `jq` syntax." ] }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, + "execution_count": 1, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "!pip install jq" + "#!pip install jq" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { + "collapsed": true, "jupyter": { "outputs_hidden": true } @@ -359,7 +361,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.16" + "version": "3.10.6" } }, "nbformat": 4, diff --git a/docs/modules/indexes/document_loaders/examples/odt.ipynb b/docs/modules/indexes/document_loaders/examples/odt.ipynb index 9bdee7aa..d0fbbe1c 100644 --- a/docs/modules/indexes/document_loaders/examples/odt.ipynb +++ b/docs/modules/indexes/document_loaders/examples/odt.ipynb @@ -5,9 +5,13 @@ "id": "22a849cc", "metadata": {}, "source": [ - "## Unstructured ODT Loader\n", + "# Open Document Format (ODT)\n", "\n", - "The `UnstructuredODTLoader` can be used to load Open Office ODT files." + ">The [Open Document Format for Office Applications (ODF)](https://en.wikipedia.org/wiki/OpenDocument), also known as `OpenDocument`, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed with the aim of providing an open, XML-based file format specification for office applications.\n", + "\n", + ">The standard is developed and maintained by a technical committee in the Organization for the Advancement of Structured Information Standards (`OASIS`) consortium. It was based on the Sun Microsystems specification for OpenOffice.org XML, the default format for `OpenOffice.org` and `LibreOffice`. It was originally developed for `StarOffice` \"to provide an open standard for office documents.\"\n", + "\n", + "The `UnstructuredODTLoader` is used to load `Open Office ODT` files." ] }, { @@ -68,7 +72,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.13" + "version": "3.10.6" } }, "nbformat": 4, diff --git a/docs/modules/indexes/document_loaders/examples/tomarkdown.ipynb b/docs/modules/indexes/document_loaders/examples/tomarkdown.ipynb index da9d262e..585f6719 100644 --- a/docs/modules/indexes/document_loaders/examples/tomarkdown.ipynb +++ b/docs/modules/indexes/document_loaders/examples/tomarkdown.ipynb @@ -7,7 +7,7 @@ "source": [ "# 2Markdown\n", "\n", - "Uses [2markdown](https://2markdown.com/) to convert any webpage into a standard markdown file" + ">[2markdown](https://2markdown.com/) service transforms website content into structured markdown files.\n" ] }, { @@ -17,7 +17,7 @@ "metadata": {}, "outputs": [], "source": [ - "# You will need to get your own API key\n", + "# You will need to get your own API key. See https://2markdown.com/login\n", "\n", "api_key = \"\"" ] @@ -56,9 +56,7 @@ "cell_type": "code", "execution_count": 8, "id": "706304e9", - "metadata": { - "scrolled": false - }, + "metadata": {}, "outputs": [ { "name": "stdout", @@ -220,7 +218,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.1" + "version": "3.10.6" } }, "nbformat": 4,