docs: Updated WikipediaLoader documentation (#25647)

- Output of the cells was not included in the documentation. I have
added them.
- There is another parameter in the `WikipediaLoader` class called
`doc_content_chars_max` (Based on
[this](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.wikipedia.WikipediaLoader.html)).
I have included this in the list of parameters.
- I put the list of parameters under a new section called "Parameters"
in the documentation.
- I also included the `langchain_community` package in the installation
command.
- Some minor formatting/spelling issues were fixed.
This commit is contained in:
Parsa Abbasi 2024-08-23 11:49:03 +03:30 committed by GitHub
parent b865ee49a0
commit 1b2ae40d45
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -17,15 +17,9 @@
"id": "1b7a1eef-7bf7-4e7d-8bfc-c4e27c9488cb",
"metadata": {},
"source": [
"## Installation"
]
},
{
"cell_type": "markdown",
"id": "2abd5578-aa3d-46b9-99af-8b262f0b3df8",
"metadata": {},
"source": [
"First, you need to install `wikipedia` python package."
"## Installation\n",
"\n",
"First, you need to install the `langchain_community` and `wikipedia` packages."
]
},
{
@ -37,7 +31,22 @@
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet wikipedia"
"%pip install -qU langchain_community wikipedia"
]
},
{
"cell_type": "markdown",
"id": "98342290",
"metadata": {},
"source": [
"## Parameters\n",
"\n",
"`WikipediaLoader` has the following arguments:\n",
"- `query`: the free text which used to find documents in Wikipedia\n",
"- `lang` (optional): default=\"en\". Use it to search in a specific language part of Wikipedia\n",
"- `load_max_docs` (optional): default=100. Use it to limit number of downloaded documents. It takes time to download all 100 documents, so use a small number for experiments. There is a hard limit of 300 for now.\n",
"- `load_all_available_meta` (optional): default=False. By default only the most important fields downloaded: `title` and `summary`. If `True` then all available fields will be downloaded.\n",
"- `doc_content_chars_max` (optional): default=4000. The maximum number of characters for the document content."
]
},
{
@ -45,24 +54,12 @@
"id": "95f05e1c-195e-4e2b-ae8e-8d6637f15be6",
"metadata": {},
"source": [
"## Examples"
]
},
{
"cell_type": "markdown",
"id": "e29b954c-1407-4797-ae21-6ba8937156be",
"metadata": {},
"source": [
"`WikipediaLoader` has these arguments:\n",
"- `query`: free text which used to find documents in Wikipedia\n",
"- optional `lang`: default=\"en\". Use it to search in a specific language part of Wikipedia\n",
"- optional `load_max_docs`: default=100. Use it to limit number of downloaded documents. It takes time to download all 100 documents, so use a small number for experiments. There is a hard limit of 300 for now.\n",
"- optional `load_all_available_meta`: default=False. By default only the most important fields downloaded: `Published` (date when document was published/last updated), `title`, `Summary`. If True, other fields also downloaded."
"## Example"
]
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 1,
"id": "9bfd5e46",
"metadata": {},
"outputs": [],
@ -72,10 +69,21 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"id": "700e4ef2",
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = WikipediaLoader(query=\"HUNTER X HUNTER\", load_max_docs=2).load()\n",
"len(docs)"
@ -83,26 +91,50 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"id": "8977bac0-0042-4f23-9754-247dbd32439b",
"metadata": {
"tags": []
},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"{'title': 'Hunter × Hunter',\n",
" 'summary': 'Hunter × Hunter (pronounced \"hunter hunter\") is a Japanese manga series written and illustrated by Yoshihiro Togashi. It has been serialized in Shueisha\\'s shōnen manga magazine Weekly Shōnen Jump since March 1998, although the manga has frequently gone on extended hiatuses since 2006. Its chapters have been collected in 37 tankōbon volumes as of November 2022. The story focuses on a young boy named Gon Freecss who discovers that his father, who left him at a young age, is actually a world-renowned Hunter, a licensed professional who specializes in fantastical pursuits such as locating rare or unidentified animal species, treasure hunting, surveying unexplored enclaves, or hunting down lawless individuals. Gon departs on a journey to become a Hunter and eventually find his father. Along the way, Gon meets various other Hunters and encounters the paranormal.\\nHunter × Hunter was adapted into a 62-episode anime television series by Nippon Animation and directed by Kazuhiro Furuhashi, which ran on Fuji Television from October 1999 to March 2001. Three separate original video animations (OVAs) totaling 30 episodes were subsequently produced by Nippon Animation and released in Japan from 2002 to 2004. A second anime television series by Madhouse aired on Nippon Television from October 2011 to September 2014, totaling 148 episodes, with two animated theatrical films released in 2013. There are also numerous audio albums, video games, musicals, and other media based on Hunter × Hunter.\\nThe manga has been licensed for English release in North America by Viz Media since April 2005. Both television series have been also licensed by Viz Media, with the first series having aired on the Funimation Channel in 2009 and the second series broadcast on Adult Swim\\'s Toonami programming block from April 2016 to June 2019.\\nHunter × Hunter has been a huge critical and financial success and has become one of the best-selling manga series of all time, having over 84 million copies in circulation by July 2022.',\n",
" 'source': 'https://en.wikipedia.org/wiki/Hunter_%C3%97_Hunter'}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0].metadata # meta-information of the Document"
"docs[0].metadata # metadata of the first document"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 4,
"id": "46969806-45a9-4c4d-a61b-cfb9658fc9de",
"metadata": {
"tags": []
},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"'Hunter × Hunter (pronounced \"hunter hunter\") is a Japanese manga series written and illustrated by Yoshihiro Togashi. It has been serialized in Shueisha\\'s shōnen manga magazine Weekly Shōnen Jump since March 1998, although the manga has frequently gone on extended hiatuses since 2006. Its chapters have been collected in 37 tankōbon volumes as of November 2022. The story focuses on a young boy name'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0].page_content[:400] # a content of the Document"
"docs[0].page_content[:400] # a part of the page content"
]
}
],
@ -122,7 +154,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.11.9"
}
},
"nbformat": 4,