forked from Archives/langchain
373ad49157
# docs: `ecosystem_integrations` update 3 Next cycle of updating the `ecosystem/integrations` * Added an integration `template` file * Added missed integration files * Fixed several document_loaders/notebooks ## Who can review? Is it possible to assign somebody to review PRs on docs? Thanks.
948 B
948 B
MediaWikiDump
MediaWiki XML Dumps contain the content of a wiki (wiki pages with all their revisions), without the site-related data. A XML dump does not create a full backup of the wiki database, the dump does not contain user accounts, images, edit logs, etc.
Installation and Setup
We need to install several python packages.
The mediawiki-utilities
supports XML schema 0.11 in unmerged branches.
pip install -qU git+https://github.com/mediawiki-utilities/python-mwtypes@updates_schema_0.11
The mediawiki-utilities mwxml
has a bug, fix PR pending.
pip install -qU git+https://github.com/gdedrouas/python-mwxml@xml_format_0.11
pip install -qU mwparserfromhell
Document Loader
See a usage example.
from langchain.document_loaders import MWDumpLoader