langchain/docs/integrations/mediawikidump.md
Leonid Ganeline 373ad49157
docs ecosystem/integrations update 3 (#5470)
# docs: `ecosystem_integrations` update 3

Next cycle of updating the `ecosystem/integrations`
* Added an integration `template` file
* Added missed integration files
* Fixed several document_loaders/notebooks

## Who can review?

Is it possible to assign somebody to review PRs on docs? Thanks.
2023-05-31 17:54:05 -07:00

948 B

MediaWikiDump

MediaWiki XML Dumps contain the content of a wiki (wiki pages with all their revisions), without the site-related data. A XML dump does not create a full backup of the wiki database, the dump does not contain user accounts, images, edit logs, etc.

Installation and Setup

We need to install several python packages.

The mediawiki-utilities supports XML schema 0.11 in unmerged branches.

pip install -qU git+https://github.com/mediawiki-utilities/python-mwtypes@updates_schema_0.11

The mediawiki-utilities mwxml has a bug, fix PR pending.

pip install -qU git+https://github.com/gdedrouas/python-mwxml@xml_format_0.11
pip install -qU mwparserfromhell

Document Loader

See a usage example.

from langchain.document_loaders import MWDumpLoader