2
0
mirror of https://github.com/WikiTeam/wikiteam synced 2024-11-12 07:12:41 +00:00
Go to file
Emilio J. Rodríguez-Posada 6bbdf42ed0 Merge pull request #125 from balr0g/check-for-disabled-api
Don't try to download sites with disabled API
2014-06-28 10:35:31 +02:00
batchdownload Issue 97: Add new siteinfo.json to the archived 7z 2014-06-26 11:11:19 +02:00
listsofwikis adding list info file for tropicalwikis 2014-06-27 17:07:37 +02:00
research/paper-wikiteam-2014 instructions to compile LaTeX paper; 2014-01-24 20:08:16 +00:00
rewrite Adding rewrite code so others can build on top of it 2014-01-29 13:36:19 +00:00
commonschecker.py Issue 85: more cross-platform shebang on all scripts 2014-02-26 23:22:53 +00:00
commonsdownloader.py Issue 85: more cross-platform shebang on all scripts 2014-02-26 23:22:53 +00:00
commonssql.py Issue 85: more cross-platform shebang on all scripts 2014-02-26 23:22:53 +00:00
dumpgenerator.py Don't try to download sites with disabled API 2014-06-27 15:19:54 -04:00
gui.py Issue 85: more cross-platform shebang on all scripts 2014-02-26 23:22:53 +00:00
LICENSE renaming gpl.txt to LICENSE 2014-06-25 16:03:15 +02:00
README.md Ask Wikimedia Commons reseed 2014-06-27 17:41:20 +02:00
uploader.py Issue 85: more cross-platform shebang on all scripts 2014-02-26 23:22:53 +00:00
wikiadownloader.py Issue 85: more cross-platform shebang on all scripts 2014-02-26 23:22:53 +00:00
wikipediadownloader.py improving args parsing and help 2014-06-25 17:32:24 +02:00

WikiTeam

We archive wikis, from Wikipedia to tiniest wikis

WikiTeam software is a set of tools for archiving wikis. They work on MediaWiki wikis, but we want to expand to other wiki engines. As of June 2014, WikiTeam has preserved more than 13,000 stand-alone wikis, several wikifarms, regular Wikipedia dumps and 24TB of Wikimedia Commons images.

There are thousands of wikis in the Internet. Every day some of them are no longer publicly available and, due to lack of backups, lost forever. Millions of people download tons of media files (movies, music, books, etc) from the Internet, serving as a kind of distributed backup. Wikis, most of them under free licenses, disappear from time to time because nobody grabbed a copy of them. That is a shame that we would like to solve.

WikiTeam is the Archive Team (GitHub) subcommittee on wikis. It was founded and originally developed by Emilio J. Rodríguez-Posada, a Wikipedia veteran editor and amateur archivist. Many people have helped by sending suggestions, reporting bugs, writing documentation, providing help in the mailing list and making wiki backups. Thanks to all, especially to: Federico Leva, Alex Buie, Scott Boyd, Hydriz, Platonides, Ian McEwen, Mike Dupont and balrog.

Documentation Source code Download available backups Community Follow us on Twitter

Quick guide

This is a very quick guide for the most used features of WikiTeam tools. For further information, read the tutorial and the rest of the documentation. You can also ask in the mailing list.

Download any wiki

For downloading any wiki, use one of the following options:

python dumpgenerator.py --api=http://wiki.domain.org/w/api.php --xml --images (complete XML histories and images)

python dumpgenerator.py --api=http://wiki.domain.org/w/api.php --xml (complete XML histories)

python dumpgenerator.py --api=http://wiki.domain.org/w/api.php --xml --curonly (only current version of every page)

You can resume an aborted download:

python dumpgenerator.py --api=http://wiki.domain.org/w/api.php --xml --images --resume --path=/path/to/incomplete-dump

See more options:

python dumpgenerator.py --help

Download Wikimedia dumps

For downloading Wikimedia XML dumps (Wikipedia, Wikibooks, Wikinews, etc):

python wikipediadownloader.py (download all projects)

See more options:

python wikipediadownloader.py --help

Download Wikimedia Commons images

There is a script for this, but we have uploaded the tarballs to Internet Archive, so it's more useful to reseed their torrents than to re-generate old ones with the script.