Commit Graph

1051 Commits (eacaf08b2ff4420d9291c983cd30e321100b4fd9)
 

Author SHA1 Message Date
Federico Leva 0b37b39923 Define xml header as empty first so that it can fail graciously
Fixes https://github.com/WikiTeam/wikiteam/issues/355
4 years ago
Federico Leva becd01b271 Use defined requests.exceptions.ConnectionError
Fixes https://github.com/WikiTeam/wikiteam/issues/356
4 years ago
Federico Leva f0436ee57c Make mwclient respect the provided HTTP/HTTPS scheme
Fixes https://github.com/WikiTeam/wikiteam/issues/358
4 years ago
Federico Leva 9ec6ce42d3 Finish xmlrevisions option for older wikis
* Actually proceed to the next page when no continuation.
* Provide the same output as with the usual per-page export.

Tested on a MediaWiki 1.16 wiki with success.
4 years ago
Federico Leva 0f35d03929 Remove rvlimit=max, fails in MediaWiki 1.16
For instance:
"Exception Caught: Internal error in ApiResult::setElement: Attempting to add element revisions=50, existing value is 500"
https://wiki.rabenthal.net/api.php?action=query&prop=revisions&titles=Hauptseite&rvprop=ids&rvlimit=max
4 years ago
Federico Leva 6b12e20a9d Actually convert the titles query method to mwclient too 4 years ago
Federico Leva f10adb71af Don't try to add revisions if the namespace has none
Traceback (most recent call last):
  File "dumpgenerator.py", line 2362, in <module>

  File "dumpgenerator.py", line 2354, in main
    resumePreviousDump(config=config, other=other)
  File "dumpgenerator.py", line 1921, in createNewDump
    getPageTitles(config=config, session=other['session'])
  File "dumpgenerator.py", line 755, in generateXMLDump
    for xml in getXMLRevisions(config=config, session=session):
  File "dumpgenerator.py", line 861, in getXMLRevisions
    revids.append(str(revision['revid']))
IndexError: list index out of range
4 years ago
Federico Leva 3760501f74 Add a couple comments 4 years ago
Federico Leva 11507e931e Initial switch to mwclient for the xmlrevisions option
* Still maintained and available for python 3 as well.
* Allows raw API requests as we need.
* Does not provide handy generators, we need to do continuation.
* Decides on its own which protocol and exact path to use, fails at it.
* Appears to use POST by default unless asked otherwise, what to do?
4 years ago
nemobis 353f4d90a6 Merge pull request #349 from nemobis/xmlrevisions
Use GET rather than POST for API requests
4 years ago
Federico Leva 3d04dcbf5c Use GET rather than POST for API requests
* It was just an old trick to get past some barriers which were waived with GET.
* It's not conformant and doesn't play well with some redirects.
* Some recent wikis seem to not like it at all, see also issue #311.
4 years ago
nemobis 128e23c3a4
Merge pull request #346 from nemobis/bug/334
Use GET rather than POST for allpages API query
4 years ago
Federico Leva 4cdc5a7784 Use GET rather than POST for allpages API query
POST does not follow the redirect from HTTP to HTTPS, which makes the
request (and the entire dump) fail if an API URL is passed like
http://7daystodie-de.gamepedia.com/api.php

Fixes https://github.com/WikiTeam/wikiteam/issues/334
4 years ago
nemobis 210158473e
Merge pull request #345 from nemobis/2020list
Update MediaWiki and Wikia lists
4 years ago
Federico Leva 7dad9a44cd Give up on Wikia-made dumps
There are less than 500 available right now, out of 400k active wikis.
4 years ago
Federico Leva accc7db019 Update list of MediaWikis
* Run checkalive.py on the "originalurl" URLs from existing items in the
  WikiTeam collection on the Internet Archive, minus dead wiki farms.
* Downloaded the list of unarchived wikis from WikiApiary.
4 years ago
Federico Leva aa0b133c1d Minimal update to list of Wikia wikis
* Change API URL to HTTPS and fandom.com.
* New output of the script (403k wikis), changed to wikia.com for diff purposes.
4 years ago
nemobis 0eeb6bfcb0
Upload all relevant wikidump.7z and history.xml.7z
Don't stop at the first 7z file found in the directory listing.
Should be fast enough for most users.

Fixes #326
4 years ago
emijrp 527401560c
2020 4 years ago
emijrp 7b03096ace update wikidot list 5 years ago
emijrp 714c9ea1f7 Merge branch 'master' of https://github.com/WikiTeam/wikiteam 5 years ago
emijrp 6aac36ce57 wikidot wiki list 5 years ago
emijrp 61b0b1b80b Merge branch 'master' of https://github.com/WikiTeam/wikiteam 5 years ago
emijrp 0cd4efb51c better spider for wikidot 5 years ago
emijrp f6c57d59e7 . 5 years ago
emijrp 5fd980c6b7 delay 1 second 5 years ago
emijrp aecee2dc53 Merge branch 'master' of https://github.com/WikiTeam/wikiteam 5 years ago
emijrp 33a93fd76a delay 1 second 5 years ago
emijrp 966df37c54
new url https://www.archiveteam.org/ 5 years ago
emijrp d43d017075
Update README.md 5 years ago
Emilio 080b723334
Update wikiapiary-update-ia-params.py 5 years ago
nemobis be0dcd8e55
Merge pull request #337 from zerote000/master
Wikiapiary update script - Change Internet Archive search string to search using both API URL and Index URL.
5 years ago
Christoffer Popp Nørskov 83f72db6cd Wikiapiary update script - Change Internet Archive search string to search using both API URL and Index URL. 5 years ago
Emilio 287b8b88a3
250,000 wikis 5 years ago
emijrp ffb39afd1e 800 wikidot sites 6 years ago
emijrp 28158f9b04 wikis 6 years ago
emijrp 7c72c27f2a wikidot 6 years ago
emijrp 4e8c92b6d2 Merge branch 'master' of https://github.com/WikiTeam/wikiteam 6 years ago
emijrp 0ebf86caf6 update, 1.8M users, 400K wikis 6 years ago
nemobis bee34f4b1b
Merge pull request #319 from TyIsI/patch-1
Updated with vancouver.hackspace.ca -> vanhack.ca domain change
6 years ago
TyIsI 09fac2aeeb Updated with vancouver.hackspace.ca domain change 6 years ago
emijrp 5aac17ea03 update 6 years ago
emijrp 72b67c74f1 randomize saving 6 years ago
emijrp ca672426bb quotes issues in titles 6 years ago
emijrp a69f44caab ignore expired wikis 6 years ago
emijrp a359984932 ++ 6 years ago
emijrp 5525a3cc4a ++ 6 years ago
emijrp 3361e4d09f Merge branch 'master' of https://github.com/WikiTeam/wikiteam 6 years ago
emijrp 94ebe5e1a3 skiping deactivated wikispaces 6 years ago
Federico Leva 83af47d6c0 Catch and raise PageMissingError when query() returns no pages 6 years ago