Federico Leva
131e19979c
Use mwclient generator for allpages
...
Tested with MediaWiki 1.31 and 1.19.
4 years ago
nemobis
3f39a97acc
Merge pull request #359 from nemobis/xmlrevisions
...
Switch the --xmlrevisions option to mwclient and related changes
4 years ago
Federico Leva
faf0e31b4e
Don't set apfrom in initial allpages request, use suggested continuation
...
Helps with recent MediaWiki versions like 1.31 where variants of "!" can
give a bad title error and the continuation wants apcontinue anyway.
4 years ago
Federico Leva
49017e3f20
Catch HTTP Error 405 and switch from POST to GET for API requests
...
Seen on http://wiki.ainigma.eu/index.php?title=Hlavn%C3%AD_strana :
HTTPError: HTTP Error 405: Method Not Allowed
4 years ago
Federico Leva
8b5378f991
Fix query prop=revisions continuation in MediaWiki 1.22
...
This wiki has the old query-continue format but it's not exposes here.
4 years ago
Federico Leva
92da7388b0
Avoid asking allpages API if API not available
...
So that it doesn't have to iterate among non-existing titles.
Fixes https://github.com/WikiTeam/wikiteam/issues/348
4 years ago
Federico Leva
1645c1d832
More robust XML header fetch for getXMLHeader()
...
Avoid UnboundLocalError: local variable 'xml' referenced before assignment
If the page exists, its XML export is returned by the API; otherwise only
the header that we were looking for.
Fixes https://github.com/WikiTeam/wikiteam/issues/355
4 years ago
Federico Leva
0b37b39923
Define xml header as empty first so that it can fail graciously
...
Fixes https://github.com/WikiTeam/wikiteam/issues/355
4 years ago
Federico Leva
becd01b271
Use defined requests.exceptions.ConnectionError
...
Fixes https://github.com/WikiTeam/wikiteam/issues/356
4 years ago
Federico Leva
f0436ee57c
Make mwclient respect the provided HTTP/HTTPS scheme
...
Fixes https://github.com/WikiTeam/wikiteam/issues/358
4 years ago
Federico Leva
9ec6ce42d3
Finish xmlrevisions option for older wikis
...
* Actually proceed to the next page when no continuation.
* Provide the same output as with the usual per-page export.
Tested on a MediaWiki 1.16 wiki with success.
4 years ago
Federico Leva
0f35d03929
Remove rvlimit=max, fails in MediaWiki 1.16
...
For instance:
"Exception Caught: Internal error in ApiResult::setElement: Attempting to add element revisions=50, existing value is 500"
https://wiki.rabenthal.net/api.php?action=query&prop=revisions&titles=Hauptseite&rvprop=ids&rvlimit=max
4 years ago
Federico Leva
6b12e20a9d
Actually convert the titles query method to mwclient too
4 years ago
Federico Leva
f10adb71af
Don't try to add revisions if the namespace has none
...
Traceback (most recent call last):
File "dumpgenerator.py", line 2362, in <module>
File "dumpgenerator.py", line 2354, in main
resumePreviousDump(config=config, other=other)
File "dumpgenerator.py", line 1921, in createNewDump
getPageTitles(config=config, session=other['session'])
File "dumpgenerator.py", line 755, in generateXMLDump
for xml in getXMLRevisions(config=config, session=session):
File "dumpgenerator.py", line 861, in getXMLRevisions
revids.append(str(revision['revid']))
IndexError: list index out of range
4 years ago
Federico Leva
3760501f74
Add a couple comments
4 years ago
Federico Leva
11507e931e
Initial switch to mwclient for the xmlrevisions option
...
* Still maintained and available for python 3 as well.
* Allows raw API requests as we need.
* Does not provide handy generators, we need to do continuation.
* Decides on its own which protocol and exact path to use, fails at it.
* Appears to use POST by default unless asked otherwise, what to do?
4 years ago
nemobis
353f4d90a6
Merge pull request #349 from nemobis/xmlrevisions
...
Use GET rather than POST for API requests
4 years ago
Federico Leva
3d04dcbf5c
Use GET rather than POST for API requests
...
* It was just an old trick to get past some barriers which were waived with GET.
* It's not conformant and doesn't play well with some redirects.
* Some recent wikis seem to not like it at all, see also issue #311 .
4 years ago
nemobis
128e23c3a4
Merge pull request #346 from nemobis/bug/334
...
Use GET rather than POST for allpages API query
4 years ago
Federico Leva
4cdc5a7784
Use GET rather than POST for allpages API query
...
POST does not follow the redirect from HTTP to HTTPS, which makes the
request (and the entire dump) fail if an API URL is passed like
http://7daystodie-de.gamepedia.com/api.php
Fixes https://github.com/WikiTeam/wikiteam/issues/334
4 years ago
nemobis
210158473e
Merge pull request #345 from nemobis/2020list
...
Update MediaWiki and Wikia lists
4 years ago
Federico Leva
7dad9a44cd
Give up on Wikia-made dumps
...
There are less than 500 available right now, out of 400k active wikis.
4 years ago
Federico Leva
accc7db019
Update list of MediaWikis
...
* Run checkalive.py on the "originalurl" URLs from existing items in the
WikiTeam collection on the Internet Archive, minus dead wiki farms.
* Downloaded the list of unarchived wikis from WikiApiary.
4 years ago
Federico Leva
aa0b133c1d
Minimal update to list of Wikia wikis
...
* Change API URL to HTTPS and fandom.com.
* New output of the script (403k wikis), changed to wikia.com for diff purposes.
4 years ago
nemobis
0eeb6bfcb0
Upload all relevant wikidump.7z and history.xml.7z
...
Don't stop at the first 7z file found in the directory listing.
Should be fast enough for most users.
Fixes #326
4 years ago
emijrp
527401560c
2020
4 years ago
emijrp
7b03096ace
update wikidot list
5 years ago
emijrp
714c9ea1f7
Merge branch 'master' of https://github.com/WikiTeam/wikiteam
5 years ago
emijrp
6aac36ce57
wikidot wiki list
5 years ago
emijrp
61b0b1b80b
Merge branch 'master' of https://github.com/WikiTeam/wikiteam
5 years ago
emijrp
0cd4efb51c
better spider for wikidot
5 years ago
emijrp
f6c57d59e7
.
5 years ago
emijrp
5fd980c6b7
delay 1 second
5 years ago
emijrp
aecee2dc53
Merge branch 'master' of https://github.com/WikiTeam/wikiteam
5 years ago
emijrp
33a93fd76a
delay 1 second
5 years ago
emijrp
966df37c54
new url https://www.archiveteam.org/
5 years ago
emijrp
d43d017075
Update README.md
5 years ago
Emilio
080b723334
Update wikiapiary-update-ia-params.py
5 years ago
nemobis
be0dcd8e55
Merge pull request #337 from zerote000/master
...
Wikiapiary update script - Change Internet Archive search string to search using both API URL and Index URL.
5 years ago
Christoffer Popp Nørskov
83f72db6cd
Wikiapiary update script - Change Internet Archive search string to search using both API URL and Index URL.
5 years ago
Emilio
287b8b88a3
250,000 wikis
5 years ago
emijrp
ffb39afd1e
800 wikidot sites
6 years ago
emijrp
28158f9b04
wikis
6 years ago
emijrp
7c72c27f2a
wikidot
6 years ago
emijrp
4e8c92b6d2
Merge branch 'master' of https://github.com/WikiTeam/wikiteam
6 years ago
emijrp
0ebf86caf6
update, 1.8M users, 400K wikis
6 years ago
nemobis
bee34f4b1b
Merge pull request #319 from TyIsI/patch-1
...
Updated with vancouver.hackspace.ca -> vanhack.ca domain change
6 years ago
TyIsI
09fac2aeeb
Updated with vancouver.hackspace.ca domain change
6 years ago
emijrp
5aac17ea03
update
6 years ago
emijrp
72b67c74f1
randomize saving
6 years ago