Nicolas SAPA
eacaf08b2f
Try to fix a broken HTTP to HTTPS redirect in generateImageDump()
...
Some wiki fail to do the HTTP to HTTPs redirect correctly so try it ourself.
4 years ago
Nicolas SAPA
7675b0d17c
Add exception handler for requests.exceptions.ReadTimeout in getXMLPageCore()
...
Treat a ReadTimeout the same as a ConnectionError (log the error & retry)
4 years ago
Nicolas SAPA
4a5eef97da
Update the default user-agent
...
A ModSecurity rule block the old UA so switch to the current Firefox 78 UA.
4 years ago
nemobis
9b1996d436
Merge pull request #387 from robkam/patch-1
...
fix typo
4 years ago
Rob Kam
e6f4674b42
fix typo
4 years ago
nemobis
ee39e8f85b
Merge pull request #386 from RhinosF1/patch-1
...
Update miraheze.org list
4 years ago
RhinosF1
3b28efab80
Update miraheze.org list
...
Using https://gist.github.com/RhinosF1/18c83dfbfadb84e28ee083628c029b41
4 years ago
nemobis
85ae14419f
Merge pull request #381 from robkam/patch-1
...
Add that the script requires Python 2.7
4 years ago
Rob Kam
c563012c1c
Add that the script requires Python 2.7
4 years ago
nemobis
6e85afca82
Merge pull request #378 from nemobis/wikia
...
More efficient Wikia download and launcher.py
5 years ago
nemobis
4eae50b2fb
Merge pull request #377 from nemobis/uploaderurl
...
uploader.py: Handle protocol-relative base URL
5 years ago
Federico Leva
3ddfa85391
uploader.py: Handle protocol-relative base URL
...
Fixes https://github.com/WikiTeam/wikiteam/issues/376
5 years ago
Federico Leva
abd908914f
Adapt to some more Wikia wikis edge cases
...
* Make it easy to batch requests for some wikis where millions of titles
are really just one-revision thread items and need to be gone through
as fast as possible.
* Status code error message.
5 years ago
Federico Leva
e4524b8aec
launcher.py: Avoid shell=True to consume half as many processes
...
No idea if "python2" will be converted to anything meaningful on Windows,
but then you're not really supposed to use the shell either in that dungeon.
https://docs.python.org/2.7/library/subprocess.html#subprocess.Popen
5 years ago
Federico Leva
0f5664028f
Stricter prefix matching in launcher.py
...
For instance, do not skip gleefandomcom if gleefandomcom_ru is found.
5 years ago
nemobis
573623ed16
Merge pull request #373 from nemobis/wikia
...
uploader.py logo and metadata improvements
5 years ago
Federico Leva
7de75012d1
Fix merge of the getXMLRevisions() loop
5 years ago
nemobis
8a2116699e
Merge branch 'master' into wikia
5 years ago
Federico Leva
7289225d2c
Directly catch exception for page missing in getXMLRevisions()
...
The caller cannot catch the PME exception because it doesn't know about
the title. Just log the error here.
5 years ago
Federico Leva
aabf3ea037
uploader.py: switch to requests, BytesIO, rights API
...
* Now uploads the logo again, at least in standard or Wikia skin.
* Finds license information more often.
* Translates Wikia license URL.
* More specific error reporting.
5 years ago
Federico Leva
e194077e52
uploader.py: Use requests GET, handle Wikia weird URLs
...
POST requests with urllib were getting empty responses from Wikia.
5 years ago
nemobis
e136ee5536
Merge pull request #372 from nemobis/wikia
...
Avoid launcher.py 7z failures
5 years ago
Federico Leva
20fe64e2dd
Delete temporary 7z file if compression failed, don't preserve it
...
Fixes https://github.com/WikiTeam/wikiteam/issues/366
5 years ago
Federico Leva
8c6f05bb54
Consider status code before content in checkIndex() and checkalive.py
...
Fixes https://github.com/WikiTeam/wikiteam/issues/369
5 years ago
nemobis
5bde9ba4fe
Merge pull request #371 from nemobis/wikia
...
Update list of Wikia wikis
5 years ago
Federico Leva
8fb2b44fdb
Update list of Wikia wikis with today's list from the API
5 years ago
Federico Leva
ed46725a89
Sort list of Wikia wikis again
...
No change in content.
5 years ago
nemobis
add13e2a31
Merge pull request #368 from nemobis/xmlrevisions
...
Recover from more crashes: oversighted revs, resume API
5 years ago
Federico Leva
9ac1e6d0f1
Implement resume in --xmlrevisions (but not yet with list=allrevisions)
...
Tested with a partial dumps over 100 MB:
https://tinyvillage.fandom.com/api.php
(grepped <title> to see the previously downloaded ones were kept and the
new ones continued from expected; did not validate a final XML).
5 years ago
Federico Leva
a664b17a9c
Handle deleted contributor name in --xmlrevisions
...
Avoids failure in https://deployment.wikimedia.beta.wmflabs.org/w/api.php
for revision https://deployment.wikimedia.beta.wmflabs.org/?oldid=2349 .
5 years ago
nemobis
912450b606
Merge pull request #367 from nemobis/xmlrevisions
...
Make --xmlrevisions work on some more wikis
5 years ago
Federico Leva
b162e7b14f
Reduce the API limit to 50 for arvlimit, gaplimit, ailimit
...
Avoids to crash on errors or warnings which some wikis return for bigger
requests, like https://www.openkm.com/wiki/api.php (MediaWiki 1.27.3).
5 years ago
Federico Leva
d543f7d4dd
Check the API URL against mwclient too, so it doesn't fail later
...
Change the protocol from HTTP to HTTPS if needed. Fixes:
http://nimiarkisto.fi/w/api.php
5 years ago
Federico Leva
d1619392f4
Force the lxml factory to pass around unicode strings
...
Not necessarily the most compatible with downstream XML parsers, but at
least should ensure that we manage to write the XML file. The encoding
declared in the header is not necessarily the same we get from the API.
See also:
https://lxml.de/FAQ.html#why-can-t-lxml-parse-my-xml-from-unicode-strings
https://lxml.de/3.7/parsing.html#serialising-to-unicode-strings
Fixes https://github.com/WikiTeam/wikiteam/issues/363
5 years ago
Federico Leva
6dc86d1964
Actually use the next batch from prop=revisions in MediaWiki 1.19
5 years ago
nemobis
21bc71a751
Merge pull request #365 from nemobis/xmlrevisions
...
Indent the number of revisions more, consistent with page title style
5 years ago
Federico Leva
2ba69b3810
Indent the number of revisions more, consistent with page title style
5 years ago
nemobis
577389e059
Merge pull request #364 from nemobis/xmlrevisions
...
Implement continuation for --xmlrevisions with prop=revisions in MW 1.19
5 years ago
Federico Leva
8fef62d46e
Implement continuation for --xmlrevisions with prop=revisions in MW 1.19
5 years ago
nemobis
84444bee36
Merge pull request #360 from nemobis/xmlrevisions
...
Wikia API fixes
5 years ago
Federico Leva
8b58599645
Merge branch 'xmlrevisions' of github.com:nemobis/wikiteam into xmlrevisions
5 years ago
Federico Leva
17283113dd
Wikia: make getXMLHeader() check more lenient
...
Otherwise we end up using Special:Export even though the export API
would work perfectly well with --xmlrevisions.
For some reason using the general requests session always got an empty
response from the Wikia API.
May also fix images on fandom.com:
https://github.com/WikiTeam/wikiteam/issues/330
5 years ago
Federico Leva
2c21eadf7c
Wikia: make getXMLHeader() check more lenient,
...
Otherwise we end up using Special:Export even though the export API
would work perfectly well with --xmlrevisions.
May also fix images on fandom.com:
https://github.com/WikiTeam/wikiteam/issues/330
5 years ago
Federico Leva
131e19979c
Use mwclient generator for allpages
...
Tested with MediaWiki 1.31 and 1.19.
5 years ago
nemobis
3f39a97acc
Merge pull request #359 from nemobis/xmlrevisions
...
Switch the --xmlrevisions option to mwclient and related changes
5 years ago
Federico Leva
faf0e31b4e
Don't set apfrom in initial allpages request, use suggested continuation
...
Helps with recent MediaWiki versions like 1.31 where variants of "!" can
give a bad title error and the continuation wants apcontinue anyway.
5 years ago
Federico Leva
49017e3f20
Catch HTTP Error 405 and switch from POST to GET for API requests
...
Seen on http://wiki.ainigma.eu/index.php?title=Hlavn%C3%AD_strana :
HTTPError: HTTP Error 405: Method Not Allowed
5 years ago
Federico Leva
8b5378f991
Fix query prop=revisions continuation in MediaWiki 1.22
...
This wiki has the old query-continue format but it's not exposes here.
5 years ago
Federico Leva
92da7388b0
Avoid asking allpages API if API not available
...
So that it doesn't have to iterate among non-existing titles.
Fixes https://github.com/WikiTeam/wikiteam/issues/348
5 years ago
Federico Leva
1645c1d832
More robust XML header fetch for getXMLHeader()
...
Avoid UnboundLocalError: local variable 'xml' referenced before assignment
If the page exists, its XML export is returned by the API; otherwise only
the header that we were looking for.
Fixes https://github.com/WikiTeam/wikiteam/issues/355
5 years ago