There are small typos in:
- dumpgenerator.py
- wikiteam/mediawiki.py
Fixes:
- Should read `inconsistencies` rather than `inconsistences`.
- Should read `partially` rather than `partialy`.
Using the API and the Special:Allpages scraper should result in the same number of titles.
Fix the detection of the next subpages on Special:Allpages.
Change the max depth to 100 and implement an anti loop (could fail on non-western wiki).
getXMLPage() yield on "</page>" so xmlfiledesc cannot contains "</mediawiki>".
Change the search to "</page>" and inject "</mediawiki>" if it is missing to fixup the XML
* Make it easy to batch requests for some wikis where millions of titles
are really just one-revision thread items and need to be gone through
as fast as possible.
* Status code error message.
Tested with a partial dumps over 100 MB:
https://tinyvillage.fandom.com/api.php
(grepped <title> to see the previously downloaded ones were kept and the
new ones continued from expected; did not validate a final XML).
Otherwise we end up using Special:Export even though the export API
would work perfectly well with --xmlrevisions.
For some reason using the general requests session always got an empty
response from the Wikia API.
May also fix images on fandom.com:
https://github.com/WikiTeam/wikiteam/issues/330
Otherwise we end up using Special:Export even though the export API
would work perfectly well with --xmlrevisions.
May also fix images on fandom.com:
https://github.com/WikiTeam/wikiteam/issues/330
Avoid UnboundLocalError: local variable 'xml' referenced before assignment
If the page exists, its XML export is returned by the API; otherwise only
the header that we were looking for.
Fixes https://github.com/WikiTeam/wikiteam/issues/355
* Actually proceed to the next page when no continuation.
* Provide the same output as with the usual per-page export.
Tested on a MediaWiki 1.16 wiki with success.
Traceback (most recent call last):
File "dumpgenerator.py", line 2362, in <module>
File "dumpgenerator.py", line 2354, in main
resumePreviousDump(config=config, other=other)
File "dumpgenerator.py", line 1921, in createNewDump
getPageTitles(config=config, session=other['session'])
File "dumpgenerator.py", line 755, in generateXMLDump
for xml in getXMLRevisions(config=config, session=session):
File "dumpgenerator.py", line 861, in getXMLRevisions
revids.append(str(revision['revid']))
IndexError: list index out of range
* Still maintained and available for python 3 as well.
* Allows raw API requests as we need.
* Does not provide handy generators, we need to do continuation.
* Decides on its own which protocol and exact path to use, fails at it.
* Appears to use POST by default unless asked otherwise, what to do?
* It was just an old trick to get past some barriers which were waived with GET.
* It's not conformant and doesn't play well with some redirects.
* Some recent wikis seem to not like it at all, see also issue #311.
Warning!: "./tdicampswikiacom-20180522-wikidump" path exists
Traceback (most recent call last):
File "./dumpgenerator.py", line 2321, in <module>
main()
File "./dumpgenerator.py", line 2283, in main
while reply.lower() not in ['yes', 'y', 'no', 'n']:
UnboundLocalError: local variable 'reply' referenced before assignment