Catch unexpected API errors in getPageTitlesAPI

Apparently the initial JSON test is not enough, the JSON can be broken or unexpected in other ways/points. Fallback to the old scraper in such a case. Fixes https://github.com/WikiTeam/wikiteam/issues/295 , perhaps. If the scraper doesn't work for the wiki, the dump will fail entirely, even if maybe the list of titles was almost complete. A different solution may be in order.
6 years ago · 1ff5af7d44
parent 59c4c5430e
commit 1ff5af7d44
1 changed files with 3 additions and 8 deletions
--- a/dumpgenerator.py
+++ b/dumpgenerator.py
@ -2,7 +2,7 @@
 # -*- coding: utf-8 -*-

 # dumpgenerator.py A generator of dumps for wikis
-# Copyright (C) 2011-2016 WikiTeam developers
+# Copyright (C) 2011-2018 WikiTeam developers
 # This program is free software: you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
 # the Free Software Foundation, either version 3 of the License, or
@ -396,16 +396,11 @@ def getPageTitles(config={}, session=None):

    titles = []
    if 'api' in config and config['api']:
-        r = session.post(config['api'], params={'action': 'query', 'list': 'allpages', 'format': 'json'}, timeout=30)
        try:
-            test = getJSON(r)
+            titles = getPageTitlesAPI(config=config, session=session)
        except:
-            test = None
-        if not test or ('warnings' in test and 'allpages' in test['warnings'] and '*' in test['warnings']['allpages']
-                and test['warnings']['allpages']['*'] == 'The "allpages" module has been disabled.'):
+            print "Error: could not get page titles from the API"
            titles = getPageTitlesScraper(config=config, session=session)
-        else:
-            titles = getPageTitlesAPI(config=config, session=session)
    elif 'index' in config and config['index']:
        titles = getPageTitlesScraper(config=config, session=session)