Apparently the initial JSON test is not enough, the JSON can be broken
or unexpected in other ways/points.
Fallback to the old scraper in such a case.
Fixes https://github.com/WikiTeam/wikiteam/issues/295 , perhaps.
If the scraper doesn't work for the wiki, the dump will fail entirely,
even if maybe the list of titles was almost complete. A different
solution may be in order.
Traceback (most recent call last):
File "dumpgenerator.py", line 2214, in <module>
print 'Trying to use path "%s"...' % (config['path'])
File "dumpgenerator.py", line 2210, in main
elif reply.lower() in ['no', 'n']:
File "dumpgenerator.py", line 1977, in saveSiteInfo
File "dumpgenerator.py", line 1711, in getJSON
return False
File "/usr/lib/python2.7/site-packages/requests/models.py", line 892, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib64/python2.7/site-packages/simplejson/__init__.py", line 516, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python2.7/site-packages/simplejson/decoder.py", line 374, in decode
obj, end = self.raw_decode(s)
File "/usr/lib64/python2.7/site-packages/simplejson/decoder.py", line 404, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Or if for instance the directory was named compared to the saved config:
Resuming previous dump process...
Traceback (most recent call last):
File "./dumpgenerator.py", line 2238, in <module>
main()
File "./dumpgenerator.py", line 2228, in main
resumePreviousDump(config=config, other=other)
File "./dumpgenerator.py", line 1829, in resumePreviousDump
if lasttitle == '--END--':
UnboundLocalError: local variable 'lasttitle' referenced before assignment
* Do not try exportnowrap first: it returns a blank page.
* Add an allpages option, which simply uses readTitles but cannot resume.
FIXME: this only exports the current revision!
$ python dumpgenerator.py --xml --index=http://meritbadge.org/wiki/index.php
fails on at least one MediaWiki 1.12 wiki:
Trying generating a new dump into a new directory...
Loading page titles from namespaces = all
Excluding titles from namespaces = None
Traceback (most recent call last):
File "dumpgenerator.py", line 2211, in <module>
main()
File "dumpgenerator.py", line 2203, in main
createNewDump(config=config, other=other)
File "dumpgenerator.py", line 1766, in createNewDump
getPageTitles(config=config, session=other['session'])
File "dumpgenerator.py", line 400, in getPageTitles
test = getJSON(r)
File "dumpgenerator.py", line 1708, in getJSON
return request.json()
File "/usr/lib/python2.7/site-packages/requests/models.py", line 892, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib64/python2.7/site-packages/simplejson/__init__.py", line 516, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python2.7/site-packages/simplejson/decoder.py", line 374, in decode
obj, end = self.raw_decode(s)
File "/usr/lib64/python2.7/site-packages/simplejson/decoder.py", line 404, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
* http://biografias.bcn.cl/api.php does not like the data to be POSTed.
Just use URL parameters. Some wikis had anti-spam protections which
made us POST everything, but for most wikis this should be fine.
* If the index is not defined, don't fail.
* Use only the base api.php URL, not parameters, in domain2prefix.
https://github.com/WikiTeam/wikiteam/issues/314
File "./dumpgenerator.py", line 1212, in generateImageDump
if not re.search(r'</mediawiki>', xmlfiledesc):
UnboundLocalError: local variable 'xmlfiledesc' referenced before assignment