Liu
6346fd6553
Update shoutwiki.com spider and list
3 years ago
Liu
f93988e9c6
Update fandom.com to HTTPS
3 years ago
Liu
91faa34529
Update shoutwiki.com list
3 years ago
Liu
d6fe1d9ff8
Update battlestarwiki.org list
3 years ago
Federico Leva
cd6d40d5ac
Merge branch 'simonliu99-updatelists2'
3 years ago
Liu
6f8f160d75
Update fandom.com list
3 years ago
Liu
6b39402ebf
Update miraheze.org list
3 years ago
Liu
f755153de9
Update neoseeker.com list
3 years ago
Federico Leva
10ee80ca3b
Rename wikia list to fandom
3 years ago
nemobis
054397aecb
Merge pull request #425 from simonliu99/append_date
...
Display appended-date IA URL if appended
3 years ago
Liu
3638e6992f
Simplify tracking item identifier
3 years ago
Liu
94eb932b3f
Display appended-date IA URL if appended
3 years ago
nemobis
1f911c0142
Merge pull request #424 from simonliu99/master
...
Add dump date to item identifier
3 years ago
Liu
d947d7571a
Allow date append only if not admin
3 years ago
Liu
9f9df1e0aa
Update logic to only append date if identifier without date exists
3 years ago
Liu
6c137764cb
Move identifier-date option behind flag
3 years ago
Liu
990d5dfb4f
Add dump date to item identifier
3 years ago
nemobis
d7b6924845
Merge pull request #408 from shreyasminocha/fix-resume-images
...
Fix image resuming
3 years ago
nemobis
7f1f9985f6
Merge pull request #419 from timgates42/bugfix_typos
...
docs: Fix a few typos
3 years ago
Tim Gates
ecbcc6118e
docs: Fix a few typos
...
There are small typos in:
- dumpgenerator.py
- wikiteam/mediawiki.py
Fixes:
- Should read `inconsistencies` rather than `inconsistences`.
- Should read `partially` rather than `partialy`.
3 years ago
Shreyas Minocha
e55de36cb7
Fix image resuming
3 years ago
nemobis
0cfde9e9d1
Merge pull request #394 from nsapa/nico_fix_1
...
Nico's fixes (dumping wiki.dystify.com/CI fixes)
4 years ago
Nicolas SAPA
5986467b12
Cleanup of link rot
...
Lot of wiki in test_dumpgenerator.py doesn't exist anymore.
Remove them from the CI.
4 years ago
Nicolas SAPA
b289f86243
Fix getPageTitlesScraper
...
Using the API and the Special:Allpages scraper should result in the same number of titles.
Fix the detection of the next subpages on Special:Allpages.
Change the max depth to 100 and implement an anti loop (could fail on non-western wiki).
4 years ago
Nicolas SAPA
1048bc3275
skilledtests.com doesn't host a MediaWiki anymore
...
http://skilledtests.com/wiki/ redirect to https://simcast.com ,
something 'Powered by Microsoft News'
4 years ago
Nicolas SAPA
320115fe5a
Try to fix CI by using current URL for archiveteam.org
...
In commit 966df37c54
, emijrp changed http://archiveteam.org/ to https://www.archiveteam.org/
Today, https://archiveteam.org/index.php?title=Special:Version show a canonical URL of https://archiveteam.org/
So try to fix the CI by doing a s/www.archiveteam.org/archiveteam.org/g
4 years ago
Nicolas SAPA
e4b43927b9
Fixup description grab in generateImageDump
...
getXMLPage() yield on "</page>" so xmlfiledesc cannot contains "</mediawiki>".
Change the search to "</page>" and inject "</mediawiki>" if it is missing to fixup the XML
4 years ago
Nicolas SAPA
eacaf08b2f
Try to fix a broken HTTP to HTTPS redirect in generateImageDump()
...
Some wiki fail to do the HTTP to HTTPs redirect correctly so try it ourself.
4 years ago
Nicolas SAPA
7675b0d17c
Add exception handler for requests.exceptions.ReadTimeout in getXMLPageCore()
...
Treat a ReadTimeout the same as a ConnectionError (log the error & retry)
4 years ago
Nicolas SAPA
4a5eef97da
Update the default user-agent
...
A ModSecurity rule block the old UA so switch to the current Firefox 78 UA.
4 years ago
nemobis
9b1996d436
Merge pull request #387 from robkam/patch-1
...
fix typo
4 years ago
Rob Kam
e6f4674b42
fix typo
4 years ago
nemobis
ee39e8f85b
Merge pull request #386 from RhinosF1/patch-1
...
Update miraheze.org list
4 years ago
RhinosF1
3b28efab80
Update miraheze.org list
...
Using https://gist.github.com/RhinosF1/18c83dfbfadb84e28ee083628c029b41
4 years ago
nemobis
85ae14419f
Merge pull request #381 from robkam/patch-1
...
Add that the script requires Python 2.7
4 years ago
Rob Kam
c563012c1c
Add that the script requires Python 2.7
4 years ago
nemobis
6e85afca82
Merge pull request #378 from nemobis/wikia
...
More efficient Wikia download and launcher.py
5 years ago
nemobis
4eae50b2fb
Merge pull request #377 from nemobis/uploaderurl
...
uploader.py: Handle protocol-relative base URL
5 years ago
Federico Leva
3ddfa85391
uploader.py: Handle protocol-relative base URL
...
Fixes https://github.com/WikiTeam/wikiteam/issues/376
5 years ago
Federico Leva
abd908914f
Adapt to some more Wikia wikis edge cases
...
* Make it easy to batch requests for some wikis where millions of titles
are really just one-revision thread items and need to be gone through
as fast as possible.
* Status code error message.
5 years ago
Federico Leva
e4524b8aec
launcher.py: Avoid shell=True to consume half as many processes
...
No idea if "python2" will be converted to anything meaningful on Windows,
but then you're not really supposed to use the shell either in that dungeon.
https://docs.python.org/2.7/library/subprocess.html#subprocess.Popen
5 years ago
Federico Leva
0f5664028f
Stricter prefix matching in launcher.py
...
For instance, do not skip gleefandomcom if gleefandomcom_ru is found.
5 years ago
nemobis
573623ed16
Merge pull request #373 from nemobis/wikia
...
uploader.py logo and metadata improvements
5 years ago
Federico Leva
7de75012d1
Fix merge of the getXMLRevisions() loop
5 years ago
nemobis
8a2116699e
Merge branch 'master' into wikia
5 years ago
Federico Leva
7289225d2c
Directly catch exception for page missing in getXMLRevisions()
...
The caller cannot catch the PME exception because it doesn't know about
the title. Just log the error here.
5 years ago
Federico Leva
aabf3ea037
uploader.py: switch to requests, BytesIO, rights API
...
* Now uploads the logo again, at least in standard or Wikia skin.
* Finds license information more often.
* Translates Wikia license URL.
* More specific error reporting.
5 years ago
Federico Leva
e194077e52
uploader.py: Use requests GET, handle Wikia weird URLs
...
POST requests with urllib were getting empty responses from Wikia.
5 years ago
nemobis
e136ee5536
Merge pull request #372 from nemobis/wikia
...
Avoid launcher.py 7z failures
5 years ago
Federico Leva
20fe64e2dd
Delete temporary 7z file if compression failed, don't preserve it
...
Fixes https://github.com/WikiTeam/wikiteam/issues/366
5 years ago