Commit Graph

1133 Commits (master)
 

Author SHA1 Message Date
Liu a96e482fd3 Add .DS_Store to .gitignore 2 years ago
Liu 4c970e358d Remove duplicates from wiki-site.com 2 years ago
Liu 74a8e9609f Update wiki-site.com spider and list 2 years ago
Liu ba7fab2e96 Add fandom-spider and update metadata and lists 2 years ago
Liu 49e41ee75d Update neoseeker.com spider and list 2 years ago
Liu 6346fd6553 Update shoutwiki.com spider and list 2 years ago
Liu f93988e9c6 Update fandom.com to HTTPS 2 years ago
Liu 91faa34529 Update shoutwiki.com list 2 years ago
Liu d6fe1d9ff8 Update battlestarwiki.org list 2 years ago
Federico Leva cd6d40d5ac Merge branch 'simonliu99-updatelists2' 2 years ago
Liu 6f8f160d75 Update fandom.com list 2 years ago
Liu 6b39402ebf Update miraheze.org list 2 years ago
Liu f755153de9 Update neoseeker.com list 2 years ago
Federico Leva 10ee80ca3b Rename wikia list to fandom 2 years ago
nemobis 054397aecb
Merge pull request #425 from simonliu99/append_date
Display appended-date IA URL if appended
2 years ago
Liu 3638e6992f Simplify tracking item identifier 2 years ago
Liu 94eb932b3f Display appended-date IA URL if appended 2 years ago
nemobis 1f911c0142
Merge pull request #424 from simonliu99/master
Add dump date to item identifier
2 years ago
Liu d947d7571a Allow date append only if not admin 2 years ago
Liu 9f9df1e0aa Update logic to only append date if identifier without date exists 2 years ago
Liu 6c137764cb Move identifier-date option behind flag 2 years ago
Liu 990d5dfb4f Add dump date to item identifier 2 years ago
nemobis d7b6924845
Merge pull request #408 from shreyasminocha/fix-resume-images
Fix image resuming
2 years ago
nemobis 7f1f9985f6
Merge pull request #419 from timgates42/bugfix_typos
docs: Fix a few typos
2 years ago
Tim Gates ecbcc6118e
docs: Fix a few typos
There are small typos in:
- dumpgenerator.py
- wikiteam/mediawiki.py

Fixes:
- Should read `inconsistencies` rather than `inconsistences`.
- Should read `partially` rather than `partialy`.
2 years ago
Shreyas Minocha e55de36cb7
Fix image resuming 3 years ago
nemobis 0cfde9e9d1
Merge pull request #394 from nsapa/nico_fix_1
Nico's fixes (dumping wiki.dystify.com/CI fixes)
4 years ago
Nicolas SAPA 5986467b12 Cleanup of link rot
Lot of wiki in test_dumpgenerator.py doesn't exist anymore.
Remove them from the CI.
4 years ago
Nicolas SAPA b289f86243 Fix getPageTitlesScraper
Using the API and the Special:Allpages scraper should result in the same number of titles.
Fix the detection of the next subpages on Special:Allpages.
Change the max depth to 100 and implement an anti loop (could fail on non-western wiki).
4 years ago
Nicolas SAPA 1048bc3275 skilledtests.com doesn't host a MediaWiki anymore
http://skilledtests.com/wiki/ redirect to https://simcast.com,
something 'Powered by Microsoft News'
4 years ago
Nicolas SAPA 320115fe5a Try to fix CI by using current URL for archiveteam.org
In commit 966df37c54, emijrp changed http://archiveteam.org/ to https://www.archiveteam.org/
Today, https://archiveteam.org/index.php?title=Special:Version show a canonical URL of https://archiveteam.org/
So try to fix the CI by doing a s/www.archiveteam.org/archiveteam.org/g
4 years ago
Nicolas SAPA e4b43927b9 Fixup description grab in generateImageDump
getXMLPage() yield on "</page>" so xmlfiledesc cannot contains "</mediawiki>".
Change the search to "</page>" and inject "</mediawiki>" if it is missing to fixup the XML
4 years ago
Nicolas SAPA eacaf08b2f Try to fix a broken HTTP to HTTPS redirect in generateImageDump()
Some wiki fail to do the HTTP to HTTPs redirect correctly so try it ourself.
4 years ago
Nicolas SAPA 7675b0d17c Add exception handler for requests.exceptions.ReadTimeout in getXMLPageCore()
Treat a ReadTimeout the same as a ConnectionError (log the error & retry)
4 years ago
Nicolas SAPA 4a5eef97da Update the default user-agent
A ModSecurity rule block the old UA so switch to the current Firefox 78 UA.
4 years ago
nemobis 9b1996d436
Merge pull request #387 from robkam/patch-1
fix typo
4 years ago
Rob Kam e6f4674b42
fix typo 4 years ago
nemobis ee39e8f85b
Merge pull request #386 from RhinosF1/patch-1
Update miraheze.org list
4 years ago
RhinosF1 3b28efab80
Update miraheze.org list
Using https://gist.github.com/RhinosF1/18c83dfbfadb84e28ee083628c029b41
4 years ago
nemobis 85ae14419f
Merge pull request #381 from robkam/patch-1
Add that the script requires Python 2.7
4 years ago
Rob Kam c563012c1c
Add that the script requires Python 2.7 4 years ago
nemobis 6e85afca82
Merge pull request #378 from nemobis/wikia
More efficient Wikia download and launcher.py
4 years ago
nemobis 4eae50b2fb
Merge pull request #377 from nemobis/uploaderurl
uploader.py: Handle protocol-relative base URL
4 years ago
Federico Leva 3ddfa85391 uploader.py: Handle protocol-relative base URL
Fixes https://github.com/WikiTeam/wikiteam/issues/376
4 years ago
Federico Leva abd908914f Adapt to some more Wikia wikis edge cases
* Make it easy to batch requests for some wikis where millions of titles
  are really just one-revision thread items and need to be gone through
  as fast as possible.
* Status code error message.
4 years ago
Federico Leva e4524b8aec launcher.py: Avoid shell=True to consume half as many processes
No idea if "python2" will be converted to anything meaningful on Windows,
but then you're not really supposed to use the shell either in that dungeon.
https://docs.python.org/2.7/library/subprocess.html#subprocess.Popen
4 years ago
Federico Leva 0f5664028f Stricter prefix matching in launcher.py
For instance, do not skip gleefandomcom if gleefandomcom_ru is found.
4 years ago
nemobis 573623ed16
Merge pull request #373 from nemobis/wikia
uploader.py logo and metadata improvements
4 years ago
Federico Leva 7de75012d1 Fix merge of the getXMLRevisions() loop 4 years ago
nemobis 8a2116699e
Merge branch 'master' into wikia 4 years ago