Commit Graph

1088 Commits (269841c909c7edb6f1dd77472fdc42f79b0c6799)
 

Author SHA1 Message Date
nemobis 269841c909
Merge pull request #431 from simonliu99/updatelists
Update mediawiki wikifarm lists
2 years ago
Liu d9885e0845 Update shoutwiki-spider to remove duplicates 2 years ago
Liu fcc4080b23 Update neoseeker.com.info instructions 2 years ago
Liu e7f7266550 Update fandom.com spider and remove duplicates 2 years ago
Liu 9c5c55342d Update miraheze.org spider and remove duplicates 2 years ago
Liu a96e482fd3 Add .DS_Store to .gitignore 2 years ago
Liu 4c970e358d Remove duplicates from wiki-site.com 2 years ago
Liu 74a8e9609f Update wiki-site.com spider and list 2 years ago
Liu ba7fab2e96 Add fandom-spider and update metadata and lists 2 years ago
Liu 49e41ee75d Update neoseeker.com spider and list 2 years ago
Liu 6346fd6553 Update shoutwiki.com spider and list 2 years ago
Liu f93988e9c6 Update fandom.com to HTTPS 2 years ago
Liu 91faa34529 Update shoutwiki.com list 2 years ago
Liu d6fe1d9ff8 Update battlestarwiki.org list 2 years ago
Federico Leva cd6d40d5ac Merge branch 'simonliu99-updatelists2' 2 years ago
Liu 6f8f160d75 Update fandom.com list 2 years ago
Liu 6b39402ebf Update miraheze.org list 2 years ago
Liu f755153de9 Update neoseeker.com list 2 years ago
Federico Leva 10ee80ca3b Rename wikia list to fandom 2 years ago
nemobis 054397aecb
Merge pull request #425 from simonliu99/append_date
Display appended-date IA URL if appended
2 years ago
Liu 3638e6992f Simplify tracking item identifier 2 years ago
Liu 94eb932b3f Display appended-date IA URL if appended 2 years ago
nemobis 1f911c0142
Merge pull request #424 from simonliu99/master
Add dump date to item identifier
2 years ago
Liu d947d7571a Allow date append only if not admin 2 years ago
Liu 9f9df1e0aa Update logic to only append date if identifier without date exists 2 years ago
Liu 6c137764cb Move identifier-date option behind flag 2 years ago
Liu 990d5dfb4f Add dump date to item identifier 2 years ago
nemobis d7b6924845
Merge pull request #408 from shreyasminocha/fix-resume-images
Fix image resuming
2 years ago
nemobis 7f1f9985f6
Merge pull request #419 from timgates42/bugfix_typos
docs: Fix a few typos
3 years ago
Tim Gates ecbcc6118e
docs: Fix a few typos
There are small typos in:
- dumpgenerator.py
- wikiteam/mediawiki.py

Fixes:
- Should read `inconsistencies` rather than `inconsistences`.
- Should read `partially` rather than `partialy`.
3 years ago
Shreyas Minocha e55de36cb7
Fix image resuming 3 years ago
nemobis 0cfde9e9d1
Merge pull request #394 from nsapa/nico_fix_1
Nico's fixes (dumping wiki.dystify.com/CI fixes)
4 years ago
Nicolas SAPA 5986467b12 Cleanup of link rot
Lot of wiki in test_dumpgenerator.py doesn't exist anymore.
Remove them from the CI.
4 years ago
Nicolas SAPA b289f86243 Fix getPageTitlesScraper
Using the API and the Special:Allpages scraper should result in the same number of titles.
Fix the detection of the next subpages on Special:Allpages.
Change the max depth to 100 and implement an anti loop (could fail on non-western wiki).
4 years ago
Nicolas SAPA 1048bc3275 skilledtests.com doesn't host a MediaWiki anymore
http://skilledtests.com/wiki/ redirect to https://simcast.com,
something 'Powered by Microsoft News'
4 years ago
Nicolas SAPA 320115fe5a Try to fix CI by using current URL for archiveteam.org
In commit 966df37c54, emijrp changed http://archiveteam.org/ to https://www.archiveteam.org/
Today, https://archiveteam.org/index.php?title=Special:Version show a canonical URL of https://archiveteam.org/
So try to fix the CI by doing a s/www.archiveteam.org/archiveteam.org/g
4 years ago
Nicolas SAPA e4b43927b9 Fixup description grab in generateImageDump
getXMLPage() yield on "</page>" so xmlfiledesc cannot contains "</mediawiki>".
Change the search to "</page>" and inject "</mediawiki>" if it is missing to fixup the XML
4 years ago
Nicolas SAPA eacaf08b2f Try to fix a broken HTTP to HTTPS redirect in generateImageDump()
Some wiki fail to do the HTTP to HTTPs redirect correctly so try it ourself.
4 years ago
Nicolas SAPA 7675b0d17c Add exception handler for requests.exceptions.ReadTimeout in getXMLPageCore()
Treat a ReadTimeout the same as a ConnectionError (log the error & retry)
4 years ago
Nicolas SAPA 4a5eef97da Update the default user-agent
A ModSecurity rule block the old UA so switch to the current Firefox 78 UA.
4 years ago
nemobis 9b1996d436
Merge pull request #387 from robkam/patch-1
fix typo
4 years ago
Rob Kam e6f4674b42
fix typo 4 years ago
nemobis ee39e8f85b
Merge pull request #386 from RhinosF1/patch-1
Update miraheze.org list
4 years ago
RhinosF1 3b28efab80
Update miraheze.org list
Using https://gist.github.com/RhinosF1/18c83dfbfadb84e28ee083628c029b41
4 years ago
nemobis 85ae14419f
Merge pull request #381 from robkam/patch-1
Add that the script requires Python 2.7
4 years ago
Rob Kam c563012c1c
Add that the script requires Python 2.7 4 years ago
nemobis 6e85afca82
Merge pull request #378 from nemobis/wikia
More efficient Wikia download and launcher.py
4 years ago
nemobis 4eae50b2fb
Merge pull request #377 from nemobis/uploaderurl
uploader.py: Handle protocol-relative base URL
4 years ago
Federico Leva 3ddfa85391 uploader.py: Handle protocol-relative base URL
Fixes https://github.com/WikiTeam/wikiteam/issues/376
4 years ago
Federico Leva abd908914f Adapt to some more Wikia wikis edge cases
* Make it easy to batch requests for some wikis where millions of titles
  are really just one-revision thread items and need to be gone through
  as fast as possible.
* Status code error message.
4 years ago