nemobis
269841c909
Merge pull request #431 from simonliu99/updatelists
...
Update mediawiki wikifarm lists
2 years ago
Liu
d9885e0845
Update shoutwiki-spider to remove duplicates
2 years ago
Liu
fcc4080b23
Update neoseeker.com.info instructions
2 years ago
Liu
e7f7266550
Update fandom.com spider and remove duplicates
2 years ago
Liu
9c5c55342d
Update miraheze.org spider and remove duplicates
2 years ago
Liu
a96e482fd3
Add .DS_Store to .gitignore
2 years ago
Liu
4c970e358d
Remove duplicates from wiki-site.com
2 years ago
Liu
74a8e9609f
Update wiki-site.com spider and list
2 years ago
Liu
ba7fab2e96
Add fandom-spider and update metadata and lists
2 years ago
Liu
49e41ee75d
Update neoseeker.com spider and list
2 years ago
Liu
6346fd6553
Update shoutwiki.com spider and list
2 years ago
Liu
f93988e9c6
Update fandom.com to HTTPS
2 years ago
Liu
91faa34529
Update shoutwiki.com list
2 years ago
Liu
d6fe1d9ff8
Update battlestarwiki.org list
2 years ago
Federico Leva
cd6d40d5ac
Merge branch 'simonliu99-updatelists2'
2 years ago
Liu
6f8f160d75
Update fandom.com list
2 years ago
Liu
6b39402ebf
Update miraheze.org list
2 years ago
Liu
f755153de9
Update neoseeker.com list
2 years ago
Federico Leva
10ee80ca3b
Rename wikia list to fandom
2 years ago
nemobis
054397aecb
Merge pull request #425 from simonliu99/append_date
...
Display appended-date IA URL if appended
2 years ago
Liu
3638e6992f
Simplify tracking item identifier
2 years ago
Liu
94eb932b3f
Display appended-date IA URL if appended
2 years ago
nemobis
1f911c0142
Merge pull request #424 from simonliu99/master
...
Add dump date to item identifier
2 years ago
Liu
d947d7571a
Allow date append only if not admin
2 years ago
Liu
9f9df1e0aa
Update logic to only append date if identifier without date exists
2 years ago
Liu
6c137764cb
Move identifier-date option behind flag
2 years ago
Liu
990d5dfb4f
Add dump date to item identifier
2 years ago
nemobis
d7b6924845
Merge pull request #408 from shreyasminocha/fix-resume-images
...
Fix image resuming
2 years ago
nemobis
7f1f9985f6
Merge pull request #419 from timgates42/bugfix_typos
...
docs: Fix a few typos
3 years ago
Tim Gates
ecbcc6118e
docs: Fix a few typos
...
There are small typos in:
- dumpgenerator.py
- wikiteam/mediawiki.py
Fixes:
- Should read `inconsistencies` rather than `inconsistences`.
- Should read `partially` rather than `partialy`.
3 years ago
Shreyas Minocha
e55de36cb7
Fix image resuming
3 years ago
nemobis
0cfde9e9d1
Merge pull request #394 from nsapa/nico_fix_1
...
Nico's fixes (dumping wiki.dystify.com/CI fixes)
4 years ago
Nicolas SAPA
5986467b12
Cleanup of link rot
...
Lot of wiki in test_dumpgenerator.py doesn't exist anymore.
Remove them from the CI.
4 years ago
Nicolas SAPA
b289f86243
Fix getPageTitlesScraper
...
Using the API and the Special:Allpages scraper should result in the same number of titles.
Fix the detection of the next subpages on Special:Allpages.
Change the max depth to 100 and implement an anti loop (could fail on non-western wiki).
4 years ago
Nicolas SAPA
1048bc3275
skilledtests.com doesn't host a MediaWiki anymore
...
http://skilledtests.com/wiki/ redirect to https://simcast.com ,
something 'Powered by Microsoft News'
4 years ago
Nicolas SAPA
320115fe5a
Try to fix CI by using current URL for archiveteam.org
...
In commit 966df37c54
, emijrp changed http://archiveteam.org/ to https://www.archiveteam.org/
Today, https://archiveteam.org/index.php?title=Special:Version show a canonical URL of https://archiveteam.org/
So try to fix the CI by doing a s/www.archiveteam.org/archiveteam.org/g
4 years ago
Nicolas SAPA
e4b43927b9
Fixup description grab in generateImageDump
...
getXMLPage() yield on "</page>" so xmlfiledesc cannot contains "</mediawiki>".
Change the search to "</page>" and inject "</mediawiki>" if it is missing to fixup the XML
4 years ago
Nicolas SAPA
eacaf08b2f
Try to fix a broken HTTP to HTTPS redirect in generateImageDump()
...
Some wiki fail to do the HTTP to HTTPs redirect correctly so try it ourself.
4 years ago
Nicolas SAPA
7675b0d17c
Add exception handler for requests.exceptions.ReadTimeout in getXMLPageCore()
...
Treat a ReadTimeout the same as a ConnectionError (log the error & retry)
4 years ago
Nicolas SAPA
4a5eef97da
Update the default user-agent
...
A ModSecurity rule block the old UA so switch to the current Firefox 78 UA.
4 years ago
nemobis
9b1996d436
Merge pull request #387 from robkam/patch-1
...
fix typo
4 years ago
Rob Kam
e6f4674b42
fix typo
4 years ago
nemobis
ee39e8f85b
Merge pull request #386 from RhinosF1/patch-1
...
Update miraheze.org list
4 years ago
RhinosF1
3b28efab80
Update miraheze.org list
...
Using https://gist.github.com/RhinosF1/18c83dfbfadb84e28ee083628c029b41
4 years ago
nemobis
85ae14419f
Merge pull request #381 from robkam/patch-1
...
Add that the script requires Python 2.7
4 years ago
Rob Kam
c563012c1c
Add that the script requires Python 2.7
4 years ago
nemobis
6e85afca82
Merge pull request #378 from nemobis/wikia
...
More efficient Wikia download and launcher.py
4 years ago
nemobis
4eae50b2fb
Merge pull request #377 from nemobis/uploaderurl
...
uploader.py: Handle protocol-relative base URL
4 years ago
Federico Leva
3ddfa85391
uploader.py: Handle protocol-relative base URL
...
Fixes https://github.com/WikiTeam/wikiteam/issues/376
4 years ago
Federico Leva
abd908914f
Adapt to some more Wikia wikis edge cases
...
* Make it easy to batch requests for some wikis where millions of titles
are really just one-revision thread items and need to be gone through
as fast as possible.
* Status code error message.
4 years ago