Commit Graph

30 Commits (ca672426bb524a913a35ad91f407740338a61e68)

Author SHA1 Message Date
Federico Leva baae839a38 Complete update of the Wikia lists
* Reduce the offset to 100, the new limit for non-bots.
* Continue listing even when we get an empty request because all
  the wikis in a batch have become inactive and are filtered out.
* Print less from curl's requests.
* Automatically write the domain names to the files here.
6 years ago
Federico Leva b8909baa3d Update Wikia list with wikia.py 6 years ago
Federico Leva 293da80da9 Add alive MediaWikis from the WikiTeam acrhive.org collection 6 years ago
Federico Leva 6a34bf65ea Wikia dumps now use 7z, not gz
Note that existence doesn't mean the dump is usable.
6 years ago
emijrp 0e20be9a6e sort 7 years ago
emijrp bbdaf7723b update neoseeker 7 years ago
emijrp fc48c895ae update info 7 years ago
emijrp c7d5f9bb2e update, 2244 wikis 7 years ago
emijrp 75e7628a11 now get ALL wikis, even closed ones 7 years ago
Hydriz a8270a7769 Update Miraheze wiki farm 7 years ago
Hydriz 9fd6df7a3c Scan for closed wikis as well 7 years ago
Hydriz Scholz 9f97e21503 Update Miraheze wiki farm 8 years ago
Alexia E. Smith cb766de5ff Update gamepedia.com wikis.
This is current as of 2016-04-07 and is correct at 1,120 wikis.
8 years ago
emijrp dde7eb90ba wiki.wiki info 9 years ago
emijrp 8048b92029 adding wiki.wiki wikifarm list 9 years ago
emijrp e30cd44384 new wikifarm list of wikis 9 years ago
emijrp d44db951c2 update date 9 years ago
emijrp 64c30f2b50 updating neoseeker list and sorting, +1 new wiki 9 years ago
Southparkfan ebffb99f48 Add Miraheze wiki farm 9 years ago
Hydriz Scholz 1550d3755d Update orain.org wiki list 9 years ago
Federico Leva a1921f0919 Update list of wikia.com unarchived wikis
The list of unarchived wikis was compared to the list of wikis that we
managed to download with dumpgenerator.py:
https://archive.org/details/wikia_dump_20141219
To allow the comparison, the naming format was aligned to the format
used by dumpgenerator.py for 7z files.
10 years ago
Federico Leva ce6fbfee55 Use curl --fail instead and other fixes; add list
Now tested and used to produce the list of some 300k Wikia wikis
which don't yet have a public dump. Will soon be archived.
10 years ago
Federico Leva 7471900e56 It's easier if the list has the actual domains 10 years ago
Federico Leva 8bd3373960 Add wikia.py, to list Wikia wikis we'll dump ourselves 10 years ago
Federico Leva 8cf4d4e6ea Add 30k domains from another crawler
11011 were found alive by checkalive.py (though there could be more
if one checks more subdomains and subdirectories), some thousands
more by checklive.pl (but mostly or all false positives).

Of the alive ones, about 6245 were new to WikiApiary!
https://wikiapiary.com/wiki/Category:Oct_2014_Import
10 years ago
Southparkfan 8ca9eb8757 Update date of Orain wikilist 10 years ago
Southparkfan 2e2fe9b818 Update list of Orain wikis 10 years ago
nemobis 23a60fa850 MediaWiki CamelCase 10 years ago
nemobis 31112b3a80 checkalive.py: more checks before accessing stuff 10 years ago
PiRSquared17 03ddde3702 Move wiki lists to mediawiki subdirectory 10 years ago