Commit Graph

148 Commits (ca672426bb524a913a35ad91f407740338a61e68)

Author SHA1 Message Date
emijrp a359984932 ++ 6 years ago
emijrp 5525a3cc4a ++ 6 years ago
Federico Leva baae839a38 Complete update of the Wikia lists
* Reduce the offset to 100, the new limit for non-bots.
* Continue listing even when we get an empty request because all
  the wikis in a batch have become inactive and are filtered out.
* Print less from curl's requests.
* Automatically write the domain names to the files here.
6 years ago
Emilio 3a56037279
Merge pull request #310 from nemobis/master
Update Wikia list with wikia.py
6 years ago
emijrp 811a325756 update 6 years ago
emijrp aec3a14b7b update spider incomplete results, still running; userwikispacesXY lists instead 6 years ago
emijrp 51ebefa1c4 100,000 wikispaces 6 years ago
emijrp 7280c89b3b duckduckgo spider 6 years ago
emijrp 83158d4506 70k wikis by spider 6 years ago
emijrp 60704e3303 searching wikis with duckduckgo 6 years ago
Federico Leva b8909baa3d Update Wikia list with wikia.py 6 years ago
emijrp 60a0ba2e54 sleep 6 years ago
emijrp 061709d9e6 50,000 wikis, do not use this list, use wikispacesXY instead 6 years ago
emijrp 30a6dc268b wikispaces lists 6 years ago
emijrp 145b040784 update, 10000 wikis, still more arriving 6 years ago
Federico Leva 293da80da9 Add alive MediaWikis from the WikiTeam acrhive.org collection 6 years ago
Federico Leva 6a34bf65ea Wikia dumps now use 7z, not gz
Note that existence doesn't mean the dump is usable.
6 years ago
Mirko Sertic c9fc4d2105 http://www.mirkosertic.de is no longer powered by DokuWiki
Removed http://www.mirkosertic.de from the list.
7 years ago
emijrp 0e20be9a6e sort 7 years ago
emijrp bbdaf7723b update neoseeker 7 years ago
emijrp fc48c895ae update info 7 years ago
emijrp c7d5f9bb2e update, 2244 wikis 7 years ago
emijrp 75e7628a11 now get ALL wikis, even closed ones 7 years ago
Hydriz a8270a7769 Update Miraheze wiki farm 7 years ago
Hydriz 9fd6df7a3c Scan for closed wikis as well 7 years ago
Hydriz Scholz 9f97e21503 Update Miraheze wiki farm 8 years ago
emijrp fea6ab3b86 more 8 years ago
emijrp 01ccacd138 first version of wikispaces spider 8 years ago
Alexia E. Smith cb766de5ff Update gamepedia.com wikis.
This is current as of 2016-04-07 and is correct at 1,120 wikis.
8 years ago
emijrp dde7eb90ba wiki.wiki info 9 years ago
emijrp 8048b92029 adding wiki.wiki wikifarm list 9 years ago
emijrp e30cd44384 new wikifarm list of wikis 9 years ago
emijrp d44db951c2 update date 9 years ago
emijrp 64c30f2b50 updating neoseeker list and sorting, +1 new wiki 9 years ago
Southparkfan ebffb99f48 Add Miraheze wiki farm 9 years ago
Hydriz Scholz 1550d3755d Update orain.org wiki list 9 years ago
Federico Leva a1921f0919 Update list of wikia.com unarchived wikis
The list of unarchived wikis was compared to the list of wikis that we
managed to download with dumpgenerator.py:
https://archive.org/details/wikia_dump_20141219
To allow the comparison, the naming format was aligned to the format
used by dumpgenerator.py for 7z files.
9 years ago
Federico Leva ce6fbfee55 Use curl --fail instead and other fixes; add list
Now tested and used to produce the list of some 300k Wikia wikis
which don't yet have a public dump. Will soon be archived.
10 years ago
Federico Leva 7471900e56 It's easier if the list has the actual domains 10 years ago
Federico Leva 8bd3373960 Add wikia.py, to list Wikia wikis we'll dump ourselves 10 years ago
Federico Leva 8cf4d4e6ea Add 30k domains from another crawler
11011 were found alive by checkalive.py (though there could be more
if one checks more subdomains and subdirectories), some thousands
more by checklive.pl (but mostly or all false positives).

Of the alive ones, about 6245 were new to WikiApiary!
https://wikiapiary.com/wiki/Category:Oct_2014_Import
10 years ago
Federico Leva 7e0071ae7f Add some UseModWiki-looking domains 10 years ago
nemobis 6b11cef9dc A few thousands more doku.php URLs from own scraping 10 years ago
Southparkfan 8ca9eb8757 Update date of Orain wikilist 10 years ago
Southparkfan 2e2fe9b818 Update list of Orain wikis 10 years ago
nemobis 23a60fa850 MediaWiki CamelCase 10 years ago
nemobis 31112b3a80 checkalive.py: more checks before accessing stuff 10 years ago
nemobis 225c3eb478 A thousand more doku.php URLs from search 10 years ago
nemobis 3fc7dcb5de Add some more doku.php URLs 10 years ago
PiRSquared17 56c2177106 Add (incomplete) list of dokuwikis 10 years ago