2
0
mirror of https://github.com/WikiTeam/wikiteam synced 2024-11-10 13:10:27 +00:00
Commit Graph

9 Commits

Author SHA1 Message Date
nemobis
cbd0905cba Add 2k more URLs from another crawl
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@950 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
2014-02-14 19:04:40 +00:00
nemobis
eb60580e91 New URLs from Incola
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@946 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
2014-02-11 12:29:00 +00:00
nemobis
a1c89623a4 Another intermediate update with results from one more run
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@943 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
2014-02-03 08:52:59 +00:00
nemobis
01b177bcf2 Update raw list with scraper run by odder
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@921 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
2014-01-27 10:16:59 +00:00
nemobis
500e8ef350 Remove some more obvious duplicates including trailing slash
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@890 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
2014-01-03 17:24:53 +00:00
nemobis
31ea06ff86 Remove also wiki/[A-Z].+$
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@889 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
2014-01-03 17:17:42 +00:00
nemobis
664fa18ea3 Remove sourceforge wikis and URLs with parameters to index.php
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@888 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
2014-01-03 17:13:46 +00:00
nemobis
3624c02852 Remove Wikimedia Foundation wikis, other 'wikimedia' URLs cleanup
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@887 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
2014-01-03 17:04:29 +00:00
nemobis
ccfc95e9f4 Issue 59: Add first dirty list of possible MediaWiki sitesFirst passes of the script, now going on with all TLDs.Just sorted and cleaned of biggest noises like mailing lists, github and stackoverflow.
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@886 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
2014-01-03 16:57:37 +00:00