nemobis
|
cbd0905cba
|
Add 2k more URLs from another crawl
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@950 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
|
2014-02-14 19:04:40 +00:00 |
|
nemobis
|
eb60580e91
|
New URLs from Incola
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@946 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
|
2014-02-11 12:29:00 +00:00 |
|
nemobis
|
a1c89623a4
|
Another intermediate update with results from one more run
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@943 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
|
2014-02-03 08:52:59 +00:00 |
|
nemobis
|
01b177bcf2
|
Update raw list with scraper run by odder
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@921 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
|
2014-01-27 10:16:59 +00:00 |
|
nemobis
|
500e8ef350
|
Remove some more obvious duplicates including trailing slash
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@890 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
|
2014-01-03 17:24:53 +00:00 |
|
nemobis
|
31ea06ff86
|
Remove also wiki/[A-Z].+$
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@889 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
|
2014-01-03 17:17:42 +00:00 |
|
nemobis
|
664fa18ea3
|
Remove sourceforge wikis and URLs with parameters to index.php
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@888 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
|
2014-01-03 17:13:46 +00:00 |
|
nemobis
|
3624c02852
|
Remove Wikimedia Foundation wikis, other 'wikimedia' URLs cleanup
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@887 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
|
2014-01-03 17:04:29 +00:00 |
|
nemobis
|
ccfc95e9f4
|
Issue 59: Add first dirty list of possible MediaWiki sitesFirst passes of the script, now going on with all TLDs.Just sorted and cleaned of biggest noises like mailing lists, github and stackoverflow.
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@886 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
|
2014-01-03 16:57:37 +00:00 |
|