Federico Leva
8cf4d4e6ea
Add 30k domains from another crawler
...
11011 were found alive by checkalive.py (though there could be more
if one checks more subdomains and subdirectories), some thousands
more by checklive.pl (but mostly or all false positives).
Of the alive ones, about 6245 were new to WikiApiary!
https://wikiapiary.com/wiki/Category:Oct_2014_Import
10 years ago
Federico Leva
7e0071ae7f
Add some UseModWiki-looking domains
10 years ago
nemobis
6b11cef9dc
A few thousands more doku.php URLs from own scraping
10 years ago
nemobis
0624d0303b
Merge pull request #198 from Southparkfan/patch-1
...
Update list of Orain wikis
10 years ago
Southparkfan
8ca9eb8757
Update date of Orain wikilist
10 years ago
Southparkfan
2e2fe9b818
Update list of Orain wikis
10 years ago
Marek Šuppa
8c44cff165
readme: Small wording fixes
...
* Small fixed in `Download Wikimedia dumps` section.
10 years ago
nemobis
6f74781e78
Merge pull request #197 from mrshu/mrshu/autopep8fied-wikiadownloader
...
wikiadownloader: Autopep8fied
10 years ago
mr.Shu
f022b02e47
wikiadownloader: Autopep8fied
...
* Made the source look a bit better, though this script might not be
used anymore.
Signed-off-by: mr.Shu <mr@shu.io>
10 years ago
nemobis
b3ef165529
Merge pull request #194 from mrshu/mrshu/dumpgenerator-pep8fied
...
dumpgenerator: AutoPEP8-fied
10 years ago
mr.Shu
04446a40a5
dumpgenerator: AutoPEP8-fied
...
* Used autopep8 to made sure the code looks nice and is actually PEP8
compliant.
Signed-off-by: mr.Shu <mr@shu.io>
10 years ago
nemobis
23a60fa850
MediaWiki CamelCase
10 years ago
nemobis
31112b3a80
checkalive.py: more checks before accessing stuff
10 years ago
nemobis
225c3eb478
A thousand more doku.php URLs from search
10 years ago
nemobis
e0f8e36bf4
Merge pull request #190 from PiRSquared17/api-allpages-disabled
...
Fallback to getPageTitlesScraper() if API allpages disabled
10 years ago
nemobis
a7e1b13304
Merge pull request #193 from mrshu/mrshu/readme-fix-wording
...
readme: Fix wording
10 years ago
nemobis
3fc7dcb5de
Add some more doku.php URLs
10 years ago
mr.Shu
54c373e9a0
readme: Fix wording
...
* Made a few wording changes to make the README.md more clear.
Signed-off-by: mr.Shu <mr@shu.io>
10 years ago
Marek Šuppa
40d863fb99
README: update working
...
* Updated wording to make the README more clear.
10 years ago
Emilio J. Rodríguez-Posada
87ce2d4540
Merge pull request #192 from mrshu/mrshu/add-travis-image
...
update: Add TravisCI image to README
10 years ago
mr.Shu
7b0b54b6e5
update: Add TravisCI image to README
...
* Added TravisCI image which specifies whether the tests are passing or
not to Developers section.
Signed-off-by: mr.Shu <mr@shu.io>
10 years ago
Emilio J. Rodríguez-Posada
5c8e316e67
Merge pull request #189 from PiRSquared17/get-wiki-engine
...
Improve getWikiEngine()
10 years ago
Emilio J. Rodríguez-Posada
086415bc00
Merge pull request #191 from mrshu/mrshu/setup-travis
...
tests: Add .travis.yml and Travis CI
10 years ago
mr.Shu
14c62c6587
tests: Add .travis.yml and Travis CI
...
* Added .travis.yml to enable Travis CI
Signed-off-by: mr.Shu <mr@shu.io>
10 years ago
PiRSquared17
757019521a
Fallback to scraper if API allpages disabled
10 years ago
PiRSquared17
4b3c862a58
Comment debugging print, fix test
10 years ago
PiRSquared17
7a1db0525b
Add more wiki engines to getWikiEngine
10 years ago
nemobis
40c406cd00
Merge pull request #188 from PiRSquared17/wikiengine-lists
...
Add subdirectories to listsofwikis for different wiki engines
10 years ago
PiRSquared17
56c2177106
Add (incomplete) list of dokuwikis
10 years ago
PiRSquared17
03ddde3702
Move wiki lists to mediawiki subdirectory
10 years ago
Emilio J. Rodríguez-Posada
43a105335b
Merge pull request #185 from PiRSquared17/fix-tests
...
Relax delay() test by 10 ms, add test for allpages
10 years ago
PiRSquared17
d7e43f92c7
Relax delay() test by 10 ms, add test for allpages
10 years ago
nemobis
f52051f8ae
Merge pull request #184 from PiRSquared17/fix-tests
...
Fix tox.ini and clean up/update tests, avoid a loop to make tests pass
10 years ago
PiRSquared17
b4818d2985
Avoid infinite loop in getImageNamesScraper
10 years ago
PiRSquared17
f2b7716e72
Fix tox.ini and clean up/update tests
10 years ago
nemobis
8a9b50b51d
Merge pull request #183 from PiRSquared17/patch-7
...
Retry on ConnectionError in getXMLPageCore
10 years ago
nemobis
9828cbec3c
Add PiRSquared17 to credits
10 years ago
nemobis
19c48d3dd0
Merge pull request #180 from PiRSquared17/patch-2
...
Get as much information from siteinfo as possible
10 years ago
nemobis
d8360393da
Merge pull request #182 from PiRSquared17/patch-6
...
AllPages API fix for old MediaWiki versions
10 years ago
Pi R. Squared
f7187b7048
Retry on ConnectionError in getXMLPageCore
...
Previously it just gave a fatal error.
10 years ago
Pi R. Squared
f31e4e6451
Dict not hashable, also not needed
...
Quick fix.
10 years ago
Pi R. Squared
399f609d70
AllPages API hack for old versions of MediaWiki
...
New API format: http://www.mediawiki.org/w/api.php?action=query&list=allpages&apnamespace=0&apfrom=!&format=json&aplimit=500
Old API format: http://wiki.damirsystems.com/api.php?action=query&list=allpages&apnamespace=0&apfrom=!&format=json
10 years ago
nemobis
b3e77fe006
Merge pull request #181 from PiRSquared17/patch-4
...
Try getting index.php from siteinfo API
10 years ago
nemobis
9beda42385
Merge pull request #137 from hashar/tests-with-tox-and-nose
...
Easily run tests in a virtualenv with tox and nose
10 years ago
Pi R. Squared
498b64da3f
Try getting index.php from siteinfo API
...
Fixes #49
10 years ago
Pi R. Squared
ff0d230d08
Get as much information from siteinfo as possible
...
Properly fixes #74 .
Algorithm:
1. Try all siteinfo props. If this gives an error, continue. Otherwise, stop.
2. Try MediaWiki 1.11-1.12 siteinfo props. If this gives an error, continue. Otherwise, stop.
3. Try minimal siteinfo props. Stop.
Not using sishowalldb=1 to avoid possible error (by default), since this data is of little use anyway.
10 years ago
nemobis
ac1a7defae
Merge pull request #178 from PiRSquared17/patch-1
...
Encode title using UTF-8 before printing
10 years ago
Pi R. Squared
322604cc23
Encode title using UTF-8 before printing
...
This fixes #170 and closes #174 .
10 years ago
nemobis
11368310ee
Merge pull request #173 from nemobis/issue/131
...
Fix #131 : ValueError: No JSON object could be decoded
10 years ago
nemobis
2c027adba0
Merge pull request #171 from seanyeh/fix-delay
...
Fix argument parsing to accept delay as a number
10 years ago