nemobis
b3ef165529
Merge pull request #194 from mrshu/mrshu/dumpgenerator-pep8fied
...
dumpgenerator: AutoPEP8-fied
10 years ago
mr.Shu
04446a40a5
dumpgenerator: AutoPEP8-fied
...
* Used autopep8 to made sure the code looks nice and is actually PEP8
compliant.
Signed-off-by: mr.Shu <mr@shu.io>
10 years ago
nemobis
e0f8e36bf4
Merge pull request #190 from PiRSquared17/api-allpages-disabled
...
Fallback to getPageTitlesScraper() if API allpages disabled
10 years ago
PiRSquared17
757019521a
Fallback to scraper if API allpages disabled
10 years ago
PiRSquared17
4b3c862a58
Comment debugging print, fix test
10 years ago
PiRSquared17
7a1db0525b
Add more wiki engines to getWikiEngine
10 years ago
PiRSquared17
b4818d2985
Avoid infinite loop in getImageNamesScraper
10 years ago
nemobis
8a9b50b51d
Merge pull request #183 from PiRSquared17/patch-7
...
Retry on ConnectionError in getXMLPageCore
10 years ago
nemobis
19c48d3dd0
Merge pull request #180 from PiRSquared17/patch-2
...
Get as much information from siteinfo as possible
10 years ago
Pi R. Squared
f7187b7048
Retry on ConnectionError in getXMLPageCore
...
Previously it just gave a fatal error.
10 years ago
Pi R. Squared
f31e4e6451
Dict not hashable, also not needed
...
Quick fix.
10 years ago
Pi R. Squared
399f609d70
AllPages API hack for old versions of MediaWiki
...
New API format: http://www.mediawiki.org/w/api.php?action=query&list=allpages&apnamespace=0&apfrom=!&format=json&aplimit=500
Old API format: http://wiki.damirsystems.com/api.php?action=query&list=allpages&apnamespace=0&apfrom=!&format=json
10 years ago
Pi R. Squared
498b64da3f
Try getting index.php from siteinfo API
...
Fixes #49
10 years ago
Pi R. Squared
ff0d230d08
Get as much information from siteinfo as possible
...
Properly fixes #74 .
Algorithm:
1. Try all siteinfo props. If this gives an error, continue. Otherwise, stop.
2. Try MediaWiki 1.11-1.12 siteinfo props. If this gives an error, continue. Otherwise, stop.
3. Try minimal siteinfo props. Stop.
Not using sishowalldb=1 to avoid possible error (by default), since this data is of little use anyway.
10 years ago
Pi R. Squared
322604cc23
Encode title using UTF-8 before printing
...
This fixes #170 and closes #174 .
10 years ago
nemobis
11368310ee
Merge pull request #173 from nemobis/issue/131
...
Fix #131 : ValueError: No JSON object could be decoded
10 years ago
Nemo bis
026c2a9a25
Issue 131: ValueError: No JSON object could be decoded
10 years ago
Sean Yeh
38e73c1cf7
Fix argument parsing to accept delay as a number
10 years ago
Emilio J. Rodríguez-Posada
a2efca27b8
improving API/Index calculate
10 years ago
Emilio J. Rodríguez-Posada
4bc43a1c0f
improved help messages
10 years ago
Emilio J. Rodríguez-Posada
51806f5a3d
fixed #160 ; improved args parsing and --help; improved API/Index estimate from URL;
10 years ago
Emilio J. Rodríguez-Posada
dd7df0cc01
Merge branch 'master' of https://github.com/WikiTeam/wikiteam
10 years ago
Emilio J. Rodríguez-Posada
f3b388fc79
a first approach to auto-detect API/Index.php using URL to the Main_Page
10 years ago
Erkan Yilmaz
44b80ceb88
fix link for tutorial
10 years ago
balr0g
8485a5004d
Pass session
10 years ago
balr0g
fd6ea19b4b
config['api'] is set but empty; properly handle this
10 years ago
nemobis
1ff96238eb
Denote as alpha until revamp is tested
...
Per emijrp who asked not to run dumps with this, at https://github.com/WikiTeam/wikiteam/issues/104#issuecomment-48039143
Currently proposed things to fix or check: https://github.com/WikiTeam/wikiteam/issues?milestone=1&state=open
10 years ago
Emilio J. Rodríguez-Posada
89e3c3e462
standarize getImage* functions names
10 years ago
Emilio J. Rodríguez-Posada
aaa1822759
improving image list downloader
10 years ago
Emilio J. Rodríguez-Posada
88c9468c0e
improving image list downloader
10 years ago
balr0g
3929e4eb9c
Cleanups and error fixes suggested by flake8 (pep8 + pyflakes)
10 years ago
Emilio J. Rodríguez-Posada
c07b527e5d
adding session to getWikiEngine()
10 years ago
Emilio J. Rodríguez-Posada
30c153ce1f
chg: using 'with open' for files
10 years ago
balr0g
9aa3c4a0e1
Removed all traces of urllib except for encode/decode; more bugs fixed.
10 years ago
balr0g
c8e11a949b
Initial port to Requests
10 years ago
Emilio J. Rodríguez-Posada
9553e3550c
adding wiki engine detector
10 years ago
Emilio J. Rodríguez-Posada
eb97cf1adf
version 0.2.2 and tiny bits in --help
10 years ago
balr0g
50b011f90d
Initial port to argparse
10 years ago
Emilio J. Rodríguez-Posada
568deef081
adding comments for clarification
10 years ago
Emilio J. Rodríguez-Posada
d4eed1f738
fixing #127 and #134 , now works with APIs that returns 'name' field for images and those that don't do it (in this case we unquote over ascii); also fixing bug that re-download image list when it was completed previously
10 years ago
Emilio J. Rodríguez-Posada
005de23c1d
adding gzip to siteinfo downloader
10 years ago
Emilio J. Rodríguez-Posada
d79ea64d41
fixing issue #97 pretty siteinfo json saving, indenting 4 chars
10 years ago
Emilio J. Rodríguez-Posada
3854a344fe
Merge branch 'master' of https://github.com/WikiTeam/wikiteam
10 years ago
Emilio J. Rodríguez-Posada
1c1f0dbb86
replacing XML with JSON in image downloading
10 years ago
balr0g
481323c7f7
Don't try to download sites with disabled API
10 years ago
nemobis
1933db8a94
Merge pull request #124 from balr0g/scraper-unicode-title-fix
...
Fix scraper for sites with Unicode titles
10 years ago
balr0g
62be069026
Fix scraper for sites with Unicode titles
10 years ago
nemobis
62d961fa97
Fix typo, unused variable spotted by balrog
10 years ago
nemobis
95bc2dec38
Link GitHub issue tracker
10 years ago
balr0g
d60e560571
Add Content-Encoding: gzip support
10 years ago