Commit Graph

203 Commits (fc2ed3ed0d8bfbc6d939a24189e09e52946aee9e)

Author SHA1 Message Date
balr0g 50b011f90d Initial port to argparse 10 years ago
Emilio J. Rodríguez-Posada 568deef081 adding comments for clarification 10 years ago
Emilio J. Rodríguez-Posada d4eed1f738 fixing #127 and #134 , now works with APIs that returns 'name' field for images and those that don't do it (in this case we unquote over ascii); also fixing bug that re-download image list when it was completed previously 10 years ago
Emilio J. Rodríguez-Posada 005de23c1d adding gzip to siteinfo downloader 10 years ago
Emilio J. Rodríguez-Posada d79ea64d41 fixing issue #97 pretty siteinfo json saving, indenting 4 chars 10 years ago
Emilio J. Rodríguez-Posada 3854a344fe Merge branch 'master' of https://github.com/WikiTeam/wikiteam 10 years ago
Emilio J. Rodríguez-Posada 1c1f0dbb86 replacing XML with JSON in image downloading 10 years ago
balr0g 481323c7f7 Don't try to download sites with disabled API 10 years ago
nemobis 1933db8a94 Merge pull request #124 from balr0g/scraper-unicode-title-fix
Fix scraper for sites with Unicode titles
10 years ago
balr0g 62be069026 Fix scraper for sites with Unicode titles 10 years ago
nemobis 62d961fa97 Fix typo, unused variable spotted by balrog 10 years ago
nemobis 95bc2dec38 Link GitHub issue tracker 10 years ago
balr0g d60e560571 Add Content-Encoding: gzip support 10 years ago
Emilio J. Rodríguez-Posada 5261811fa4 only if api exists 10 years ago
Emilio J. Rodríguez-Posada 610764619a add saveSiteInfo() to download meta=siteinfo data from API to a file 10 years ago
Emilio J. Rodríguez-Posada d395433513 comments and newlines 10 years ago
Emilio J. Rodríguez-Posada 5eff4bd072 comments and tabs 10 years ago
Emilio J. Rodríguez-Posada 0b0c40f5da adding more user-agents, but keeps the first as default by now 10 years ago
Emilio J. Rodríguez-Posada 81468c4a7c using JSON to retrieve namespaces via API 10 years ago
Emilio J. Rodríguez-Posada 703eb9011b improving checkAPI() using JSON properly loaded 10 years ago
Emilio J. Rodríguez-Posada 44d3fe1e36 Merge pull request #117 from nemobis/bug/48
Issue 48: Check that API actually works
10 years ago
Emilio J. Rodríguez-Posada fc80556d8a merging... 10 years ago
Emilio J. Rodríguez-Posada f474deb71f now we use JSON properly in getPageTitlesAPI(), instead of XML; fixing some wrong prints, now support utf-8 10 years ago
Federico Leva 997276110c Issue 46: dumpgenerator should follow redirects
Patch by @balr0g from libsonic (GPLv3+).
10 years ago
Federico Leva a8e1575879 Issue 48: Check that API actually works 10 years ago
Emilio J. Rodríguez-Posada c9aa165504 fixing header with the new year, info and documentation link 10 years ago
nemobis ac4c93c12a Issue 85: more cross-platform shebang on all scripts
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@962 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
10 years ago
nemobis 403dc213ef Issue 71: English-only match for an older case
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@942 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis 0ede45b7cf Special:BadTitle works only in English wikis
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@917 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis 034866a32e Handle permissions-errors for wikis requiring login or whatever
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@916 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis 6c69d9800f Followup, delay needs config; should be BC
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@914 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis 55185467e1 Add delay to all checking and listing functions, crappy hosts die on them
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@902 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis 6efe406ea5 Followup r877, first check most common conditions for shortcut performance
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@882 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
Hydriz 611d13f8c5 Follow up r877, check the number of revision tags
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@878 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
Hydriz 64bd837cab (Issue 34) XML integry check inside the code
This *really* fixes the issue and asks the user whether or not to regenerate a dump.


git-svn-id: https://wikiteam.googlecode.com/svn/trunk@877 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
Hydriz 79047a3ded (Issue 71) Use a better check for private wikis
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@873 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis 26873ad495 Fix typo, make domain2prefix quiet again
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@869 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis 626118cfab Let's call it 0.2 then, a bump to 1 would require announcements etc. We're not there yet (API support etc.).
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@867 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
Hydriz df1e7efafd Change version of dumpgenerator.py to 1.1. Using 0.1 is rather confusing.
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@866 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis e1d4de3179 Uncomment appended index.php for guess in most configurations
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@864 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis 5d34d9512a Needs to be non-matching group
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@863 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis 82ba173739 Issue 22: allimages now uses aicontinue, not aifrom
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@862 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis c6546ff935 Issue 71: Don't try to download private wikis, first workaround
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@861 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis 776038666f Issue 72: revert r857, just define everything in launcher.py
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@860 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis 6113fa3340 Add delay to getPageTitlesScraper
We must be nice here too or naughty hosts fail badly, for instance wikkii.com gave

urllib2.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Moved Temporarily



git-svn-id: https://wikiteam.googlecode.com/svn/trunk@859 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis 9e1b13e173 Correct --help: format is --delay=5, not --delay:5
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@858 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis 6430ac5f47 Check for the existence of the array in domain2prefix instead; uploader.py failed on python 2.6
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@857 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis ef7d527e86 Add some advice about editthis.info for usage via launcher.py
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@855 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis 4820339d10 Fix r842, patch by balrog; Schbirid reported python error in CleanHTML
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@854 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago
nemobis 7c94815e2c Issue 68: Use GET, not POST, to download images; some harm and no? good
git-svn-id: https://wikiteam.googlecode.com/svn/trunk@851 31edc4fc-5e31-b4c4-d58b-c8bc928bcb95
11 years ago