Hydriz Scholz
30b67c44f4
Remove Banana Dance
9 years ago
Emilio J. Rodríguez-Posada
0f456208f1
Update README.md
9 years ago
emijrp
d0f7f83ee5
reading from IA
9 years ago
nemobis
344d74dc9d
Add warning for those who missed the issue
9 years ago
nemobis
cc5f9ea0a6
Update 2014 done list URL; spotted by emijrp
9 years ago
nemobis
ba903682ce
Merge pull request #240 from PiRSquared17/logo-uploader
...
Save and upload logos in uploader.py
9 years ago
PiRSquared17
fadd7134f7
What I meant to do, ugh
10 years ago
PiRSquared17
1b2e83aa8c
Fix minor error with normpath call
10 years ago
nemobis
7ad89fc226
Merge pull request #245 from PiRSquared17/path-parse-norm
...
Normalize path/foo/ to path/foo, so -2, etc. work (fixes #244 )
10 years ago
PiRSquared17
5db9a1c7f3
Normalize path/foo/ to path/foo, so -2, etc. work ( fixes #244 )
10 years ago
PiRSquared17
6b2c5a6915
Fix license URL, misc.
10 years ago
PiRSquared17
0fa6085a2a
Merge branch 'master' of https://github.com/WikiTeam/wikiteam into logo-uploader
10 years ago
Federico Leva
918fe060d0
Merge branch 'nemobis-2015/iterators'
10 years ago
Federico Leva
2b78bfb795
Merge branch '2015/iterators' of git://github.com/nemobis/wikiteam into nemobis-2015/iterators
...
Conflicts:
requirements.txt
10 years ago
Federico Leva
d4fd745498
Actually allow resuming huge or broken XML dumps
...
* Log "XML export on this wiki is broken, quitting." to the error
file so that grepping reveals which dumps were interrupted so.
* Automatically reduce export size for a page when downloading the
entire history at once results in a MemoryError.
* Truncate the file with a pythonic method (.seek and .truncate)
while reading from the end, by making reverse_readline() a weird
hybrid to avoid an actual coroutine.
10 years ago
Federico Leva
9168a66a54
logerror() wants unicode, but readTitles etc. give bytes
...
Fixes #239 .
10 years ago
Federico Leva
632b99ea53
Merge branch '2015/iterators' of https://github.com/nemobis/wikiteam into nemobis-2015/iterators
10 years ago
PiRSquared17
109528384b
Save and upload logos in uploader.py
10 years ago
nemobis
4e57430605
Merge pull request #238 from WikiTeam/PiRSquared17-patch-1
...
Set verbose=True for upload
10 years ago
PiRSquared17
cb005516b2
Set verbose=True for upload
...
This makes it show progress.
10 years ago
nemobis
4fce244d4a
Merge pull request #237 from WikiTeam/uploader-ia-wrapper
...
Port uploader.py to use internetarchive package
10 years ago
PiRSquared17
29ee59c925
Add internetarchive requirement
...
Add internetarchive
10 years ago
PiRSquared17
905511f996
Port uploader.py to use internetarchive package
...
Remove curl stuff and replace with internetarchive pip package (or https://github.com/jjjake/ia-wrapper ) API
10 years ago
nemobis
ff2cdfa1cd
Merge pull request #236 from PiRSquared17/fix-server-check-api
...
Catch KeyError to fix server check
10 years ago
nemobis
0b25951ab1
Merge pull request #224 from nemobis/2015/issue26
...
Issue #26 : Local "Special" namespace, actually limit replies
10 years ago
PiRSquared17
03db166718
Catch KeyError to fix server check
10 years ago
nemobis
213687011e
Merge pull request #235 from PiRSquared17/truncate-file-utf8
...
Make filename truncation work with UTF-8
10 years ago
PiRSquared17
f80ad39df0
Make filename truncation work with UTF-8
10 years ago
PiRSquared17
90bfd1400e
Merge pull request #229 from PiRSquared17/fix-zwnbsp-bom
...
Strip ZWNBSP (U+FEFF) Byte-Order Mark from JSON/XML
10 years ago
Marek Šuppa
5fbeda982f
Merge pull request #233 from PiRSquared17/allow-single-test
...
Allow a single test to be run (see PR)
10 years ago
PiRSquared17
b80159e257
Allow a single test to be run (see PR)
10 years ago
PiRSquared17
7c80d37e04
Add test for BOM encoding
10 years ago
nemobis
d31709338d
Merge pull request #231 from PiRSquared17/ignore-leading-spaces
...
Allow spaces before <mediawiki> tag.
10 years ago
PiRSquared17
ba48c43d34
Merge pull request #232 from PiRSquared17/remove-test-kwiki
...
Comment out broken test case wiki
10 years ago
PiRSquared17
d89b99bd7c
Comment out broken test case wiki
10 years ago
PiRSquared17
fc276d525f
Allow spaces before <mediawiki> tag.
10 years ago
PiRSquared17
1c820dafb7
Strip ZWNBSP (U+FEFF) Byte-Order Mark from JSON/XML
10 years ago
nemobis
711a88df59
Merge pull request #226 from nemobis/master
...
Make dumpgenerator.py 774: required by launcher.py
10 years ago
Nemo bis
55e5888a00
Fix UnicodeDecodeError in resume: use kitchen
10 years ago
Federico Leva
14ce5f2c1b
Resume and list titles without keeping everything in memory
...
Approach suggested by @makoshark, finally found the time to start
implementing it.
* Do not produce and save the titles list all at once. Instead, use
the scraper and API as generators and save titles on the go. Also,
try to start the generator from the appropriate title.
For now the title sorting is not implemented. Pages will be in the
order given by namespace ID, then page name.
* When resuming, read both the title list and the XML file from the
end rather than the beginning. If the correct terminator is
present, only one line needs to be read.
* In both cases, use a generator instead of a huge list in memory.
* Also truncate the resumed XML without writing it from scratch.
For now using GNU ed: very compact, though shelling out is ugly.
I gave up on using file.seek and file.truncate to avoid reading the
whole file from the beginning or complicating reverse_readline()
with more offset calculations.
This should avoid MemoryError in most cases.
Tested by running a dump over a 1.24 wiki with 11 pages: a complete
dump and a resumed dump from a dump interrupted with ctrl-c.
10 years ago
Federico Leva
2537e9852e
Make dumpgenerator.py 774: required by launcher.py
10 years ago
nemobis
4b81fa00f1
Merge pull request #225 from nemobis/master
...
Fix API check if only index is passed
10 years ago
Federico Leva
79e2c5951f
Fix API check if only index is passed
...
I forgot that the preceding point only extracts the api.php URL if
the "wiki" argument is passed to say it's a MediaWiki wiki (!).
10 years ago
Federico Leva
bdc7c9bf06
Issue 26: Local "Special" namespace, actually limit replies
...
* For some reason, in a previous commit I had noticed that maxretries
was not respected in getXMLPageCore, but I didn't fix it. Done now.
* If the "Special" namespace alias doesn't work, fetch the local one.
10 years ago
Federico Leva
c1a5e3e0ca
Merge branch 'PiRSquared17-follow-redirects-api'
10 years ago
Federico Leva
2f25e6b787
Make checkAPI() more readable and verbose
...
Also return the api URL we found.
10 years ago
Federico Leva
48ad3775fd
Merge branch 'follow-redirects-api' of git://github.com/PiRSquared17/wikiteam into PiRSquared17-follow-redirects-api
10 years ago
nemobis
2284e3d55e
Merge pull request #186 from PiRSquared17/update-headers
...
Preserve default headers, fixing openwrt test
10 years ago
PiRSquared17
5d23cb62f4
Merge pull request #219 from vadp/dir-fnames-unicode
...
convert images directory content to unicode when resuming download
10 years ago
PiRSquared17
d361477a46
Merge pull request #222 from vadp/img-desc-load-err
...
dumpgenerator: catch errors for missing image descriptions
10 years ago