Hydriz Scholz
30b67c44f4
Remove Banana Dance
2015-08-06 17:41:45 +08:00
Emilio J. Rodríguez-Posada
0f456208f1
Update README.md
2015-07-18 22:07:53 +02:00
emijrp
d0f7f83ee5
reading from IA
2015-07-10 09:03:16 +02:00
nemobis
344d74dc9d
Add warning for those who missed the issue
2015-07-06 21:35:28 +02:00
nemobis
cc5f9ea0a6
Update 2014 done list URL; spotted by emijrp
2015-07-06 21:34:12 +02:00
nemobis
ba903682ce
Merge pull request #240 from PiRSquared17/logo-uploader
...
Save and upload logos in uploader.py
2015-07-06 08:50:38 +02:00
PiRSquared17
fadd7134f7
What I meant to do, ugh
2015-04-18 21:28:57 +00:00
PiRSquared17
1b2e83aa8c
Fix minor error with normpath call
2015-04-18 21:27:23 +00:00
nemobis
7ad89fc226
Merge pull request #245 from PiRSquared17/path-parse-norm
...
Normalize path/foo/ to path/foo, so -2, etc. work (fixes #244 )
2015-04-10 09:12:25 +02:00
PiRSquared17
5db9a1c7f3
Normalize path/foo/ to path/foo, so -2, etc. work ( fixes #244 )
2015-04-10 00:21:07 +00:00
PiRSquared17
6b2c5a6915
Fix license URL, misc.
2015-04-07 01:06:00 +00:00
PiRSquared17
0fa6085a2a
Merge branch 'master' of https://github.com/WikiTeam/wikiteam into logo-uploader
2015-03-31 00:10:50 +00:00
Federico Leva
918fe060d0
Merge branch 'nemobis-2015/iterators'
2015-03-30 09:27:27 +02:00
Federico Leva
2b78bfb795
Merge branch '2015/iterators' of git://github.com/nemobis/wikiteam into nemobis-2015/iterators
...
Conflicts:
requirements.txt
2015-03-30 09:27:00 +02:00
Federico Leva
d4fd745498
Actually allow resuming huge or broken XML dumps
...
* Log "XML export on this wiki is broken, quitting." to the error
file so that grepping reveals which dumps were interrupted so.
* Automatically reduce export size for a page when downloading the
entire history at once results in a MemoryError.
* Truncate the file with a pythonic method (.seek and .truncate)
while reading from the end, by making reverse_readline() a weird
hybrid to avoid an actual coroutine.
2015-03-30 02:35:55 +02:00
Federico Leva
9168a66a54
logerror() wants unicode, but readTitles etc. give bytes
...
Fixes #239 .
2015-03-30 00:51:53 +02:00
Federico Leva
632b99ea53
Merge branch '2015/iterators' of https://github.com/nemobis/wikiteam into nemobis-2015/iterators
2015-03-30 00:48:32 +02:00
PiRSquared17
109528384b
Save and upload logos in uploader.py
2015-03-29 21:30:15 +00:00
nemobis
4e57430605
Merge pull request #238 from WikiTeam/PiRSquared17-patch-1
...
Set verbose=True for upload
2015-03-29 21:01:22 +02:00
PiRSquared17
cb005516b2
Set verbose=True for upload
...
This makes it show progress.
2015-03-29 18:50:50 +00:00
nemobis
4fce244d4a
Merge pull request #237 from WikiTeam/uploader-ia-wrapper
...
Port uploader.py to use internetarchive package
2015-03-29 20:42:13 +02:00
PiRSquared17
29ee59c925
Add internetarchive requirement
...
Add internetarchive
2015-03-29 18:33:56 +00:00
PiRSquared17
905511f996
Port uploader.py to use internetarchive package
...
Remove curl stuff and replace with internetarchive pip package (or https://github.com/jjjake/ia-wrapper ) API
2015-03-29 18:30:42 +00:00
nemobis
ff2cdfa1cd
Merge pull request #236 from PiRSquared17/fix-server-check-api
...
Catch KeyError to fix server check
2015-03-29 13:53:26 +02:00
nemobis
0b25951ab1
Merge pull request #224 from nemobis/2015/issue26
...
Issue #26 : Local "Special" namespace, actually limit replies
2015-03-29 13:53:20 +02:00
PiRSquared17
03db166718
Catch KeyError to fix server check
2015-03-29 04:14:43 +01:00
nemobis
213687011e
Merge pull request #235 from PiRSquared17/truncate-file-utf8
...
Make filename truncation work with UTF-8
2015-03-28 21:55:56 +01:00
PiRSquared17
f80ad39df0
Make filename truncation work with UTF-8
2015-03-28 15:17:06 +00:00
PiRSquared17
90bfd1400e
Merge pull request #229 from PiRSquared17/fix-zwnbsp-bom
...
Strip ZWNBSP (U+FEFF) Byte-Order Mark from JSON/XML
2015-03-24 21:46:33 +00:00
Marek Šuppa
5fbeda982f
Merge pull request #233 from PiRSquared17/allow-single-test
...
Allow a single test to be run (see PR)
2015-03-24 22:39:23 +01:00
PiRSquared17
b80159e257
Allow a single test to be run (see PR)
2015-03-24 21:08:43 +00:00
PiRSquared17
7c80d37e04
Add test for BOM encoding
2015-03-24 21:04:29 +00:00
nemobis
d31709338d
Merge pull request #231 from PiRSquared17/ignore-leading-spaces
...
Allow spaces before <mediawiki> tag.
2015-03-24 21:07:24 +01:00
PiRSquared17
ba48c43d34
Merge pull request #232 from PiRSquared17/remove-test-kwiki
...
Comment out broken test case wiki
2015-03-24 03:50:15 +00:00
PiRSquared17
d89b99bd7c
Comment out broken test case wiki
2015-03-24 03:47:45 +00:00
PiRSquared17
fc276d525f
Allow spaces before <mediawiki> tag.
2015-03-24 03:44:03 +00:00
PiRSquared17
1c820dafb7
Strip ZWNBSP (U+FEFF) Byte-Order Mark from JSON/XML
2015-03-24 01:58:01 +00:00
nemobis
711a88df59
Merge pull request #226 from nemobis/master
...
Make dumpgenerator.py 774: required by launcher.py
2015-03-14 10:44:41 +01:00
Nemo bis
55e5888a00
Fix UnicodeDecodeError in resume: use kitchen
2015-03-10 22:26:23 +02:00
Federico Leva
14ce5f2c1b
Resume and list titles without keeping everything in memory
...
Approach suggested by @makoshark, finally found the time to start
implementing it.
* Do not produce and save the titles list all at once. Instead, use
the scraper and API as generators and save titles on the go. Also,
try to start the generator from the appropriate title.
For now the title sorting is not implemented. Pages will be in the
order given by namespace ID, then page name.
* When resuming, read both the title list and the XML file from the
end rather than the beginning. If the correct terminator is
present, only one line needs to be read.
* In both cases, use a generator instead of a huge list in memory.
* Also truncate the resumed XML without writing it from scratch.
For now using GNU ed: very compact, though shelling out is ugly.
I gave up on using file.seek and file.truncate to avoid reading the
whole file from the beginning or complicating reverse_readline()
with more offset calculations.
This should avoid MemoryError in most cases.
Tested by running a dump over a 1.24 wiki with 11 pages: a complete
dump and a resumed dump from a dump interrupted with ctrl-c.
2015-03-10 11:21:44 +01:00
Federico Leva
2537e9852e
Make dumpgenerator.py 774: required by launcher.py
2015-03-08 21:33:39 +01:00
nemobis
4b81fa00f1
Merge pull request #225 from nemobis/master
...
Fix API check if only index is passed
2015-03-08 20:53:51 +01:00
Federico Leva
79e2c5951f
Fix API check if only index is passed
...
I forgot that the preceding point only extracts the api.php URL if
the "wiki" argument is passed to say it's a MediaWiki wiki (!).
2015-03-08 20:52:24 +01:00
Federico Leva
bdc7c9bf06
Issue 26: Local "Special" namespace, actually limit replies
...
* For some reason, in a previous commit I had noticed that maxretries
was not respected in getXMLPageCore, but I didn't fix it. Done now.
* If the "Special" namespace alias doesn't work, fetch the local one.
2015-03-08 19:30:09 +01:00
Federico Leva
c1a5e3e0ca
Merge branch 'PiRSquared17-follow-redirects-api'
2015-03-08 16:31:32 +01:00
Federico Leva
2f25e6b787
Make checkAPI() more readable and verbose
...
Also return the api URL we found.
2015-03-08 16:01:46 +01:00
Federico Leva
48ad3775fd
Merge branch 'follow-redirects-api' of git://github.com/PiRSquared17/wikiteam into PiRSquared17-follow-redirects-api
2015-03-08 14:35:30 +01:00
nemobis
2284e3d55e
Merge pull request #186 from PiRSquared17/update-headers
...
Preserve default headers, fixing openwrt test
2015-03-08 13:56:00 +01:00
PiRSquared17
5d23cb62f4
Merge pull request #219 from vadp/dir-fnames-unicode
...
convert images directory content to unicode when resuming download
2015-03-04 23:36:59 +00:00
PiRSquared17
d361477a46
Merge pull request #222 from vadp/img-desc-load-err
...
dumpgenerator: catch errors for missing image descriptions
2015-03-03 01:55:59 +00:00