wikiteam

mirror of https://github.com/WikiTeam/wikiteam synced 2024-11-16 21:27:46 +00:00

Author	SHA1	Message	Date
Tim Sheerman-Chase	5cb2ecb6b5	Attempting to fix missing config in tests	2015-08-07 21:33:39 +01:00
Tim Sheerman-Chase	8380af5f24	Improve retry logic	2015-08-05 21:24:59 +01:00
Emilio J. Rodríguez-Posada	0f456208f1	Update README.md	2015-07-18 22:07:53 +02:00
emijrp	d0f7f83ee5	reading from IA	2015-07-10 09:03:16 +02:00
nemobis	344d74dc9d	Add warning for those who missed the issue	2015-07-06 21:35:28 +02:00
nemobis	cc5f9ea0a6	Update 2014 done list URL; spotted by emijrp	2015-07-06 21:34:12 +02:00
nemobis	ba903682ce	Merge pull request #240 from PiRSquared17/logo-uploader Save and upload logos in uploader.py	2015-07-06 08:50:38 +02:00
PiRSquared17	fadd7134f7	What I meant to do, ugh	2015-04-18 21:28:57 +00:00
PiRSquared17	1b2e83aa8c	Fix minor error with normpath call	2015-04-18 21:27:23 +00:00
nemobis	7ad89fc226	Merge pull request #245 from PiRSquared17/path-parse-norm Normalize path/foo/ to path/foo, so -2, etc. work (fixes #244)	2015-04-10 09:12:25 +02:00
PiRSquared17	5db9a1c7f3	Normalize path/foo/ to path/foo, so -2, etc. work (fixes #244 )	2015-04-10 00:21:07 +00:00
PiRSquared17	6b2c5a6915	Fix license URL, misc.	2015-04-07 01:06:00 +00:00
PiRSquared17	0fa6085a2a	Merge branch 'master' of https://github.com/WikiTeam/wikiteam into logo-uploader	2015-03-31 00:10:50 +00:00
Federico Leva	918fe060d0	Merge branch 'nemobis-2015/iterators'	2015-03-30 09:27:27 +02:00
Federico Leva	2b78bfb795	Merge branch '2015/iterators' of git://github.com/nemobis/wikiteam into nemobis-2015/iterators Conflicts: requirements.txt	2015-03-30 09:27:00 +02:00
Federico Leva	d4fd745498	Actually allow resuming huge or broken XML dumps * Log "XML export on this wiki is broken, quitting." to the error file so that grepping reveals which dumps were interrupted so. * Automatically reduce export size for a page when downloading the entire history at once results in a MemoryError. * Truncate the file with a pythonic method (.seek and .truncate) while reading from the end, by making reverse_readline() a weird hybrid to avoid an actual coroutine.	2015-03-30 02:35:55 +02:00
Federico Leva	9168a66a54	logerror() wants unicode, but readTitles etc. give bytes Fixes #239.	2015-03-30 00:51:53 +02:00
Federico Leva	632b99ea53	Merge branch '2015/iterators' of https://github.com/nemobis/wikiteam into nemobis-2015/iterators	2015-03-30 00:48:32 +02:00
PiRSquared17	109528384b	Save and upload logos in uploader.py	2015-03-29 21:30:15 +00:00
nemobis	4e57430605	Merge pull request #238 from WikiTeam/PiRSquared17-patch-1 Set verbose=True for upload	2015-03-29 21:01:22 +02:00
PiRSquared17	cb005516b2	Set verbose=True for upload This makes it show progress.	2015-03-29 18:50:50 +00:00
nemobis	4fce244d4a	Merge pull request #237 from WikiTeam/uploader-ia-wrapper Port uploader.py to use internetarchive package	2015-03-29 20:42:13 +02:00
PiRSquared17	29ee59c925	Add internetarchive requirement Add internetarchive	2015-03-29 18:33:56 +00:00
PiRSquared17	905511f996	Port uploader.py to use internetarchive package Remove curl stuff and replace with internetarchive pip package (or https://github.com/jjjake/ia-wrapper) API	2015-03-29 18:30:42 +00:00
nemobis	ff2cdfa1cd	Merge pull request #236 from PiRSquared17/fix-server-check-api Catch KeyError to fix server check	2015-03-29 13:53:26 +02:00
nemobis	0b25951ab1	Merge pull request #224 from nemobis/2015/issue26 Issue #26: Local "Special" namespace, actually limit replies	2015-03-29 13:53:20 +02:00
PiRSquared17	03db166718	Catch KeyError to fix server check	2015-03-29 04:14:43 +01:00
nemobis	213687011e	Merge pull request #235 from PiRSquared17/truncate-file-utf8 Make filename truncation work with UTF-8	2015-03-28 21:55:56 +01:00
PiRSquared17	f80ad39df0	Make filename truncation work with UTF-8	2015-03-28 15:17:06 +00:00
PiRSquared17	90bfd1400e	Merge pull request #229 from PiRSquared17/fix-zwnbsp-bom Strip ZWNBSP (U+FEFF) Byte-Order Mark from JSON/XML	2015-03-24 21:46:33 +00:00
Marek Šuppa	5fbeda982f	Merge pull request #233 from PiRSquared17/allow-single-test Allow a single test to be run (see PR)	2015-03-24 22:39:23 +01:00
PiRSquared17	b80159e257	Allow a single test to be run (see PR)	2015-03-24 21:08:43 +00:00
PiRSquared17	7c80d37e04	Add test for BOM encoding	2015-03-24 21:04:29 +00:00
nemobis	d31709338d	Merge pull request #231 from PiRSquared17/ignore-leading-spaces Allow spaces before <mediawiki> tag.	2015-03-24 21:07:24 +01:00
PiRSquared17	ba48c43d34	Merge pull request #232 from PiRSquared17/remove-test-kwiki Comment out broken test case wiki	2015-03-24 03:50:15 +00:00
PiRSquared17	d89b99bd7c	Comment out broken test case wiki	2015-03-24 03:47:45 +00:00
PiRSquared17	fc276d525f	Allow spaces before <mediawiki> tag.	2015-03-24 03:44:03 +00:00
PiRSquared17	1c820dafb7	Strip ZWNBSP (U+FEFF) Byte-Order Mark from JSON/XML	2015-03-24 01:58:01 +00:00
nemobis	711a88df59	Merge pull request #226 from nemobis/master Make dumpgenerator.py 774: required by launcher.py	2015-03-14 10:44:41 +01:00
Nemo bis	55e5888a00	Fix UnicodeDecodeError in resume: use kitchen	2015-03-10 22:26:23 +02:00
Federico Leva	14ce5f2c1b	Resume and list titles without keeping everything in memory Approach suggested by @makoshark, finally found the time to start implementing it. * Do not produce and save the titles list all at once. Instead, use the scraper and API as generators and save titles on the go. Also, try to start the generator from the appropriate title. For now the title sorting is not implemented. Pages will be in the order given by namespace ID, then page name. * When resuming, read both the title list and the XML file from the end rather than the beginning. If the correct terminator is present, only one line needs to be read. * In both cases, use a generator instead of a huge list in memory. * Also truncate the resumed XML without writing it from scratch. For now using GNU ed: very compact, though shelling out is ugly. I gave up on using file.seek and file.truncate to avoid reading the whole file from the beginning or complicating reverse_readline() with more offset calculations. This should avoid MemoryError in most cases. Tested by running a dump over a 1.24 wiki with 11 pages: a complete dump and a resumed dump from a dump interrupted with ctrl-c.	2015-03-10 11:21:44 +01:00
Federico Leva	2537e9852e	Make dumpgenerator.py 774: required by launcher.py	2015-03-08 21:33:39 +01:00
nemobis	4b81fa00f1	Merge pull request #225 from nemobis/master Fix API check if only index is passed	2015-03-08 20:53:51 +01:00
Federico Leva	79e2c5951f	Fix API check if only index is passed I forgot that the preceding point only extracts the api.php URL if the "wiki" argument is passed to say it's a MediaWiki wiki (!).	2015-03-08 20:52:24 +01:00
Federico Leva	bdc7c9bf06	Issue 26: Local "Special" namespace, actually limit replies * For some reason, in a previous commit I had noticed that maxretries was not respected in getXMLPageCore, but I didn't fix it. Done now. * If the "Special" namespace alias doesn't work, fetch the local one.	2015-03-08 19:30:09 +01:00
Federico Leva	c1a5e3e0ca	Merge branch 'PiRSquared17-follow-redirects-api'	2015-03-08 16:31:32 +01:00
Federico Leva	2f25e6b787	Make checkAPI() more readable and verbose Also return the api URL we found.	2015-03-08 16:01:46 +01:00
Federico Leva	48ad3775fd	Merge branch 'follow-redirects-api' of git://github.com/PiRSquared17/wikiteam into PiRSquared17-follow-redirects-api	2015-03-08 14:35:30 +01:00
nemobis	2284e3d55e	Merge pull request #186 from PiRSquared17/update-headers Preserve default headers, fixing openwrt test	2015-03-08 13:56:00 +01:00
PiRSquared17	5d23cb62f4	Merge pull request #219 from vadp/dir-fnames-unicode convert images directory content to unicode when resuming download	2015-03-04 23:36:59 +00:00

1 2 3 4 5 ...

763 Commits