2
0
mirror of https://github.com/WikiTeam/wikiteam synced 2024-11-16 21:27:46 +00:00
Commit Graph

763 Commits

Author SHA1 Message Date
Tim Sheerman-Chase
5cb2ecb6b5 Attempting to fix missing config in tests 2015-08-07 21:33:39 +01:00
Tim Sheerman-Chase
8380af5f24 Improve retry logic 2015-08-05 21:24:59 +01:00
Emilio J. Rodríguez-Posada
0f456208f1 Update README.md 2015-07-18 22:07:53 +02:00
emijrp
d0f7f83ee5 reading from IA 2015-07-10 09:03:16 +02:00
nemobis
344d74dc9d Add warning for those who missed the issue 2015-07-06 21:35:28 +02:00
nemobis
cc5f9ea0a6 Update 2014 done list URL; spotted by emijrp 2015-07-06 21:34:12 +02:00
nemobis
ba903682ce Merge pull request #240 from PiRSquared17/logo-uploader
Save and upload logos in uploader.py
2015-07-06 08:50:38 +02:00
PiRSquared17
fadd7134f7 What I meant to do, ugh 2015-04-18 21:28:57 +00:00
PiRSquared17
1b2e83aa8c Fix minor error with normpath call 2015-04-18 21:27:23 +00:00
nemobis
7ad89fc226 Merge pull request #245 from PiRSquared17/path-parse-norm
Normalize path/foo/ to path/foo, so -2, etc. work (fixes #244)
2015-04-10 09:12:25 +02:00
PiRSquared17
5db9a1c7f3 Normalize path/foo/ to path/foo, so -2, etc. work (fixes #244) 2015-04-10 00:21:07 +00:00
PiRSquared17
6b2c5a6915 Fix license URL, misc. 2015-04-07 01:06:00 +00:00
PiRSquared17
0fa6085a2a Merge branch 'master' of https://github.com/WikiTeam/wikiteam into logo-uploader 2015-03-31 00:10:50 +00:00
Federico Leva
918fe060d0 Merge branch 'nemobis-2015/iterators' 2015-03-30 09:27:27 +02:00
Federico Leva
2b78bfb795 Merge branch '2015/iterators' of git://github.com/nemobis/wikiteam into nemobis-2015/iterators
Conflicts:
	requirements.txt
2015-03-30 09:27:00 +02:00
Federico Leva
d4fd745498 Actually allow resuming huge or broken XML dumps
* Log "XML export on this wiki is broken, quitting." to the error
  file so that grepping reveals which dumps were interrupted so.
* Automatically reduce export size for a page when downloading the
  entire history at once results in a MemoryError.
* Truncate the file with a pythonic method (.seek and .truncate)
  while reading from the end, by making reverse_readline() a weird
  hybrid to avoid an actual coroutine.
2015-03-30 02:35:55 +02:00
Federico Leva
9168a66a54 logerror() wants unicode, but readTitles etc. give bytes
Fixes #239.
2015-03-30 00:51:53 +02:00
Federico Leva
632b99ea53 Merge branch '2015/iterators' of https://github.com/nemobis/wikiteam into nemobis-2015/iterators 2015-03-30 00:48:32 +02:00
PiRSquared17
109528384b Save and upload logos in uploader.py 2015-03-29 21:30:15 +00:00
nemobis
4e57430605 Merge pull request #238 from WikiTeam/PiRSquared17-patch-1
Set verbose=True for upload
2015-03-29 21:01:22 +02:00
PiRSquared17
cb005516b2 Set verbose=True for upload
This makes it show progress.
2015-03-29 18:50:50 +00:00
nemobis
4fce244d4a Merge pull request #237 from WikiTeam/uploader-ia-wrapper
Port uploader.py to use internetarchive package
2015-03-29 20:42:13 +02:00
PiRSquared17
29ee59c925 Add internetarchive requirement
Add internetarchive
2015-03-29 18:33:56 +00:00
PiRSquared17
905511f996 Port uploader.py to use internetarchive package
Remove curl stuff and replace with internetarchive pip package (or https://github.com/jjjake/ia-wrapper) API
2015-03-29 18:30:42 +00:00
nemobis
ff2cdfa1cd Merge pull request #236 from PiRSquared17/fix-server-check-api
Catch KeyError to fix server check
2015-03-29 13:53:26 +02:00
nemobis
0b25951ab1 Merge pull request #224 from nemobis/2015/issue26
Issue #26: Local "Special" namespace, actually limit replies
2015-03-29 13:53:20 +02:00
PiRSquared17
03db166718 Catch KeyError to fix server check 2015-03-29 04:14:43 +01:00
nemobis
213687011e Merge pull request #235 from PiRSquared17/truncate-file-utf8
Make filename truncation work with UTF-8
2015-03-28 21:55:56 +01:00
PiRSquared17
f80ad39df0 Make filename truncation work with UTF-8 2015-03-28 15:17:06 +00:00
PiRSquared17
90bfd1400e Merge pull request #229 from PiRSquared17/fix-zwnbsp-bom
Strip ZWNBSP (U+FEFF) Byte-Order Mark from JSON/XML
2015-03-24 21:46:33 +00:00
Marek Šuppa
5fbeda982f Merge pull request #233 from PiRSquared17/allow-single-test
Allow a single test to be run (see PR)
2015-03-24 22:39:23 +01:00
PiRSquared17
b80159e257 Allow a single test to be run (see PR) 2015-03-24 21:08:43 +00:00
PiRSquared17
7c80d37e04 Add test for BOM encoding 2015-03-24 21:04:29 +00:00
nemobis
d31709338d Merge pull request #231 from PiRSquared17/ignore-leading-spaces
Allow spaces before <mediawiki> tag.
2015-03-24 21:07:24 +01:00
PiRSquared17
ba48c43d34 Merge pull request #232 from PiRSquared17/remove-test-kwiki
Comment out broken test case wiki
2015-03-24 03:50:15 +00:00
PiRSquared17
d89b99bd7c Comment out broken test case wiki 2015-03-24 03:47:45 +00:00
PiRSquared17
fc276d525f Allow spaces before <mediawiki> tag. 2015-03-24 03:44:03 +00:00
PiRSquared17
1c820dafb7 Strip ZWNBSP (U+FEFF) Byte-Order Mark from JSON/XML 2015-03-24 01:58:01 +00:00
nemobis
711a88df59 Merge pull request #226 from nemobis/master
Make dumpgenerator.py 774: required by launcher.py
2015-03-14 10:44:41 +01:00
Nemo bis
55e5888a00 Fix UnicodeDecodeError in resume: use kitchen 2015-03-10 22:26:23 +02:00
Federico Leva
14ce5f2c1b Resume and list titles without keeping everything in memory
Approach suggested by @makoshark, finally found the time to start
implementing it.
* Do not produce and save the titles list all at once. Instead, use
  the scraper and API as generators and save titles on the go. Also,
  try to start the generator from the appropriate title.
  For now the title sorting is not implemented. Pages will be in the
  order given by namespace ID, then page name.
* When resuming, read both the title list and the XML file from the
  end rather than the beginning. If the correct terminator is
  present, only one line needs to be read.
* In both cases, use a generator instead of a huge list in memory.
* Also truncate the resumed XML without writing it from scratch.
  For now using GNU ed: very compact, though shelling out is ugly.
  I gave up on using file.seek and file.truncate to avoid reading the
  whole file from the beginning or complicating reverse_readline()
  with more offset calculations.

This should avoid MemoryError in most cases.

Tested by running a dump over a 1.24 wiki with 11 pages: a complete
dump and a resumed dump from a dump interrupted with ctrl-c.
2015-03-10 11:21:44 +01:00
Federico Leva
2537e9852e Make dumpgenerator.py 774: required by launcher.py 2015-03-08 21:33:39 +01:00
nemobis
4b81fa00f1 Merge pull request #225 from nemobis/master
Fix API check if only index is passed
2015-03-08 20:53:51 +01:00
Federico Leva
79e2c5951f Fix API check if only index is passed
I forgot that the preceding point only extracts the api.php URL if
the "wiki" argument is passed to say it's a MediaWiki wiki (!).
2015-03-08 20:52:24 +01:00
Federico Leva
bdc7c9bf06 Issue 26: Local "Special" namespace, actually limit replies
* For some reason, in a previous commit I had noticed that maxretries
  was not respected in getXMLPageCore, but I didn't fix it. Done now.
* If the "Special" namespace alias doesn't work, fetch the local one.
2015-03-08 19:30:09 +01:00
Federico Leva
c1a5e3e0ca Merge branch 'PiRSquared17-follow-redirects-api' 2015-03-08 16:31:32 +01:00
Federico Leva
2f25e6b787 Make checkAPI() more readable and verbose
Also return the api URL we found.
2015-03-08 16:01:46 +01:00
Federico Leva
48ad3775fd Merge branch 'follow-redirects-api' of git://github.com/PiRSquared17/wikiteam into PiRSquared17-follow-redirects-api 2015-03-08 14:35:30 +01:00
nemobis
2284e3d55e Merge pull request #186 from PiRSquared17/update-headers
Preserve default headers, fixing openwrt test
2015-03-08 13:56:00 +01:00
PiRSquared17
5d23cb62f4 Merge pull request #219 from vadp/dir-fnames-unicode
convert images directory content to unicode when resuming download
2015-03-04 23:36:59 +00:00