Commit Graph

1106 Commits (d7153f4c60209c2fabb776fcb750e3534bc179d3)
 

Author SHA1 Message Date
yzqzss d7153f4c60
Update uploader.py 1 year ago
nemobis 0c4c54dc9e
Merge pull request #449 from yzqzss/patch-2
make `requests.session` to use `--retries` value
1 year ago
yzqzss 90a64c6a22
make `requests.session` to use `--retries` value
(default=5)
1 year ago
nemobis 9d614cf8ad
Merge pull request #444 from Pokechu22/wiki-engine-session
Use the same requests session for getting the wiki engine and checking API/index
2 years ago
Pokechu22 97146c6f01 Use the same requests session for getting the wiki engine and checking API/index 2 years ago
Pokechu22 6668999658 Update User-Agent to latest Firefox 2 years ago
nemobis ea5e130517
Merge pull request #442 from Pokechu22/missing-image-description
Fix crash when the image description is missing for an image containing non-ascii characters
2 years ago
Pokechu22 cad7260d7c Fix crash when the image description is missing for an image containing non-ascii characters
title is already unicode, so we shouldn't need to decode it (and don't in generateXMLDump).
2 years ago
nemobis 25329be008
Merge pull request #441 from Pokechu22/mwclient-session
Pass requests session to mwclient
2 years ago
Pokechu22 5b3fc4ac7b Pass requests session to mwclient
This means it uses our configured user-agent, as well as any cookies.
2 years ago
nemobis 52fe2d89a6
Merge pull request #440 from Pokechu22/xmlrevisions-skip-empty-revision
Skip empty revisions when using --xmlrevisions
2 years ago
Pokechu22 1af69ca147 Skip empty revisions when using --xmlrevisions
Before, the download would die, and need to be resumed from the start.
2 years ago
nemobis 5d83703d50
Merge pull request #438 from Pokechu22/getXMLHeader-session
Use `session.get` instead of `requests.get` in `getXMLHeader`
2 years ago
Pokechu22 4a2cbd4843 Use `session.get` instead of `requests.get` in `getXMLHeader`
`session.get` uses our configured User-Agent, while `requests.get` uses the default one.
2 years ago
nemobis 9808279a6a
Merge pull request #436 from Pokechu22/unicode-resume
Work around unicode titles not working with resuming and fix truncation when resuming
2 years ago
Pokechu22 9b2c6e40ae Fix truncation when resuming
There already was code that looks like it was supposed to truncate files, but it calculated the index wrong and didn't properly check all lines. It worked out, though, because it didn't actually call the truncate function.

Now, truncation occurs to the last `</page>` tag. If the XML file ends with a `</page>` tag, then nothing gets truncated. The page is added after that; if nothing was truncated, this will result in the same page being listed twice (which already happened with the missing truncation), but if truncation did happen then the file should no longer be invalid.
2 years ago
Pokechu22 43945c467f Work around unicode titles not working with resuming
Before, you would get UnicodeWarning: Unicode unequal comparison failed to convert both arguments to Unicode - interpreting them as being unequal. The %s versus {} change was needed because otherwise you would get UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128). There is probably a better way of solving that, but this one does work.
2 years ago
Federico Leva e33f14fce6 Support GiveUpGitHub 2 years ago
nemobis 269841c909
Merge pull request #431 from simonliu99/updatelists
Update mediawiki wikifarm lists
2 years ago
Liu d9885e0845 Update shoutwiki-spider to remove duplicates 2 years ago
Liu fcc4080b23 Update neoseeker.com.info instructions 2 years ago
Liu e7f7266550 Update fandom.com spider and remove duplicates 2 years ago
Liu 9c5c55342d Update miraheze.org spider and remove duplicates 2 years ago
Liu a96e482fd3 Add .DS_Store to .gitignore 2 years ago
Liu 4c970e358d Remove duplicates from wiki-site.com 2 years ago
Liu 74a8e9609f Update wiki-site.com spider and list 2 years ago
Liu ba7fab2e96 Add fandom-spider and update metadata and lists 2 years ago
Liu 49e41ee75d Update neoseeker.com spider and list 2 years ago
Liu 6346fd6553 Update shoutwiki.com spider and list 2 years ago
Liu f93988e9c6 Update fandom.com to HTTPS 2 years ago
Liu 91faa34529 Update shoutwiki.com list 2 years ago
Liu d6fe1d9ff8 Update battlestarwiki.org list 2 years ago
Federico Leva cd6d40d5ac Merge branch 'simonliu99-updatelists2' 2 years ago
Liu 6f8f160d75 Update fandom.com list 2 years ago
Liu 6b39402ebf Update miraheze.org list 2 years ago
Liu f755153de9 Update neoseeker.com list 2 years ago
Federico Leva 10ee80ca3b Rename wikia list to fandom 2 years ago
nemobis 054397aecb
Merge pull request #425 from simonliu99/append_date
Display appended-date IA URL if appended
2 years ago
Liu 3638e6992f Simplify tracking item identifier 2 years ago
Liu 94eb932b3f Display appended-date IA URL if appended 2 years ago
nemobis 1f911c0142
Merge pull request #424 from simonliu99/master
Add dump date to item identifier
2 years ago
Liu d947d7571a Allow date append only if not admin 2 years ago
Liu 9f9df1e0aa Update logic to only append date if identifier without date exists 2 years ago
Liu 6c137764cb Move identifier-date option behind flag 2 years ago
Liu 990d5dfb4f Add dump date to item identifier 2 years ago
nemobis d7b6924845
Merge pull request #408 from shreyasminocha/fix-resume-images
Fix image resuming
2 years ago
nemobis 7f1f9985f6
Merge pull request #419 from timgates42/bugfix_typos
docs: Fix a few typos
2 years ago
Tim Gates ecbcc6118e
docs: Fix a few typos
There are small typos in:
- dumpgenerator.py
- wikiteam/mediawiki.py

Fixes:
- Should read `inconsistencies` rather than `inconsistences`.
- Should read `partially` rather than `partialy`.
2 years ago
Shreyas Minocha e55de36cb7
Fix image resuming 3 years ago
nemobis 0cfde9e9d1
Merge pull request #394 from nsapa/nico_fix_1
Nico's fixes (dumping wiki.dystify.com/CI fixes)
4 years ago