2
0
mirror of https://github.com/WikiTeam/wikiteam synced 2024-11-15 00:15:00 +00:00
Commit Graph

1126 Commits

Author SHA1 Message Date
nemobis
1e363f450f
Merge pull request #464 from saveweb/xml-format-py2
Adjust XML format
2023-05-29 23:48:50 +03:00
nemobis
c7150784c1
Merge pull request #452 from yzqzss/patch-4
Update dumpgenerator.py
2023-05-29 09:33:17 +03:00
nemobis
a977dc1a8b
Merge pull request #439 from Pokechu22/page-title-scraper-fix
Fix infinite loop on page title scraper
2023-05-29 09:31:08 +03:00
nemobis
c56cbf1c12
Merge pull request #453 from yzqzss/patch-5
Speed up file scanning in `images/` dir
2023-05-29 09:29:01 +03:00
nemobis
674381c27c
Merge pull request #448 from yzqzss/patch-1
Match single quotes too when scraping namespaces
2023-05-29 09:22:51 +03:00
nemobis
8167987052
Merge pull request #451 from yzqzss/patch-2
Quote `title` to get correct file description
2023-05-29 09:22:32 +03:00
yzqzss
e979adfbeb remove empty <comment> if no comment provided 2023-05-29 10:49:56 +08:00
yzqzss
522807d25d fix: incorrect xml space attr in <text> 2023-05-29 10:48:05 +08:00
nemobis
0621adf0a3
Merge pull request #463 from Pokechu22/broken-http_method-fallback
Fix broken http_method fallback
2023-05-07 21:51:23 +03:00
Pokechu22
aac816e315 Fix broken http_method fallback
This was probably a copy/paste typo. I don't remember if I ever ran into this in practice but it is something I noticed in the past and never submitted a fix for.
2023-05-07 11:29:37 -07:00
nemobis
e339927cc3
Merge pull request #462 from Pokechu22/fix-prop-revisions
Fix exporting via prop=revisions
2023-05-07 20:25:05 +03:00
Pokechu22
df230a96c9 Fix exporting via prop=revisions 2023-05-07 08:39:44 -07:00
emijrp
dd0f4a4593
350,000 2023-02-26 19:31:18 +01:00
nemobis
ef03cff447
Merge pull request #454 from yzqzss/patch-6
Fix a small syntax error in uploader.py
2023-02-19 16:17:44 +02:00
yzqzss
d7153f4c60
Update uploader.py 2023-02-19 01:33:25 +08:00
yzqzss
392fbce083
speed up file scanning
use `set` instead of `list` to speed up the scanning of large numbers of files (>10000) in `images/`.
2023-01-19 21:18:59 +08:00
yzqzss
940d50bbac
Update dumpgenerator.py
fix typo
2023-01-16 02:55:22 +08:00
yzqzss
ebac66f557
Update dumpgenerator.py 2023-01-14 23:50:36 +08:00
yzqzss
0be46c7427
quote title 2023-01-14 22:42:11 +08:00
nemobis
0c4c54dc9e
Merge pull request #449 from yzqzss/patch-2
make `requests.session` to use `--retries` value
2023-01-13 17:52:14 +02:00
yzqzss
90a64c6a22
make requests.session to use --retries value
(default=5)
2023-01-13 23:36:32 +08:00
yzqzss
331f8e122b
update regex to match ' and " in <option> tag
the new versions of MediaWiki use `'`, older use `"`.
2023-01-06 00:24:52 +08:00
nemobis
9d614cf8ad
Merge pull request #444 from Pokechu22/wiki-engine-session
Use the same requests session for getting the wiki engine and checking API/index
2022-11-27 14:48:01 +02:00
Pokechu22
97146c6f01 Use the same requests session for getting the wiki engine and checking API/index 2022-11-26 21:32:08 -08:00
Pokechu22
6668999658 Update User-Agent to latest Firefox 2022-11-26 21:32:08 -08:00
nemobis
ea5e130517
Merge pull request #442 from Pokechu22/missing-image-description
Fix crash when the image description is missing for an image containing non-ascii characters
2022-11-07 23:36:51 +02:00
Pokechu22
cad7260d7c Fix crash when the image description is missing for an image containing non-ascii characters
title is already unicode, so we shouldn't need to decode it (and don't in generateXMLDump).
2022-10-23 16:29:30 -07:00
nemobis
25329be008
Merge pull request #441 from Pokechu22/mwclient-session
Pass requests session to mwclient
2022-10-23 09:11:35 +03:00
Pokechu22
5b3fc4ac7b Pass requests session to mwclient
This means it uses our configured user-agent, as well as any cookies.
2022-10-22 21:06:10 -07:00
nemobis
52fe2d89a6
Merge pull request #440 from Pokechu22/xmlrevisions-skip-empty-revision
Skip empty revisions when using --xmlrevisions
2022-10-22 10:51:37 +03:00
Pokechu22
1af69ca147 Skip empty revisions when using --xmlrevisions
Before, the download would die, and need to be resumed from the start.
2022-10-21 19:57:31 -07:00
Pokechu22
a1bd3b0851 Fix infinite loop on page title scraper 2022-10-13 11:09:00 -07:00
nemobis
5d83703d50
Merge pull request #438 from Pokechu22/getXMLHeader-session
Use `session.get` instead of `requests.get` in `getXMLHeader`
2022-09-18 16:55:02 +03:00
Pokechu22
4a2cbd4843 Use session.get instead of requests.get in getXMLHeader
`session.get` uses our configured User-Agent, while `requests.get` uses the default one.
2022-09-17 14:30:51 -07:00
nemobis
9808279a6a
Merge pull request #436 from Pokechu22/unicode-resume
Work around unicode titles not working with resuming and fix truncation when resuming
2022-09-17 08:39:22 +03:00
Pokechu22
9b2c6e40ae Fix truncation when resuming
There already was code that looks like it was supposed to truncate files, but it calculated the index wrong and didn't properly check all lines. It worked out, though, because it didn't actually call the truncate function.

Now, truncation occurs to the last `</page>` tag. If the XML file ends with a `</page>` tag, then nothing gets truncated. The page is added after that; if nothing was truncated, this will result in the same page being listed twice (which already happened with the missing truncation), but if truncation did happen then the file should no longer be invalid.
2022-09-16 22:20:27 -07:00
Pokechu22
43945c467f Work around unicode titles not working with resuming
Before, you would get UnicodeWarning: Unicode unequal comparison failed to convert both arguments to Unicode - interpreting them as being unequal. The %s versus {} change was needed because otherwise you would get UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128). There is probably a better way of solving that, but this one does work.
2022-09-16 22:15:55 -07:00
Federico Leva
e33f14fce6 Support GiveUpGitHub 2022-07-01 02:35:17 +03:00
nemobis
269841c909
Merge pull request #431 from simonliu99/updatelists
Update mediawiki wikifarm lists
2022-04-14 08:08:52 +03:00
Liu
d9885e0845 Update shoutwiki-spider to remove duplicates 2022-04-12 21:18:40 -04:00
Liu
fcc4080b23 Update neoseeker.com.info instructions 2022-04-12 21:16:33 -04:00
Liu
e7f7266550 Update fandom.com spider and remove duplicates 2022-04-12 21:12:39 -04:00
Liu
9c5c55342d Update miraheze.org spider and remove duplicates 2022-04-12 20:18:03 -04:00
Liu
a96e482fd3 Add .DS_Store to .gitignore 2022-04-12 19:19:32 -04:00
Liu
4c970e358d Remove duplicates from wiki-site.com 2022-04-12 17:43:07 -04:00
Liu
74a8e9609f Update wiki-site.com spider and list 2022-04-12 17:40:44 -04:00
Liu
ba7fab2e96 Add fandom-spider and update metadata and lists 2022-04-12 17:05:30 -04:00
Liu
49e41ee75d Update neoseeker.com spider and list 2022-04-12 14:15:01 -04:00
Liu
6346fd6553 Update shoutwiki.com spider and list 2022-04-12 14:02:29 -04:00
Liu
f93988e9c6 Update fandom.com to HTTPS 2022-04-12 13:22:37 -04:00