Federico Leva
3d04dcbf5c
Use GET rather than POST for API requests
...
* It was just an old trick to get past some barriers which were waived with GET.
* It's not conformant and doesn't play well with some redirects.
* Some recent wikis seem to not like it at all, see also issue #311 .
2020-02-08 12:18:03 +02:00
nemobis
0eeb6bfcb0
Upload all relevant wikidump.7z and history.xml.7z
...
Don't stop at the first 7z file found in the directory listing.
Should be fast enough for most users.
Fixes #326
2020-02-07 14:17:14 +02:00
emijrp
527401560c
2020
2020-01-27 10:42:46 +01:00
emijrp
7b03096ace
update wikidot list
2019-12-04 19:23:26 +01:00
emijrp
714c9ea1f7
Merge branch 'master' of https://github.com/WikiTeam/wikiteam
2019-12-04 10:18:45 +01:00
emijrp
6aac36ce57
wikidot wiki list
2019-12-04 10:17:18 +01:00
emijrp
61b0b1b80b
Merge branch 'master' of https://github.com/WikiTeam/wikiteam
2019-12-03 14:09:12 +01:00
emijrp
0cd4efb51c
better spider for wikidot
2019-12-03 14:08:57 +01:00
emijrp
f6c57d59e7
.
2019-12-03 13:55:09 +01:00
emijrp
5fd980c6b7
delay 1 second
2019-12-03 13:52:52 +01:00
emijrp
aecee2dc53
Merge branch 'master' of https://github.com/WikiTeam/wikiteam
2019-12-03 13:33:14 +01:00
emijrp
33a93fd76a
delay 1 second
2019-12-03 13:33:06 +01:00
emijrp
966df37c54
new url https://www.archiveteam.org/
2019-11-30 00:55:36 +01:00
emijrp
d43d017075
Update README.md
2019-11-29 17:21:30 +01:00
Emilio
080b723334
Update wikiapiary-update-ia-params.py
2019-06-11 10:25:49 +02:00
nemobis
be0dcd8e55
Merge pull request #337 from zerote000/master
...
Wikiapiary update script - Change Internet Archive search string to search using both API URL and Index URL.
2019-04-21 09:59:31 +03:00
Christoffer Popp Nørskov
83f72db6cd
Wikiapiary update script - Change Internet Archive search string to search using both API URL and Index URL.
2019-04-20 22:42:09 +02:00
Emilio
287b8b88a3
250,000 wikis
2019-03-03 17:06:31 +01:00
emijrp
ffb39afd1e
800 wikidot sites
2018-07-21 09:57:07 +02:00
emijrp
28158f9b04
wikis
2018-07-20 21:22:54 +02:00
emijrp
7c72c27f2a
wikidot
2018-07-20 16:33:00 +02:00
emijrp
4e8c92b6d2
Merge branch 'master' of https://github.com/WikiTeam/wikiteam
2018-07-13 14:28:57 +02:00
emijrp
0ebf86caf6
update, 1.8M users, 400K wikis
2018-07-13 14:28:44 +02:00
nemobis
bee34f4b1b
Merge pull request #319 from TyIsI/patch-1
...
Updated with vancouver.hackspace.ca -> vanhack.ca domain change
2018-06-21 07:15:58 +03:00
TyIsI
09fac2aeeb
Updated with vancouver.hackspace.ca domain change
2018-06-20 18:23:58 -07:00
emijrp
5aac17ea03
update
2018-06-20 13:03:30 +02:00
emijrp
72b67c74f1
randomize saving
2018-06-20 13:01:01 +02:00
emijrp
ca672426bb
quotes issues in titles
2018-05-31 20:44:02 +02:00
emijrp
a69f44caab
ignore expired wikis
2018-05-28 22:12:15 +02:00
emijrp
a359984932
++
2018-05-26 11:25:53 +02:00
emijrp
5525a3cc4a
++
2018-05-26 10:03:53 +02:00
emijrp
3361e4d09f
Merge branch 'master' of https://github.com/WikiTeam/wikiteam
2018-05-25 23:04:50 +02:00
emijrp
94ebe5e1a3
skiping deactivated wikispaces
2018-05-25 23:04:38 +02:00
Federico Leva
83af47d6c0
Catch and raise PageMissingError when query() returns no pages
2018-05-25 11:00:32 +03:00
Federico Leva
73902d39c0
For old MediaWiki releases, use rawcontinue and wikitools query()
...
Otherwise the query continuation may fail and only the top revisions
will be exported. Tested with Wikia:
http://clubpenguin.wikia.com/api.php?action=query&prop=revisions&titles=Club_Penguin_Wiki
Also add parentid since it's available after all.
https://github.com/WikiTeam/wikiteam/issues/311#issuecomment-391957783
2018-05-25 10:55:44 +03:00
emijrp
d11df60516
Merge branch 'master' of https://github.com/WikiTeam/wikiteam
2018-05-24 13:28:22 +02:00
emijrp
de7822cd37
duckduckgo parser; remove .zip after upload
2018-05-24 13:28:12 +02:00
Federico Leva
bf4781eeea
Merge branch 'master' of github.com:WikiTeam/wikiteam
2018-05-23 18:33:34 +03:00
Federico Leva
da64349a5d
Avoid UnboundLocalError: local variable 'reply' referenced before assignment
2018-05-23 18:32:38 +03:00
emijrp
273f1b33cb
Merge branch 'master' of https://github.com/WikiTeam/wikiteam
2018-05-23 14:26:07 +02:00
emijrp
70eefcc945
skiping deleted wikis
2018-05-23 14:25:51 +02:00
Federico Leva
3b74173e0f
launcher.py style and minor changes
2018-05-22 21:44:18 +03:00
Federico Leva
6fbde766c4
Further reduce os.walk() in launcher.py to speed up
2018-05-22 12:41:02 +03:00
Federico Leva
b7789751fc
UnboundLocalError: local variable 'reply' referenced before assignment
...
Warning!: "./tdicampswikiacom-20180522-wikidump" path exists
Traceback (most recent call last):
File "./dumpgenerator.py", line 2321, in <module>
main()
File "./dumpgenerator.py", line 2283, in main
while reply.lower() not in ['yes', 'y', 'no', 'n']:
UnboundLocalError: local variable 'reply' referenced before assignment
2018-05-22 10:30:11 +03:00
Federico Leva
d76b4b4e01
Raise and catch PageMissingError when revisions API result is incomplete
...
https://github.com/WikiTeam/wikiteam/issues/317
2018-05-22 10:16:52 +03:00
Federico Leva
7a655f0074
Check for sha1 presence in makeXmlFromPage()
2018-05-22 09:33:53 +03:00
Federico Leva
baae839a38
Complete update of the Wikia lists
...
* Reduce the offset to 100, the new limit for non-bots.
* Continue listing even when we get an empty request because all
the wikis in a batch have become inactive and are filtered out.
* Print less from curl's requests.
* Automatically write the domain names to the files here.
2018-05-21 23:26:40 +03:00
Federico Leva
4bc41c3aa2
Actually keep track of listed titles and stop when duplicates are returned
...
https://github.com/WikiTeam/wikiteam/issues/309
2018-05-21 16:41:10 +03:00
Federico Leva
80288cf49e
Catch allpages and namespaces API without query results
2018-05-21 16:41:00 +03:00
Federico Leva
e47f638a24
Define "check" before running checkAPI()
...
Traceback (most recent call last):
File "./dumpgenerator.py", line 2294, in <module>
main()
File "./dumpgenerator.py", line 2239, in main
config, other = getParameters(params=params)
File "./dumpgenerator.py", line 1587, in getParameters
if api and check:
UnboundLocalError: local variable 'check' referenced before assignment
2018-05-21 15:53:51 +03:00