2
0
mirror of https://github.com/WikiTeam/wikiteam synced 2024-11-16 21:27:46 +00:00
Commit Graph

985 Commits

Author SHA1 Message Date
Federico Leva
3d04dcbf5c Use GET rather than POST for API requests
* It was just an old trick to get past some barriers which were waived with GET.
* It's not conformant and doesn't play well with some redirects.
* Some recent wikis seem to not like it at all, see also issue #311.
2020-02-08 12:18:03 +02:00
nemobis
0eeb6bfcb0
Upload all relevant wikidump.7z and history.xml.7z
Don't stop at the first 7z file found in the directory listing.
Should be fast enough for most users.

Fixes #326
2020-02-07 14:17:14 +02:00
emijrp
527401560c
2020 2020-01-27 10:42:46 +01:00
emijrp
7b03096ace update wikidot list 2019-12-04 19:23:26 +01:00
emijrp
714c9ea1f7 Merge branch 'master' of https://github.com/WikiTeam/wikiteam 2019-12-04 10:18:45 +01:00
emijrp
6aac36ce57 wikidot wiki list 2019-12-04 10:17:18 +01:00
emijrp
61b0b1b80b Merge branch 'master' of https://github.com/WikiTeam/wikiteam 2019-12-03 14:09:12 +01:00
emijrp
0cd4efb51c better spider for wikidot 2019-12-03 14:08:57 +01:00
emijrp
f6c57d59e7 . 2019-12-03 13:55:09 +01:00
emijrp
5fd980c6b7 delay 1 second 2019-12-03 13:52:52 +01:00
emijrp
aecee2dc53 Merge branch 'master' of https://github.com/WikiTeam/wikiteam 2019-12-03 13:33:14 +01:00
emijrp
33a93fd76a delay 1 second 2019-12-03 13:33:06 +01:00
emijrp
966df37c54
new url https://www.archiveteam.org/ 2019-11-30 00:55:36 +01:00
emijrp
d43d017075
Update README.md 2019-11-29 17:21:30 +01:00
Emilio
080b723334
Update wikiapiary-update-ia-params.py 2019-06-11 10:25:49 +02:00
nemobis
be0dcd8e55
Merge pull request #337 from zerote000/master
Wikiapiary update script - Change Internet Archive search string to search using both API URL and Index URL.
2019-04-21 09:59:31 +03:00
Christoffer Popp Nørskov
83f72db6cd Wikiapiary update script - Change Internet Archive search string to search using both API URL and Index URL. 2019-04-20 22:42:09 +02:00
Emilio
287b8b88a3
250,000 wikis 2019-03-03 17:06:31 +01:00
emijrp
ffb39afd1e 800 wikidot sites 2018-07-21 09:57:07 +02:00
emijrp
28158f9b04 wikis 2018-07-20 21:22:54 +02:00
emijrp
7c72c27f2a wikidot 2018-07-20 16:33:00 +02:00
emijrp
4e8c92b6d2 Merge branch 'master' of https://github.com/WikiTeam/wikiteam 2018-07-13 14:28:57 +02:00
emijrp
0ebf86caf6 update, 1.8M users, 400K wikis 2018-07-13 14:28:44 +02:00
nemobis
bee34f4b1b
Merge pull request #319 from TyIsI/patch-1
Updated with vancouver.hackspace.ca -> vanhack.ca domain change
2018-06-21 07:15:58 +03:00
TyIsI
09fac2aeeb Updated with vancouver.hackspace.ca domain change 2018-06-20 18:23:58 -07:00
emijrp
5aac17ea03 update 2018-06-20 13:03:30 +02:00
emijrp
72b67c74f1 randomize saving 2018-06-20 13:01:01 +02:00
emijrp
ca672426bb quotes issues in titles 2018-05-31 20:44:02 +02:00
emijrp
a69f44caab ignore expired wikis 2018-05-28 22:12:15 +02:00
emijrp
a359984932 ++ 2018-05-26 11:25:53 +02:00
emijrp
5525a3cc4a ++ 2018-05-26 10:03:53 +02:00
emijrp
3361e4d09f Merge branch 'master' of https://github.com/WikiTeam/wikiteam 2018-05-25 23:04:50 +02:00
emijrp
94ebe5e1a3 skiping deactivated wikispaces 2018-05-25 23:04:38 +02:00
Federico Leva
83af47d6c0 Catch and raise PageMissingError when query() returns no pages 2018-05-25 11:00:32 +03:00
Federico Leva
73902d39c0 For old MediaWiki releases, use rawcontinue and wikitools query()
Otherwise the query continuation may fail and only the top revisions
will be exported. Tested with Wikia:
http://clubpenguin.wikia.com/api.php?action=query&prop=revisions&titles=Club_Penguin_Wiki

Also add parentid since it's available after all.

https://github.com/WikiTeam/wikiteam/issues/311#issuecomment-391957783
2018-05-25 10:55:44 +03:00
emijrp
d11df60516 Merge branch 'master' of https://github.com/WikiTeam/wikiteam 2018-05-24 13:28:22 +02:00
emijrp
de7822cd37 duckduckgo parser; remove .zip after upload 2018-05-24 13:28:12 +02:00
Federico Leva
bf4781eeea Merge branch 'master' of github.com:WikiTeam/wikiteam 2018-05-23 18:33:34 +03:00
Federico Leva
da64349a5d Avoid UnboundLocalError: local variable 'reply' referenced before assignment 2018-05-23 18:32:38 +03:00
emijrp
273f1b33cb Merge branch 'master' of https://github.com/WikiTeam/wikiteam 2018-05-23 14:26:07 +02:00
emijrp
70eefcc945 skiping deleted wikis 2018-05-23 14:25:51 +02:00
Federico Leva
3b74173e0f launcher.py style and minor changes 2018-05-22 21:44:18 +03:00
Federico Leva
6fbde766c4 Further reduce os.walk() in launcher.py to speed up 2018-05-22 12:41:02 +03:00
Federico Leva
b7789751fc UnboundLocalError: local variable 'reply' referenced before assignment
Warning!: "./tdicampswikiacom-20180522-wikidump" path exists
Traceback (most recent call last):
  File "./dumpgenerator.py", line 2321, in <module>
    main()
  File "./dumpgenerator.py", line 2283, in main
    while reply.lower() not in ['yes', 'y', 'no', 'n']:
UnboundLocalError: local variable 'reply' referenced before assignment
2018-05-22 10:30:11 +03:00
Federico Leva
d76b4b4e01 Raise and catch PageMissingError when revisions API result is incomplete
https://github.com/WikiTeam/wikiteam/issues/317
2018-05-22 10:16:52 +03:00
Federico Leva
7a655f0074 Check for sha1 presence in makeXmlFromPage() 2018-05-22 09:33:53 +03:00
Federico Leva
baae839a38 Complete update of the Wikia lists
* Reduce the offset to 100, the new limit for non-bots.
* Continue listing even when we get an empty request because all
  the wikis in a batch have become inactive and are filtered out.
* Print less from curl's requests.
* Automatically write the domain names to the files here.
2018-05-21 23:26:40 +03:00
Federico Leva
4bc41c3aa2 Actually keep track of listed titles and stop when duplicates are returned
https://github.com/WikiTeam/wikiteam/issues/309
2018-05-21 16:41:10 +03:00
Federico Leva
80288cf49e Catch allpages and namespaces API without query results 2018-05-21 16:41:00 +03:00
Federico Leva
e47f638a24 Define "check" before running checkAPI()
Traceback (most recent call last):
  File "./dumpgenerator.py", line 2294, in <module>
    main()
  File "./dumpgenerator.py", line 2239, in main
    config, other = getParameters(params=params)
  File "./dumpgenerator.py", line 1587, in getParameters
    if api and check:
UnboundLocalError: local variable 'check' referenced before assignment
2018-05-21 15:53:51 +03:00