Commit Graph

1133 Commits (master)
 

Author SHA1 Message Date
emijrp 527401560c
2020 4 years ago
emijrp 7b03096ace update wikidot list 4 years ago
emijrp 714c9ea1f7 Merge branch 'master' of https://github.com/WikiTeam/wikiteam 4 years ago
emijrp 6aac36ce57 wikidot wiki list 4 years ago
emijrp 61b0b1b80b Merge branch 'master' of https://github.com/WikiTeam/wikiteam 5 years ago
emijrp 0cd4efb51c better spider for wikidot 5 years ago
emijrp f6c57d59e7 . 5 years ago
emijrp 5fd980c6b7 delay 1 second 5 years ago
emijrp aecee2dc53 Merge branch 'master' of https://github.com/WikiTeam/wikiteam 5 years ago
emijrp 33a93fd76a delay 1 second 5 years ago
emijrp 966df37c54
new url https://www.archiveteam.org/ 5 years ago
emijrp d43d017075
Update README.md 5 years ago
Emilio 080b723334
Update wikiapiary-update-ia-params.py 5 years ago
nemobis be0dcd8e55
Merge pull request #337 from zerote000/master
Wikiapiary update script - Change Internet Archive search string to search using both API URL and Index URL.
5 years ago
Christoffer Popp Nørskov 83f72db6cd Wikiapiary update script - Change Internet Archive search string to search using both API URL and Index URL. 5 years ago
Emilio 287b8b88a3
250,000 wikis 5 years ago
emijrp ffb39afd1e 800 wikidot sites 6 years ago
emijrp 28158f9b04 wikis 6 years ago
emijrp 7c72c27f2a wikidot 6 years ago
emijrp 4e8c92b6d2 Merge branch 'master' of https://github.com/WikiTeam/wikiteam 6 years ago
emijrp 0ebf86caf6 update, 1.8M users, 400K wikis 6 years ago
nemobis bee34f4b1b
Merge pull request #319 from TyIsI/patch-1
Updated with vancouver.hackspace.ca -> vanhack.ca domain change
6 years ago
TyIsI 09fac2aeeb Updated with vancouver.hackspace.ca domain change 6 years ago
emijrp 5aac17ea03 update 6 years ago
emijrp 72b67c74f1 randomize saving 6 years ago
emijrp ca672426bb quotes issues in titles 6 years ago
emijrp a69f44caab ignore expired wikis 6 years ago
emijrp a359984932 ++ 6 years ago
emijrp 5525a3cc4a ++ 6 years ago
emijrp 3361e4d09f Merge branch 'master' of https://github.com/WikiTeam/wikiteam 6 years ago
emijrp 94ebe5e1a3 skiping deactivated wikispaces 6 years ago
Federico Leva 83af47d6c0 Catch and raise PageMissingError when query() returns no pages 6 years ago
Federico Leva 73902d39c0 For old MediaWiki releases, use rawcontinue and wikitools query()
Otherwise the query continuation may fail and only the top revisions
will be exported. Tested with Wikia:
http://clubpenguin.wikia.com/api.php?action=query&prop=revisions&titles=Club_Penguin_Wiki

Also add parentid since it's available after all.

https://github.com/WikiTeam/wikiteam/issues/311#issuecomment-391957783
6 years ago
emijrp d11df60516 Merge branch 'master' of https://github.com/WikiTeam/wikiteam 6 years ago
emijrp de7822cd37 duckduckgo parser; remove .zip after upload 6 years ago
Federico Leva bf4781eeea Merge branch 'master' of github.com:WikiTeam/wikiteam 6 years ago
Federico Leva da64349a5d Avoid UnboundLocalError: local variable 'reply' referenced before assignment 6 years ago
emijrp 273f1b33cb Merge branch 'master' of https://github.com/WikiTeam/wikiteam 6 years ago
emijrp 70eefcc945 skiping deleted wikis 6 years ago
Federico Leva 3b74173e0f launcher.py style and minor changes 6 years ago
Federico Leva 6fbde766c4 Further reduce os.walk() in launcher.py to speed up 6 years ago
Federico Leva b7789751fc UnboundLocalError: local variable 'reply' referenced before assignment
Warning!: "./tdicampswikiacom-20180522-wikidump" path exists
Traceback (most recent call last):
  File "./dumpgenerator.py", line 2321, in <module>
    main()
  File "./dumpgenerator.py", line 2283, in main
    while reply.lower() not in ['yes', 'y', 'no', 'n']:
UnboundLocalError: local variable 'reply' referenced before assignment
6 years ago
Federico Leva d76b4b4e01 Raise and catch PageMissingError when revisions API result is incomplete
https://github.com/WikiTeam/wikiteam/issues/317
6 years ago
Federico Leva 7a655f0074 Check for sha1 presence in makeXmlFromPage() 6 years ago
Federico Leva baae839a38 Complete update of the Wikia lists
* Reduce the offset to 100, the new limit for non-bots.
* Continue listing even when we get an empty request because all
  the wikis in a batch have become inactive and are filtered out.
* Print less from curl's requests.
* Automatically write the domain names to the files here.
6 years ago
Federico Leva 4bc41c3aa2 Actually keep track of listed titles and stop when duplicates are returned
https://github.com/WikiTeam/wikiteam/issues/309
6 years ago
Federico Leva 80288cf49e Catch allpages and namespaces API without query results 6 years ago
Federico Leva e47f638a24 Define "check" before running checkAPI()
Traceback (most recent call last):
  File "./dumpgenerator.py", line 2294, in <module>
    main()
  File "./dumpgenerator.py", line 2239, in main
    config, other = getParameters(params=params)
  File "./dumpgenerator.py", line 1587, in getParameters
    if api and check:
UnboundLocalError: local variable 'check' referenced before assignment
6 years ago
Federico Leva dd32202a55 Merge branch 'master' of github.com:WikiTeam/wikiteam 6 years ago
Federico Leva fcdc1b5cf2 Use os.listdir('.') 6 years ago