Commit Graph

142 Commits

Author SHA1 Message Date
arkiver
0e7392acd3 Version 20231019.01. Use --secure-protocol=TLSv1_2. 2023-10-19 03:23:51 +02:00
arkiver
4bcc04734f Version 20231017.02. Use --secure-protocol=TLSv1_3. 2023-10-17 22:59:48 +02:00
arkiver
b1bf682030 Version 20231017.01. Use --secure-protocol=auto. Use new minimum Wget version checker. 2023-10-17 00:18:22 +02:00
arkiver
a0e35bb72d Version 20230910.05. Install Lua utf8 library through warrior-install.sh. 2023-09-10 22:54:30 +02:00
arkiver
3add4f891c Version 20230910.04. Install lua utf8 library. Fix converting unicode codepoint to utf8 character support. 2023-09-10 22:49:35 +02:00
arkiver
12abd58d4d Version 20230910.03. Increase hardcoded multi item size to 100, for soft limiting on tracker side. 2023-09-10 05:37:31 +02:00
arkiver
8a46824231 Version 20230910.02. Remove old Lua files. 2023-09-10 05:36:35 +02:00
arkiver
a2ffd1f671 Version 20230910.01. Use cjson instead of JSON.lua. 2023-09-10 05:28:45 +02:00
arkiver
e6b1602e31 Version 20230827.01. Use --secure-protocol=TLSv1_3. 2023-08-27 22:25:47 +02:00
arkiver
d210e65967
Merge pull request #18 from imerr/master-1
Extra docker container params
2023-08-04 00:04:52 +02:00
Robin Rolf
b7feddc147
Extra docker container params
watchtower: `--include-restarting` also update if the container is in a crash loop due to a bad build or the like
grab container: `--log-driver json-file --log-opt max-size=50m` to limit logs, docker defaults to json-file with no limit
2023-08-03 23:56:37 +02:00
arkiver
29a6952edb Version 20230727.03. In the Warrior, do not use GnuTLS compiled Wget-AT. 2023-07-27 18:03:48 +02:00
arkiver
6e73452ec5 Version 20230727.02. Only allow GNU Wget 1.21.3-at.20230623.01. Use Wget-AT option --reject-reserved-subnets. Remove old Wget files. Update README to latest. 2023-07-27 17:39:42 +02:00
arkiver
288c9b731c Version 20230727.01. Use openssl instead of gnutls. 2023-07-27 16:23:21 +02:00
arkiver
bb6198cc1a Version 20230627.01. Queue outlinks directly to the urls project. 2023-06-27 12:59:40 +02:00
arkiver
f1ef7d1697 Version 20230619.02. Accept 404 on mediaembed URL. 2023-06-19 18:28:52 +02:00
arkiver
d2571cde06 Version 20230619.01. Primitive fix to user post verification problems. 2023-06-19 02:54:59 +02:00
arkiver
2b19cdcd43 Version 20230617.01. Use --secure-protocol=auto for Wget-AT. 2023-06-17 15:16:03 +02:00
arkiver
5a0dcd6dd9
Merge pull request #17 from masterX244/master
Ignore fix for certain 404-ing garbage
2023-06-15 15:06:42 +02:00
masterX244
488aaa2181
Update pipeline.py 2023-06-15 13:09:42 +02:00
masterX244
520e8b95d6
Ignore for some garbge URLs that 404
wget guesses too much and generates bad URLs, ignore needed
2023-06-15 13:08:24 +02:00
arkiver
bea971f375 Version 20230614.03. Better check for level error page on svc URL. 2023-06-15 01:45:13 +02:00
arkiver
be6e32cba5 Version 20230614.02. Extra validity checks. 2023-06-14 22:12:15 +02:00
arkiver
e84e804fc5 Version 20230614.01. Fix check for valid data. 2023-06-14 18:49:41 +02:00
arkiver
4936505b0f Version 20230612.02. Add Reddit problem check for /comments/.../comment/ URL. 2023-06-14 03:07:27 +02:00
arkiver
57adbb381c Version 20230612.01. Kill grab when reddit seems to have problems. 2023-06-12 19:50:28 +02:00
arkiver
0ef6368945 Version 20230611.02. Multi item size 40. 2023-06-11 00:12:40 +02:00
arkiver
a974b81618 Version 20230611.01. Extra very simple check on validity of old.reddit.com returned body. 2023-06-11 00:12:10 +02:00
arkiver
15a0a1a6f5 Version 20230607.06. Ignore discovered /r/FIFA URL if coming from a /r/EASportFC parent URL. 2023-06-07 23:13:42 +02:00
arkiver
fe17191306 Version 20230607.05. Better checking for video. Abort item if no post is found (during blackout for example). 2023-06-07 23:05:44 +02:00
arkiver
7bb5c39419 Version 20230607.04. Abort on video for now. 2023-06-07 22:53:41 +02:00
arkiver
f63c8ab696 Version 20230607.03. Prevent getting URL ending with /". Ignore /message/compose URLs. 2023-06-07 22:39:57 +02:00
arkiver
393407520b Version 20230607.02. Very simple content checks to check if response is complete. Properly prevent writing to WARC in cases and do not abort all items when finding a problematic URL. 2023-06-07 22:35:47 +02:00
arkiver
37ba172c61 Version 20230607.01. Use GNU Wget 1.21.3-at.20230605.01 and arguments around DNS. 2023-06-07 15:46:23 +02:00
arkiver
da85457aae Version 20230531.01. Use --secure-protocol PFS. 2023-05-31 10:16:48 +02:00
arkiver
48b24323c6 Version 20230530.01. Queue discovered outlinks to urls-stash-reddit. 2023-05-30 19:42:55 +02:00
arkiver
a3b5bcecc1 Version 20230529.01. Correctly extract more comment pages from comment pages in the new design. Print debug infrmation for comment pages on old design. 2023-05-29 17:56:36 +02:00
arkiver
1a14af2095 Version 20230509.02. Support new Wget-AT. 2023-05-09 05:48:05 +02:00
arkiver
b2654e9317 Version 20230509.01. Support for new design. 2023-05-09 05:43:21 +02:00
arkiver
7f4db17348 Version 20221021.01. Ignore /tailwind-build.css URL from comment in HTML. 2022-10-21 01:11:46 +02:00
arkiver
8a27002fd3 Version 20221005.01. Max tries for backfeed to 10. 2022-10-05 16:20:17 +02:00
arkiver
35e31af37f Queue redditstatic.com URLs as outlinks. 2022-10-05 16:19:53 +02:00
arkiver
bab4b4dcd2 Version 20220729.05. Fix aborting item on bad status code on url: item. Keep old retry code otherwise. 2022-07-29 04:52:08 +02:00
arkiver
8c45a263aa Version 20220729.04. Queue extra found URLs on media URLs to backfeed. 2022-07-28 18:31:23 +02:00
arkiver
e8fe03fbd0 Version 20220729.03. Add url: prefix to url item. 2022-07-28 18:20:59 +02:00
arkiver
2d8fa4034b Version 20220729.02. Support older Wget versions. 2022-07-28 18:15:54 +02:00
arkiver
f81b2ce97e Version 20220729.01. Queue media URLs back to reddit project and download individually. 2022-07-28 18:09:04 +02:00
arkiver
edacb2065a Fix README. 2022-05-07 04:49:30 +02:00
arkiver
cc83009a94 Version 20220605.01. Support GNU Wget 1.21.3-at.20220503.02. Fix killing crawl when items cannot be queued. 2022-05-06 18:31:38 +02:00
arkiver
7c4cf4548e Version 20220415.02. 2022-04-15 21:39:33 +02:00