arkiver
|
12abd58d4d
|
Version 20230910.03. Increase hardcoded multi item size to 100, for soft limiting on tracker side.
|
2023-09-10 05:37:31 +02:00 |
|
arkiver
|
8a46824231
|
Version 20230910.02. Remove old Lua files.
|
2023-09-10 05:36:35 +02:00 |
|
arkiver
|
a2ffd1f671
|
Version 20230910.01. Use cjson instead of JSON.lua.
|
2023-09-10 05:28:45 +02:00 |
|
arkiver
|
e6b1602e31
|
Version 20230827.01. Use --secure-protocol=TLSv1_3.
|
2023-08-27 22:25:47 +02:00 |
|
arkiver
|
29a6952edb
|
Version 20230727.03. In the Warrior, do not use GnuTLS compiled Wget-AT.
|
2023-07-27 18:03:48 +02:00 |
|
arkiver
|
6e73452ec5
|
Version 20230727.02. Only allow GNU Wget 1.21.3-at.20230623.01. Use Wget-AT option --reject-reserved-subnets. Remove old Wget files. Update README to latest.
|
2023-07-27 17:39:42 +02:00 |
|
arkiver
|
288c9b731c
|
Version 20230727.01. Use openssl instead of gnutls.
|
2023-07-27 16:23:21 +02:00 |
|
arkiver
|
bb6198cc1a
|
Version 20230627.01. Queue outlinks directly to the urls project.
|
2023-06-27 12:59:40 +02:00 |
|
arkiver
|
f1ef7d1697
|
Version 20230619.02. Accept 404 on mediaembed URL.
|
2023-06-19 18:28:52 +02:00 |
|
arkiver
|
d2571cde06
|
Version 20230619.01. Primitive fix to user post verification problems.
|
2023-06-19 02:54:59 +02:00 |
|
arkiver
|
2b19cdcd43
|
Version 20230617.01. Use --secure-protocol=auto for Wget-AT.
|
2023-06-17 15:16:03 +02:00 |
|
masterX244
|
488aaa2181
|
Update pipeline.py
|
2023-06-15 13:09:42 +02:00 |
|
arkiver
|
bea971f375
|
Version 20230614.03. Better check for level error page on svc URL.
|
2023-06-15 01:45:13 +02:00 |
|
arkiver
|
be6e32cba5
|
Version 20230614.02. Extra validity checks.
|
2023-06-14 22:12:15 +02:00 |
|
arkiver
|
e84e804fc5
|
Version 20230614.01. Fix check for valid data.
|
2023-06-14 18:49:41 +02:00 |
|
arkiver
|
4936505b0f
|
Version 20230612.02. Add Reddit problem check for /comments/.../comment/ URL.
|
2023-06-14 03:07:27 +02:00 |
|
arkiver
|
57adbb381c
|
Version 20230612.01. Kill grab when reddit seems to have problems.
|
2023-06-12 19:50:28 +02:00 |
|
arkiver
|
0ef6368945
|
Version 20230611.02. Multi item size 40.
|
2023-06-11 00:12:40 +02:00 |
|
arkiver
|
a974b81618
|
Version 20230611.01. Extra very simple check on validity of old.reddit.com returned body.
|
2023-06-11 00:12:10 +02:00 |
|
arkiver
|
15a0a1a6f5
|
Version 20230607.06. Ignore discovered /r/FIFA URL if coming from a /r/EASportFC parent URL.
|
2023-06-07 23:13:42 +02:00 |
|
arkiver
|
fe17191306
|
Version 20230607.05. Better checking for video. Abort item if no post is found (during blackout for example).
|
2023-06-07 23:05:44 +02:00 |
|
arkiver
|
7bb5c39419
|
Version 20230607.04. Abort on video for now.
|
2023-06-07 22:53:41 +02:00 |
|
arkiver
|
f63c8ab696
|
Version 20230607.03. Prevent getting URL ending with /". Ignore /message/compose URLs.
|
2023-06-07 22:39:57 +02:00 |
|
arkiver
|
393407520b
|
Version 20230607.02. Very simple content checks to check if response is complete. Properly prevent writing to WARC in cases and do not abort all items when finding a problematic URL.
|
2023-06-07 22:35:47 +02:00 |
|
arkiver
|
37ba172c61
|
Version 20230607.01. Use GNU Wget 1.21.3-at.20230605.01 and arguments around DNS.
|
2023-06-07 15:46:23 +02:00 |
|
arkiver
|
da85457aae
|
Version 20230531.01. Use --secure-protocol PFS.
|
2023-05-31 10:16:48 +02:00 |
|
arkiver
|
48b24323c6
|
Version 20230530.01. Queue discovered outlinks to urls-stash-reddit.
|
2023-05-30 19:42:55 +02:00 |
|
arkiver
|
a3b5bcecc1
|
Version 20230529.01. Correctly extract more comment pages from comment pages in the new design. Print debug infrmation for comment pages on old design.
|
2023-05-29 17:56:36 +02:00 |
|
arkiver
|
1a14af2095
|
Version 20230509.02. Support new Wget-AT.
|
2023-05-09 05:48:05 +02:00 |
|
arkiver
|
b2654e9317
|
Version 20230509.01. Support for new design.
|
2023-05-09 05:43:21 +02:00 |
|
arkiver
|
7f4db17348
|
Version 20221021.01. Ignore /tailwind-build.css URL from comment in HTML.
|
2022-10-21 01:11:46 +02:00 |
|
arkiver
|
8a27002fd3
|
Version 20221005.01. Max tries for backfeed to 10.
|
2022-10-05 16:20:17 +02:00 |
|
arkiver
|
bab4b4dcd2
|
Version 20220729.05. Fix aborting item on bad status code on url: item. Keep old retry code otherwise.
|
2022-07-29 04:52:08 +02:00 |
|
arkiver
|
8c45a263aa
|
Version 20220729.04. Queue extra found URLs on media URLs to backfeed.
|
2022-07-28 18:31:23 +02:00 |
|
arkiver
|
e8fe03fbd0
|
Version 20220729.03. Add url: prefix to url item.
|
2022-07-28 18:20:59 +02:00 |
|
arkiver
|
2d8fa4034b
|
Version 20220729.02. Support older Wget versions.
|
2022-07-28 18:15:54 +02:00 |
|
arkiver
|
f81b2ce97e
|
Version 20220729.01. Queue media URLs back to reddit project and download individually.
|
2022-07-28 18:09:04 +02:00 |
|
arkiver
|
cc83009a94
|
Version 20220605.01. Support GNU Wget 1.21.3-at.20220503.02. Fix killing crawl when items cannot be queued.
|
2022-05-06 18:31:38 +02:00 |
|
arkiver
|
7c4cf4548e
|
Version 20220415.02.
|
2022-04-15 21:39:33 +02:00 |
|
arkiver
|
0ce1c59ca4
|
Version 20220415.01. Do not queue /r/undefined/ URLs.
|
2022-04-15 20:38:36 +02:00 |
|
arkiver
|
da28d3c902
|
Version 20220323.03. Fix items to maxtries variable name. Fix backfeed key name.
|
2022-03-23 21:59:52 +01:00 |
|
arkiver
|
8944cf1fc6
|
Version 20220323.02. Fix items to maxtries variable name.
|
2022-03-23 16:36:23 +01:00 |
|
arkiver
|
10eaa7c50c
|
Version 20220323.01. Fix backfeed. Fix maxtries use.
|
2022-03-23 16:16:58 +01:00 |
|
arkiver
|
28f132a052
|
Version 20220312.01. Fix backfeed.
|
2022-03-12 23:53:48 +01:00 |
|
arkiver
|
4f50a0d699
|
Version 20220311.01. Use new backfeed endpoint for queuing.
|
2022-03-11 03:52:49 +01:00 |
|
arkiver
|
383c101aef
|
Version 20220109.02. Cut off URL at space when found between brackets without href= in front.
|
2022-01-09 17:19:29 +01:00 |
|
arkiver
|
df35317e0c
|
Version 20220109.01. Add codepoint to utf8 support. Percent encode outlinks correctly.
|
2022-01-09 17:15:10 +01:00 |
|
arkiver
|
8a3f8cd1de
|
Version 20211004.02. Fix incomplete facebook.com fix.
|
2021-10-04 21:09:21 +02:00 |
|
arkiver
|
d0070db67a
|
Version 20211004.01. Do not check facebook.com while down at the moment.
|
2021-10-04 21:04:03 +02:00 |
|
arkiver
|
0c5e8cd3bd
|
Version 20211001.01. Use GNU Wget 1.20.3-at.20211001.01.
|
2021-10-01 02:44:01 +02:00 |
|