arkiver
|
0be16f775a
|
Version 20210114.04. Support cookies.
|
2021-01-14 16:54:55 +01:00 |
|
arkiver
|
0d0e824421
|
Version 20210114.03. Do not accept 403.
|
2021-01-14 01:53:08 +01:00 |
|
arkiver
|
94d8b551f8
|
Version 20210114.02. Actually add the user-agents file.
|
2021-01-14 01:38:31 +01:00 |
|
arkiver
|
bc94cf036f
|
Version 20210114.01. Use a random user-agent.
|
2021-01-14 01:34:07 +01:00 |
|
arkiver
|
df1f60079d
|
Version 20210109.01. Use browser user-agent.
|
2021-01-09 14:02:03 +01:00 |
|
arkiver
|
2911934fd4
|
Version 20210108.09. Ignore over18 URLs on old.reddit.com (cookie fix coming up, not a problem on www.reddit.com).
|
2021-01-09 01:01:44 +01:00 |
|
arkiver
|
f3d41ea2e1
|
Version 20210108.08. Do not archive URLs with utm_source for old.reddit.com.
|
2021-01-09 00:55:58 +01:00 |
|
arkiver
|
992fb6b953
|
Version 20210108.07. Use tracker reddit.
|
2021-01-08 23:41:15 +01:00 |
|
arkiver
|
6a8d5a62ac
|
Version 20210108.06.
|
2021-01-08 23:40:31 +01:00 |
|
arkiver
|
1b220e014b
|
Fix for archiving videos.
|
2021-01-08 23:40:14 +01:00 |
|
arkiver
|
4a371be167
|
Version 20210108.05.
|
2021-01-08 23:21:35 +01:00 |
|
arkiver
|
3d20ca90af
|
Handle NULL byte seperated multi items. Support unicode chars in JSON permalink.
|
2021-01-08 23:18:48 +01:00 |
|
arkiver
|
7c5ea717a8
|
Version 20210108.03.
|
2021-01-08 22:42:48 +01:00 |
|
arkiver
|
5f3958c282
|
Merge branch 'master' of https://github.com/ArchiveTeam/reddit-grab
|
2021-01-08 22:42:24 +01:00 |
|
arkiver
|
1924d5217e
|
Version 20210108.02.
|
2021-01-08 22:42:03 +01:00 |
|
arkiver
|
16836ba201
|
Support single comment and post items. Queue outlinks to URLs project.
|
2021-01-08 22:40:49 +01:00 |
|
arkiver
|
ae57a81baf
|
Use multi items.
|
2021-01-08 22:40:09 +01:00 |
|
km09
|
eb945d2470
|
Use updated grab-base
|
2020-10-31 10:49:34 +00:00 |
|
arkiver
|
4284d24b47
|
Version 20201031.01. Support Wget-AT version 1.20.3-at.20201030.01.
|
2020-10-31 01:20:27 +01:00 |
|
arkiver
|
9ecf9a3a30
|
Version 20200902.01. Support Wget-AT version 1.20.3-at.20200902.01.
|
2020-09-02 09:46:17 -04:00 |
|
arkiver
|
99875895b6
|
Version 20200821.02. Set tracker host to trackerproxy.archiveteam.org.
|
2020-08-21 16:22:24 -04:00 |
|
arkiver
|
2087174a5c
|
Version 20200821.01. Ignore comment URL with utm_source param.
|
2020-08-21 16:12:06 -04:00 |
|
arkiver
|
ace1a4f037
|
Version 20200805.01. Support Wget-AT version 1.20.3-at.20200804.01.
|
2020-08-05 10:10:57 -04:00 |
|
arkiver
|
8b40429e95
|
Use new README template.
|
2020-07-29 19:16:48 -04:00 |
|
arkiver
|
23bfe8b12c
|
Version 20200730.01. Support /user/ post better (like /r/).
|
2020-07-29 18:40:05 -04:00 |
|
arkiver
|
450d4e0413
|
Version 20200728.01. Ignore non-reddit URLs. Fix extraction of tokens for morecomments.
|
2020-07-27 19:53:46 -04:00 |
|
arkiver
|
9a6417ecbc
|
Version 20200727.03. Fix handling video URLs without extension.
|
2020-07-27 10:37:39 -04:00 |
|
arkiver
|
869cdc4e6e
|
Remove unused cookies.txt file. Update README.
|
2020-07-26 20:40:10 -04:00 |
|
arkiver
|
911c675e74
|
Version 20200727.02. Set TRACKER_ID to reddittest.
|
2020-07-26 20:11:56 -04:00 |
|
arkiver
|
147c6416ed
|
Version 20200727.01. Use trackerproxy for dictionaries. Ignore irc: URLs.
|
2020-07-26 20:04:24 -04:00 |
|
arkiver
|
910687b053
|
Version 20200726.06. Fix project name for ZSTD dictionary request.
|
2020-07-26 17:44:05 -04:00 |
|
arkiver
|
496c018eef
|
Version 20200726.05. Add cookies to access some quarantines subreddits.
|
2020-07-26 17:20:17 -04:00 |
|
arkiver
|
3d5e7e17f9
|
Version 20200726.04. Use reddittest tracker for size estimate.
|
2020-07-26 16:59:20 -04:00 |
|
arkiver
|
23fec56409
|
Version 20200726.03. Support galleries and comments.
|
2020-07-26 16:43:41 -04:00 |
|
arkiver
|
2f6a602313
|
Version 20200726.01. Fully support new and old design for posts.
|
2020-07-26 16:05:38 -04:00 |
|
arkiver
|
56571306dd
|
Use default upload concurrent of 2.
|
2020-06-30 19:12:32 -04:00 |
|
arkiver
|
40063adcaf
|
Use wget-at with ZSTD.
|
2020-06-30 19:11:06 -04:00 |
|
Arkiver2
|
831f79f0d9
|
Do not import warcio. Update version to 20200102.03.
|
2020-01-02 17:43:53 +01:00 |
|
Arkiver2
|
cf3f6c7af9
|
Skip URL on status code 204. Update version to 20200102.02.
|
2020-01-02 17:41:38 +01:00 |
|
Arkiver2
|
ac65b0a818
|
Update version to 20200102.01.
|
2020-01-02 17:39:20 +01:00 |
|
Arkiver2
|
0eb4b6205a
|
Fix string joining.
|
2020-01-02 17:37:06 +01:00 |
|
Arkiver2
|
ad2cf89404
|
Split off checking if URL was processed. Do not add URL without trailing / already added with trailing /.
|
2019-07-30 01:49:46 +02:00 |
|
Arkiver2
|
d4d5c9a93f
|
Skip amp.reddit.com post pages.
|
2019-07-30 00:18:23 +02:00 |
|
Arkiver2
|
4cf7bd18f0
|
Version 20190729.01; do not get page requisites from outlinks; do not pip install warcio.
|
2019-07-29 22:58:09 +02:00 |
|
Arkiver2
|
8902255c76
|
Version 20190405.01; support www.reddit.com; support videos; support outlinks
|
2019-04-05 04:52:06 +02:00 |
|
Arkiver2
|
9d1ea0c688
|
rewrite
|
2019-02-22 01:15:18 +01:00 |
|
Arkiver2
|
c08fd59a29
|
reddit.lua: ignore urls, fixes
|
2015-07-19 22:58:23 +02:00 |
|
Arkiver2
|
11aef69a32
|
pipeline.py: cookies!
|
2015-07-06 18:46:00 +02:00 |
|
Arkiver2
|
38074381c4
|
cookies
|
2015-07-06 18:45:42 +02:00 |
|
Arkiver2
|
e87a2e4a51
|
README.md
|
2015-07-05 21:50:11 +02:00 |
|