You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
arkiver daab40aa6e Version 20240216.01. Use fixed minimum Wget version 1.21.3-at.20231213.03. Use TLSv1.2. Fix check on svc comment content check. 2 months ago
.gitignore Use wget-at with ZSTD. 4 years ago
Dockerfile Version 20231118.01. Switch to gnutls. 5 months ago
LICENSE first files 9 years ago
README.md Extra docker container params 9 months ago
cookies.txt Add support for latest change in _options 2 years ago
ignore-list Version 20200726.01. Fully support new and old design for posts. 4 years ago
pipeline.py Version 20240216.01. Use fixed minimum Wget version 1.21.3-at.20231213.03. Use TLSv1.2. Fix check on svc comment content check. 2 months ago
reddit.lua Version 20240216.01. Use fixed minimum Wget version 1.21.3-at.20231213.03. Use TLSv1.2. Fix check on svc comment content check. 2 months ago
user-agents Version 20210306.01. Remove some AppleWebKir user-agents for getting 403s. 3 years ago

README.md

reddit-grab

More information about the archiving project can be found on the ArchiveTeam wiki: Reddit

Setup instructions

General instructions

Data integrity is very important in Archive Team projects. Please note the following important rules:

We strongly encourage you to join the IRC channel associated with this project in order to be informed about project updates and other important announcements, as well as to be reachable in the event of an issue. The Archive Team Wiki has more information about IRC. We can be found at hackint IRC #shreddit.

If you have any questions or issues during setup, please review the wiki pages or contact us on IRC for troubleshooting information.

Running the project

This and other archiving projects can easily be run using the Archive Team Warrior virtual machine. Follow the instructions on the Archive Team wiki for installing the Warrior, and from the web interface running at http://localhost:8001/, enter the nickname that you want to be shown as on the tracker. There is no registration, just pick a nickname you like. Then, select the Reddit project in the Warrior interface.

Project-specific Docker container (for more advanced users)

Alternatively, more advanced users can also run projects using Docker. While users of the Warrior can switch between projects using a web interface, Docker containers are specific to each project. However, while the Warrior supports a maximum of 6 concurrent items, a Docker container supports a maximum of 20 concurrent items. The instructions below are a short overview. For more information and detailed explanations of the commands, follow the follow the Docker instructions on the Archive Team wiki.

It is advised to use Watchtower to automatically update the project container:

docker run -d --name watchtower --restart=unless-stopped -v /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --label-enable --cleanup --interval 3600 --include-restarting

after which the project container can be run:

docker run -d --name archiveteam --label=com.centurylinklabs.watchtower.enable=true --log-driver json-file --log-opt max-size=50m --restart=unless-stopped atdr.meo.ws/archiveteam/reddit-grab --concurrent 1 YOURNICKHERE

Be sure to replace YOURNICKHERE with the nickname that you want to be shown as on the tracker. There is no registration, just pick a nickname you like.

Supporting Archive Team

Behind the scenes Archive Team has infrastructure to run the projects and process the data with. If you would like to help out with the costs of our infrastructure, a donation on our Open Collective would be very welcome.

Issues in the code

If you notice a bug and want to file a bug report, please use the GitHub issues tracker.

Are you a developer? Help write code for us! Look at our developer documentation for details.

Other problems

Have an issue not listed here? Join us on IRC and ask! We can be found at hackint IRC #shreddit.