Version 20230727.02. Only allow GNU Wget 1.21.3-at.20230623.01. Use Wget-AT option --reject-reserved-subnets. Remove old Wget files. Update README to latest.
More information about the archiving project can be found on the ArchiveTeam wiki: [Reddit](http://archiveteam.org/index.php?title=Reddit)
More information about the archiving project can be found on the ArchiveTeam wiki: [Reddit](https://wiki.archiveteam.org/index.php?title=Reddit)
Setup instructions
=========================
## Setup instructions
Be sure to replace `YOURNICKHERE` with the nickname that you want to be shown as, on the tracker. You don't need to register it, just pick a nickname you like.
### General instructions
In most of the below cases, there will be a web interface running at http://localhost:8001/. If you don't know or care what this is, you can just ignore it—otherwise, it gives you a fancy view of what's going on.
Data integrity is very important in Archive Team projects. Please note the following important rules:
**If anything goes wrong while running the commands below, please scroll down to the bottom of this page. There's troubleshooting information there.**
* [Do not use proxies or VPNs](https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior#Can_I_use_whatever_internet_access_for_the_Warrior?).
* Run the project using the either the Warrior or the project-specific Docker container as listed below. [Do not modify project code](https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior#I'd_like_to_help_write_code_or_I_want_to_tweak_the_scripts_to_run_to_my_liking._Where_can_I_find_more_info?_Where_is_the_source_code_and_repository?). Compiling the project dependencies yourself is no longer supported.
* You can share your tracker nickname(s) across machine(s) you personally operate, but not with machines operated by other users. Nickname sharing makes it harder to inspect data if a problem arises.
* [Use clean internet connections](https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior#Can_I_use_whatever_internet_access_for_the_Warrior?).
* Only x64-based machines are supported. [ARM (used on Raspberry Pi and Apple Silicon Macs) is not currently supported](https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior#Can_I_run_the_Warrior_on_ARM_or_some_other_unusual_architecture?).
* See the [Archive Team Wiki](https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior#Warrior_FAQ) for additional information.
Running with a warrior
-------------------------
We strongly encourage you to join the IRC channel associated with this project in order to be informed about project updates and other important announcements, as well as to be reachable in the event of an issue. The Archive Team Wiki has [more information about IRC](https://wiki.archiveteam.org/index.php/Archiveteam:IRC). We can be found at hackint IRC [#shreddit](https://webirc.hackint.org/#irc://irc.hackint.org/#shreddit).
Follow the [instructions on the ArchiveTeam wiki](http://archiveteam.org/index.php?title=Warrior) for installing the Warrior, and select the "Reddit" project in the Warrior interface.
**If you have any questions or issues during setup, please review the wiki pages or contact us on IRC for troubleshooting information.**
Running with Docker
-------------------------
### Running the project
The recommended way to run these projects is with Docker. The instructions below are a short overview. For more information and detailed explanations of the commands, follow the follow the [Docker instructions on the Archive Team wiki](https://wiki.archiveteam.org/index.php/Running_Archive_Team_Projects_with_Docker).
#### Archive Team Warrior (recommended for most users)
It is advised to use watchtower to automatically update the project. This requires watchtower:
This and other archiving projects can easily be run using the [Archive Team Warrior](https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior) virtual machine. Follow the [instructions on the Archive Team wiki](https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior) for installing the Warrior, and from the web interface running at `http://localhost:8001/`, enter the nickname that you want to be shown as on the tracker. There is no registration, just pick a nickname you like. Then, select the `Reddit` project in the Warrior interface.
#### Project-specific Docker container (for more advanced users)
after which the project can be run:
Alternatively, more advanced users can also run projects using Docker. While users of the Warrior can switch between projects using a web interface, Docker containers are specific to each project. However, while the Warrior supports a maximum of 6 concurrent items, a Docker container supports a maximum of 20 concurrent items. The instructions below are a short overview. For more information and detailed explanations of the commands, follow the follow the [Docker instructions on the Archive Team wiki](https://wiki.archiveteam.org/index.php/Running_Archive_Team_Projects_with_Docker).
docker run --name archiveteam --label=com.centurylinklabs.watchtower.enable=true --restart=unless-stopped atdr.meo.ws/archiveteam/reddit-grab --concurrent 1 YOURNICKHERE
It is advised to use [Watchtower](https://github.com/containrrr/watchtower) to automatically update the project container:
Running without a warrior or Docker
-------------------------
To run this outside the warrior, clone this repository, cd into its directory and run:
Be sure to replace `YOURNICKHERE` with the nickname that you want to be shown as on the tracker. There is no registration, just pick a nickname you like.
For more options, run:
### Supporting Archive Team
run-pipeline3 --help
If you don't have root access and/or your version of pip is very old, you can replace "pip install --upgrade seesaw" with:
In __Debian Jessie, Ubuntu 18.04 Bionic and above__, the `libgnutls-dev` package was renamed to `libgnutls28-dev`. So, you need to do the following instead:
You need Homebrew. Ensure that you have the OS X equivalent of bzip2 installed as well.
brew install python lua gnutls
pip install --upgrade seesaw
[... pretty much the same as above ...]
**There is a known issue with some packaged versions of rsync. If you get errors during the upload stage, reddit-grab will not work with your rsync version.**
This supposedly fixes it:
alias rsync=/usr/local/bin/rsync
### For Arch Linux:
Ensure that you have the Arch equivalent of bzip2 installed as well.
1. Make sure you have `python2-pip` installed.
2. Install [the wget-lua package from the AUR](https://aur.archlinux.org/packages/wget-lua/).
3. Run `pip2 install --upgrade seesaw`.
4. Modify the run-pipeline script in seesaw to point at `#!/usr/bin/python2` instead of `#!/usr/bin/python`.
Honestly, I have no idea. `./get-wget-lua.sh` supposedly doesn't work due to differences in the `tar` that ships with FreeBSD. Another problem is the apparent absence of Lua 5.1 development headers. If you figure this out, please do let us know on IRC (irc.hackint.org #archiveteam).
Troubleshooting
=========================
Broken? These are some of the possible solutions:
### wget-lua was not successfully built
If you get errors about `wget.pod` or something similar, the documentation failed to compile - wget-lua, however, compiled fine. Try this:
cd get-wget-lua.tmp
mv src/wget ../wget-lua
cd ..
The `get-wget-lua.tmp` name may be inaccurate. If you have a folder with a similar but different name, use that instead and please let us know on IRC what folder name you had!
Optionally, if you know what you're doing, you may want to use wgetpod.patch.
### Problem with gnutls or openssl during get-wget-lua
Please ensure that gnutls-dev(el) and openssl-dev(el) are installed.
### ImportError: No module named seesaw
If you're sure that you followed the steps to install `seesaw`, permissions on your module directory may be set incorrectly. Try the following:
Behind the scenes Archive Team has infrastructure to run the projects and process the data with. If you would like to help out with the costs of our infrastructure, a donation on our [Open Collective](https://opencollective.com/archiveteam) would be very welcome.
### Issues in the code
If you notice a bug and want to file a bug report, please use the GitHub issues tracker.
Are you a developer? Help write code for us! Look at our [developer documentation](http://archiveteam.org/index.php?title=Dev) for details.
Are you a developer? Help write code for us! Look at our [developer documentation](https://wiki.archiveteam.org/index.php?title=Dev) for details.