mirror of
https://github.com/ArchiveTeam/reddit-grab
synced 2024-11-04 12:00:12 +00:00
README.md
This commit is contained in:
parent
2dd4e29062
commit
e87a2e4a51
145
README.md
Normal file
145
README.md
Normal file
@ -0,0 +1,145 @@
|
||||
reddit-grab
|
||||
=============
|
||||
|
||||
More information about the archiving project can be found on the ArchiveTeam wiki: [Reddit](http://archiveteam.org/index.php?title=Reddit)
|
||||
|
||||
Setup instructions
|
||||
=========================
|
||||
|
||||
Be sure to replace `YOURNICKHERE` with the nickname that you want to be shown as, on the tracker. You don't need to register it, just pick a nickname you like.
|
||||
|
||||
In most of the below cases, there will be a web interface running at http://localhost:8001/. If you don't know or care what this is, you can just ignore it—otherwise, it gives you a fancy view of what's going on.
|
||||
|
||||
**If anything goes wrong while running the commands below, please scroll down to the bottom of this page. There's troubleshooting information there.**
|
||||
|
||||
Running with a warrior
|
||||
-------------------------
|
||||
|
||||
Follow the [instructions on the ArchiveTeam wiki](http://archiveteam.org/index.php?title=Warrior) for installing the Warrior, and select the "Reddit" project in the Warrior interface.
|
||||
|
||||
Running without a warrior
|
||||
-------------------------
|
||||
To run this outside the warrior, clone this repository, cd into its directory and run:
|
||||
|
||||
pip install seesaw
|
||||
./get-wget-lua.sh
|
||||
|
||||
then start downloading with:
|
||||
|
||||
run-pipeline pipeline.py --concurrent 2 YOURNICKHERE
|
||||
|
||||
For more options, run:
|
||||
|
||||
run-pipeline --help
|
||||
|
||||
If you don't have root access and/or your version of pip is very old, you can replace "pip install seesaw" with:
|
||||
|
||||
wget https://raw.github.com/pypa/pip/master/contrib/get-pip.py ; python get-pip.py --user ; ~/.local/bin/pip install --user seesaw
|
||||
|
||||
so that pip and seesaw are installed in your home, then run
|
||||
|
||||
~/.local/bin/run-pipeline pipeline.py --concurrent 2 YOURNICKHERE
|
||||
|
||||
Running multiple instances on different IPs
|
||||
-------------------------------------------
|
||||
|
||||
This feature requires seesaw version 0.0.16 or greater. Use `pip install --upgrade seesaw` to upgrade.
|
||||
|
||||
Use the `--context-value` argument to pass in `bind_address=123.4.5.6` (replace the IP address with your own).
|
||||
|
||||
Example of running 2 threads, no web interface, and Wget binding of IP address:
|
||||
|
||||
run-pipeline pipeline.py --concurrent 2 YOURNICKHERE --disable-web-server --context-value bind_address=123.4.5.6
|
||||
|
||||
Distribution-specific setup
|
||||
-------------------------
|
||||
### For Debian/Ubuntu:
|
||||
|
||||
adduser --system --group --shell /bin/bash archiveteam
|
||||
apt-get install -y git-core libgnutls-dev lua5.1 liblua5.1-0 liblua5.1-0-dev screen python-dev python-pip bzip2 zlib1g-dev
|
||||
pip install seesaw
|
||||
su -c "cd /home/archiveteam; git clone https://github.com/ArchiveTeam/reddit-grab.git; cd reddit-grab; ./get-wget-lua.sh" archiveteam
|
||||
screen su -c "cd /home/archiveteam/reddit-grab/; run-pipeline pipeline.py --concurrent 2 --address '127.0.0.1' YOURNICKHERE" archiveteam
|
||||
[... ctrl+A D to detach ...]
|
||||
|
||||
Wget-lua is also available on [ArchiveTeam's PPA](https://launchpad.net/~archiveteam/+archive/wget-lua) for Ubuntu.
|
||||
|
||||
### For CentOS:
|
||||
|
||||
Ensure that you have the CentOS equivalent of bzip2 installed as well. You might need the EPEL repository to be enabled.
|
||||
|
||||
yum -y install gnutls-devel lua-devel python-pip zlib-devel
|
||||
pip install seesaw
|
||||
[... pretty much the same as above ...]
|
||||
|
||||
### For openSUSE:
|
||||
|
||||
zypper install liblua5_1 lua51 lua51-devel screen python-pip libgnutls-devel bzip2 python-devel gcc make
|
||||
pip install seesaw
|
||||
[... pretty much the same as above ...]
|
||||
|
||||
### For OS X:
|
||||
|
||||
You need Homebrew. Ensure that you have the OS X equivalent of bzip2 installed as well.
|
||||
|
||||
brew install python lua gnutls
|
||||
pip install seesaw
|
||||
[... pretty much the same as above ...]
|
||||
|
||||
**There is a known issue with some packaged versions of rsync. If you get errors during the upload stage, reddit-grab will not work with your rsync version.**
|
||||
|
||||
This supposedly fixes it:
|
||||
|
||||
alias rsync=/usr/local/bin/rsync
|
||||
|
||||
### For Arch Linux:
|
||||
|
||||
Ensure that you have the Arch equivalent of bzip2 installed as well.
|
||||
|
||||
1. Make sure you have `python2-pip` installed.
|
||||
2. Install [https://aur.archlinux.org/packages/wget-lua/](the wget-lua package from the AUR).
|
||||
3. Run `pip2 install seesaw`.
|
||||
4. Modify the run-pipeline script in seesaw to point at `#!/usr/bin/python2` instead of `#!/usr/bin/python`.
|
||||
5. `useradd --system --group users --shell /bin/bash --create-home archiveteam`
|
||||
6. `screen su -c "cd /home/archiveteam/reddit-grab/; run-pipeline pipeline.py --concurrent 2 --address '127.0.0.1' YOURNICKHERE" archiveteam`
|
||||
|
||||
### For FreeBSD:
|
||||
|
||||
Honestly, I have no idea. `./get-wget-lua.sh` supposedly doesn't work due to differences in the `tar` that ships with FreeBSD. Another problem is the apparent absence of Lua 5.1 development headers. If you figure this out, please do let us know on IRC (irc.efnet.org #archiveteam).
|
||||
|
||||
Troubleshooting
|
||||
=========================
|
||||
|
||||
Broken? These are some of the possible solutions:
|
||||
|
||||
### wget-lua was not successfully built
|
||||
|
||||
If you get errors about `wget.pod` or something similar, the documentation failed to compile - wget-lua, however, compiled fine. Try this:
|
||||
|
||||
cd get-wget-lua.tmp
|
||||
mv src/wget ../wget-lua
|
||||
cd ..
|
||||
|
||||
The `get-wget-lua.tmp` name may be inaccurate. If you have a folder with a similar but different name, use that instead and please let us know on IRC what folder name you had!
|
||||
|
||||
Optionally, if you know what you're doing, you may want to use wgetpod.patch.
|
||||
|
||||
### Problem with gnutls or openssl during get-wget-lua
|
||||
|
||||
Please ensure that gnutls-dev(el) and openssl-dev(el) are installed.
|
||||
|
||||
### ImportError: No module named seesaw
|
||||
|
||||
If you're sure that you followed the steps to install `seesaw`, permissions on your module directory may be set incorrectly. Try the following:
|
||||
|
||||
chmod o+rX -R /usr/local/lib/python2.7/dist-packages
|
||||
|
||||
### Issues in the code
|
||||
|
||||
If you notice a bug and want to file a bug report, please use the GitHub issues tracker.
|
||||
|
||||
Are you a developer? Help write code for us! Look at our [developer documentation](http://archiveteam.org/index.php?title=Dev) for details.
|
||||
|
||||
### Other problems
|
||||
|
||||
Have an issue not listed here? Join us on IRC and ask! We can be found at irc.efnet.org #deaddit.
|
Loading…
Reference in New Issue
Block a user