dhtcrawler is a DHT crawler written in erlang. It can join a DHT network and crawl many P2P torrents. The program save all torrent info into database and provide an http interface to search a torrent by a keyword
You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
Kevin Lynx 7c777f64b6 add torrent downloader 11 years ago
deps fix http cache bug 11 years ago
ebin add torrent downloader 11 years ago
tools/db-replset add database repl-set scripts 11 years ago
www change http interface 11 years ago
HISTORY.md add torrent downloader 11 years ago
README.md readme 11 years ago
create_bin.bat add database repl-set scripts 11 years ago
win_start_crawler.bat first commit 11 years ago
win_start_hash.bat change http interface 11 years ago
win_start_http.bat change http to read data from mongodb slave 11 years ago
win_start_torcache.bat add torrent downloader 11 years ago

README.md

dhtcrawler2

dhtcrawler is a DHT crawler written in erlang. It can join a DHT network and crawl many P2P torrents. The program save all torrent info into database and provide an http interface to search a torrent by a keyword.

screenshot

dhtcrawler2 is an extended version to dhtcrawler. It has improved a lot on crawling speed, and much more stable.

This git branch maintain pre-compiled erlang files to start dhtcrawler2 directly. So you don't need to compile it yourself, just download it and run it to collect torrents and search a torrent by a keyword.

Enjoy it!

Usage

  • install Erlang R16B or newer

  • download mongodb and start mongodb first

      mongod --dbpath your-database-path --setParameter textSearchEnabled=true
    
  • start crawler, on Windows, just click win_start_crawler.bat

  • start hash_reader, on Windows, just click win_start_hash.bat

  • start httpd, on Windows, just click win_start_http.bat

  • wait several minutes and checkout localhost:8000

You can also compile the source code and run it manually. The source code is in src branch of this repo.

Also you can check more technique information at my blog site (Chinese) codemacro.com

Config

Most config value is in priv/dhtcrawler.config, when you first run dhtcrawler, this file will be generated automatically. And the other config values are passed by arguments to erlang functions. In most case you don't need to change these config values, except these network addresses.

Mongodb Replica set

It's not related about dhtcrawler, but only Mongodb, try figure it yourself.

Another http front-end

Yes of course you can write another http front-end UI based on the torrent database, if you're interested in it I can help you about the database format.