Go to file
Aloïs Micard 71f82d4aad
process: Rework whole flags system
- Turn the flag into Feature system to allow easier configuration.
- Add prefetch flag to event feature
2021-01-07 10:33:25 +01:00
.github/workflows remove old CD stuff 2021-01-02 17:37:10 +01:00
build/docker Big improvements 2021-01-06 17:54:39 +01:00
cmd Big improvements 2021-01-06 17:54:39 +01:00
deployments/docker process: Rework whole flags system 2021-01-07 10:33:25 +01:00
docs Big improvements 2021-01-06 17:54:39 +01:00
internal process: Rework whole flags system 2021-01-07 10:33:25 +01:00
scripts Big improvements 2021-01-06 17:54:39 +01:00
.dockerignore Start implementing new architecture 2020-09-21 16:40:12 +02:00
.gitignore Implement API authentication 2020-09-25 16:28:22 +02:00
go.mod Remove echo dependency 2020-12-29 20:57:27 +01:00
go.sum Remove echo dependency 2020-12-29 20:57:27 +01:00
LICENSE Initial commit 2020-04-03 17:43:59 +02:00
README.md Big improvements 2021-01-06 17:54:39 +01:00

Trandoshan dark web crawler

CI

This repository is a complete rewrite of the Trandoshan dark web crawler. Everything has been written inside a single Git repository to ease maintenance.

Why a rewrite?

The first version of Trandoshan (available here) is working great but not really professional, the code start to be a mess, hard to manage since split in multiple repositories, etc.

I have therefore decided to create & maintain the project in this specific repository, where all components code will be available (as a Go module).

How to start the crawler

To start the crawler, one just need to execute the following command:

$ ./scripts/start.sh

and wait for all containers to start.

Notes

  • You can start the crawler in detached mode by passing --detach to start.sh.
  • Ensure you have at least 3 GB of memory as the Elasticsearch stack docker will require 2 GB.

How to initiate crawling

One can use the RabbitMQ dashhboard available at localhost:15003, and publish a new JSON object in the crawlingQueue.

The object should look like this:

{
  "url": "https://facebookcorewwwi.onion"
}

How to speed up crawling

If one want to speed up the crawling, he can scale the instance of crawling component in order to increase performances. This may be done by issuing the following command after the crawler is started:

$ ./scripts/scale.sh crawler=5

this will set the number of crawler instance to 5.

How to view results

You can use the Kibana dashboard available at http://localhost:15004. You will need to create an index pattern named ' resources', and when it asks for the time field, choose 'time'.

How to hack the crawler

If you've made a change to one of the crawler component and wish to use the updated version when running start.sh you just need to issue the following command:

$ ./script/build.sh

this will rebuild all crawler images using local changes. After that just run start.sh again to have the updated version running.

Architecture

The architecture details are available here.