Commit Graph

312 Commits

Author SHA1 Message Date
Aloïs Micard
bbdba2546f
Merge pull request #132 from creekorful/120-blacklister-final-tweaks
Blacklister final tweaks
2021-01-08 21:01:22 +01:00
Aloïs Micard
9b83830d95
Merge remote-tracking branch 'origin/develop' into 120-blacklister-final-tweaks 2021-01-08 20:59:18 +01:00
Aloïs Micard
6c678478a1
Implement better blacklist config 2021-01-08 20:50:32 +01:00
Aloïs Micard
871daf1bcc
scheduler: hash url before caching it 2021-01-08 19:53:48 +01:00
Aloïs Micard
69352f7237
indexer: sort headers to have deterministic output 2021-01-08 19:53:48 +01:00
Aloïs Micard
b84e6d28ac
Merge pull request #131 from creekorful/130-optimize-redis-memory
scheduler: hash url before caching it
2021-01-08 13:03:22 +01:00
Aloïs Micard
e07ed8156e
scheduler: hash url before caching it 2021-01-08 13:00:39 +01:00
Aloïs Micard
cae3bb514f
Merge pull request #128 from creekorful/114-fix-tests
indexer: sort headers to have deterministic output
2021-01-07 21:03:12 +01:00
Aloïs Micard
faee8b48c1
indexer: sort headers to have deterministic output 2021-01-07 20:58:33 +01:00
Aloïs Micard
8297dc7616
Merge pull request #127 from creekorful/124-improve-scheduler-speed
scheduler: increase performances
2021-01-07 19:02:01 +01:00
Aloïs Micard
84a28c5be0
scheduler: increase event prefetch 2021-01-07 18:59:45 +01:00
Aloïs Micard
afed403e6a
Remove useless regex 2021-01-07 18:56:43 +01:00
Aloïs Micard
4e33813b21
Merge remote-tracking branch 'origin/develop' into 124-improve-scheduler-speed 2021-01-07 17:51:52 +01:00
Aloïs Micard
7820820fa9
scheduler: add batch support for dialing with cache 2021-01-07 17:39:14 +01:00
Aloïs Micard
de50ed02e3
Merge pull request #126 from creekorful/125-indexer-bulk-indexation
Indexer: implement bulk indexation
2021-01-07 13:09:20 +01:00
Aloïs Micard
9b46dc205e
Indexer: support buffered indexing 2021-01-07 13:06:15 +01:00
Aloïs Micard
71f82d4aad
process: Rework whole flags system
- Turn the flag into Feature system to allow easier configuration.
- Add prefetch flag to event feature
2021-01-07 10:33:25 +01:00
Aloïs Micard
829afcbb6a
Release 0.10.0 2021-01-06 18:05:08 +01:00
Aloïs Micard
ec3357be5d
Big improvements
- Reduce debug noise
- Create scripts to blacklist 'famous' legit hostnames
- Make blacklister more resilient
- Merge archiver & indexer together
- Better prefix for cache key
- Rework scheduling process
- Update architecture.png
- Remove trandoshanctl
- Improve testing
2021-01-06 17:54:39 +01:00
Aloïs Micard
2d7499f7e2
Merge pull request #118 from creekorful/106-improve-blacklister
Implement new blacklister
2021-01-04 08:55:26 +01:00
Aloïs Micard
8da1f29a43
little fixes 2021-01-04 08:39:41 +01:00
Aloïs Micard
d0dffb9928
Implement new blacklister 2021-01-04 08:21:17 +01:00
Aloïs Micard
2133a1aeb5
bump app versions 2021-01-03 16:17:58 +01:00
Aloïs Micard
46a7a05e4a
Merge pull request #116 from creekorful/110-archiver-new-format
Implement new storage format
2021-01-03 16:14:09 +01:00
Aloïs Micard
a27092fd13
Use new storage format 2021-01-03 16:09:55 +01:00
Aloïs Micard
1ac5c1e036
Merge remote-tracking branch 'origin/develop' into 110-archiver-new-format 2021-01-03 15:53:16 +01:00
Aloïs Micard
571b1e2628
Merge pull request #115 from creekorful/111-prevent-duplicates-urls
Prevent duplicates urls in crawlingQueue
2021-01-03 15:45:56 +01:00
Aloïs Micard
cc3c0d62d6
remove hacky check 2021-01-03 15:42:52 +01:00
Aloïs Micard
c8352d3299
Use url cache to determinate if crawling should be done 2021-01-03 15:24:51 +01:00
Aloïs Micard
e245e5d79a
last fixes 2021-01-03 14:54:21 +01:00
Aloïs Micard
60a23f7182
Fix ttl 2021-01-03 14:27:34 +01:00
Aloïs Micard
12362e0100
Fix tests case 2021-01-03 14:26:01 +01:00
Aloïs Micard
4a0fbd0b9b
add configapi key prefix 2021-01-03 14:20:45 +01:00
Aloïs Micard
0aba4fa4f9
Finalize redis cache impl 2021-01-03 14:19:17 +01:00
Aloïs Micard
387a93b7b9
Create new flags for cache 2021-01-03 14:09:37 +01:00
Aloïs Micard
d826fe73b6
Refactor configapi to use new cache 2021-01-03 14:04:28 +01:00
Aloïs Micard
477092316b
Implement cache logic 2021-01-03 13:59:30 +01:00
Aloïs Micard
55ae36f3b9
s/database/index 2021-01-03 13:33:04 +01:00
Aloïs Micard
87a2fb246f
Add new hostname to blacklist 2021-01-03 13:01:46 +01:00
Aloïs Micard
38a0a36de0
Merge pull request #113 from creekorful/109-pre-declared-mapping
elastic: pre-declare index mapping
2021-01-03 11:09:45 +01:00
Aloïs Micard
2d6beb26ce
elastic: pre-declare index mapping 2021-01-03 10:56:09 +01:00
Aloïs Micard
33ba6b4e7d
Merge pull request #112 from creekorful/101-scheduler-whitelisting
make scheduler use whitelisting instead of blacklisting
2021-01-03 10:04:46 +01:00
Aloïs Micard
15bae2143d
improve test cases 2021-01-03 10:02:37 +01:00
Aloïs Micard
4bddf39335
make scheduler use whitelisting instead of blacklisting 2021-01-03 09:54:59 +01:00
Aloïs Micard
d5eb551d82
Merge pull request #104 from creekorful/103-turn-api-into-indexer
Turn api into indexer
2021-01-02 18:14:57 +01:00
Aloïs Micard
188df77541
improve logging 2021-01-02 18:09:32 +01:00
Aloïs Micard
039f8cb76c
update architecture.png 2021-01-02 18:00:20 +01:00
Aloïs Micard
2eb416845e
improve logging 2021-01-02 17:59:22 +01:00
Aloïs Micard
c5bd0b3b87
remove old CD stuff 2021-01-02 17:37:10 +01:00
Aloïs Micard
ad808e6b31
indexer: do not publish duplicate URLs 2021-01-02 17:35:29 +01:00