Aloïs Micard
|
bbdba2546f
|
Merge pull request #132 from creekorful/120-blacklister-final-tweaks
Blacklister final tweaks
|
2021-01-08 21:01:22 +01:00 |
|
Aloïs Micard
|
9b83830d95
|
Merge remote-tracking branch 'origin/develop' into 120-blacklister-final-tweaks
|
2021-01-08 20:59:18 +01:00 |
|
Aloïs Micard
|
6c678478a1
|
Implement better blacklist config
|
2021-01-08 20:50:32 +01:00 |
|
Aloïs Micard
|
871daf1bcc
|
scheduler: hash url before caching it
|
2021-01-08 19:53:48 +01:00 |
|
Aloïs Micard
|
69352f7237
|
indexer: sort headers to have deterministic output
|
2021-01-08 19:53:48 +01:00 |
|
Aloïs Micard
|
b84e6d28ac
|
Merge pull request #131 from creekorful/130-optimize-redis-memory
scheduler: hash url before caching it
|
2021-01-08 13:03:22 +01:00 |
|
Aloïs Micard
|
e07ed8156e
|
scheduler: hash url before caching it
|
2021-01-08 13:00:39 +01:00 |
|
Aloïs Micard
|
cae3bb514f
|
Merge pull request #128 from creekorful/114-fix-tests
indexer: sort headers to have deterministic output
|
2021-01-07 21:03:12 +01:00 |
|
Aloïs Micard
|
faee8b48c1
|
indexer: sort headers to have deterministic output
|
2021-01-07 20:58:33 +01:00 |
|
Aloïs Micard
|
8297dc7616
|
Merge pull request #127 from creekorful/124-improve-scheduler-speed
scheduler: increase performances
|
2021-01-07 19:02:01 +01:00 |
|
Aloïs Micard
|
84a28c5be0
|
scheduler: increase event prefetch
|
2021-01-07 18:59:45 +01:00 |
|
Aloïs Micard
|
afed403e6a
|
Remove useless regex
|
2021-01-07 18:56:43 +01:00 |
|
Aloïs Micard
|
4e33813b21
|
Merge remote-tracking branch 'origin/develop' into 124-improve-scheduler-speed
|
2021-01-07 17:51:52 +01:00 |
|
Aloïs Micard
|
7820820fa9
|
scheduler: add batch support for dialing with cache
|
2021-01-07 17:39:14 +01:00 |
|
Aloïs Micard
|
de50ed02e3
|
Merge pull request #126 from creekorful/125-indexer-bulk-indexation
Indexer: implement bulk indexation
|
2021-01-07 13:09:20 +01:00 |
|
Aloïs Micard
|
9b46dc205e
|
Indexer: support buffered indexing
|
2021-01-07 13:06:15 +01:00 |
|
Aloïs Micard
|
71f82d4aad
|
process: Rework whole flags system
- Turn the flag into Feature system to allow easier configuration.
- Add prefetch flag to event feature
|
2021-01-07 10:33:25 +01:00 |
|
Aloïs Micard
|
829afcbb6a
|
Release 0.10.0
|
2021-01-06 18:05:08 +01:00 |
|
Aloïs Micard
|
ec3357be5d
|
Big improvements
- Reduce debug noise
- Create scripts to blacklist 'famous' legit hostnames
- Make blacklister more resilient
- Merge archiver & indexer together
- Better prefix for cache key
- Rework scheduling process
- Update architecture.png
- Remove trandoshanctl
- Improve testing
|
2021-01-06 17:54:39 +01:00 |
|
Aloïs Micard
|
2d7499f7e2
|
Merge pull request #118 from creekorful/106-improve-blacklister
Implement new blacklister
|
2021-01-04 08:55:26 +01:00 |
|
Aloïs Micard
|
8da1f29a43
|
little fixes
|
2021-01-04 08:39:41 +01:00 |
|
Aloïs Micard
|
d0dffb9928
|
Implement new blacklister
|
2021-01-04 08:21:17 +01:00 |
|
Aloïs Micard
|
2133a1aeb5
|
bump app versions
|
2021-01-03 16:17:58 +01:00 |
|
Aloïs Micard
|
46a7a05e4a
|
Merge pull request #116 from creekorful/110-archiver-new-format
Implement new storage format
|
2021-01-03 16:14:09 +01:00 |
|
Aloïs Micard
|
a27092fd13
|
Use new storage format
|
2021-01-03 16:09:55 +01:00 |
|
Aloïs Micard
|
1ac5c1e036
|
Merge remote-tracking branch 'origin/develop' into 110-archiver-new-format
|
2021-01-03 15:53:16 +01:00 |
|
Aloïs Micard
|
571b1e2628
|
Merge pull request #115 from creekorful/111-prevent-duplicates-urls
Prevent duplicates urls in crawlingQueue
|
2021-01-03 15:45:56 +01:00 |
|
Aloïs Micard
|
cc3c0d62d6
|
remove hacky check
|
2021-01-03 15:42:52 +01:00 |
|
Aloïs Micard
|
c8352d3299
|
Use url cache to determinate if crawling should be done
|
2021-01-03 15:24:51 +01:00 |
|
Aloïs Micard
|
e245e5d79a
|
last fixes
|
2021-01-03 14:54:21 +01:00 |
|
Aloïs Micard
|
60a23f7182
|
Fix ttl
|
2021-01-03 14:27:34 +01:00 |
|
Aloïs Micard
|
12362e0100
|
Fix tests case
|
2021-01-03 14:26:01 +01:00 |
|
Aloïs Micard
|
4a0fbd0b9b
|
add configapi key prefix
|
2021-01-03 14:20:45 +01:00 |
|
Aloïs Micard
|
0aba4fa4f9
|
Finalize redis cache impl
|
2021-01-03 14:19:17 +01:00 |
|
Aloïs Micard
|
387a93b7b9
|
Create new flags for cache
|
2021-01-03 14:09:37 +01:00 |
|
Aloïs Micard
|
d826fe73b6
|
Refactor configapi to use new cache
|
2021-01-03 14:04:28 +01:00 |
|
Aloïs Micard
|
477092316b
|
Implement cache logic
|
2021-01-03 13:59:30 +01:00 |
|
Aloïs Micard
|
55ae36f3b9
|
s/database/index
|
2021-01-03 13:33:04 +01:00 |
|
Aloïs Micard
|
87a2fb246f
|
Add new hostname to blacklist
|
2021-01-03 13:01:46 +01:00 |
|
Aloïs Micard
|
38a0a36de0
|
Merge pull request #113 from creekorful/109-pre-declared-mapping
elastic: pre-declare index mapping
|
2021-01-03 11:09:45 +01:00 |
|
Aloïs Micard
|
2d6beb26ce
|
elastic: pre-declare index mapping
|
2021-01-03 10:56:09 +01:00 |
|
Aloïs Micard
|
33ba6b4e7d
|
Merge pull request #112 from creekorful/101-scheduler-whitelisting
make scheduler use whitelisting instead of blacklisting
|
2021-01-03 10:04:46 +01:00 |
|
Aloïs Micard
|
15bae2143d
|
improve test cases
|
2021-01-03 10:02:37 +01:00 |
|
Aloïs Micard
|
4bddf39335
|
make scheduler use whitelisting instead of blacklisting
|
2021-01-03 09:54:59 +01:00 |
|
Aloïs Micard
|
d5eb551d82
|
Merge pull request #104 from creekorful/103-turn-api-into-indexer
Turn api into indexer
|
2021-01-02 18:14:57 +01:00 |
|
Aloïs Micard
|
188df77541
|
improve logging
|
2021-01-02 18:09:32 +01:00 |
|
Aloïs Micard
|
039f8cb76c
|
update architecture.png
|
2021-01-02 18:00:20 +01:00 |
|
Aloïs Micard
|
2eb416845e
|
improve logging
|
2021-01-02 17:59:22 +01:00 |
|
Aloïs Micard
|
c5bd0b3b87
|
remove old CD stuff
|
2021-01-02 17:37:10 +01:00 |
|
Aloïs Micard
|
ad808e6b31
|
indexer: do not publish duplicate URLs
|
2021-01-02 17:35:29 +01:00 |
|