Commit Graph

33 Commits

Author SHA1 Message Date
Siddhant Rai
e01071426f feat: field to pass number of posts as a parameter 2024-03-27 19:20:55 +05:30
Siddhant Rai
eed1bfbe50 feat: fields to handle reddit loader + minor changes 2024-03-26 16:07:44 +05:30
Siddhant Rai
60cfea1126 feat: added reddit loader 2024-03-16 20:22:05 +05:30
Alex
4a701cb993
Merge branch 'main' into feature/remote-loads 2024-03-01 14:38:27 +00:00
Pavel
54d187a0ad Fixing ingestion metadata grouping 2024-02-28 19:52:58 +03:00
Pavel
c8d8a8d0b5 Fixing ingestion metadata grouping 2024-02-25 16:03:18 +03:00
Alex
0cb3d12d94 Refactor loader classes to accept inputs directly 2024-02-14 15:17:56 +00:00
Alex
2e14dec12d
Merge pull request #849 from arc53/main
Sync
2024-02-09 14:05:39 +00:00
Anton Larin
9e04b7796a application folder related changes:
* optimize content of requirements.txt
* upgrade libs
* fix imports
2024-01-27 16:25:19 +01:00
Anton Larin
e8099c4db5 script folder related changes:
* optmize content of requirements.txt
* upgrade libs
* fix imports
2024-01-27 14:58:08 +01:00
Exterminator11
f3540aac0f Changed import 2023-10-25 17:07:47 +05:30
Exterminator11
889ce984a9 Made changes 2023-10-25 16:50:01 +05:30
Pavel
381a2740ee change input 2023-10-13 21:52:56 +04:00
Pavel
024674eef3 List check 2023-10-13 11:42:42 +04:00
Pavel
b7d88b4c0f fix wrong link 2023-10-12 19:45:36 +04:00
Pavel
719ca63ec1 fixes 2023-10-12 19:40:23 +04:00
Pavel
2cfb416fd0 Desc loader 2023-10-12 13:44:32 +04:00
Pavel
50f07f9ef5 limit crawler 2023-10-12 12:53:33 +04:00
Pavel
c517bdd2e1 Crawler + sitemap 2023-10-12 12:35:26 +04:00
Pavel
658867cb46 No crawler, no sitemap 2023-10-12 01:03:40 +04:00
Alex
8f2ad38503 tests 2023-10-11 10:13:51 +01:00
John Bampton
32ea0213f7 Remove unneeded duplicate words 2023-10-07 00:11:03 +10:00
John Bampton
2c6ab18e41 Fix spelling 2023-10-02 01:25:23 +10:00
Alex
347cfe253f elastic2 2023-09-29 17:17:48 +01:00
Alex
783e7f6939 working es 2023-09-29 00:32:19 +01:00
Anton Larin
98a97f34f5 fix packaging and imports and introduce tests with pytest.
still issues with celery worker.
2023-08-14 18:20:25 +02:00
Anton Larin
bed25b317c Fix min_tokens logic for grouping documents: documents with (lengh >= min_tokens) should not be grouped into one document for indexing 2023-08-05 13:18:52 +02:00
Alex
a64a30c088 fix 2023-07-24 16:23:49 +01:00
Alex
dac76a867f fix tokens for header 2023-07-24 16:14:08 +01:00
Anton Larin
962becb9a5
Linting
* validate python formatting on every build with Ruff
* fix lint warnings
2023-05-13 10:36:17 +02:00
Anton Larin
168648e789 Proper PEP8 formatting 2023-05-12 12:02:25 +02:00
Alex
8e477c9d16 update worker 2023-03-15 00:23:51 +00:00
Alex
1d2162705d uploads backend first 2023-03-13 14:20:03 +00:00