Commit Graph

56 Commits (10023344a5a65808be03531099bdd9e2b1c2a49c)

Author SHA1 Message Date
Markus Heiser e92d40c854 [enh] implement a OnlineUrlSearchProcessor
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2 years ago
Martin Fischer 640c404844 [pyright:strict] searx.search.checker.background 2 years ago
Alexandre Flament 5439dd5fb1 [fix] checker: fix image fetch
Since https://github.com/searxng/searxng/pull/354
the searx.network.stream(...) returns a tuple

This commits update the checker code according to
this function signature change.
2 years ago
Martin Fischer def62c3a47 [typing] add type hints for dictionaries 2 years ago
Alexandre Flament 2134703b4b [enh] settings.yml: implement general.enable_metrics
* allow not to record metrics (response time, etc...)
* this commit doesn't change the UI. If the metrics are disabled
  /stats and /stats/errors will return empty response.
  in /preferences, the columns response time and reliability will be empty.
3 years ago
Markus Heiser 3d96a9839a [format.python] initial formatting of the python code
This patch was generated by black [1]::

    make format.python

[1] https://github.com/psf/black

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Markus Heiser fcdc2c2cd2 [format.python] disable py code formatting for some hunks of code
Disable the python code formatting from python-black, where the readability of
code suffers by formatting.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Alexandre Flament f9c6393502 [enh] verify that Tor proxy works every time searx starts
based on @MarcAbonce commit on searx
3 years ago
Alexandre Flament 29893cf816 [fix] searx.network.stream: fix memory leak 3 years ago
Alexandre Flament 2eab89b4ca [fix] checker: fix memory usage
* download images using the "image_proxy" network (HTTP/1 instead of HTTP/2)
* don't cache data: URL (reduce memory usage)
* after each test: purge image URL cache then call garbage collector
* download only the first 64kb of images
3 years ago
Markus Heiser 443bf35e09 [pylint] fix global-variable-not-assigned issues
If there is no write access, there is no need for global.  Remove global
statement if there is no assignment.

global-variable-not-assigned:
  Using global for names but no assignment is done Used when a variable is
  defined through the "global" statement but no assignment to this variable is
  done.

In Pylint 2.11 the global-variable-not-assigned checker now catches global
variables that are never reassigned in a local scope and catches (reassigned)
functions [1][2]

[1] https://pylint.pycqa.org/en/latest/whatsnew/2.11.html
[2] https://github.com/PyCQA/pylint/issues/1375

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Alexandre Flament b513917ef9 [mod] searx.metrics & searx.search: use the engine loggers
metrics & processors use the engine logger
3 years ago
Alexandre Flament 0b27c8698f [doc] update docs/dev/plugins.rst 3 years ago
Alexandre Flament 660c180170 [mod] plugin: call on_result after each engine from the ResultContainer
Currently, searx.search.Search calls on_result once the engine results have been merged
(ResultContainer.order_results).

on_result plugins can rewrite the results: once the URL(s) are modified, even they can be merged,
it won't be the case since ResultContainer.order_results has already be called.

This commit call on_result inside for each result of each engines.
In addition the on_result function can return False to remove the result.

Note: the on_result function now run on the engine thread instead of the Flask thread.
3 years ago
Markus Heiser 2a3b9a2e26 [pylint] searx: drop no longer needed 'missing-function-docstring'
Suggested-by: @dalf https://github.com/searxng/searxng/issues/102#issuecomment-914168470
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Alexandre Flament 2f363858b8 [fix] searx.search.checker.get_result() always return a dict
So checker_results['status'] == 'ok' is enough to check the checker result.
See searx/webapp.py, /preferences endpoint
3 years ago
Markus Heiser 24f2376c11 [pylint] prepare for pylint v2.9.3 / fix some (new) pylint issues
Upgrade from pylint v2.8.3 to 2.9.3 raise some new issues::

  searx/search/checker/__main__.py:37:26: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
  searx/search/checker/__main__.py:38:26: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
  searx/search/processors/__init__.py:20:0: R0402: Use 'from searx import engines' instead (consider-using-from-import)
  searx/preferences.py:182:19: C0207: Use data.split('-', maxsplit=1)[0] instead (use-maxsplit-arg)
  searx/preferences.py:506:15: R1733: Unnecessary dictionary index lookup, use 'user_setting' instead (unnecessary-dict-index-lookup)
  searx/webapp.py:436:0: C0206: Consider iterating with .items() (consider-using-dict-items)
  searx/webapp.py:950:4: C0206: Consider iterating with .items() (consider-using-dict-items)

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Markus Heiser f122cb0e27 [fix] typo: online_dictionnary --> online_dictionary
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Alexandre Flament 6fa114c9ba [mod] settings_default: remove searx.search.max_request_timeout global variable 3 years ago
Markus Heiser 6f1446d55f [pylint] searx/search/__init__.py & replace lic-text by SPDX tag
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Alexandre Flament 426fadccb3 [mod] remove gc.collect() after each user request 3 years ago
Markus Heiser fa0d05c313 [pylint] checker/__main__.py & checker/background.py
Lint files that has been touched by [PR #58]

[PR #58] https://github.com/searxng/searxng/pull/58

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Alexandre Flament 8c1a65d32f [mod] multithreading only in searx.search.* packages
it prepares the new architecture change,
everything about multithreading in moved in the searx.search.* packages

previously the call to the "init" function of the engines was done in searx.engines:
* the network was not set (request not sent using the defined proxy)
* it requires to monkey patch the code to avoid HTTP requests during the tests
3 years ago
Markus Heiser 924f9afea3 [lint] pylint searx/search/processors files / BTW add some doc-strings
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Alexandre Flament b1557b5443 [mod] processors: show identical error messages on /search and /stats 3 years ago
Alexandre Flament 7cfd8d900a [mod] oscar: /preferences , engines tab: report engine times
* display the median time instead of the average.
* add a "Reliability" column (sum up the metrics and the checker results).
* the "selected language", "SafeSearch", "Time range" values are displayed as "broken" when the checker tests fail.
3 years ago
Alexandre Flament c27fef1cde [mod] metrics: add secondary parameter
Some error won't stop the engine:
* additional HTTP redirects for example
* some invalid results

secondary=True allows to flag these errors as not important.
3 years ago
Alexandre Flament 7acd7ffc02 [enh] rewrite and enhance metrics 3 years ago
Alexandre Flament aae7830d14 [mod] refactoring: processors
Report to the user suspended engines.

searx.search.processor.abstract:
* manages suspend time (per network).
* reports suspended time to the ResultContainer (method extend_container_if_suspended)
* adds the results to the ResultContainer (method extend_container)
* handles exceptions (method handle_exception)
3 years ago
Alexandre Flament d14994dc73 [httpx] replace searx.poolrequests by searx.network
settings.yml:

* outgoing.networks:
   * can contains network definition
   * propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
     keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
   * retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
   * local_addresses can be "192.168.0.1/24" (it supports IPv6)
   * support_ipv4 & support_ipv6: both True by default
     see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
   * either a full network description
   * either reference an existing network

* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
3 years ago
Alexandre Flament eaa694fb7d [enh] replace requests by httpx 3 years ago
Alexandre Flament 0b45afd4d7 [fix] checker: various bug fixes
* initialize engine_data (youtube engine)
* don't crash if an engine don't set result['url']
3 years ago
Rolf 80025c3244 Windows does not support SIGUSR1, so don't use it unconditionally. 3 years ago
Alexandre Flament 99e0651cea [mod] by default allow only HTTPS, not HTTP
Related to https://github.com/searx/searx/pull/2373
3 years ago
Adam Tauber 44f4a9d49a [enh] add ability to send engine data to subsequent requests 3 years ago
Alexandre Flament 46ca32c3cc [mod] update currencies.json and fetch_currencies.py
use a sparql request on wikidata to get the list of currencies.

currencies.json contains the translation for all supported searx languages.

Supersede #993
3 years ago
Alexandre Flament c22d4c764c [fix] duckduckgo engine: "!ddg !g" do not redirect to google
* searx understand "!ddg !g time" as : send "!g time" to DDG
* !g a DDG bang for Google: DDG return a HTTP redirect to Google

This commit adds a the allows_redirect param not to follow HTTP redirect.

The DDG engine returns a empty result as before without HTTP redirect.
3 years ago
Alexandre Flament aedf03c0f7 Fix: activate raise_for_error by default
Fix commit d703119d3a :
Some engines need to parse the HTTP error but
raise_for_error is always set to False in the "request" function.
3 years ago
Alexandre Flament 3b7b852aa8 [fix] checker: minor fix about language detection 4 years ago
Alexandre Flament aa887eb375 [mod] checker : replace pycld3 by langdetect
pycld3 requires the native library cld3
langdetect is a pure python package
4 years ago
Alexandre Flament 67a1aab0d5 [fix] /stats/checker : remove the timestamp field when the checker is disabled 4 years ago
Alexandre Flament d473407ec9 [fix] checker: fix engine statistics
Without this commit, the URL /stats/errors shows percentage above 100% after the checker has run.
4 years ago
Alexandre Flament 912c7e975c [fix] checker: don't run the checker when uwsgi is not properly configured
Before this commit, even with the scheduler disabled, the checker was running
at least once for each uwsgi worker.
4 years ago
Alexandre Flament 7f0c508598 [fix] checker: fix typo unknown instead of unknow 4 years ago
Alexandre Flament 87bafbc32b [mod] checker: add status and timestamp to the result
for each engine: replace status by success
4 years ago
Alexandre Flament f3e1bd308f [mod] checker: minor adjustements on the default tests
the query "time" is convinient because most of the search engine will return some results,
but some engines in the general category will return documentation about the HTML tags <time> or <input type="time">
4 years ago
Alexandre Flament 45bfab77d0 |mod] checker: improve searx-checker command line
* output is unbuffered
* verbose mode describe more precisly the errrors
4 years ago
Alexandre Flament 3a9f513521 [enh] checker: background check
See settings.yml for the options
SIGUSR1 signal starts the checker.
The result is available at /stats/checker
4 years ago
Markus Heiser 9c581466e1 [fix] do not colorize output on dumb terminals
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
4 years ago
Alexandre Flament 8cbc9f2d58 [enh] add checker 4 years ago