Commit Graph

2663 Commits

Author SHA1 Message Date
Markus Heiser
d2faea423a [fix] rewrite Yahoo-News engine
Many things have been changed since last review of this engine.  This patch fix
xpath selectors, implements suggestion and is a complete review / rewrite of the
engine.

Signed-off-by: Markus Heiser <markus@darmarit.de>
2021-03-08 11:43:34 +01:00
Adam Tauber
44f4a9d49a [enh] add ability to send engine data to subsequent requests 2021-03-06 12:12:35 +01:00
Alexandre Flament
87f4cc4a9e
Merge pull request #2631 from searx/update_data_update_languages.py
Update searx.data - update_languages.py
2021-03-06 10:03:00 +01:00
Markus Heiser
4845183128 [mod] don't dump traceback of SearxEngineResponseException on init
When initing engines a "SearxEngineResponseException" is logged very verbose,
including full traceback information:

    ERROR:searx.engines:yggtorrent engine: Fail to initialize
    Traceback (most recent call last):
      File "share/searx/searx/engines/__init__.py", line 293, in engine_init
        init_fn(get_engine_from_settings(engine_name))
      File "share/searx/searx/engines/yggtorrent.py", line 42, in init
        resp = http_get(url, allow_redirects=False)
      File "share/searx/searx/poolrequests.py", line 197, in get
        return request('get', url, **kwargs)
      File "share/searx/searx/poolrequests.py", line 190, in request
        raise_for_httperror(response)
      File "share/searx/searx/raise_for_httperror.py", line 60, in raise_for_httperror
        raise_for_captcha(resp)
      File "share/searx/searx/raise_for_httperror.py", line 43, in raise_for_captcha
        raise_for_cloudflare_captcha(resp)
      File "share/searx/searx/raise_for_httperror.py", line 30, in raise_for_cloudflare_captcha
        raise SearxEngineCaptchaException(message='Cloudflare CAPTCHA', suspended_time=3600 * 24 * 15)
    searx.exceptions.SearxEngineCaptchaException: Cloudflare CAPTCHA, suspended_time=1296000

For SearxEngineResponseException this is not needed.  Those types of exceptions
can be a normal use case.  E.g. for CAPTCHA errors like shown in the example
above. It should be enough to log a warning for such issues:

    WARNING:searx.engines:yggtorrent engine: Fail to initialize // Cloudflare CAPTCHA, suspended_time=1296000

closes: #2612

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-03-05 17:26:22 +01:00
Alexandre Flament
0165e14a7f
Merge pull request #2632 from searx/update_data_update_wikidata_units.py
Update searx.data - update_wikidata_units.py
2021-03-05 11:59:44 +01:00
Alexandre Flament
152f6fc1da
Merge pull request #2630 from searx/update_data_update_ahmia_blacklist.py
Update searx.data - update_ahmia_blacklist.py
2021-03-05 11:59:20 +01:00
dalf
1e8b846954 Update searx.data - update_currencies.py 2021-03-05 10:56:57 +00:00
dalf
2f8a708481 Update searx.data - update_wikidata_units.py 2021-03-05 10:56:49 +00:00
dalf
d9dc3376d0 Update searx.data - update_languages.py 2021-03-05 10:56:46 +00:00
dalf
2857473553 Update searx.data - update_ahmia_blacklist.py 2021-03-05 10:56:33 +00:00
Alexandre Flament
aac37f288f
Merge pull request #2593 from dalf/update-autocomplete
Update autocomplete
2021-03-04 10:51:09 +01:00
Alexandre Flament
63f17d2e4c [enh] autocomplete refactoring, autocomplete on external bangs 2021-03-01 19:12:32 +01:00
Markus Heiser
d48e2e7b0b [enh] google scholar - python implementation of the engine
The old xpath configuration for google scholar did not work and is replaced by a
python implementation.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-03-01 15:16:37 +01:00
Alexandre Flament
4fa1290c11 [fix] answers: don't crash when the query is an empty string 2021-03-01 10:52:39 +01:00
Alexandre Flament
e2fb500892
Merge pull request #2608 from return42/unittest2
[py2to3] use unittest from py3, remove unittest2 from py2
2021-03-01 10:05:38 +01:00
Alexandre Flament
0c663e25fc
Merge pull request #2604 from searx/update_data_firefox_version
Update searx.data - firefox_version
2021-03-01 10:03:39 +01:00
Alexandre Flament
f77983e174
Merge pull request #2602 from MarcAbonce/fix-bing-fetch-languages
Fix fetch_languages for Bing
2021-03-01 09:06:37 +01:00
GazoilKerozen
5f6ac3afa2
Add Freesound engine (#2596)
Add freesound engine with player.

Co-authored-by: Gazoil <maildeguzel@gmail.com>
2021-03-01 08:52:36 +01:00
Markus Heiser
3bae35940a [py2to3] use unittest from py3, remove unittest2 from py2
- unittest2 is a backport of the new features added to the unittest testing
  framework in Python 2.7

- unittest2 was only needed in py2 and can be dropped now

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-28 11:37:06 +01:00
Alexandre Flament
b05f4d0664
Merge pull request #2605 from searx/update_data_currencies
Update searx.data - currencies
2021-02-26 12:56:32 +01:00
Alexandre Flament
aec5188b51
Merge pull request #2606 from searx/update_data_wikidata_units
Update searx.data - wikidata_units
2021-02-26 12:55:51 +01:00
dalf
893b6e4901 Update searx.data - ahmia_blacklist 2021-02-26 08:31:15 +00:00
dalf
7b9005df31 Update searx.data - wikidata_units 2021-02-26 08:31:01 +00:00
dalf
4c8ae5b7ed Update searx.data - firefox_version 2021-02-26 08:30:45 +00:00
dalf
d2778b5efe Update searx.data - currencies 2021-02-26 08:30:45 +00:00
Marc Abonce Seguin
d6681fd33b remove articles number from engines_languages.json 2021-02-25 23:54:21 -07:00
Marc Abonce Seguin
9b6ffed061 fix fetch_languages for bing
Bing has a list of regions that it supports and some of these regions
may have more than one possible language.

In some cases, like Switzerland, these languages are always shown as
options, so there is no issue. But in other cases, like Andorra, Bing
will only show one language at the time, either the region's default or
the request's language if the latter is supported by that region.

For example, if the HTTP request is in French, Andorra will appear as
fr-AD but if the same page is requested in any other language Andorra
will appear as ca-AD.

This is specially a problem when Bing assumes that the request is in
English because it overrides enough language codes to make several major
languages like Arabic dissappear from the languages.py file.

To avoid that issue, I set the Accept-Language header to a language
that's only supported in one region to hopefully avoid these overrides.
2021-02-25 23:51:49 -07:00
Alexandre Flament
7c1847d5f2 [mod] add utils/fetch_external_bangs.py
Based on duckduckgo bangs
Store bangs on a trie to allow autocomplete (not in this commit)
2021-02-24 18:48:36 +01:00
Alexandre Flament
5f4a085fc4
Merge pull request #2595 from dalf/update-wikidata-units
[mod] update wikidata_units.json and fetch_wikidata_units.py
2021-02-23 17:22:37 +01:00
Alexandre Flament
46ca32c3cc [mod] update currencies.json and fetch_currencies.py
use a sparql request on wikidata to get the list of currencies.

currencies.json contains the translation for all supported searx languages.

Supersede #993
2021-02-23 16:42:28 +01:00
Alexandre Flament
93d1da4906 [mod] update wikidata_units.json and fetch_wikidata_units.py
The fetch_wikidata_units.py result won't change randomly.
See comments in the script.
2021-02-23 13:10:38 +01:00
Noémi Ványi
1be6ab2a91 Fix paging of Bing Images 2021-02-22 21:19:34 +01:00
datagram1
1d0a32a2c5 Added rumble.com video search engine. TODO video embedding.
Update rumble.py

some lines too long.

Disable Rumble engine

disabled : True

PEP8 fix

change line spacing
2021-02-20 12:48:56 +00:00
Alexandre Flament
44a6593c13
Merge pull request #2573 from unixfox/yggtorrent
update yggtorrent url + add it back
2021-02-16 08:22:07 +01:00
Emilien Devos
4b37e10dd9 fix yggtorrent url + add it back 2021-02-15 13:38:34 +01:00
Thorben Günther
fbbd4cc21f
Improve peertube searching
At the moment videos without a description are not shown - setting
default content to "" fixes this.
Another current bug is that thumbnails are not displayed. This is caused
by a double slash in the url. For this every trailing slash is now
stripped (for backwards compatibility) and the API response is correctly
parsed.
2021-02-13 19:47:33 +01:00
Alexandre Flament
45027765e3
Merge pull request #2566 from dalf/remove-yandex
[remove] yandex engine
2021-02-12 17:12:07 +01:00
Alexandre Flament
c22d4c764c [fix] duckduckgo engine: "!ddg !g" do not redirect to google
* searx understand "!ddg !g time" as : send "!g time" to DDG
* !g a DDG bang for Google: DDG return a HTTP redirect to Google

This commit adds a the allows_redirect param not to follow HTTP redirect.

The DDG engine returns a empty result as before without HTTP redirect.
2021-02-12 11:10:08 +01:00
Alexandre Flament
d76660463b
Merge pull request #2562 from dalf/mod-json-engine
[mod] json_engine: add content_html_to_text and title_html_to_text
2021-02-12 10:58:28 +01:00
Alexandre Flament
7dcf67a47a
Merge pull request #2565 from dalf/upd-wikipedia
[upd] wikipedia engine: return an empty result on query with illegal characters
2021-02-12 10:57:05 +01:00
Alexandre Flament
2b60d0d243
Merge pull request #2564 from dalf/fix-seznam
[fix] fix seznam engine
2021-02-12 10:56:53 +01:00
Alexandre Flament
7e83818879
Merge pull request #2560 from dalf/fix-duckduckgo
Fix duckduckgo
2021-02-12 10:56:40 +01:00
Alexandre Flament
63d6ccfbc2
Merge pull request #2557 from dalf/fix-raise_for_httperror
Fix: activate raise_for_error by default
2021-02-12 10:56:25 +01:00
Alexandre Flament
74c8b5606f
Merge pull request #2541 from return42/mediathekviewweb
[enh] add engine MediathekViewWeb (API)
2021-02-11 15:11:26 +01:00
Alexandre Flament
5d9db6c2f7 [remove] yandex engine 2021-02-11 14:28:06 +01:00
Alexandre Flament
35dd069402 [fix] fix seznam engine
no paging support
2021-02-11 12:53:19 +01:00
Alexandre Flament
7d6e69e2f9 [upd] wikipedia engine: return an empty result on query with illegal characters
on some queries (like an IT error message), wikipedia returns an HTTP error 400.
this commit returns an empty result instead of showing an error to the user.
2021-02-11 12:29:21 +01:00
Alexandre Flament
ff84a1af35 [mod] json_engine: add content_html_to_text and title_html_to_text
Some JSON API returns HTML in either in the HTML or the content.
This commit adds two new parameters to the json_engine:
content_html_to_text and title_html_to_text, False by default.

If True, then the searx.utils.html_to_text removes the HTML tags.

Update crossref, openairedatasets and openairepublications engines
2021-02-10 16:42:11 +01:00
Alexandre Flament
436d366448
Merge pull request #2544 from mrwormo/congresslibrary
[Engine] Add Library of Congress engine
2021-02-10 10:13:46 +01:00
Alexandre Flament
eafd27f42a
Merge pull request #2556 from dalf/fix-apk-mirror
[fix] fix apk_mirror engine
2021-02-10 10:12:37 +01:00