Implements a fetch_traits function for the Google engines.
.. note::
Does not include migration of the request methode from 'supported_languages'
to 'traits' (EngineTraits) object!
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The google news are in a rework, the content area of a news item has been
removed.
Closes: https://github.com/searxng/searxng/issues/1790
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Most engines that support languages (and regions) use the Accept-Language from
the WEB browser to build a response that fits to the language (and region).
- add new engine option: send_accept_language_header
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking. To get the consent
from the user, google-news requests are redirected to confirm and get a CONSENT
Cookie from https://consent.google.de/s?continue=...
This patch adds a CONSENT Cookie to the google-news request to avoid
redirection.
The behavior of the CONTENTS cookies over all google engines seems similar but
the pattern is not yet fully clear to me, here are some random samples from my
analysis ..
Using common google search from different domains::
google.com: CONSENT=YES+cb.{{date}}-14-p0.de+FX+816
google.de: CONSENT=YES+cb.{{date}}-14-p0.de+FX+333
google.fr: CONSENT=YES+srp.gws-{{date}}-0-RC2.fr+FX+826
When searching about videos (google-videos)::
google.es: CONSENT=YES+srp.gws-{{date}}-0-RC2.es+FX+076
google.de: CONSENT=YES+srp.gws-{{date}}-0-RC2.de+FX+171
Google news has only one domain for all languages::
news.google.com: CONSENT=YES+cb.{{date}}-14-p0.de+FX+816
Using google-scholar search from different domains::
scholar.google.de: CONSENT=YES+cb.{{date}}-14-p0.de+FX+333
scholar.google.fr: does not use such a cookie / did not ask the user
scholar.google.es: does not use such a cookie / did not ask the user
Interim summary:
Pattern is unclear and I won't apply the CONSENT cookie to all google engines.
More experience is need before we generalize the CONSENT cookies over all
google engines.
Related:
- e9a6ab401 [fix] youtube - send CONSENT Cookie to not be redirected
- https://github.com/benbusby/whoogle-search/issues/311
- https://github.com/benbusby/whoogle-search/issues/243
[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Since we added
- 1c67b6aec [enh] google engine: supports "default language"
there is a KeyError: 'hl in request,error pattern::
ERROR:searx.searx.search.processor.online:engine google news : exception : 'hl'
Traceback (most recent call last):
File "searx/search/processors/online.py", line 144, in search
search_results = self._search_basic(query, params)
File "searx/search/processors/online.py", line 118, in _search_basic
self.engine.request(query, params)
File "searx/engines/google_news.py", line 97, in request
if lang_info['hl'] == 'en':
KeyError: 'hl'
Closes: https://github.com/searxng/searxng/issues/154
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Same behaviour behaviour than Whoogle [1]. Only the google engine with the
"Default language" choice "(all)"" is changed by this patch.
When searching for a locate place, the result are in the expect language,
without missing results [2]:
> When a language is not specified, the language interpretation is left up to
> Google to decide how the search results should be delivered.
The query parameters are copied from Whoogle. With the ``all`` language:
- add parameter ``source=lnt``
- don't use parameter ``lr``
- don't add a ``Accept-Language`` HTTP header.
The new signature of function ``get_lang_info()`` is:
lang_info = get_lang_info(params, lang_list, custom_aliases, supported_any_language)
Argument ``supported_any_language`` is True for google.py and False for the other
google engines. With this patch the function now returns:
- query parameters: ``lang_info['params']``
- HTTP headers: ``lang_info['headers']``
- and as before this patch:
- ``lang_info['subdomain']``
- ``lang_info['country']``
- ``lang_info['language']``
[1] https://github.com/benbusby/whoogle-search
[2] https://github.com/benbusby/whoogle-search/releases/tag/v0.5.4
The language_support variable is set to True by default,
and set to False in only 5 engines.
Except the documentation and the /config URL, this variable is not used.
This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.
Close#2485
BTW: make the engines ready for search.checker:
- replace eval_xpath by eval_xpath_getindex and eval_xpath_list
- google_images: remove outer try/except block
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
use
from searx.engines.duckduckgo import _fetch_supported_languages, supported_languages_url # NOQA
so it is possible to easily remove all unused import using autoflake:
autoflake --in-place --recursive --remove-all-unused-imports searx tests
Add match_language function in utils to match any user given
language code with a list of engine's supported languages.
Also add language_aliases dict on each engine to translate
standard language codes into the custom codes used by the engine.