Commit Graph

66 Commits

Author SHA1 Message Date
Markus Heiser
e8706fb738 [fix] engine & network issues / documentation and type annotations
This patch fixes some quirks and issues related to the engines and the network.
Each engine has its own network and this network was broken for the following
engines[1]:

- archlinux
- bing
- dailymotion
- duckduckgo
- google
- peertube
- startpage
- wikipedia

Since the files have been touched anyway, the type annotaions of the engine
modules has also been completed so that error messages from the type checker are
no longer reported.

Related and (partial) fixed issue:

- [1] https://github.com/searxng/searxng/issues/762#issuecomment-1605323861
- [2] https://github.com/searxng/searxng/issues/2513
- [3] https://github.com/searxng/searxng/issues/2515

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-25 13:58:26 +02:00
Markus Heiser
e9afc4f8ce [mod] Startpage: reversed engineered & upgrade to data_type: traits_v1
One reason for the often seen CAPTCHA of the Startpage requests are the
incomplete requests SearXNG sends to startpage.com: this patch is a complete new
implementation of the ``request()`` function, reversed engineered from the
Startpage's search form.  The new implementation:

- use traits of data_type: traits_v1 and drop deprecated data_type: supported_languages
- adds time-range support
- adds save-search support
- fix searxng/searxng/issues 1884
- fix searxng/searxng/issues 1081 --> improvements to avoid CAPTCHA

In preparation for more categories (News, Images, Videos ..) from Startpage, the
variable ``startpage_categ`` was set up.  The default value is ``web`` and other
categories from Startpage are not yet implemented.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Markus Heiser
61383edb27 [mod] Startpage: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the Startpage engine.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24 10:37:42 +01:00
Alexandre Flament
37addec69e search.suspended_time settings: bug fixes
* fix type in settings.yml: replace suspend_times by suspended_times
* always use delay defined in settings.yml:
  * HTTP status 402 and 403: read the value from settings.yml instead of using the hardcoded value of 1 day.
  * startpage engine: CAPTCHA suspend the engine for one day instead of one week
2023-01-28 10:24:14 +00:00
Alexandre FLAMENT
035bc507ec [fix] startpage engine 2022-10-14 18:27:53 +00:00
Markus Heiser
ba8959ad7c [fix] typos / reported by @kianmeng in searx PR-3366
[PR-3366] https://github.com/searx/searx/pull/3366

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-27 18:32:14 +02:00
Alexandre Flament
378b29be2f fix startpage: update XPath in _fetch_supported_languages 2022-03-19 14:16:37 +01:00
Alexandre Flament
f9271d595f [fix] startpage: workaround to use the startpage network
workaround for the issue #762
2022-01-15 22:56:34 +01:00
Markus Heiser
df238e944c [mod] starpage engine: add comment about Startpage's FFox add-on
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:38 +01:00
Markus Heiser
21e884f369 [fix] startpage engine: fetch CAPTCHA & issues related to PR-695
In case of CAPTCHA raise a SearxEngineCaptchaException and suspend for 7 days.
When get_sc_code() fails raise a SearxEngineResponseException and suspend for 7
days.

[1] https://github.com/searxng/searxng/pull/695

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:38 +01:00
Markus Heiser
2f4e567e90 [fix] Get an actual sc argument from startpage's home page.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:38 +01:00
Markus Heiser
1cbcddb3f7 [pylint] Startpage engine
Fix remarks from pylint

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:38 +01:00
Markus Heiser
f1f5e69c42 [fix] startpage engine - avoid captcha
Startpage has introduced new anti-scraping measures that make SearXNG instances
run into captchas:

1. some arguments has been removed and a new `sc` has been added.
2. search path changed from `do/search` to `sp/search`
3. POST request is no longer needed

Closes: https://github.com/searxng/searxng/issues/692
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10 11:22:12 +01:00
Martin Fischer
b02f762687 [enh] add more categories 2022-01-05 11:00:11 +01:00
Markus Heiser
3d96a9839a [format.python] initial formatting of the python code
This patch was generated by black [1]::

    make format.python

[1] https://github.com/psf/black

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-12-27 09:26:22 +01:00
Alexandre Flament
ca93a01844 [mod] dynamically set language_support variable
The language_support variable is set to True by default,
and set to False in only 5 engines.

Except the documentation and the /config URL, this variable is not used.

This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.

Close #2485
2021-02-01 17:10:37 +01:00
Alexandre Flament
a4dcfa025c [enh] engines: add about variable
move meta information from comment to the about variable
so the preferences, the documentation can show these information
2021-01-14 20:57:17 +01:00
lucky13820
fea8958e99
Fix the StartPage result title is showing the url
Fix the issue 2395 where StartPage result title is showing the url. https://github.com/searx/searx/issues/2395
2020-12-16 13:54:14 -08:00
joshu9h
8260435c8b
[Fix] Startpage 2020-12-13 15:43:50 +01:00
Alexandre Flament
3038052c79 [mod] remove unused import
use
from searx.engines.duckduckgo import _fetch_supported_languages, supported_languages_url  # NOQA
so it is possible to easily remove all unused import using autoflake:
autoflake --in-place --recursive --remove-all-unused-imports searx tests
2020-11-14 14:11:02 +01:00
Alexandre Flament
2006eb4680 [mod] move extract_text, extract_url to searx.utils 2020-10-02 18:13:56 +02:00
Marc Abonce Seguin
41800835f9 fetch supported languages for startpage engine 2020-09-22 11:37:44 +02:00
Spühler Stefan
4f90fb6a92 [Fix] Startpage ValueError on Spanish date format
datetime.parser.parse() does not know the Spanish date format which
leads to a ValueError. Fixes #1870

Traceback (most recent call last):
  File "/usr/local/searx/searx/search.py", line 160, in search_one_http_request_safe
    search_results = search_one_http_request(engine, query, request_params)
  File "/usr/local/searx/searx/search.py", line 97, in search_one_http_request
    return engine.response(response)
  File "/usr/local/searx/searx/engines/startpage.py", line 102, in response
    published_date = parser.parse(date_string, dayfirst=True)
  File "/usr/local/searx/searx-ve/lib/python3.6/site-packages/dateutil/parser/_parser.py", line 1358, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/usr/local/searx/searx-ve/lib/python3.6/site-packages/dateutil/parser/_parser.py", line 649, in parse
    raise ValueError("Unknown string format:", timestr)
ValueError: ('Unknown string format:', '24 Ene 2013')
2020-03-09 09:31:20 +01:00
Dalf
85b3723345 [mod] speed optimization
compile XPath only once
avoid redundant call to urlparse
get_locale(webapp.py): avoid useless call to request.accept_languages.best_match
2019-11-15 09:33:15 +01:00
Adam Tauber
ed1c1bdb04 [fix] pep8 2019-10-14 15:09:39 +02:00
Adam Tauber
77a70fe541 [fix] update startpage engine - closes #1601 2019-10-14 14:18:41 +02:00
Noémi Ványi
b63d645a52 Revert "remove 'all' option from search languages"
This reverts commit 4d1770398a.
2019-01-07 21:19:00 +01:00
Noémi Ványi
aeb6dab187
Merge branch 'master' into master 2019-01-04 22:14:40 +01:00
Michael Pfitzner
44ce51f0c5 restore startpage search results 2018-12-14 21:38:48 +01:00
dimqua
0d86ed9c7e update startpage.py 2018-12-11 21:45:47 +03:00
marc
4d1770398a remove 'all' option from search languages 2017-12-06 01:20:15 -06:00
Adam Tauber
52e615dede [enh] py3 compatibility 2017-05-15 12:02:30 +02:00
marc
f62ce21f50 [mod] fetch supported languages for several engines
utils/fetch_languages.py gets languages supported by each engine and
generates engines_languages.json with each engine's supported language.
2016-12-13 19:58:10 -06:00
marc
a11948c71b Add language support for more engines. 2016-12-13 19:32:43 -06:00
marc
149802c569 [enh] add supported_languages on engines and auto-generate languages.py 2016-12-13 19:32:00 -06:00
Adam Tauber
16bdc0baf4 [mod] do not escape html content in engines 2016-12-09 18:59:19 +01:00
stepshal
b3ab221b98 Fix anomalous backslash in string 2016-07-11 23:53:13 +07:00
Adam Tauber
bd22e9a336 [fix] pep8 compatibilty 2016-01-18 12:47:31 +01:00
Thomas Pointhuber
4508c96667 [enh] fix content fetching, parse published date from description 2015-10-24 16:19:47 +02:00
Thomas Pointhuber
996c96ffff [fix] block ixquick search url's 2015-08-24 11:31:30 +02:00
Thomas Pointhuber
23b9095cbf [fix] improve result handling of startpage engine 2015-08-24 11:28:55 +02:00
Cqoicebordel
f1c10f4fe4 Startpage's unit test 2015-02-06 17:31:10 +01:00
Cqoicebordel
b4b666e703 Flake8 2015-01-15 20:27:30 +01:00
Cqoicebordel
fa0330f0ff Fix startpage
Fix issue with unicode caracters in startpage : we shouldn't urlencode them if we are using POST.
Should fix #169. @dimqua can you confirm ?
2015-01-15 20:18:40 +01:00
Adam Tauber
c8be128e97 [mod] ignore startpage unicode errors 2015-01-09 11:21:46 +01:00
Adam Tauber
b1234ee889 [fix] startpage engine compatibility 2014-11-17 10:19:23 +01:00
Thomas Pointhuber
678a80f043 fix startpage engine and add comments
* add language support
* remove not required code
* improve google-ad detection (no false detection anymore, I hope)
* other improvements
2014-09-02 19:57:01 +02:00
Adam Tauber
111a86d355 [fix] html escape 2014-08-06 14:43:44 +02:00
asciimoo
7db4558de7 [mod][fix] startpage engine updates 2014-02-18 16:14:31 +01:00
asciimoo
c1d7d30b8e [mod] len() removed from conditions 2014-02-11 13:13:51 +01:00