Commit Graph

69 Commits (72be98e12f5b8454f03cf3cb44a920fce75d4f7b)

Author SHA1 Message Date
Markus Heiser dbed8da284 [fix] startpage engine: XPath expressions adapted for new HTML layout
Startpage has changed its HTML layout, classes like ``w-gl__result__main`` do no
longer exists and the result items have been slightly changed in their
structure.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
5 months ago
Markus Heiser 8205f170ff [mod] pylint all engines without PYLINT_SEARXNG_DISABLE_OPTION
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
6 months ago
Markus Heiser 3829c253ff [mod] add option max_page to bing, brave, qwant, startpage & mojeek
[1] https://github.com/searxng/searxng/issues/2982#issuecomment-1808975780

Reported-by: @Damaj301damaj-lol [1]
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
10 months ago
Markus Heiser e8706fb738 [fix] engine & network issues / documentation and type annotations
This patch fixes some quirks and issues related to the engines and the network.
Each engine has its own network and this network was broken for the following
engines[1]:

- archlinux
- bing
- dailymotion
- duckduckgo
- google
- peertube
- startpage
- wikipedia

Since the files have been touched anyway, the type annotaions of the engine
modules has also been completed so that error messages from the type checker are
no longer reported.

Related and (partial) fixed issue:

- [1] https://github.com/searxng/searxng/issues/762#issuecomment-1605323861
- [2] https://github.com/searxng/searxng/issues/2513
- [3] https://github.com/searxng/searxng/issues/2515

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
1 year ago
Markus Heiser e9afc4f8ce [mod] Startpage: reversed engineered & upgrade to data_type: traits_v1
One reason for the often seen CAPTCHA of the Startpage requests are the
incomplete requests SearXNG sends to startpage.com: this patch is a complete new
implementation of the ``request()`` function, reversed engineered from the
Startpage's search form.  The new implementation:

- use traits of data_type: traits_v1 and drop deprecated data_type: supported_languages
- adds time-range support
- adds save-search support
- fix searxng/searxng/issues 1884
- fix searxng/searxng/issues 1081 --> improvements to avoid CAPTCHA

In preparation for more categories (News, Images, Videos ..) from Startpage, the
variable ``startpage_categ`` was set up.  The default value is ``web`` and other
categories from Startpage are not yet implemented.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2 years ago
Markus Heiser 61383edb27 [mod] Startpage: fetch engine traits (data_type: supported_languages)
Implements a fetch_traits function for the Startpage engine.

.. note::

   Does not include migration of the request methode from 'supported_languages'
   to 'traits' (EngineTraits) object!

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2 years ago
Alexandre Flament 37addec69e search.suspended_time settings: bug fixes
* fix type in settings.yml: replace suspend_times by suspended_times
* always use delay defined in settings.yml:
  * HTTP status 402 and 403: read the value from settings.yml instead of using the hardcoded value of 1 day.
  * startpage engine: CAPTCHA suspend the engine for one day instead of one week
2 years ago
Alexandre FLAMENT 035bc507ec [fix] startpage engine 2 years ago
Markus Heiser ba8959ad7c [fix] typos / reported by @kianmeng in searx PR-3366
[PR-3366] https://github.com/searx/searx/pull/3366

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2 years ago
Alexandre Flament 378b29be2f fix startpage: update XPath in _fetch_supported_languages 3 years ago
Alexandre Flament f9271d595f [fix] startpage: workaround to use the startpage network
workaround for the issue #762
3 years ago
Markus Heiser df238e944c [mod] starpage engine: add comment about Startpage's FFox add-on
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Markus Heiser 21e884f369 [fix] startpage engine: fetch CAPTCHA & issues related to PR-695
In case of CAPTCHA raise a SearxEngineCaptchaException and suspend for 7 days.
When get_sc_code() fails raise a SearxEngineResponseException and suspend for 7
days.

[1] https://github.com/searxng/searxng/pull/695

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Markus Heiser 2f4e567e90 [fix] Get an actual `sc` argument from startpage's home page.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Markus Heiser 1cbcddb3f7 [pylint] Startpage engine
Fix remarks from pylint

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Markus Heiser f1f5e69c42 [fix] startpage engine - avoid captcha
Startpage has introduced new anti-scraping measures that make SearXNG instances
run into captchas:

1. some arguments has been removed and a new `sc` has been added.
2. search path changed from `do/search` to `sp/search`
3. POST request is no longer needed

Closes: https://github.com/searxng/searxng/issues/692
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Martin Fischer b02f762687 [enh] add more categories 3 years ago
Markus Heiser 3d96a9839a [format.python] initial formatting of the python code
This patch was generated by black [1]::

    make format.python

[1] https://github.com/psf/black

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Alexandre Flament ca93a01844 [mod] dynamically set language_support variable
The language_support variable is set to True by default,
and set to False in only 5 engines.

Except the documentation and the /config URL, this variable is not used.

This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.

Close #2485
4 years ago
Alexandre Flament a4dcfa025c [enh] engines: add about variable
move meta information from comment to the about variable
so the preferences, the documentation can show these information
4 years ago
lucky13820 fea8958e99
Fix the StartPage result title is showing the url
Fix the issue 2395 where StartPage result title is showing the url. https://github.com/searx/searx/issues/2395
4 years ago
joshu9h 8260435c8b
[Fix] Startpage 4 years ago
Alexandre Flament 3038052c79 [mod] remove unused import
use
from searx.engines.duckduckgo import _fetch_supported_languages, supported_languages_url  # NOQA
so it is possible to easily remove all unused import using autoflake:
autoflake --in-place --recursive --remove-all-unused-imports searx tests
4 years ago
Alexandre Flament 2006eb4680 [mod] move extract_text, extract_url to searx.utils 4 years ago
Marc Abonce Seguin 41800835f9 fetch supported languages for startpage engine 4 years ago
Spühler Stefan 4f90fb6a92 [Fix] Startpage ValueError on Spanish date format
datetime.parser.parse() does not know the Spanish date format which
leads to a ValueError. Fixes #1870

Traceback (most recent call last):
  File "/usr/local/searx/searx/search.py", line 160, in search_one_http_request_safe
    search_results = search_one_http_request(engine, query, request_params)
  File "/usr/local/searx/searx/search.py", line 97, in search_one_http_request
    return engine.response(response)
  File "/usr/local/searx/searx/engines/startpage.py", line 102, in response
    published_date = parser.parse(date_string, dayfirst=True)
  File "/usr/local/searx/searx-ve/lib/python3.6/site-packages/dateutil/parser/_parser.py", line 1358, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/usr/local/searx/searx-ve/lib/python3.6/site-packages/dateutil/parser/_parser.py", line 649, in parse
    raise ValueError("Unknown string format:", timestr)
ValueError: ('Unknown string format:', '24 Ene 2013')
5 years ago
Dalf 85b3723345 [mod] speed optimization
compile XPath only once
avoid redundant call to urlparse
get_locale(webapp.py): avoid useless call to request.accept_languages.best_match
5 years ago
Adam Tauber ed1c1bdb04 [fix] pep8 5 years ago
Adam Tauber 77a70fe541 [fix] update startpage engine - closes #1601 5 years ago
Noémi Ványi b63d645a52 Revert "remove 'all' option from search languages"
This reverts commit 4d1770398a.
6 years ago
Noémi Ványi aeb6dab187
Merge branch 'master' into master 6 years ago
Michael Pfitzner 44ce51f0c5 restore startpage search results 6 years ago
dimqua 0d86ed9c7e update startpage.py 6 years ago
marc 4d1770398a remove 'all' option from search languages 7 years ago
Adam Tauber 52e615dede [enh] py3 compatibility 7 years ago
marc f62ce21f50 [mod] fetch supported languages for several engines
utils/fetch_languages.py gets languages supported by each engine and
generates engines_languages.json with each engine's supported language.
8 years ago
marc a11948c71b Add language support for more engines. 8 years ago
marc 149802c569 [enh] add supported_languages on engines and auto-generate languages.py 8 years ago
Adam Tauber 16bdc0baf4 [mod] do not escape html content in engines 8 years ago
stepshal b3ab221b98 Fix anomalous backslash in string 8 years ago
Adam Tauber bd22e9a336 [fix] pep8 compatibilty 9 years ago
Thomas Pointhuber 4508c96667 [enh] fix content fetching, parse published date from description 9 years ago
Thomas Pointhuber 996c96ffff [fix] block ixquick search url's 9 years ago
Thomas Pointhuber 23b9095cbf [fix] improve result handling of startpage engine 9 years ago
Cqoicebordel f1c10f4fe4 Startpage's unit test 10 years ago
Cqoicebordel b4b666e703 Flake8 10 years ago
Cqoicebordel fa0330f0ff Fix startpage
Fix issue with unicode caracters in startpage : we shouldn't urlencode them if we are using POST.
Should fix #169. @dimqua can you confirm ?
10 years ago
Adam Tauber c8be128e97 [mod] ignore startpage unicode errors 10 years ago
Adam Tauber b1234ee889 [fix] startpage engine compatibility 10 years ago
Thomas Pointhuber 678a80f043 fix startpage engine and add comments
* add language support
* remove not required code
* improve google-ad detection (no false detection anymore, I hope)
* other improvements
10 years ago