Commit Graph

1113 Commits (c6a5cc019ae27b1892b38afd33090653d07f33be)

Author SHA1 Message Date
Markus Heiser 8efabd3ab7 [mod] core.ac.uk engine
- add to list of pylint scripts
- add debug log messages
- move API key int `settings.yml`
- improved readability
- add some metadata to results

Signed-off-by: Markus Heiser <markus@darmarit.de>
3 years ago
spongebob33 7528e38c8a add core.ac.uk engine 3 years ago
Alexandre Flament d01741c9a2
Merge pull request #15 from return42/add-springer
Add a search engine for Springer Nature
3 years ago
Pierre Chevalier a80bf1ba97 [enh] Add Springer Nature engine
Springer Nature is a global publisher dedicated to providing service to research
community [1] with official API [2].

To test this PR, first get your API key following this page:

   https://dev.springernature.com/signup

In searx/engines/springer.py at line 24, add this API key.  I left my own key,
commented out in the line aboce.  Feel free to use it, if needed.

[1] https://www.springernature.com/
[2] https://dev.springernature.com/
3 years ago
habsinn 41a2e3785e [enh] add engine using API from "The Art Institute of Chicago" 3 years ago
Markus Heiser e9a6ab4015 [fix] youtube - send CONSENT Cookie to not be redirected
In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking.  To get the consent
from the user, youtube requests are redirected to confirm and get a CONSENT
Cookie from https://consent.youtube.com

This patch adds a CONSENT Cookie to the youtube request to avoid redirection.

[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Reported-by: https://github.com/searx/searx/issues/2774
3 years ago
Alexandre Flament c6d5605d27
Merge pull request #7 from searxng/metrics
Metrics
3 years ago
Alexandre Flament b7848e3422 [fix] searxng fix: sjp engine 3 years ago
Alexandre Flament 7acd7ffc02 [enh] rewrite and enhance metrics 3 years ago
Alexandre Flament aae7830d14 [mod] refactoring: processors
Report to the user suspended engines.

searx.search.processor.abstract:
* manages suspend time (per network).
* reports suspended time to the ResultContainer (method extend_container_if_suspended)
* adds the results to the ResultContainer (method extend_container)
* handles exceptions (method handle_exception)
3 years ago
Alexandre Flament 48720e20a8 Merge remote-tracking branch 'searx/master' 3 years ago
Noémi Ványi 8362257b9a
Merge pull request #2736 from plague-doctor/sjp
Add new engine: SJP - Słownik języka polskiego
3 years ago
Noémi Ványi e56323d3c8
Merge pull request #2759 from ypid/fix/typo
Fix grammar mistake in debug log output
3 years ago
Plague Doctor d275d7a35e Code refactoring. 3 years ago
Markus Heiser 062d589f86 [fix] xpath expressions to grap all items from bandcamp's response
I also found some items missing a thumbnail and I used text_extract for content
and title, to remove unneeded whitespaces.

BTW: added bandcamp's favicon

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Kyle Anthony Williams 4d3c399ee9 [feat] add bandcamp engine 3 years ago
Alexandre Flament d14994dc73 [httpx] replace searx.poolrequests by searx.network
settings.yml:

* outgoing.networks:
   * can contains network definition
   * propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
     keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
   * retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
   * local_addresses can be "192.168.0.1/24" (it supports IPv6)
   * support_ipv4 & support_ipv6: both True by default
     see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
   * either a full network description
   * either reference an existing network

* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
3 years ago
Robin Schneider dfc66ff0f0
Fix grammar mistake in debug log output 3 years ago
Alexandre Flament eaa694fb7d [enh] replace requests by httpx 3 years ago
Plague Doctor 599ff39ddf Fix conflicts 3 years ago
Plague Doctor 6631f11305 Add new engine: SJP 3 years ago
Plague Doctor 7035bed4ee Add new engine: Wordnik.com 3 years ago
Noémi Ványi 07f5edce3d Add Meilisearch engine
Website: https://www.meilisearch.com/
3 years ago
Alexandre Flament 725a69616b
Merge pull request #2681 from dalf/fix-wikipedia-title
[fix] wikipedia: remove HTML from the title
3 years ago
Noémi Ványi 9bb312c505 Remove duplicated key from dict in Semantic Scholar 3 years ago
Noémi Ványi f596f5767b fix Semantic Scholar engine 3 years ago
Adam Tauber 28286cf3f2 [fix] update seznam engine to be compatible with the new website 3 years ago
Alexandre Flament fcfcf662ff [fix] wikipedia: remove HTML from the title
fr.wikipedia.org (and it seems not other wikipedia websites),
adds HTML to api_result['displayTitle'].
(Search for '!wp :fr Braid' for example)

The commit uses api_result['title']
3 years ago
Adam Tauber 0ba71c3644 [fix] make ina engine compatible with the new response json 3 years ago
Adam Tauber 5f450fda74 [enh] add year filter to duckduckgo 3 years ago
Adam Tauber fd737dc9d8 [fix] remove debug code 3 years ago
Alexandre Flament 38c210d746
[mod] soundcloud: faster initialization
The get_cliend_id() function:
* fetches https://soundcloud.com
* then fetches each referenced javascript URL to get the client id.

This commit fetches the javascript URLs in the reverse order: the client id is in the last javascript URL.
3 years ago
Adam Tauber 4c631ac6d0 [fix] remove debug code 3 years ago
Noémi Ványi 8158d8654a fix Microsoft Academic engine 3 years ago
Adam Tauber f97b4ff7b6 [fix] update youtube_noapi paging 3 years ago
Adam Tauber dd34ac396c
Merge pull request #2652 from kvch/solr-engine
Add Apache Solr engine
3 years ago
Alexandre Flament 1664258061
Merge pull request #2655 from return42/fix-imports
[fix] remove unused import from yahoo-news engine
3 years ago
Markus Heiser 6e1f1085ef [fix] remove unused import from yahoo-news engine
Signed-off-by: Markus Heiser <markus@darmarit.de>
3 years ago
Markus Heiser 3703ebb22a [drop] Acgsou engine - www.acgsou.com no longer exists
- https://www.acgsou.com/ acgsou.com is redirected to 36dm.club
- @rinpatch do not plan on maintaining the engine [1]

[1] https://github.com/searx/searx/pull/1283#issuecomment-798783585

Signed-off-by: Markus Heiser <markus@darmarit.de>
3 years ago
Noémi Ványi ff527e2681 Add Solr engine 3 years ago
Alexandre Flament 92dd5e245e
Merge pull request #2626 from mikeri/solidtorrents
Add Solid Torrents engine
3 years ago
Alexandre Flament a1a492baed
Merge pull request #2641 from dalf/disable_http_by_default
[mod] by default allow only HTTPS, not HTTP
3 years ago
Markus Heiser 96422e5c9f [fix] APKMirror engine - update xpath selectors and fix img_src
BTW: make the code slightly more readable

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Markus Heiser d2faea423a [fix] rewrite Yahoo-News engine
Many things have been changed since last review of this engine.  This patch fix
xpath selectors, implements suggestion and is a complete review / rewrite of the
engine.

Signed-off-by: Markus Heiser <markus@darmarit.de>
3 years ago
Alexandre Flament 99e0651cea [mod] by default allow only HTTPS, not HTTP
Related to https://github.com/searx/searx/pull/2373
3 years ago
Michael Ilsaas 5549d58de3 Add Solid Torrents engine 3 years ago
Adam Tauber 44f4a9d49a [enh] add ability to send engine data to subsequent requests 3 years ago
Markus Heiser 4845183128 [mod] don't dump traceback of SearxEngineResponseException on init
When initing engines a "SearxEngineResponseException" is logged very verbose,
including full traceback information:

    ERROR:searx.engines:yggtorrent engine: Fail to initialize
    Traceback (most recent call last):
      File "share/searx/searx/engines/__init__.py", line 293, in engine_init
        init_fn(get_engine_from_settings(engine_name))
      File "share/searx/searx/engines/yggtorrent.py", line 42, in init
        resp = http_get(url, allow_redirects=False)
      File "share/searx/searx/poolrequests.py", line 197, in get
        return request('get', url, **kwargs)
      File "share/searx/searx/poolrequests.py", line 190, in request
        raise_for_httperror(response)
      File "share/searx/searx/raise_for_httperror.py", line 60, in raise_for_httperror
        raise_for_captcha(resp)
      File "share/searx/searx/raise_for_httperror.py", line 43, in raise_for_captcha
        raise_for_cloudflare_captcha(resp)
      File "share/searx/searx/raise_for_httperror.py", line 30, in raise_for_cloudflare_captcha
        raise SearxEngineCaptchaException(message='Cloudflare CAPTCHA', suspended_time=3600 * 24 * 15)
    searx.exceptions.SearxEngineCaptchaException: Cloudflare CAPTCHA, suspended_time=1296000

For SearxEngineResponseException this is not needed.  Those types of exceptions
can be a normal use case.  E.g. for CAPTCHA errors like shown in the example
above. It should be enough to log a warning for such issues:

    WARNING:searx.engines:yggtorrent engine: Fail to initialize // Cloudflare CAPTCHA, suspended_time=1296000

closes: #2612

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Markus Heiser d48e2e7b0b [enh] google scholar - python implementation of the engine
The old xpath configuration for google scholar did not work and is replaced by a
python implementation.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
3 years ago
Alexandre Flament f77983e174
Merge pull request #2602 from MarcAbonce/fix-bing-fetch-languages
Fix fetch_languages for Bing
3 years ago