whoogle-search

Commit Graph

Author	SHA1	Message	Date
fiestasiesta	7041b43db9	Add time constraint to search options (#888 ) Introduces the ability to refine searches by time period: - Past hour - Past 24 hours - Past week - Past month - Past year Co-authored-by: Ben Busby <contact@benbusby.com>	1 year ago
Ben Busby	7a852aa876	Allow HTTP-exclusive proxies for all requests Proxies that only support HTTP were causing request timeouts due to an invalid upgrade to HTTPS when creating the request. This update restores the ability to have an HTTP-only proxy for all requests. Fixes #906	1 year ago
João	e99db8db26	Add country and interface lang to autocomplete (#866 )	2 years ago
Kian-Meng Ang	2a8519be30	Fix typos [skip ci] (#813 )	2 years ago
MadcowOG	03eeb3fad1	Strip newlines when parsing tor password (#801 ) When parsing control.conf or password file, a newline character could cause Authentication Errors.	2 years ago
jan Anja	194b2eae74	Fix a crash with protected Tor control port (#785 )	2 years ago
MadcowOG	c9ee9dcc8b	Tor password authentication (#746 ) Added password authentication for tor control port. For user configuration of access to tor control port. This file should be heavily restricted in file system. Co-authored-by: MadcowOG <madcowog@Arch-Main.localdomain>	2 years ago
Ben Busby	2a0ad8796c	Switch to defusedxml for xml parsing xml.etree.ElementTree.fromstring is considered insecure, see: https://docs.python.org/3/library/xml.etree.elementtree.html The defusedxml package contains several Python-only workarounds and fixes for denial of service and other vulnerabilities in Python's XML libraries: https://github.com/tiran/defusedxml Fixes #670	2 years ago
Ben Busby	f4b65be876	Catch invalid XML in suggestion response As reported in #593, the XML response body returned for search suggestions can apparently contain invalid XML elements. This catches the error and returns an empty suggestion list instead of erroring. Fixes #593	2 years ago
Ben Busby	8c92b381a2	Remove default country param The country URL param ('gl') is no longer set to 'US' by default, and is omitted from the search entirely unless explicitly set by the user. This change was made in an attempt to cut back on the number of captchas experienced by certain users self-hosting who experienced a decreased amount of captchas when this configuration setting was removed. Fixes #558	2 years ago
Ben Busby	634d179568	Use farside.link for frontend alternatives in results (#560 ) * Integrate Farside into Whoogle When instances are ratelimited (when a captcha is returned instead of the user's search results) the user can now hop to a new instance via Farside, a new backend service that redirects users to working instances of a particular frontend. In this case, it presents a user with a Farside link to a new Whoogle (or Searx) instance instead, so that the user can resume their search. For the generated Farside->Whoogle link, the generated link includes the user's current Whoogle configuration settings as URL params, to ensure a more seamless transition between instances. This doesn't translate to the Farside->Searx link, but potentially could with some changes. * Expand conversion of config<->url params Config settings can now be translated to and from URL params using a predetermined set of "safe" keys (i.e. config settings that easily translate to URL params). * Allow jumping instances via Farside when ratelimited When instances are ratelimited (when a captcha is returned instead of the user's search results) the user can now hop to a new instance via Farside, a new backend service that redirects users to working instances of a particular frontend. In this case, it presents a user with a Farside link to a new Whoogle (or Searx) instance instead, so that the user can resume their search. For the generated Farside->Whoogle link, the generated link includes the user's current Whoogle configuration settings as URL params, to ensure a more seamless transition between instances. This doesn't translate to the Farside->Searx link, but potentially could with some changes. Closes #554 Closes #559	2 years ago
Ben Busby	10a15e06e1	Fix incorrect request type for image searches Previously had hardcoded POST requests for all requests that didn't use the header template (which currently is only the image tab). Also refactored how the Filter class works. It now requires a valid Config model to be provided, which is then set up as a class var that the filtering functions can use as needed, rather than setting specific values from the config as individual values (which was confusing and sloppy). Fixes #561	2 years ago
Ben Busby	3c06519130	Use 'gl' search param to set country This switches the param used for the "country" config setting from "cr" (which only filters results by the country the result is hosted in) to "gl" (which overrides server/hosting location and produces results that are more accurate for the user's current country). Before this change, the country config setting was (imo) pretty useless. Allowing a user to override an instance's hosting location with their preferred country though is way more useful, especially for public instances that are hosted in a different country than the user. Closes #544	3 years ago
Joao A. Candido Ramos	1f18e505ab	Include "chips" param in image search (#534 ) "chips" is used in image tabs to pass the optional "filter" to add to the given search term Fixes #299	3 years ago
Ben Busby	e93507f148	Catch connection error during Tor validation step Validation of the Tor connection occasionally fails with a ConnectionError from requests, which was previously uncaught. This is now handled appropriately (error message shown and connection dropped). Fixes #532	3 years ago
Ben Busby	b96e3a0acb	Make base search url a member of the request class Since the request class is loaded prior to values being read from the user's dotenv, the WHOOGLE_RESULT_PER_PAGE var wasn't being used for searches. This moves the definition of the base search url to be intialized in the request class to address this issue. Fixes #497	3 years ago
DUO Labs	5a05bfb6de	Allow setting number of results per page (#486 ) Add `WHOOGLE_RESULTS_PER_PAGE` var, allowing users to specify the number of results per page. The default is 10.	3 years ago
Vansh Comar	5118ddb8b8	Allow setting "Accept-Language" header (#483 ) Closes #445	3 years ago
Ben Busby	b21b4f4f57	Skip parsing user agent if absent from request	3 years ago
Ben Busby	44b0fe519c	Revert changes to default language config A recent issue brought up a good point about how the latest changes to setting default language to english break functionality for bilingual users. The change was likely not the best solution for users who were being affected by IP geolocation on their instances -- the right solution for that would be to configure the interface/search language to their preference instead.	3 years ago
Ben Busby	e7a604d428	Fix handling of http (vs https) proxy creation The requests library requires both 'http' and 'https' values in any included proxy dict, and whoogle was previously copying the http proxy to https for simplicity. The assumption was that if the underlying request wasn't able to connect via https, it would default to http (otherwise why have the requirement to specify both?) This led to connectivity issues for users with http only proxies as of the latest urllib and requests package versions, which are a lot more strict with connections over https. With the latest versions, if an https connection cannot be made, the library returns an error. As a result, the new proxy dict must look something like this for plain http proxies: {'http': 'http://domain.tld:port', 'https': 'http://domain.tld:port'} where both http and https are identical, but both are still required.	3 years ago
Ben Busby	614dceeb70	Add fallback interface/search lang + cleanup Since the interface language defaults to IP geolocation by google, the default language is now set to english. Still not sure if this is the best solution, but at least temporarily should clear up some confusion for users with instances deployed in countries outside of their own. Also performed some minor cleanup: - Updated name of strip_blocked_sites to clean_query - Added clean_query to list of jinja template functions - Ensured site block list doesn't contain duplicate filters	3 years ago
Ben Busby	43faaee77f	Hotfix: remove site filter for maps links The new site filter breaks links to Maps results, so filter.py needed to be updated to handle these links as a unique case. A new method was introduced to easily remove any "-site:..." filters from the query, which is now also used to format queries in the header template rather than manually removing the blocked site list within the template itself. Bumps version to 0.5.1 for releasing the bugfix Fixes #329	3 years ago
Joao A. Candido Ramos	448efb8f2a	Add "view image" functionality (#268 ) * add view image option * prevent whoogle links from opening in a new tab. * remove view image template on mobile requests * change loop values to be more robust to the number of images * Update app/templates/imageresults.html * fix "Basically the .cvifge class needs width: 100%; in order to expand the search input to fit the form width." * Update app/templates/imageresults.html * remove hardcoded string from template * Add view image config var to app.json * Add view image config var to whoogle.env Co-authored-by: jacr13 <ramos.joao@protonmail.com> Co-authored-by: Ben Busby <benbusby@protonmail.com>	3 years ago
bruvv	27b6d05b6a	Fix EU consent bug (#320 ) * Update request.py * Use current date to format EU consent cookie Co-authored-by: Ben Busby <benbusby@protonmail.com>	3 years ago
Ben Busby	c8da53d4b0	Block websites from search results via user config (#304 ) * Block websites in search results via user config Adds a new config field "Block" to specify a comma separated list of websites to block in search results. This is applied for all searches. * Add test for blocking sites from search results * Document WHOOGLE_CONFIG_BLOCK usage * Strip '-site:' filters from query in header template The 'behind the scenes' site filter applied for blocked sites was appearing in the query field when navigating between search categories (all -> images -> news, etc). This prevents the filter from appearing in all except "images", since the image category uses a separate header. This should eventually be addressed when the image page can begin using the standard whoogle header, but until then, the filter will still appear for image searches.	3 years ago
Ben Busby	a321d55f13	Hotfix: Send generic "Mozilla" in user agent Randomizing the "Mozilla" portion of the user agent changed the character encoding to GB2312. Setting it to plain "Mozilla" enforces UTF-8 encoding. Bump to version 0.4.1 for release of bug fix Fixes #267	3 years ago
Ben Busby	df0b7afa50	Switch to single Fernet key per session This moves away from the previous (messy) approach of using two separate keys for decrypting text and element URLs separately and regenerating them for new searches. The current implementation of sessions is not very reliable, which lead to keys being regenerated too soon, which would break page navigation. Until that can be addressed, the single key per session approach should work a lot better. Fixes #250 Fixes #90	3 years ago
Ben Busby	8ad8e66d37	Improve static typing throughout repo Eventually this should be part of a separate mypy ci build, but right now it's just a general guideline. Future commits and PRs should be validated for static typing wherever possible. For reference, the testing commands used for this commit were: mypy --ignore-missing-imports --pretty --disallow-untyped-calls app/ mypy --ignore-missing-imports --pretty --disallow-untyped-calls test/	3 years ago
Ben Busby	f8dfc78539	Improve naming of _utils files, update fn/class doc The app/utils/_utils weren't named very well, and all have been updated to have more accurate names. Function and class documention for the utils have been updated as well, as part of the effort to improve overall documentation for the project.	3 years ago
Ben Busby	fdd4ee590f	Hotfix: Set EU consent cookie to pending for all requests See discussion on #243	3 years ago
Ben Busby	440c4e9c50	Remove lxml dependency The lxml dependency in the project was fairly unnecessary, and made the initial build time for the project considerably slower. This replaces all instances of lxml with either the default html parser (for bs4 constructors) or the built in xml.etree package (for search suggestion parsing).	3 years ago
Ben Busby	375f4ee9fd	PEP-8: Fix formatting issues, add CI workflow (#161 ) Enforces PEP-8 formatting for all python code Adds a github action build for checking pep8 formatting using pycodestyle	3 years ago
Ben Busby	0ef098069e	Add tor and http/socks proxy support (#137 ) * Add tor and http/socks proxy support Allows users to enable/disable tor from the config menu, which will forward all requests through Tor. Also adds support for setting environment variables for alternative proxy support. Setting the following variables will forward requests through the proxy: - WHOOGLE_PROXY_USER (optional) - WHOOGLE_PROXY_PASS (optional) - WHOOGLE_PROXY_TYPE (required) - Can be "http", "socks4", or "socks5" - WHOOGLE_PROXY_LOC (required) - Format: "<ip address>:<port>" See #30 * Refactor acquire_tor_conn -> acquire_tor_identity Also updated travis CI to set up tor * Add check for Tor socket on init, improve Tor error handling Initializing the app sends a heartbeat request to Tor to check for availability, and updates the home page config options accordingly. This heartbeat is sent on every request, to ensure Tor support can be reconfigured without restarting the entire app. If Tor support is enabled, and a subsequent request fails, then a new TorError exception is raised, and the Tor feature is disabled until a valid connection is restored. The max attempts has been updated to 10, since 5 seemed a bit too low for how quickly the attempts go by. * Change send_tor_signal arg type, update function doc send_tor_signal now accepts a stem.Signal arg (a bit cleaner tbh). Also added the doc string for the "disable" attribute in TorError. * Fix tor identity logic in Request.send * Update proxy init, change proxyloc var name Proxy is now only initialized if both type and location are specified, as neither have a default fallback and both are required. I suppose the type could fall back to http, but seems safer this way. Also refactored proxyurl -> proxyloc for the runtime args in order to match the Dockerfile args. * Add tor/proxy support for Docker builds, fix opensearch/init The Dockerfile is now updated to include support for Tor configuration, with a working torrc file included in the repo. An issue with opensearch was fixed as well, which was uncovered during testing and was simple enough to fix here. Likewise, DDG bang gen was updated to only ever happen if the file didn't exist previously, as testing with the file being regenerated every time was tedious. * Add missing "@" for socks proxy requests	4 years ago
Ben Busby	975ece8cd0	Privacy respecting alternatives in results view (#106 ) Full implementation of social media alt redirects (twitter/youtube/instagram -> nitter/invidious/bibliogram) depending on configuration. Verbatim search and option to ignore search autocorrect are now supported as well. Also cleaned up the javascript side of whoogle config so that it now uses arrays of available fields for parsing config values instead of manually assigning each one to a variable. This doesn't include support for Google Maps -> Open Street Maps, that seems a bit more involved than the social media redirects were, so it should likely be a separate effort.	4 years ago
Joao A. Candido Ramos	bf4bf1ff2c	Split interface and results language config (#89 ) Adding support to choose separately the language of search and the one for the interface (allowing a default givent by google). Co-authored-by: Joao <ramos.joao@protonmail.com>	4 years ago
Ben Busby	f86a44b637	Removed no-cache enforcement, minor styling/formatting improvements	4 years ago
Ben Busby	4324fcd8f8	Added better multilingual support, updated filter Results page now includes method for switching to "All Languages" from whichever language is specified as the primary in the config (see #74). Also removes the non-Whoogle links from the page footer, leaving only the page navigation controls Added support for the date range filter on the results page, though I'd still recommend using the ":past <unit>" query instead.	4 years ago
Ben Busby	b6fb4723f9	Project refactor (#85 ) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune ''). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style	4 years ago
Ben Busby	21012f5265	Feature: autocomplete/search suggestions (#72 ) Basic autocomplete/search suggestion functionality added * Adds new GET and POST routes for '/autocomplete' that accept a string query and returns an array of suggestions * Adds new autoscript.js file for handling queries on the main page and results view * Updated requests class to include autocomplete method * Updated opensearch template to handle search suggestions * Added header template to allow for autocomplete on results view * Updated readme to mention autocomplete feature	4 years ago
Ben Busby	09c53b52af	Feature: country and safe search config options (#71 ) * Added country and safe search config options * Updated handling of parser error in results test * Improved handling of default country * Added 1px empty gif fallback as a replacement for images that fail to load	4 years ago
Ben Busby	a11ceb0a57	Feature: language config (#27 ) * Added language configuration support Main page now has a dropdown for selecting preferred language of results. Refactored config to be its own model with language constants. * Added more language support Interface language is now updated using the "hl" arg Fixed chinese traditional and simplified values Updated decoding of characters to gb2312 * Updated to use conditional decoding dependent on language * Updated filter to not rely on valid config to work properly	4 years ago
Ben Busby	445019d204	Fixed RAM usage bug Pushing straight to master since this is an extremely simple fix, with a pretty large performance benefit. The Phyme library used for generating a User Agent rhyme was consuming an absolute unit of memory. Now that it's removed, it's using about 10x less memory, at the cost of User Agents being not as funny anymore.	4 years ago
Ben Busby	0300eab6df	Updated formatting and setup instructions Switched encoding from utf-8 to unicode-escape in an effort to support multiple languages besides English. Updated image results page formatting to fix bad image links (added TODO for adding full res image link for each image result). Updated README to include libcurl and libssl install instructions for manual setup.	4 years ago
Ben Busby	3e404cb524	Restructured valid params checking, added empty query redirect	4 years ago
Ben Busby	1cbe394e6f	Updated tests, fixed a few bugs Added opensearch routes test and individual tests for searching via GET and POST separately. Fixed incorrect assignment in gen_query.	4 years ago
Ben Busby	0c0ebb8917	Added POST search, encrypted query strings, refactoring The implementation of POST search support comes with a few benefits. The most apparent is the avoidance of search queries appearing in web server logs -- instead of the prior GET approach (i.e. /search?q=my+search+query), using POST requests with the query stored in the request body creates logs that simply appear as "/search". Since a lot of relative links are generated in the results page, I came up with a way to generate a unique key at run time that is used to encrypt any query strings before sending to the user. This benefits both regular text queries as well as fetching of image links and means that web logs will only show an encrypted string where a link or query string might slip through. Unfortunately, GET search requests still need to be supported, as it doesn't seem that Firefox (on iOS) supports loading search engines by their opensearch.xml file, but instead relies on manual entry of a search query string. Once this is updated, I'll probably remove GET request search support.	4 years ago
Ben Busby	4180aedd87	Added image proxying, refactored filter class Images were previously directly fetched from google search results, which was a potential privacy hazard. All image sources are now modified to be passed through shoogle's routing first, which will then fetch raw image data and pass it through to the user. Filter class was refactored to split the primary clean method into smaller, more manageable submethods.	4 years ago
Ben Busby	a7005c012e	Refactoring of user requests and routing Curl requests and user agent related functionality was moved to its own request class. Routes was refactored to only include strictly routing related functionality. Filter class was cleaned up (had routing/request related logic in here, which didn't make sense)	4 years ago

49 Commits (7041b43db96142015df8d2b0170f0f2318c57c31)