whoogle-search

mirror of https://github.com/benbusby/whoogle-search synced 2024-11-12 07:10:29 +00:00

Author	SHA1	Message	Date
Ben Busby	44b0fe519c	Revert changes to default language config A recent issue brought up a good point about how the latest changes to setting default language to english break functionality for bilingual users. The change was likely not the best solution for users who were being affected by IP geolocation on their instances -- the right solution for that would be to configure the interface/search language to their preference instead.	2021-06-06 13:39:06 -04:00
Ben Busby	e7a604d428	Fix handling of http (vs https) proxy creation The requests library requires both 'http' and 'https' values in any included proxy dict, and whoogle was previously copying the http proxy to https for simplicity. The assumption was that if the underlying request wasn't able to connect via https, it would default to http (otherwise why have the requirement to specify both?) This led to connectivity issues for users with http only proxies as of the latest urllib and requests package versions, which are a lot more strict with connections over https. With the latest versions, if an https connection cannot be made, the library returns an error. As a result, the new proxy dict must look something like this for plain http proxies: {'http': 'http://domain.tld:port', 'https': 'http://domain.tld:port'} where both http and https are identical, but both are still required.	2021-06-04 15:30:21 -04:00
Ben Busby	614dceeb70	Add fallback interface/search lang + cleanup Since the interface language defaults to IP geolocation by google, the default language is now set to english. Still not sure if this is the best solution, but at least temporarily should clear up some confusion for users with instances deployed in countries outside of their own. Also performed some minor cleanup: - Updated name of strip_blocked_sites to clean_query - Added clean_query to list of jinja template functions - Ensured site block list doesn't contain duplicate filters	2021-06-04 11:09:30 -04:00
Ben Busby	43faaee77f	Hotfix: remove site filter for maps links The new site filter breaks links to Maps results, so filter.py needed to be updated to handle these links as a unique case. A new method was introduced to easily remove any "-site:..." filters from the query, which is now also used to format queries in the header template rather than manually removing the blocked site list within the template itself. Bumps version to 0.5.1 for releasing the bugfix Fixes #329	2021-05-27 12:01:57 -04:00
Joao A. Candido Ramos	448efb8f2a	Add "view image" functionality (#268 ) * add view image option * prevent whoogle links from opening in a new tab. * remove view image template on mobile requests * change loop values to be more robust to the number of images * Update app/templates/imageresults.html * fix "Basically the .cvifge class needs width: 100%; in order to expand the search input to fit the form width." * Update app/templates/imageresults.html * remove hardcoded string from template * Add view image config var to app.json * Add view image config var to whoogle.env Co-authored-by: jacr13 <ramos.joao@protonmail.com> Co-authored-by: Ben Busby <benbusby@protonmail.com>	2021-05-21 11:19:45 -04:00
bruvv	27b6d05b6a	Fix EU consent bug (#320 ) * Update request.py * Use current date to format EU consent cookie Co-authored-by: Ben Busby <benbusby@protonmail.com>	2021-05-18 10:52:24 -04:00
Ben Busby	c8da53d4b0	Block websites from search results via user config (#304 ) * Block websites in search results via user config Adds a new config field "Block" to specify a comma separated list of websites to block in search results. This is applied for all searches. * Add test for blocking sites from search results * Document WHOOGLE_CONFIG_BLOCK usage * Strip '-site:' filters from query in header template The 'behind the scenes' site filter applied for blocked sites was appearing in the query field when navigating between search categories (all -> images -> news, etc). This prevents the filter from appearing in all except "images", since the image category uses a separate header. This should eventually be addressed when the image page can begin using the standard whoogle header, but until then, the filter will still appear for image searches.	2021-05-07 11:45:53 -04:00
Ben Busby	a321d55f13	Hotfix: Send generic "Mozilla" in user agent Randomizing the "Mozilla" portion of the user agent changed the character encoding to GB2312. Setting it to plain "Mozilla" enforces UTF-8 encoding. Bump to version 0.4.1 for release of bug fix Fixes #267	2021-04-08 09:43:41 -04:00
Ben Busby	df0b7afa50	Switch to single Fernet key per session This moves away from the previous (messy) approach of using two separate keys for decrypting text and element URLs separately and regenerating them for new searches. The current implementation of sessions is not very reliable, which lead to keys being regenerated too soon, which would break page navigation. Until that can be addressed, the single key per session approach should work a lot better. Fixes #250 Fixes #90	2021-04-05 11:00:56 -04:00
Ben Busby	8ad8e66d37	Improve static typing throughout repo Eventually this should be part of a separate mypy ci build, but right now it's just a general guideline. Future commits and PRs should be validated for static typing wherever possible. For reference, the testing commands used for this commit were: mypy --ignore-missing-imports --pretty --disallow-untyped-calls app/ mypy --ignore-missing-imports --pretty --disallow-untyped-calls test/	2021-04-05 11:00:56 -04:00
Ben Busby	f8dfc78539	Improve naming of _utils files, update fn/class doc The app/utils/_utils weren't named very well, and all have been updated to have more accurate names. Function and class documention for the utils have been updated as well, as part of the effort to improve overall documentation for the project.	2021-04-05 11:00:56 -04:00
Ben Busby	fdd4ee590f	Hotfix: Set EU consent cookie to pending for all requests See discussion on #243	2021-04-02 12:32:59 -04:00
Ben Busby	440c4e9c50	Remove lxml dependency The lxml dependency in the project was fairly unnecessary, and made the initial build time for the project considerably slower. This replaces all instances of lxml with either the default html parser (for bs4 constructors) or the built in xml.etree package (for search suggestion parsing).	2020-12-29 18:43:42 -05:00
Ben Busby	375f4ee9fd	PEP-8: Fix formatting issues, add CI workflow (#161 ) Enforces PEP-8 formatting for all python code Adds a github action build for checking pep8 formatting using pycodestyle	2020-12-17 16:06:47 -05:00
Ben Busby	0ef098069e	Add tor and http/socks proxy support (#137 ) * Add tor and http/socks proxy support Allows users to enable/disable tor from the config menu, which will forward all requests through Tor. Also adds support for setting environment variables for alternative proxy support. Setting the following variables will forward requests through the proxy: - WHOOGLE_PROXY_USER (optional) - WHOOGLE_PROXY_PASS (optional) - WHOOGLE_PROXY_TYPE (required) - Can be "http", "socks4", or "socks5" - WHOOGLE_PROXY_LOC (required) - Format: "<ip address>:<port>" See #30 * Refactor acquire_tor_conn -> acquire_tor_identity Also updated travis CI to set up tor * Add check for Tor socket on init, improve Tor error handling Initializing the app sends a heartbeat request to Tor to check for availability, and updates the home page config options accordingly. This heartbeat is sent on every request, to ensure Tor support can be reconfigured without restarting the entire app. If Tor support is enabled, and a subsequent request fails, then a new TorError exception is raised, and the Tor feature is disabled until a valid connection is restored. The max attempts has been updated to 10, since 5 seemed a bit too low for how quickly the attempts go by. * Change send_tor_signal arg type, update function doc send_tor_signal now accepts a stem.Signal arg (a bit cleaner tbh). Also added the doc string for the "disable" attribute in TorError. * Fix tor identity logic in Request.send * Update proxy init, change proxyloc var name Proxy is now only initialized if both type and location are specified, as neither have a default fallback and both are required. I suppose the type could fall back to http, but seems safer this way. Also refactored proxyurl -> proxyloc for the runtime args in order to match the Dockerfile args. * Add tor/proxy support for Docker builds, fix opensearch/init The Dockerfile is now updated to include support for Tor configuration, with a working torrc file included in the repo. An issue with opensearch was fixed as well, which was uncovered during testing and was simple enough to fix here. Likewise, DDG bang gen was updated to only ever happen if the file didn't exist previously, as testing with the file being regenerated every time was tedious. * Add missing "@" for socks proxy requests	2020-10-28 20:47:42 -04:00
Ben Busby	975ece8cd0	Privacy respecting alternatives in results view (#106 ) Full implementation of social media alt redirects (twitter/youtube/instagram -> nitter/invidious/bibliogram) depending on configuration. Verbatim search and option to ignore search autocorrect are now supported as well. Also cleaned up the javascript side of whoogle config so that it now uses arrays of available fields for parsing config values instead of manually assigning each one to a variable. This doesn't include support for Google Maps -> Open Street Maps, that seems a bit more involved than the social media redirects were, so it should likely be a separate effort.	2020-07-26 11:53:59 -06:00
Joao A. Candido Ramos	bf4bf1ff2c	Split interface and results language config (#89 ) Adding support to choose separately the language of search and the one for the interface (allowing a default givent by google). Co-authored-by: Joao <ramos.joao@protonmail.com>	2020-06-27 14:23:17 -06:00
Ben Busby	f86a44b637	Removed no-cache enforcement, minor styling/formatting improvements	2020-06-11 12:14:57 -06:00
Ben Busby	4324fcd8f8	Added better multilingual support, updated filter Results page now includes method for switching to "All Languages" from whichever language is specified as the primary in the config (see #74). Also removes the non-Whoogle links from the page footer, leaving only the page navigation controls Added support for the date range filter on the results page, though I'd still recommend using the ":past <unit>" query instead.	2020-06-07 14:06:49 -06:00
Ben Busby	b6fb4723f9	Project refactor (#85 ) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune ''). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style	2020-06-02 12:54:47 -06:00
Ben Busby	21012f5265	Feature: autocomplete/search suggestions (#72 ) Basic autocomplete/search suggestion functionality added * Adds new GET and POST routes for '/autocomplete' that accept a string query and returns an array of suggestions * Adds new autoscript.js file for handling queries on the main page and results view * Updated requests class to include autocomplete method * Updated opensearch template to handle search suggestions * Added header template to allow for autocomplete on results view * Updated readme to mention autocomplete feature	2020-05-24 14:03:11 -06:00
Ben Busby	09c53b52af	Feature: country and safe search config options (#71 ) * Added country and safe search config options * Updated handling of parser error in results test * Improved handling of default country * Added 1px empty gif fallback as a replacement for images that fail to load	2020-05-23 14:27:23 -06:00
Ben Busby	a11ceb0a57	Feature: language config (#27 ) * Added language configuration support Main page now has a dropdown for selecting preferred language of results. Refactored config to be its own model with language constants. * Added more language support Interface language is now updated using the "hl" arg Fixed chinese traditional and simplified values Updated decoding of characters to gb2312 * Updated to use conditional decoding dependent on language * Updated filter to not rely on valid config to work properly	2020-05-12 17:15:53 -06:00
Ben Busby	445019d204	Fixed RAM usage bug Pushing straight to master since this is an extremely simple fix, with a pretty large performance benefit. The Phyme library used for generating a User Agent rhyme was consuming an absolute unit of memory. Now that it's removed, it's using about 10x less memory, at the cost of User Agents being not as funny anymore.	2020-05-12 00:45:56 -06:00
Ben Busby	0300eab6df	Updated formatting and setup instructions Switched encoding from utf-8 to unicode-escape in an effort to support multiple languages besides English. Updated image results page formatting to fix bad image links (added TODO for adding full res image link for each image result). Updated README to include libcurl and libssl install instructions for manual setup.	2020-05-03 19:32:47 -06:00
Ben Busby	3e404cb524	Restructured valid params checking, added empty query redirect	2020-04-29 18:53:58 -06:00
Ben Busby	1cbe394e6f	Updated tests, fixed a few bugs Added opensearch routes test and individual tests for searching via GET and POST separately. Fixed incorrect assignment in gen_query.	2020-04-28 18:59:33 -06:00
Ben Busby	0c0ebb8917	Added POST search, encrypted query strings, refactoring The implementation of POST search support comes with a few benefits. The most apparent is the avoidance of search queries appearing in web server logs -- instead of the prior GET approach (i.e. /search?q=my+search+query), using POST requests with the query stored in the request body creates logs that simply appear as "/search". Since a lot of relative links are generated in the results page, I came up with a way to generate a unique key at run time that is used to encrypt any query strings before sending to the user. This benefits both regular text queries as well as fetching of image links and means that web logs will only show an encrypted string where a link or query string might slip through. Unfortunately, GET search requests still need to be supported, as it doesn't seem that Firefox (on iOS) supports loading search engines by their opensearch.xml file, but instead relies on manual entry of a search query string. Once this is updated, I'll probably remove GET request search support.	2020-04-28 18:19:34 -06:00
Ben Busby	4180aedd87	Added image proxying, refactored filter class Images were previously directly fetched from google search results, which was a potential privacy hazard. All image sources are now modified to be passed through shoogle's routing first, which will then fetch raw image data and pass it through to the user. Filter class was refactored to split the primary clean method into smaller, more manageable submethods.	2020-04-27 20:21:36 -06:00
Ben Busby	a7005c012e	Refactoring of user requests and routing Curl requests and user agent related functionality was moved to its own request class. Routes was refactored to only include strictly routing related functionality. Filter class was cleaned up (had routing/request related logic in here, which didn't make sense)	2020-04-23 20:59:43 -06:00

30 Commits