whoogle-search

mirror of https://github.com/benbusby/whoogle-search synced 2024-11-04 18:00:25 +00:00

Author	SHA1	Message	Date
Vansh Comar	f04c7c5557	Support DDG style bangs with bang at the end (#503 ) DDG style bang searches can now have the bang (!) at the end of the search (i.e. "bologna w!" will now redirect to wikipedia just like "bologna !w" would)	2021-10-28 12:39:33 -06:00
DUO Labs	d8dcdc7455	Skip bolding search terms that are not alphanumeric (#496 ) Fixes #494	2021-10-27 10:50:21 -06:00
Ben Busby	591ed4a6d6	Use f-string in bold query regex by @DUOLabs333	2021-10-26 16:21:30 -06:00
Ben Busby	f154b5f2e2	PEP-8 formatting fix	2021-10-26 16:17:38 -06:00
Ben Busby	6decab5a51	Improve regex for bolding search terms Co-authored by @DUOLabs333	2021-10-26 16:15:24 -06:00
DUO Labs	2c9cf3ecc6	Bold search query in results (#487 ) This modifies the search result page by bold-ing all appearances of any word in the original query. If portions of the query are in quotes (i.e. "ice cream"), only exact matches of the sequence of words will be made bold. Co-authored-by: Ben Busby <noreply+git@benbusby.com>	2021-10-26 14:59:23 -06:00
Ben Busby	8f70236403	Update domains used for scribe.rip replacements The levelup.gitconnected.com site is a Medium site that can also be replaced with scribe.rip whenever privacy respecting site alternatives are enabled in the config. Also modified how link descriptions are updated when that config is enabled (before it was missing replacements on quite a few descriptions).	2021-10-23 23:23:37 -06:00
Vansh Comar	771bf34ce9	Show client IP for "my ip" searches (#469 ) This introduces a new UI element for displaying the client IP address when a search for "my ip" is used. Note that this does not show the IP address seen by Google if Whoogle is deployed remotely. It uses `request.remote_addr` to display the client IP address in the UI, not the actual address of the server (which is what Google sees in requests sent from remote Whoogle instances).	2021-10-21 10:42:31 -06:00
Vansh Comar	79fb7531be	Implement scribe.rip replacement for medium.com results (#463 ) scribe.rip is a privacy respecting front end for medium.com. This feature allows medium.com results to be replaced with scribe.rip links, and works for both regular medium.com domains as well as user specific subdomains (i.e. user.medium.com). [scribe.rip website](https://scribe.rip) [scribe.rip source code](https://git.sr.ht/~edwardloveall/scribe) Co-authored-by: Ben Busby <noreply+git@benbusby.com>	2021-10-16 12:22:00 -06:00
Ben Busby	ff885e4fde	Disable autocomplete via WHOOGLE_AUTOCOMPLETE var Setting WHOOGLE_AUTOCOMPLETE to 0 now disables the autocomplete/search suggestion feature. Closes #462	2021-10-14 18:59:10 -06:00
rn83	f18400b1f1	Strip SKIP_PREFIX for SITE_ALTS only (#452 ) Domain prefixes (www, mobile, m) are now striped for site alternatives only.	2021-10-11 14:25:21 -06:00
BlissOWL	f12b0e62c5	Make bang searches case insensitive (#438 ) Bang searches now ignore the capitalization of the operator Co-authored-by: Ben Busby <noreply+git@benbusby.com>	2021-09-27 19:39:58 -06:00
Ben Busby	68fdd55482	Use cache busting for css/js files On app init, short hashes are generated from file checksums to use for cache busting. These hashes are added into the full file name and used to symlink to the actual file contents. These symlinks are loaded in the jinja templates for each page, and can tell the browser to load a new file if the hash changes. This is only in place for css and js files, but can be extended in the future for other file types if needed.	2021-06-30 19:00:01 -04:00
Ben Busby	43faaee77f	Hotfix: remove site filter for maps links The new site filter breaks links to Maps results, so filter.py needed to be updated to handle these links as a unique case. A new method was introduced to easily remove any "-site:..." filters from the query, which is now also used to format queries in the header template rather than manually removing the blocked site list within the template itself. Bumps version to 0.5.1 for releasing the bugfix Fixes #329	2021-05-27 12:01:57 -04:00
Joao A. Candido Ramos	448efb8f2a	Add "view image" functionality (#268 ) * add view image option * prevent whoogle links from opening in a new tab. * remove view image template on mobile requests * change loop values to be more robust to the number of images * Update app/templates/imageresults.html * fix "Basically the .cvifge class needs width: 100%; in order to expand the search input to fit the form width." * Update app/templates/imageresults.html * remove hardcoded string from template * Add view image config var to app.json * Add view image config var to whoogle.env Co-authored-by: jacr13 <ramos.joao@protonmail.com> Co-authored-by: Ben Busby <benbusby@protonmail.com>	2021-05-21 11:19:45 -04:00
Ben Busby	c8da53d4b0	Block websites from search results via user config (#304 ) * Block websites in search results via user config Adds a new config field "Block" to specify a comma separated list of websites to block in search results. This is applied for all searches. * Add test for blocking sites from search results * Document WHOOGLE_CONFIG_BLOCK usage * Strip '-site:' filters from query in header template The 'behind the scenes' site filter applied for blocked sites was appearing in the query field when navigating between search categories (all -> images -> news, etc). This prevents the filter from appearing in all except "images", since the image category uses a separate header. This should eventually be addressed when the image page can begin using the standard whoogle header, but until then, the filter will still appear for image searches.	2021-05-07 11:45:53 -04:00
Ben Busby	50c888f9a7	Revert heroku app https upgrade fix	2021-04-05 11:00:56 -04:00
Ben Busby	df0b7afa50	Switch to single Fernet key per session This moves away from the previous (messy) approach of using two separate keys for decrypting text and element URLs separately and regenerating them for new searches. The current implementation of sessions is not very reliable, which lead to keys being regenerated too soon, which would break page navigation. Until that can be addressed, the single key per session approach should work a lot better. Fixes #250 Fixes #90	2021-04-05 11:00:56 -04:00
Ben Busby	ed4432f3f8	Hotfix: Upgrade heroku apps to https for all endpoints The previous implementation of the is_heroku check in search.needs_https() was implemented to only match URLs ending in '.herokuapp.com', and skipped upgrading to HTTPS for other endpoints.	2021-04-05 11:00:56 -04:00
Shimul	8a10efaa01	Allow setting environment variables in whoogle.env (#237 ) This allows the user to enable their preferred settings in a variety of ways, depending on their deployment preference. Values added to whoogle.env can be enabled using WHOOGLE_DOTENV=1, in which case all values in the env var file will overwrite defaults or user provided settings. Co-authored-by: Ben Busby <benbusby@protonmail.com>	2021-04-05 11:00:56 -04:00
Ben Busby	8ad8e66d37	Improve static typing throughout repo Eventually this should be part of a separate mypy ci build, but right now it's just a general guideline. Future commits and PRs should be validated for static typing wherever possible. For reference, the testing commands used for this commit were: mypy --ignore-missing-imports --pretty --disallow-untyped-calls app/ mypy --ignore-missing-imports --pretty --disallow-untyped-calls test/	2021-04-05 11:00:56 -04:00
Ben Busby	083c3758a1	Return 503 if response is blocked by captcha Also added in a slight modification to the dark theme style, which should only apply the border radius in the header. Closes #226	2021-04-05 11:00:56 -04:00
Ben Busby	e5d1f6a292	Add healthcheck to Dockerfile See #184	2021-04-05 11:00:56 -04:00
Ben Busby	f8dfc78539	Improve naming of _utils files, update fn/class doc The app/utils/_utils weren't named very well, and all have been updated to have more accurate names. Function and class documention for the utils have been updated as well, as part of the effort to improve overall documentation for the project.	2021-04-05 11:00:56 -04:00
Ben Busby	ecb7885a56	Allow bang operator anywhere in query Bang operator can now be placed anywhere in the query, to allow for peak efficiency in stream of consciousness querying (i.e. `big !reddit chungus` will search reddit for big chungus`). Fixes #196	2021-04-05 11:00:56 -04:00
Ben Busby	64567a63ea	Ensure G logo doesn't appear in mobile img results Adds a separate check to remove all images sourced from www.gstatic.com, which is where the mobile logo in particular is coming from.	2021-04-05 11:00:56 -04:00
Ben Busby	6600d8580c	Add ability to redirect reddit.com to libredd.it (#180 ) * Adds the ability to redirect reddit.com to libredd.it using the existing "site alts" config setting. This adds the WHOOGLE_ALT_RD environment variable for optionally redirecting reddit links to libreddit (https://github.com/spikecodes/libreddit). * Include libreddit in home page site alt note	2021-04-05 11:00:56 -04:00
Ben Busby	329c38efb0	Hotfix: Enforce https in heroku opensearch template Heroku instances were using the base http url when formatting the opensearch.xml template. This adds a new routing utility, "needs_https", which can be used for determining if the url in question needs upgrading.	2021-01-23 14:50:30 -05:00
Ben Busby	440c4e9c50	Remove lxml dependency The lxml dependency in the project was fairly unnecessary, and made the initial build time for the project considerably slower. This replaces all instances of lxml with either the default html parser (for bs4 constructors) or the built in xml.etree package (for search suggestion parsing).	2020-12-29 18:43:42 -05:00
Ben Busby	375f4ee9fd	PEP-8: Fix formatting issues, add CI workflow (#161 ) Enforces PEP-8 formatting for all python code Adds a github action build for checking pep8 formatting using pycodestyle	2020-12-17 16:06:47 -05:00
Ben Busby	5b5c2588ed	Fix nojs lxml constructor The BeautifulSoup constructur in gen_nojs needed to explicitly set features='lxml' to silence a warning from the library. Also temporarily disabled the site alts test since the results are too unreliable. This should be moved to a unit test instead.	2020-12-11 19:21:32 -05:00
Ben Busby	6c429e6dd1	Allow setting site alts using environment vars (#155 ) * Add ability to configure site alts w/ env vars Site alternatives (i.e. twitter.com -> nitter.net) can now be configured using environment variables: WHOOGLE_ALT_TW='nitter.net' # twitter alt WHOOGLE_ALT_YT='invidio.us' # youtube alt WHOOGLE_ALT_IG='bibliogram.art/u' # instagram alt Updated testing to confirm results have been modified. * Add site alt vars to docker settings and readme	2020-12-05 17:01:21 -05:00
Ben Busby	2d0823b012	Hotfix: Remove mobile subdomain for invidious redirect See #151	2020-11-28 21:30:58 -05:00
Ben Busby	0afd59056f	Hotfix: update invidious url, remove www from link The invidious instance has been updated to invidious.snopyta.org, since this instance is more reliable and has more users according to instances.invidio.us All site alternative redirects now redirect without the 'www' subdomain, since most of the alternative sites don't have this subdomain set up.	2020-11-28 12:15:04 -05:00
Ben Busby	0d0f32d108	Hotfix: update ad filter for portugese config	2020-11-24 13:14:40 -05:00
Ben Busby	72cbc342af	Add ability to set temp config in search query Dark mode, country, interface language, and search language configs can now be set in the search query by appending each option as a url parameter. Supported args are: 'dark', 'lang_search', 'lang_interface', and 'ctry' Ex: /search?q=%s&dark=1&lang_search=lang_en... These config settings persist across page navigation and switching result type, but will be reset if the main search bar is used. See #144	2020-11-11 00:40:49 -05:00
Ben Busby	0ef098069e	Add tor and http/socks proxy support (#137 ) * Add tor and http/socks proxy support Allows users to enable/disable tor from the config menu, which will forward all requests through Tor. Also adds support for setting environment variables for alternative proxy support. Setting the following variables will forward requests through the proxy: - WHOOGLE_PROXY_USER (optional) - WHOOGLE_PROXY_PASS (optional) - WHOOGLE_PROXY_TYPE (required) - Can be "http", "socks4", or "socks5" - WHOOGLE_PROXY_LOC (required) - Format: "<ip address>:<port>" See #30 * Refactor acquire_tor_conn -> acquire_tor_identity Also updated travis CI to set up tor * Add check for Tor socket on init, improve Tor error handling Initializing the app sends a heartbeat request to Tor to check for availability, and updates the home page config options accordingly. This heartbeat is sent on every request, to ensure Tor support can be reconfigured without restarting the entire app. If Tor support is enabled, and a subsequent request fails, then a new TorError exception is raised, and the Tor feature is disabled until a valid connection is restored. The max attempts has been updated to 10, since 5 seemed a bit too low for how quickly the attempts go by. * Change send_tor_signal arg type, update function doc send_tor_signal now accepts a stem.Signal arg (a bit cleaner tbh). Also added the doc string for the "disable" attribute in TorError. * Fix tor identity logic in Request.send * Update proxy init, change proxyloc var name Proxy is now only initialized if both type and location are specified, as neither have a default fallback and both are required. I suppose the type could fall back to http, but seems safer this way. Also refactored proxyurl -> proxyloc for the runtime args in order to match the Dockerfile args. * Add tor/proxy support for Docker builds, fix opensearch/init The Dockerfile is now updated to include support for Tor configuration, with a working torrc file included in the repo. An issue with opensearch was fixed as well, which was uncovered during testing and was simple enough to fix here. Likewise, DDG bang gen was updated to only ever happen if the file didn't exist previously, as testing with the file being regenerated every time was tedious. * Add missing "@" for socks proxy requests	2020-10-28 20:47:42 -04:00
Ben Busby	ae05e8ff8b	Finished basic implementation of DDG bang feature Initialization of the app now includes generation of a ddg-bang json file, which is used for all bang style searches afterwards. Also added search suggestion handling for bang json lookup. Queries beginning with "!" now reference the bang json file to pull all keys that match. Updated test suite to include basic tests for bang functionality. Updated gitignore to exclude bang subdir.	2020-10-10 15:55:14 -04:00
Ben Busby	2126742b76	Merge branch 'develop' into develop	2020-10-07 18:38:36 -04:00
Ben Busby	9a03b4111d	Clarified country filter, updated invidious result URL (closes #123 ) Improves clarity of the meaning behind the "Country" filter -- Google seemingly uses this value to only return results that are hosted in a particular country, as evidenced in the search differences highlighted in #123. It now mentions that the results are filtered by website hosting location. Also, now that invidio.us is shut down, the fallback URL (invidiou.site) is now used instead.	2020-09-17 18:59:37 -04:00
Ben Busby	975ece8cd0	Privacy respecting alternatives in results view (#106 ) Full implementation of social media alt redirects (twitter/youtube/instagram -> nitter/invidious/bibliogram) depending on configuration. Verbatim search and option to ignore search autocorrect are now supported as well. Also cleaned up the javascript side of whoogle config so that it now uses arrays of available fields for parsing config values instead of manually assigning each one to a variable. This doesn't include support for Google Maps -> Open Street Maps, that seems a bit more involved than the social media redirects were, so it should likely be a separate effort.	2020-07-26 11:53:59 -06:00
Marvin Borner	dd9d87d25b	Added ddg-style !bang-operators This is a proof of concept! The code works, but uses hardcoded operators and may be placed in the wrong file/class. The best-case scenario would be the possibility to use the 13.000+ ddg operators, but I don't know if that's possible without having to redirect to duckduckgo first.	2020-06-26 00:26:02 +02:00
Ben Busby	f7380ae15d	Improving ad filtering for non-English languages	2020-06-11 13:21:40 -06:00
Ben Busby	f86a44b637	Removed no-cache enforcement, minor styling/formatting improvements	2020-06-11 12:14:57 -06:00
Ben Busby	32e837a5e0	Refactored whoogle session mgmt Now allows a fallback "default" session to be used if a user's browser is blocking cookies	2020-06-05 15:24:44 -06:00
Ben Busby	b6fb4723f9	Project refactor (#85 ) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune ''). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style	2020-06-02 12:54:47 -06:00

46 Commits