whoogle-search

mirror of https://github.com/benbusby/whoogle-search synced 2024-11-18 09:25:33 +00:00

Author	SHA1	Message	Date
Ben Busby	23402e27e1	Check for updates using 24 hour time delta Rather than only checking for an available update on app init, the check for updates now performs the check once every 24 hours on the first request sent after that period. This also now catches the requests.exceptions.ConnectionError that is thrown if the app is initialized without an active internet connection. Fixes #649	2022-02-14 12:19:02 -07:00
Ben Busby	d33e8241dc	Fix "my ip" search regression Removes dependency on class names for creating the "my ip" info card in the results list for searches pertaining to the user's public IP. Adds test to prevent this from happening again. Note to anyone reading this and looking to contribute: please avoid using hardcoded class names at all costs. This approach of creating/removing content just results in issues if/when Google decides to introduce/remove class names from the result page. Fixes #657	2022-02-14 11:40:11 -07:00
Joao A. Candido Ramos	11099f7b1d	Use consistent header for all result types (#535 ) Introduces a header for switching between result types (i.e. "All", "News", etc) that is consistent between the different result types. Previously, image results had a tab header that was formatted in a drastically different manner, which was jarring when switching from a different result page to the Images page. Created a G class enum to reference class names returned in search results. As noted in the class doc, this should only be used/updated as a last resort, as class names change frequently. For some instances, such as replacing the tbm tab, it's a lot easier to just replace by header name than attempting to replace it based on how the element is structured. Also updated a few styles to revert the latest styling changes being applied by Google. Co-authored-by: jacr13 <ramos.joao@protonmail.com> Co-authored-by: Ben Busby <contact@benbusby.com>	2022-02-07 10:47:25 -07:00
Ben Busby	72e5a227c8	Move bangs init to bg thread Initializing the DDG bangs when running whoogle for the first time creates an indeterminate amount of delay before the app becomes usable, which makes usability tests (particularly w/ Docker) unreliable. This moves the bang json init to a background thread and writes a temporary empty dict to the bangs json file until the full bangs json can be used.	2022-01-25 12:28:06 -07:00
DUO Labs	74cb48086c	Introduce site alts for imgur and wikipedia (#609 ) * Add `WHOOGLE_ALT_IMG` for a replacement for imgur. * Add `WHOOGLE_ALT_WIKI` for Wikipedia	2022-01-14 09:59:03 -07:00
Ben Busby	634d179568	Use farside.link for frontend alternatives in results (#560 ) * Integrate Farside into Whoogle When instances are ratelimited (when a captcha is returned instead of the user's search results) the user can now hop to a new instance via Farside, a new backend service that redirects users to working instances of a particular frontend. In this case, it presents a user with a Farside link to a new Whoogle (or Searx) instance instead, so that the user can resume their search. For the generated Farside->Whoogle link, the generated link includes the user's current Whoogle configuration settings as URL params, to ensure a more seamless transition between instances. This doesn't translate to the Farside->Searx link, but potentially could with some changes. * Expand conversion of config<->url params Config settings can now be translated to and from URL params using a predetermined set of "safe" keys (i.e. config settings that easily translate to URL params). * Allow jumping instances via Farside when ratelimited When instances are ratelimited (when a captcha is returned instead of the user's search results) the user can now hop to a new instance via Farside, a new backend service that redirects users to working instances of a particular frontend. In this case, it presents a user with a Farside link to a new Whoogle (or Searx) instance instead, so that the user can resume their search. For the generated Farside->Whoogle link, the generated link includes the user's current Whoogle configuration settings as URL params, to ensure a more seamless transition between instances. This doesn't translate to the Farside->Searx link, but potentially could with some changes. Closes #554 Closes #559	2021-12-08 17:27:33 -07:00
Vansh Comar	7bea6349a0	Add tools for currency conversion in search results (#536 ) This implements a method for converting between various currencies. When a user searches "<currency A> to <currency B>" (including when prefixed by a specific amount), they are now presented with a table for quickly converting between the two. This makes use of the currency ratio returned as the first "card" in currency related searches, and the table is inserted into this same card.	2021-12-06 22:56:13 -07:00
Ben Busby	10a15e06e1	Fix incorrect request type for image searches Previously had hardcoded POST requests for all requests that didn't use the header template (which currently is only the image tab). Also refactored how the Filter class works. It now requires a valid Config model to be provided, which is then set up as a class var that the filtering functions can use as needed, rather than setting specific values from the config as individual values (which was confusing and sloppy). Fixes #561	2021-12-06 21:39:50 -07:00
Ben Busby	6f5f3d8ca7	Fix incorrect redirect protocol used by Flask Flask's `request.url` uses `http` as the protocol, which breaks instances that enforce `https`, since the session redirect relies on `request.url` for the follow-through URL. This introduces a new method for determining the correct URL to use for these redirects by automatically replacing the protocol with `https` if the `HTTPS_ONLY` env var is set for that instance. Fixes #538 Fixes #545	2021-11-21 23:21:04 -07:00
Ben Busby	e06ff85579	Improve public instance session management (#480 ) This introduces a new approach to handling user sessions, which should allow for users to set more reliable config settings on public instances. Previously, when a user with cookies disabled would update their config, this would modify the app's default config file, which would in turn cause new users to inherit these settings when visiting the app for the first time and cause users to inherit these settings when their current session cookie expired (which was after 30 days by default I believe). There was also some half-baked logic for determining on the backend whether or not a user had cookies disabled, which lead to some issues with out of control session file creation by Flask. Now, when a user visits the site, their initial request is forwarded to a session/<session id> endpoint, and during that subsequent request their current session id is matched against the one found in the url. If the ids match, the user has cookies enabled. If not, their original request is modified with a 'cookies_disabled' query param that tells Flask not to bother trying to set up a new session for that user, and instead just use the app's fallback Fernet key for encryption and the default config. Since attempting to create a session for a user with cookies disabled creates a new session file, there is now also a clean-up routine included in the new session decorator, which will remove all sessions that don't include a valid key in the dict. NOTE!!! This means that current user sessions on public instances will be cleared once this update is merged in. In the long run that's a good thing though, since this will allow session mgmt to be a lot more reliable overall for users regardless of their cookie preference. Individual user sessions still use a unique Fernet key for encrypting queries, but users with cookies disabled will use the default app key for encryption and decryption. Sessions are also now (semi)permanent and have a lifetime of 1 year.	2021-11-17 19:35:30 -07:00
Ben Busby	e93507f148	Catch connection error during Tor validation step Validation of the Tor connection occasionally fails with a ConnectionError from requests, which was previously uncaught. This is now handled appropriately (error message shown and connection dropped). Fixes #532	2021-11-12 17:19:45 -07:00
Ben Busby	c766554eea	Bang refactor PEP-8 fix Addresses PEP-8 formatting issue in previous commit	2021-11-01 16:53:19 -06:00
Ben Busby	ddf951de35	Use `replace` in bang query formatting Using `format` for formatting bang queries caused a KeyError for some searches, such as !hd (HUDOC). In that example, the URL returned in the bangs json was `http://...#{%22fulltext%22:[%22{}%22]...`, where standard formatting would not work due to the misidentification of "fulltext" as a formatting key. The logic has been updated to just replace the first occurence of "{}" in the URL returned by the bangs dict. Fixes #513	2021-11-01 16:47:48 -06:00
gripped	d1c9b7f803	Remove styling from NoJS liks (#511 ) Fixes #510	2021-11-01 16:03:47 -06:00
Ben Busby	7fe066b4ea	Escape result html after bolding search terms Fixes #518	2021-11-01 15:35:57 -06:00
gripped	c2ced23073	Improve formatting with NoJS enabled (#509 ) Removes line breaks, divider, and link location from all NoJS links in results when NoJS mode is enabled	2021-10-29 09:28:05 -06:00
Ben Busby	0a78c524fa	Expand 'my ip' to work for proxied requests Adds a check for the HTTP_X_FORWARDED_FOR header, and uses the value from the request if found.	2021-10-28 21:31:24 -06:00
DUO Labs	5189cdb072	Update "skip bolding" regex to fix some edge cases (#500 ) Should address errors caused by the "bold query" feature replacing tags and style elements, resulting in unformatted response pages.	2021-10-28 12:54:27 -06:00
Vansh Comar	f04c7c5557	Support DDG style bangs with bang at the end (#503 ) DDG style bang searches can now have the bang (!) at the end of the search (i.e. "bologna w!" will now redirect to wikipedia just like "bologna !w" would)	2021-10-28 12:39:33 -06:00
DUO Labs	d8dcdc7455	Skip bolding search terms that are not alphanumeric (#496 ) Fixes #494	2021-10-27 10:50:21 -06:00
Ben Busby	591ed4a6d6	Use f-string in bold query regex by @DUOLabs333	2021-10-26 16:21:30 -06:00
Ben Busby	f154b5f2e2	PEP-8 formatting fix	2021-10-26 16:17:38 -06:00
Ben Busby	6decab5a51	Improve regex for bolding search terms Co-authored by @DUOLabs333	2021-10-26 16:15:24 -06:00
DUO Labs	2c9cf3ecc6	Bold search query in results (#487 ) This modifies the search result page by bold-ing all appearances of any word in the original query. If portions of the query are in quotes (i.e. "ice cream"), only exact matches of the sequence of words will be made bold. Co-authored-by: Ben Busby <noreply+git@benbusby.com>	2021-10-26 14:59:23 -06:00
Ben Busby	8f70236403	Update domains used for scribe.rip replacements The levelup.gitconnected.com site is a Medium site that can also be replaced with scribe.rip whenever privacy respecting site alternatives are enabled in the config. Also modified how link descriptions are updated when that config is enabled (before it was missing replacements on quite a few descriptions).	2021-10-23 23:23:37 -06:00
Vansh Comar	771bf34ce9	Show client IP for "my ip" searches (#469 ) This introduces a new UI element for displaying the client IP address when a search for "my ip" is used. Note that this does not show the IP address seen by Google if Whoogle is deployed remotely. It uses `request.remote_addr` to display the client IP address in the UI, not the actual address of the server (which is what Google sees in requests sent from remote Whoogle instances).	2021-10-21 10:42:31 -06:00
Vansh Comar	79fb7531be	Implement scribe.rip replacement for medium.com results (#463 ) scribe.rip is a privacy respecting front end for medium.com. This feature allows medium.com results to be replaced with scribe.rip links, and works for both regular medium.com domains as well as user specific subdomains (i.e. user.medium.com). [scribe.rip website](https://scribe.rip) [scribe.rip source code](https://git.sr.ht/~edwardloveall/scribe) Co-authored-by: Ben Busby <noreply+git@benbusby.com>	2021-10-16 12:22:00 -06:00
Ben Busby	ff885e4fde	Disable autocomplete via WHOOGLE_AUTOCOMPLETE var Setting WHOOGLE_AUTOCOMPLETE to 0 now disables the autocomplete/search suggestion feature. Closes #462	2021-10-14 18:59:10 -06:00
rn83	f18400b1f1	Strip SKIP_PREFIX for SITE_ALTS only (#452 ) Domain prefixes (www, mobile, m) are now striped for site alternatives only.	2021-10-11 14:25:21 -06:00
BlissOWL	f12b0e62c5	Make bang searches case insensitive (#438 ) Bang searches now ignore the capitalization of the operator Co-authored-by: Ben Busby <noreply+git@benbusby.com>	2021-09-27 19:39:58 -06:00
Ben Busby	68fdd55482	Use cache busting for css/js files On app init, short hashes are generated from file checksums to use for cache busting. These hashes are added into the full file name and used to symlink to the actual file contents. These symlinks are loaded in the jinja templates for each page, and can tell the browser to load a new file if the hash changes. This is only in place for css and js files, but can be extended in the future for other file types if needed.	2021-06-30 19:00:01 -04:00
Ben Busby	43faaee77f	Hotfix: remove site filter for maps links The new site filter breaks links to Maps results, so filter.py needed to be updated to handle these links as a unique case. A new method was introduced to easily remove any "-site:..." filters from the query, which is now also used to format queries in the header template rather than manually removing the blocked site list within the template itself. Bumps version to 0.5.1 for releasing the bugfix Fixes #329	2021-05-27 12:01:57 -04:00
Joao A. Candido Ramos	448efb8f2a	Add "view image" functionality (#268 ) * add view image option * prevent whoogle links from opening in a new tab. * remove view image template on mobile requests * change loop values to be more robust to the number of images * Update app/templates/imageresults.html * fix "Basically the .cvifge class needs width: 100%; in order to expand the search input to fit the form width." * Update app/templates/imageresults.html * remove hardcoded string from template * Add view image config var to app.json * Add view image config var to whoogle.env Co-authored-by: jacr13 <ramos.joao@protonmail.com> Co-authored-by: Ben Busby <benbusby@protonmail.com>	2021-05-21 11:19:45 -04:00
Ben Busby	c8da53d4b0	Block websites from search results via user config (#304 ) * Block websites in search results via user config Adds a new config field "Block" to specify a comma separated list of websites to block in search results. This is applied for all searches. * Add test for blocking sites from search results * Document WHOOGLE_CONFIG_BLOCK usage * Strip '-site:' filters from query in header template The 'behind the scenes' site filter applied for blocked sites was appearing in the query field when navigating between search categories (all -> images -> news, etc). This prevents the filter from appearing in all except "images", since the image category uses a separate header. This should eventually be addressed when the image page can begin using the standard whoogle header, but until then, the filter will still appear for image searches.	2021-05-07 11:45:53 -04:00
Ben Busby	50c888f9a7	Revert heroku app https upgrade fix	2021-04-05 11:00:56 -04:00
Ben Busby	df0b7afa50	Switch to single Fernet key per session This moves away from the previous (messy) approach of using two separate keys for decrypting text and element URLs separately and regenerating them for new searches. The current implementation of sessions is not very reliable, which lead to keys being regenerated too soon, which would break page navigation. Until that can be addressed, the single key per session approach should work a lot better. Fixes #250 Fixes #90	2021-04-05 11:00:56 -04:00
Ben Busby	ed4432f3f8	Hotfix: Upgrade heroku apps to https for all endpoints The previous implementation of the is_heroku check in search.needs_https() was implemented to only match URLs ending in '.herokuapp.com', and skipped upgrading to HTTPS for other endpoints.	2021-04-05 11:00:56 -04:00
Shimul	8a10efaa01	Allow setting environment variables in whoogle.env (#237 ) This allows the user to enable their preferred settings in a variety of ways, depending on their deployment preference. Values added to whoogle.env can be enabled using WHOOGLE_DOTENV=1, in which case all values in the env var file will overwrite defaults or user provided settings. Co-authored-by: Ben Busby <benbusby@protonmail.com>	2021-04-05 11:00:56 -04:00
Ben Busby	8ad8e66d37	Improve static typing throughout repo Eventually this should be part of a separate mypy ci build, but right now it's just a general guideline. Future commits and PRs should be validated for static typing wherever possible. For reference, the testing commands used for this commit were: mypy --ignore-missing-imports --pretty --disallow-untyped-calls app/ mypy --ignore-missing-imports --pretty --disallow-untyped-calls test/	2021-04-05 11:00:56 -04:00
Ben Busby	083c3758a1	Return 503 if response is blocked by captcha Also added in a slight modification to the dark theme style, which should only apply the border radius in the header. Closes #226	2021-04-05 11:00:56 -04:00
Ben Busby	e5d1f6a292	Add healthcheck to Dockerfile See #184	2021-04-05 11:00:56 -04:00
Ben Busby	f8dfc78539	Improve naming of _utils files, update fn/class doc The app/utils/_utils weren't named very well, and all have been updated to have more accurate names. Function and class documention for the utils have been updated as well, as part of the effort to improve overall documentation for the project.	2021-04-05 11:00:56 -04:00
Ben Busby	ecb7885a56	Allow bang operator anywhere in query Bang operator can now be placed anywhere in the query, to allow for peak efficiency in stream of consciousness querying (i.e. `big !reddit chungus` will search reddit for big chungus`). Fixes #196	2021-04-05 11:00:56 -04:00
Ben Busby	64567a63ea	Ensure G logo doesn't appear in mobile img results Adds a separate check to remove all images sourced from www.gstatic.com, which is where the mobile logo in particular is coming from.	2021-04-05 11:00:56 -04:00
Ben Busby	6600d8580c	Add ability to redirect reddit.com to libredd.it (#180 ) * Adds the ability to redirect reddit.com to libredd.it using the existing "site alts" config setting. This adds the WHOOGLE_ALT_RD environment variable for optionally redirecting reddit links to libreddit (https://github.com/spikecodes/libreddit). * Include libreddit in home page site alt note	2021-04-05 11:00:56 -04:00
Ben Busby	329c38efb0	Hotfix: Enforce https in heroku opensearch template Heroku instances were using the base http url when formatting the opensearch.xml template. This adds a new routing utility, "needs_https", which can be used for determining if the url in question needs upgrading.	2021-01-23 14:50:30 -05:00
Ben Busby	440c4e9c50	Remove lxml dependency The lxml dependency in the project was fairly unnecessary, and made the initial build time for the project considerably slower. This replaces all instances of lxml with either the default html parser (for bs4 constructors) or the built in xml.etree package (for search suggestion parsing).	2020-12-29 18:43:42 -05:00
Ben Busby	375f4ee9fd	PEP-8: Fix formatting issues, add CI workflow (#161 ) Enforces PEP-8 formatting for all python code Adds a github action build for checking pep8 formatting using pycodestyle	2020-12-17 16:06:47 -05:00
Ben Busby	5b5c2588ed	Fix nojs lxml constructor The BeautifulSoup constructur in gen_nojs needed to explicitly set features='lxml' to silence a warning from the library. Also temporarily disabled the site alts test since the results are too unreliable. This should be moved to a unit test instead.	2020-12-11 19:21:32 -05:00
Ben Busby	6c429e6dd1	Allow setting site alts using environment vars (#155 ) * Add ability to configure site alts w/ env vars Site alternatives (i.e. twitter.com -> nitter.net) can now be configured using environment variables: WHOOGLE_ALT_TW='nitter.net' # twitter alt WHOOGLE_ALT_YT='invidio.us' # youtube alt WHOOGLE_ALT_IG='bibliogram.art/u' # instagram alt Updated testing to confirm results have been modified. * Add site alt vars to docker settings and readme	2020-12-05 17:01:21 -05:00
Ben Busby	2d0823b012	Hotfix: Remove mobile subdomain for invidious redirect See #151	2020-11-28 21:30:58 -05:00
Ben Busby	0afd59056f	Hotfix: update invidious url, remove www from link The invidious instance has been updated to invidious.snopyta.org, since this instance is more reliable and has more users according to instances.invidio.us All site alternative redirects now redirect without the 'www' subdomain, since most of the alternative sites don't have this subdomain set up.	2020-11-28 12:15:04 -05:00
Ben Busby	0d0f32d108	Hotfix: update ad filter for portugese config	2020-11-24 13:14:40 -05:00
Ben Busby	72cbc342af	Add ability to set temp config in search query Dark mode, country, interface language, and search language configs can now be set in the search query by appending each option as a url parameter. Supported args are: 'dark', 'lang_search', 'lang_interface', and 'ctry' Ex: /search?q=%s&dark=1&lang_search=lang_en... These config settings persist across page navigation and switching result type, but will be reset if the main search bar is used. See #144	2020-11-11 00:40:49 -05:00
Ben Busby	0ef098069e	Add tor and http/socks proxy support (#137 ) * Add tor and http/socks proxy support Allows users to enable/disable tor from the config menu, which will forward all requests through Tor. Also adds support for setting environment variables for alternative proxy support. Setting the following variables will forward requests through the proxy: - WHOOGLE_PROXY_USER (optional) - WHOOGLE_PROXY_PASS (optional) - WHOOGLE_PROXY_TYPE (required) - Can be "http", "socks4", or "socks5" - WHOOGLE_PROXY_LOC (required) - Format: "<ip address>:<port>" See #30 * Refactor acquire_tor_conn -> acquire_tor_identity Also updated travis CI to set up tor * Add check for Tor socket on init, improve Tor error handling Initializing the app sends a heartbeat request to Tor to check for availability, and updates the home page config options accordingly. This heartbeat is sent on every request, to ensure Tor support can be reconfigured without restarting the entire app. If Tor support is enabled, and a subsequent request fails, then a new TorError exception is raised, and the Tor feature is disabled until a valid connection is restored. The max attempts has been updated to 10, since 5 seemed a bit too low for how quickly the attempts go by. * Change send_tor_signal arg type, update function doc send_tor_signal now accepts a stem.Signal arg (a bit cleaner tbh). Also added the doc string for the "disable" attribute in TorError. * Fix tor identity logic in Request.send * Update proxy init, change proxyloc var name Proxy is now only initialized if both type and location are specified, as neither have a default fallback and both are required. I suppose the type could fall back to http, but seems safer this way. Also refactored proxyurl -> proxyloc for the runtime args in order to match the Dockerfile args. * Add tor/proxy support for Docker builds, fix opensearch/init The Dockerfile is now updated to include support for Tor configuration, with a working torrc file included in the repo. An issue with opensearch was fixed as well, which was uncovered during testing and was simple enough to fix here. Likewise, DDG bang gen was updated to only ever happen if the file didn't exist previously, as testing with the file being regenerated every time was tedious. * Add missing "@" for socks proxy requests	2020-10-28 20:47:42 -04:00
Ben Busby	ae05e8ff8b	Finished basic implementation of DDG bang feature Initialization of the app now includes generation of a ddg-bang json file, which is used for all bang style searches afterwards. Also added search suggestion handling for bang json lookup. Queries beginning with "!" now reference the bang json file to pull all keys that match. Updated test suite to include basic tests for bang functionality. Updated gitignore to exclude bang subdir.	2020-10-10 15:55:14 -04:00
Ben Busby	2126742b76	Merge branch 'develop' into develop	2020-10-07 18:38:36 -04:00
Ben Busby	9a03b4111d	Clarified country filter, updated invidious result URL (closes #123 ) Improves clarity of the meaning behind the "Country" filter -- Google seemingly uses this value to only return results that are hosted in a particular country, as evidenced in the search differences highlighted in #123. It now mentions that the results are filtered by website hosting location. Also, now that invidio.us is shut down, the fallback URL (invidiou.site) is now used instead.	2020-09-17 18:59:37 -04:00
Ben Busby	975ece8cd0	Privacy respecting alternatives in results view (#106 ) Full implementation of social media alt redirects (twitter/youtube/instagram -> nitter/invidious/bibliogram) depending on configuration. Verbatim search and option to ignore search autocorrect are now supported as well. Also cleaned up the javascript side of whoogle config so that it now uses arrays of available fields for parsing config values instead of manually assigning each one to a variable. This doesn't include support for Google Maps -> Open Street Maps, that seems a bit more involved than the social media redirects were, so it should likely be a separate effort.	2020-07-26 11:53:59 -06:00
Marvin Borner	dd9d87d25b	Added ddg-style !bang-operators This is a proof of concept! The code works, but uses hardcoded operators and may be placed in the wrong file/class. The best-case scenario would be the possibility to use the 13.000+ ddg operators, but I don't know if that's possible without having to redirect to duckduckgo first.	2020-06-26 00:26:02 +02:00
Ben Busby	f7380ae15d	Improving ad filtering for non-English languages	2020-06-11 13:21:40 -06:00
Ben Busby	f86a44b637	Removed no-cache enforcement, minor styling/formatting improvements	2020-06-11 12:14:57 -06:00
Ben Busby	32e837a5e0	Refactored whoogle session mgmt Now allows a fallback "default" session to be used if a user's browser is blocking cookies	2020-06-05 15:24:44 -06:00
Ben Busby	b6fb4723f9	Project refactor (#85 ) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune ''). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style	2020-06-02 12:54:47 -06:00

1 2 3

114 Commits