This can be updated later to allow users with cookies enabled to use a
key that is unique to their session (if they want, not mandatory), but
for now it makes more sense to just use a single key for all queries
from all users. This should eliminate a lot of issues that users have
reported where they are unable to decrypt queries or page elements due
to an expired/renewed session key.
Adds support for encoding (and optionally encrypting) user config values as
a single string that can be passed to any endpoint with the "preferences" url
param.
Co-authored-by: Ben Busby <contact@benbusby.com>
Sessions are no longer validated using the "/session/..." route. This
created a lot of problems due to buggy/unexpected behavior coming from
the Flask-Session dependency, which is (more or less) no longer
maintained.
Sessions are also no longer strictly server-side-only. The majority of
information that was being stored in user sessions was aesthetic only,
aside from the session specific key used to encrypt URLs. This key is
still unique per user, but is not (or shouldn't be) in anyone's threat
model to keep absolutely 100% private from everyone. Especially paranoid
users of Whoogle can easily modify the code to use a randomly generated
encryption key that is reset on session invalidation (and set
invalidation time to a short enough period for their liking).
Ultimately, this should result in much more stable sessions per client.
There shouldn't be decryption issues with element URLs or queries
during result page navigation.
This adds a new "temporary" config section of the results view, where a
user can now change the country that their results come from without
changing their default config settings.
Closes#322
If Whoogle is accessed on a non-standard port _and_ proxied,
this port is lost to the application and `element['src']`s are
incorrectly formed (omitting port).
HTTP x-Forwarded-Host will contain this front port number in
a typical Nginx reverse proxy configuration.
Due to how instances installed with pip seem to have issues storing
unrelated files in the same directory as sessions, exception handling
during session validation has been expanded to blindly ignore all
exceptions. This portion of the code is more for maintainers of large
public instances with a bunch of users who block cookies anyways, so
having basic app functionality break down as a result shouldn't be the
default.
Country config value should be checked against the valid value when
updating the home page config, not the other way around. This can lead
to a state where a user sets up an invalid country value, but can still
be matched against a correct value that is part of the invalid value
(i.e. "countryUK" is invalid, but would match against the correct value,
"UK")
Also minor refactor of where the session file size validation occurs.
For pip installed instances of Whoogle, there seems to be an issue where
files other than sessions are being stored in the same directory as the
sessions. From a brief investigation, this does not seem to be caused by
Whoogle, since Flask-Session objects are the only files stored in that
directory. It could be an issue with the library that is being used for
sessions, however.
Regardless, the app shouldn't crash when trying to validate and remove
invalid sessions, so a file size limit of 4KB was imposed during
validation. Any file found in the session directory that exceeds this
size limit will be ignored.
Fixes#777Fixes#793
Added password authentication for tor control port.
For user configuration of access to tor control port. This file should be
heavily restricted in file system.
Co-authored-by: MadcowOG <madcowog@Arch-Main.localdomain>
This seems to be caused by an odd behavior related to Flask sessions and
instances of Whoogle installed via pip. I didn't investigate it too
much, since catching and ignoring the result doesn't impact Whoogle
functionality at all (configuration and session values persist as
normal). Since this doesn't affect non-pip instances, I don't believe it
to be a fault within Whoogle itself.
Fixes#765
The `/url` endpoint was previously used as a way of mirroring the
`/url?q=<result domain>` formatting of locations in search results from
Google. Rather than have this unnecessary intermediary step, the result
path was extracted and used as the immediate path for each result item
instead.
This endpoint hasn't been in use for many versions and has been in need
of removal for quite some time.
Previously, empty bang searches would redirect to the Whoogle instance
home page. This now redirects to the specific site for the bang search
instead (i.e. "!yt" without a query redirects to "youtube.com", "!gh" to
"github.com", etc)
Fixes#719
The "anon-view" translation key is the correct one to use for accessing
anonymous view within the search results. "config-anon-view" is only for
the configuration menu on the home page.
* Relativization of search results
* Fix JavaScript error when opening images
* Replace single-letter logo and remove sign-in link
* Add `WHOOGLE_URL_PREFIX` env var to support relative path redirection
The `WHOOGLE_URL_PREFIX` var can now be set to fix internal app
redirects, such as the `/session` redirect performed on the first visit
to the Whoogle home page.
Co-authored-by: Ben Busby <contact@benbusby.com>
In some rare instances (a race condition perhaps?) a
`cryptography.fernet.InvalidToken` exception is thrown resulting in
a broken connection.
This change gracefully returns a 401 error instead.
* Expand `/window` endpoint to behave like a proxy
The `/window` endpoint was previously used as a type of proxy, but only
for removing Javascript from the result page. This expands the existing
functionality to allow users to proxy search result pages (with or without
Javascript) through their Whoogle instance.
* Implement filtering of remote content from css
* Condense NoJS feature into Anonymous View
Enabling NoJS now removes Javascript from the Anonymous View, rather
than creating a separate option.
* Exclude 'data:' urls from filter, add translations
The 'data:' url must be allowed in results to view certain elements on
the page, such as stars for review based results.
Add translations for the remaining languages.
* Add cssutils to requirements
Bang searches without an actual query (i.e. just searching "!gh") will
now redirect to the home page. I guess people do this for some reason
and don't like that it redirects to the correct bang result URL, but
without an actual search term.
Fixes#595
Rather than only checking for an available update on app init, the check
for updates now performs the check once every 24 hours on the first
request sent after that period.
This also now catches the requests.exceptions.ConnectionError that is
thrown if the app is initialized without an active internet connection.
Fixes#649
Introduces a header for switching between result types (i.e. "All", "News",
etc) that is consistent between the different result types. Previously, image
results had a tab header that was formatted in a drastically different manner,
which was jarring when switching from a different result page to the Images
page.
Created a G class enum to reference class names returned in search
results. As noted in the class doc, this should only be used/updated as
a last resort, as class names change frequently. For some instances,
such as replacing the tbm tab, it's a lot easier to just replace by
header name than attempting to replace it based on how the element is
structured.
Also updated a few styles to revert the latest styling changes being
applied by Google.
Co-authored-by: jacr13 <ramos.joao@protonmail.com>
Co-authored-by: Ben Busby <contact@benbusby.com>
Initializing the DDG bangs when running whoogle for the first time
creates an indeterminate amount of delay before the app becomes usable,
which makes usability tests (particularly w/ Docker) unreliable. This
moves the bang json init to a background thread and writes a temporary
empty dict to the bangs json file until the full bangs json can be used.
* Integrate Farside into Whoogle
When instances are ratelimited (when a captcha is returned instead of
the user's search results) the user can now hop to a new instance via
Farside, a new backend service that redirects users to working instances
of a particular frontend. In this case, it presents a user with a
Farside link to a new Whoogle (or Searx) instance instead, so that the
user can resume their search.
For the generated Farside->Whoogle link, the generated link includes the
user's current Whoogle configuration settings as URL params, to ensure a
more seamless transition between instances. This doesn't translate to
the Farside->Searx link, but potentially could with some changes.
* Expand conversion of config<->url params
Config settings can now be translated to and from URL params using a
predetermined set of "safe" keys (i.e. config settings that easily
translate to URL params).
* Allow jumping instances via Farside when ratelimited
When instances are ratelimited (when a captcha is returned instead of
the user's search results) the user can now hop to a new instance via
Farside, a new backend service that redirects users to working instances
of a particular frontend. In this case, it presents a user with a
Farside link to a new Whoogle (or Searx) instance instead, so that the
user can resume their search.
For the generated Farside->Whoogle link, the generated link includes the
user's current Whoogle configuration settings as URL params, to ensure a
more seamless transition between instances. This doesn't translate to
the Farside->Searx link, but potentially could with some changes.
Closes#554Closes#559
This implements a method for converting between various currencies. When a user
searches "<currency A> to <currency B>" (including when prefixed by a specific
amount), they are now presented with a table for quickly converting between the
two. This makes use of the currency ratio returned as the first "card" in
currency related searches, and the table is inserted into this same card.
The default CSP is only helpful for some, and can break instances for
others. Since these aren't always necessary and are occasionally set by
the user's preferred reverse proxy, it is being disabled unless
explicitly enabled by setting `WHOOGLE_CSP`.
Fixes#493
This expands on the current testing suite a bit by introducing a new
workflow for testing functionality within the docker container. It runs
the same test suite as the regular "test" workflow, but also performs a
health check after running the app for 10 seconds to ensure
functionality.
The buildx workflow now waits for the docker test script to finish
successfully, rather than the regular test workflow. This will hopefully
avoid situations where new images are pushed with issues that aren't
detected in regular testing of the app.
Flask's `request.url` uses `http` as the protocol, which breaks
instances that enforce `https`, since the session redirect relies on
`request.url` for the follow-through URL.
This introduces a new method for determining the correct URL to use for
these redirects by automatically replacing the protocol with `https` if
the `HTTPS_ONLY` env var is set for that instance.
Fixes#538Fixes#545
HTTPS upgrades should be handled outside of Whoogle, since Flask often
doesn't detect the right protocol when being used behind a reverse proxy
such as Nginx.
This introduces a new approach to handling user sessions, which should
allow for users to set more reliable config settings on public instances.
Previously, when a user with cookies disabled would update their config,
this would modify the app's default config file, which would in turn
cause new users to inherit these settings when visiting the app for the
first time and cause users to inherit these settings when their current
session cookie expired (which was after 30 days by default I believe).
There was also some half-baked logic for determining on the backend
whether or not a user had cookies disabled, which lead to some issues
with out of control session file creation by Flask.
Now, when a user visits the site, their initial request is forwarded to
a session/<session id> endpoint, and during that subsequent request
their current session id is matched against the one found in the url. If
the ids match, the user has cookies enabled. If not, their original
request is modified with a 'cookies_disabled' query param that tells
Flask not to bother trying to set up a new session for that user, and
instead just use the app's fallback Fernet key for encryption and the
default config.
Since attempting to create a session for a user with cookies disabled
creates a new session file, there is now also a clean-up routine included
in the new session decorator, which will remove all sessions that don't
include a valid key in the dict. NOTE!!! This means that current user
sessions on public instances will be cleared once this update is merged
in. In the long run that's a good thing though, since this will allow session
mgmt to be a lot more reliable overall for users regardless of their cookie
preference.
Individual user sessions still use a unique Fernet key for encrypting queries,
but users with cookies disabled will use the default app key for encryption
and decryption.
Sessions are also now (semi)permanent and have a lifetime of 1 year.
This checks the latest released version of Whoogle against
the current app version, and shows an "update available"
message if the current version num < latest release num.
Closes#305
Due to how the response is now reformed into a new bsoup object when
bolding search query terms, creating an ip card for "my ip" searches
threw an error due to how the new bsoup object was initialized for the
"my ip" card. This passes the response in as a string instead.
Fixes#504
This modifies the search result page by bold-ing all appearances
of any word in the original query. If portions of the query are in
quotes (i.e. "ice cream"), only exact matches of the sequence of
words will be made bold.
Co-authored-by: Ben Busby <noreply+git@benbusby.com>
This introduces a new UI element for displaying the client IP
address when a search for "my ip" is used.
Note that this does not show the IP address seen by Google
if Whoogle is deployed remotely. It uses `request.remote_addr`
to display the client IP address in the UI, not the actual address
of the server (which is what Google sees in requests sent from
remote Whoogle instances).
Used in header templates for navigating back to the home page when
behind a reverse proxy config where the app is running from a subpath of
a domain (i.e. "https://something/whoogle/")
Fixes#403