More MarkupResemblesLocatorWarning warnings have been appearing. This
seems to be caused by parsing HTML content that contains a URL.
This new change suppresses the warning at the root level of the app
before any content has been parsed, so this error shouldn't appear
again.
Fixes#968
The whoogle.env file previously needed to be created and enabled using
the WHOOGLE_DOTENV var. This removes the second step and loads the env
file if it's found during app init.
The Dockerfile has also been updated to copy in whoogle.env if it
exists.
Fixes#909
Introduces the ability to refine searches by time period:
- Past hour
- Past 24 hours
- Past week
- Past month
- Past year
Co-authored-by: Ben Busby <contact@benbusby.com>
This can be updated later to allow users with cookies enabled to use a
key that is unique to their session (if they want, not mandatory), but
for now it makes more sense to just use a single key for all queries
from all users. This should eliminate a lot of issues that users have
reported where they are unable to decrypt queries or page elements due
to an expired/renewed session key.
* Sync setup.cfg with requirements.txt
* Include tests in PyPI tarballs
And exclude them from setuptools
* Set version number only once
Switch to PEP517 standard (pyproject.toml) for builds
Sessions are no longer validated using the "/session/..." route. This
created a lot of problems due to buggy/unexpected behavior coming from
the Flask-Session dependency, which is (more or less) no longer
maintained.
Sessions are also no longer strictly server-side-only. The majority of
information that was being stored in user sessions was aesthetic only,
aside from the session specific key used to encrypt URLs. This key is
still unique per user, but is not (or shouldn't be) in anyone's threat
model to keep absolutely 100% private from everyone. Especially paranoid
users of Whoogle can easily modify the code to use a randomly generated
encryption key that is reset on session invalidation (and set
invalidation time to a short enough period for their liking).
Ultimately, this should result in much more stable sessions per client.
There shouldn't be decryption issues with element URLs or queries
during result page navigation.
For pip installed instances of Whoogle, there seems to be an issue where
files other than sessions are being stored in the same directory as the
sessions. From a brief investigation, this does not seem to be caused by
Whoogle, since Flask-Session objects are the only files stored in that
directory. It could be an issue with the library that is being used for
sessions, however.
Regardless, the app shouldn't crash when trying to validate and remove
invalid sessions, so a file size limit of 4KB was imposed during
validation. Any file found in the session directory that exceeds this
size limit will be ignored.
Fixes#777Fixes#793
SESSION_COOKIE_SAMESITE must be set to 'lax' to allow the user's
previous session to persist when accessing the instance from an external
link. Setting this value to 'strict' causes Whoogle to revalidate a new
session, and fail, resulting in cookies being disabled.
This could be re-evaluated if Whoogle ever switches to client side
configuration instead.
Fixes#749
Wikipedia, imgur, and translate alternatives were all still using
hardcoded URLs when replaced with their respective alternative frontend.
This updates them to use farside instead.
Rather than only checking for an available update on app init, the check
for updates now performs the check once every 24 hours on the first
request sent after that period.
This also now catches the requests.exceptions.ConnectionError that is
thrown if the app is initialized without an active internet connection.
Fixes#649
Introduces a header for switching between result types (i.e. "All", "News",
etc) that is consistent between the different result types. Previously, image
results had a tab header that was formatted in a drastically different manner,
which was jarring when switching from a different result page to the Images
page.
Created a G class enum to reference class names returned in search
results. As noted in the class doc, this should only be used/updated as
a last resort, as class names change frequently. For some instances,
such as replacing the tbm tab, it's a lot easier to just replace by
header name than attempting to replace it based on how the element is
structured.
Also updated a few styles to revert the latest styling changes being
applied by Google.
Co-authored-by: jacr13 <ramos.joao@protonmail.com>
Co-authored-by: Ben Busby <contact@benbusby.com>
Initializing the DDG bangs when running whoogle for the first time
creates an indeterminate amount of delay before the app becomes usable,
which makes usability tests (particularly w/ Docker) unreliable. This
moves the bang json init to a background thread and writes a temporary
empty dict to the bangs json file until the full bangs json can be used.
This introduces a new approach to handling user sessions, which should
allow for users to set more reliable config settings on public instances.
Previously, when a user with cookies disabled would update their config,
this would modify the app's default config file, which would in turn
cause new users to inherit these settings when visiting the app for the
first time and cause users to inherit these settings when their current
session cookie expired (which was after 30 days by default I believe).
There was also some half-baked logic for determining on the backend
whether or not a user had cookies disabled, which lead to some issues
with out of control session file creation by Flask.
Now, when a user visits the site, their initial request is forwarded to
a session/<session id> endpoint, and during that subsequent request
their current session id is matched against the one found in the url. If
the ids match, the user has cookies enabled. If not, their original
request is modified with a 'cookies_disabled' query param that tells
Flask not to bother trying to set up a new session for that user, and
instead just use the app's fallback Fernet key for encryption and the
default config.
Since attempting to create a session for a user with cookies disabled
creates a new session file, there is now also a clean-up routine included
in the new session decorator, which will remove all sessions that don't
include a valid key in the dict. NOTE!!! This means that current user
sessions on public instances will be cleared once this update is merged
in. In the long run that's a good thing though, since this will allow session
mgmt to be a lot more reliable overall for users regardless of their cookie
preference.
Individual user sessions still use a unique Fernet key for encrypting queries,
but users with cookies disabled will use the default app key for encryption
and decryption.
Sessions are also now (semi)permanent and have a lifetime of 1 year.
This checks the latest released version of Whoogle against
the current app version, and shows an "update available"
message if the current version num < latest release num.
Closes#305
Restricting form-action to 'self' in the content security policy
prevented Chrome (and likely other browsers) from using !bangs on the
home page.
Fixes#408
On app init, short hashes are generated from file checksums to use for
cache busting. These hashes are added into the full file name and used
to symlink to the actual file contents. These symlinks are loaded in the
jinja templates for each page, and can tell the browser to load a new
file if the hash changes.
This is only in place for css and js files, but can be extended in the
future for other file types if needed.
Introduces a new config element and environment variable
(WHOOGLE_CONFIG_THEME) for setting the theme of the app. Rather than
just having either light or dark, this allows a user to have their
instance use their current system light/dark preference to determine the
theme to use.
As a result, the dark mode setting (and WHOOGLE_CONFIG_DARK) have been
deprecated, but will still work as expected until a system theme has
been chosen.
* Add support for Lingva translations in results
Searches that contain the word "translate" and are normal search queries
(i.e. not news/images/video/etc) now create an iframe to a Lingva url to
translate the user's search using their configured search language.
The Lingva url can be configured using the WHOOGLE_ALT_TL env var, or
will fall back to the official Lingva instance url (lingva.ml).
For more info, visit https://github.com/TheDavidDelta/lingva-translate
* Add basic test for lingva results
* Allow user specified lingva instances through csp frame-src
* Fix pep8 issue
Since the interface language defaults to IP geolocation by google, the
default language is now set to english. Still not sure if this is the
best solution, but at least temporarily should clear up some confusion
for users with instances deployed in countries outside of their own.
Also performed some minor cleanup:
- Updated name of strip_blocked_sites to clean_query
- Added clean_query to list of jinja template functions
- Ensured site block list doesn't contain duplicate filters
Occasionally the search results will contain links with arguments such
as 'dq', which was being erroneously used in attempts to extract the 'q'
element from query strings. This enforces that only links with '?q=' or
'&q=' (elements with a standalone 'q' arg) will have the element
extracted.
I also refactored the naming of this element once extracted to be just
'q'. Although this seems counterintuitive, it makes a little more sense
since this element is the one we're extracting. It's a vague url arg
name, but it is what it is.
Bump version to 0.5.2 for hotfix release
The new site filter breaks links to Maps results, so filter.py needed
to be updated to handle these links as a unique case. A new method was
introduced to easily remove any "-site:..." filters from the query,
which is now also used to format queries in the header template rather
than manually removing the blocked site list within the template itself.
Bumps version to 0.5.1 for releasing the bugfix
Fixes#329
* Replace hardcoded strings using translation json file
This introduces a new "translations.json" file under app/static/settings
that is loaded on app init and uses the user config value for interface
language to determine the appropriate strings to use in Whoogle-specific
elements of the UI (primarily only on the home page).
* Verify interface lang can be used for localization
Check the configured interface language against the available
localization dict before attempting to use, otherwise fall back to
english.
Also expanded language names in the languages json file.
* Add test for validating translation language keys
Also adds Spanish translation to json (the only non-English language I
can add and reasonably validate on my own).
* Validate all translations against original keyset, update readme
Readme has been updated to include basic contributing guidelines for
both code and translations.
* Add option to disable changing of configuration
Introduces a test to ensure the correct response code is found when
attempting to update the config when disabled, and ensure default config
is unchanged when posting a new config dict.
Attempting to update the config using the API when disabled now returns
a 403 code + redirect.
Co-authored-by: Ben Busby <benbusby@protonmail.com>
The logging from imported modules (stem, in particular) has caused quite
a few users to assume there are errors where there aren't any. The logs
from stem also aren't helpful, as everything in the library works as
expected despite the implication from the logs that it is not working.
Randomizing the "Mozilla" portion of the user agent changed the
character encoding to GB2312. Setting it to plain "Mozilla" enforces
UTF-8 encoding.
Bump to version 0.4.1 for release of bug fix
Fixes#267
This moves away from the previous (messy) approach of using two separate
keys for decrypting text and element URLs separately and regenerating
them for new searches. The current implementation of sessions is not very
reliable, which lead to keys being regenerated too soon, which would
break page navigation. Until that can be addressed, the single
key per session approach should work a lot better.
Fixes#250Fixes#90
This allows the user to enable their preferred settings in a variety of
ways, depending on their deployment preference. Values added to
whoogle.env can be enabled using WHOOGLE_DOTENV=1, in which case all
values in the env var file will overwrite defaults or user provided
settings.
Co-authored-by: Ben Busby <benbusby@protonmail.com>
* Add custom CSS field to config
This allows users to set/customize an instance's theme and appearance to
their liking. The config CSS field is prepopulated with all default CSS
variable values to allow quick editing.
Note that this can be somewhat of a "footgun" if someone updates the
CSS to hide all fields/search/etc. Should probably add some sort of
bandaid "admin" feature for public instances to employ until the whole
cookie/session issue is investigated further.
* Symlink all app static files to test dir
* Refactor app/misc/*.json -> app/static/settings/*.json
The country/language json files are used for user config settings, so
the "misc" name didn't really make sense. Also moved these to the static
folder to make testing easier.
* Fix light theme variables in dark theme css
* Minor style tweaking
The app/utils/*_utils weren't named very well, and all have been updated
to have more accurate names.
Function and class documention for the utils have been updated as well,
as part of the effort to improve overall documentation for the project.
Introduces a new content security policy header for responses to all
requests to reduce the possibility of ip leaks to outside connections.
By default blocks all inline scripts, and only allows content loaded
from Whoogle.
Refactors a few small inline scripting cases in the project to their own
individual scripts.