Restricting form-action to 'self' in the content security policy
prevented Chrome (and likely other browsers) from using !bangs on the
home page.
Fixes#408
On app init, short hashes are generated from file checksums to use for
cache busting. These hashes are added into the full file name and used
to symlink to the actual file contents. These symlinks are loaded in the
jinja templates for each page, and can tell the browser to load a new
file if the hash changes.
This is only in place for css and js files, but can be extended in the
future for other file types if needed.
Introduces a new config element and environment variable
(WHOOGLE_CONFIG_THEME) for setting the theme of the app. Rather than
just having either light or dark, this allows a user to have their
instance use their current system light/dark preference to determine the
theme to use.
As a result, the dark mode setting (and WHOOGLE_CONFIG_DARK) have been
deprecated, but will still work as expected until a system theme has
been chosen.
* Add support for Lingva translations in results
Searches that contain the word "translate" and are normal search queries
(i.e. not news/images/video/etc) now create an iframe to a Lingva url to
translate the user's search using their configured search language.
The Lingva url can be configured using the WHOOGLE_ALT_TL env var, or
will fall back to the official Lingva instance url (lingva.ml).
For more info, visit https://github.com/TheDavidDelta/lingva-translate
* Add basic test for lingva results
* Allow user specified lingva instances through csp frame-src
* Fix pep8 issue
Since the interface language defaults to IP geolocation by google, the
default language is now set to english. Still not sure if this is the
best solution, but at least temporarily should clear up some confusion
for users with instances deployed in countries outside of their own.
Also performed some minor cleanup:
- Updated name of strip_blocked_sites to clean_query
- Added clean_query to list of jinja template functions
- Ensured site block list doesn't contain duplicate filters
Occasionally the search results will contain links with arguments such
as 'dq', which was being erroneously used in attempts to extract the 'q'
element from query strings. This enforces that only links with '?q=' or
'&q=' (elements with a standalone 'q' arg) will have the element
extracted.
I also refactored the naming of this element once extracted to be just
'q'. Although this seems counterintuitive, it makes a little more sense
since this element is the one we're extracting. It's a vague url arg
name, but it is what it is.
Bump version to 0.5.2 for hotfix release
The new site filter breaks links to Maps results, so filter.py needed
to be updated to handle these links as a unique case. A new method was
introduced to easily remove any "-site:..." filters from the query,
which is now also used to format queries in the header template rather
than manually removing the blocked site list within the template itself.
Bumps version to 0.5.1 for releasing the bugfix
Fixes#329
* Replace hardcoded strings using translation json file
This introduces a new "translations.json" file under app/static/settings
that is loaded on app init and uses the user config value for interface
language to determine the appropriate strings to use in Whoogle-specific
elements of the UI (primarily only on the home page).
* Verify interface lang can be used for localization
Check the configured interface language against the available
localization dict before attempting to use, otherwise fall back to
english.
Also expanded language names in the languages json file.
* Add test for validating translation language keys
Also adds Spanish translation to json (the only non-English language I
can add and reasonably validate on my own).
* Validate all translations against original keyset, update readme
Readme has been updated to include basic contributing guidelines for
both code and translations.
* Add option to disable changing of configuration
Introduces a test to ensure the correct response code is found when
attempting to update the config when disabled, and ensure default config
is unchanged when posting a new config dict.
Attempting to update the config using the API when disabled now returns
a 403 code + redirect.
Co-authored-by: Ben Busby <benbusby@protonmail.com>
The logging from imported modules (stem, in particular) has caused quite
a few users to assume there are errors where there aren't any. The logs
from stem also aren't helpful, as everything in the library works as
expected despite the implication from the logs that it is not working.
Randomizing the "Mozilla" portion of the user agent changed the
character encoding to GB2312. Setting it to plain "Mozilla" enforces
UTF-8 encoding.
Bump to version 0.4.1 for release of bug fix
Fixes#267
This moves away from the previous (messy) approach of using two separate
keys for decrypting text and element URLs separately and regenerating
them for new searches. The current implementation of sessions is not very
reliable, which lead to keys being regenerated too soon, which would
break page navigation. Until that can be addressed, the single
key per session approach should work a lot better.
Fixes#250Fixes#90
This allows the user to enable their preferred settings in a variety of
ways, depending on their deployment preference. Values added to
whoogle.env can be enabled using WHOOGLE_DOTENV=1, in which case all
values in the env var file will overwrite defaults or user provided
settings.
Co-authored-by: Ben Busby <benbusby@protonmail.com>
* Add custom CSS field to config
This allows users to set/customize an instance's theme and appearance to
their liking. The config CSS field is prepopulated with all default CSS
variable values to allow quick editing.
Note that this can be somewhat of a "footgun" if someone updates the
CSS to hide all fields/search/etc. Should probably add some sort of
bandaid "admin" feature for public instances to employ until the whole
cookie/session issue is investigated further.
* Symlink all app static files to test dir
* Refactor app/misc/*.json -> app/static/settings/*.json
The country/language json files are used for user config settings, so
the "misc" name didn't really make sense. Also moved these to the static
folder to make testing easier.
* Fix light theme variables in dark theme css
* Minor style tweaking
The app/utils/*_utils weren't named very well, and all have been updated
to have more accurate names.
Function and class documention for the utils have been updated as well,
as part of the effort to improve overall documentation for the project.
Introduces a new content security policy header for responses to all
requests to reduce the possibility of ip leaks to outside connections.
By default blocks all inline scripts, and only allows content loaded
from Whoogle.
Refactors a few small inline scripting cases in the project to their own
individual scripts.
Pip installs of whoogle search were missing access to the misc/ folder,
which previously contained the language and country json files. These
have been moved to app/misc, and the previous root level misc/ was
renamed to config/ (since it now only contains the tor config files).
Bump to 0.3.1.
Moves the language and country dicts from the config model to json files
that are loaded during app init and stored in the app config dict. This
substantially improves the readability of the config model and allows
for much more sensible loading of the language/country options.
* Add tor and http/socks proxy support
Allows users to enable/disable tor from the config menu, which will
forward all requests through Tor.
Also adds support for setting environment variables for alternative
proxy support. Setting the following variables will forward requests
through the proxy:
- WHOOGLE_PROXY_USER (optional)
- WHOOGLE_PROXY_PASS (optional)
- WHOOGLE_PROXY_TYPE (required)
- Can be "http", "socks4", or "socks5"
- WHOOGLE_PROXY_LOC (required)
- Format: "<ip address>:<port>"
See #30
* Refactor acquire_tor_conn -> acquire_tor_identity
Also updated travis CI to set up tor
* Add check for Tor socket on init, improve Tor error handling
Initializing the app sends a heartbeat request to Tor to check for
availability, and updates the home page config options accordingly. This
heartbeat is sent on every request, to ensure Tor support can be
reconfigured without restarting the entire app.
If Tor support is enabled, and a subsequent request fails, then a new
TorError exception is raised, and the Tor feature is disabled until a
valid connection is restored.
The max attempts has been updated to 10, since 5 seemed a bit too low
for how quickly the attempts go by.
* Change send_tor_signal arg type, update function doc
send_tor_signal now accepts a stem.Signal arg (a bit cleaner tbh). Also
added the doc string for the "disable" attribute in TorError.
* Fix tor identity logic in Request.send
* Update proxy init, change proxyloc var name
Proxy is now only initialized if both type and location are specified,
as neither have a default fallback and both are required. I suppose the
type could fall back to http, but seems safer this way.
Also refactored proxyurl -> proxyloc for the runtime args in order to
match the Dockerfile args.
* Add tor/proxy support for Docker builds, fix opensearch/init
The Dockerfile is now updated to include support for Tor configuration,
with a working torrc file included in the repo.
An issue with opensearch was fixed as well, which was uncovered during
testing and was simple enough to fix here. Likewise, DDG bang gen was
updated to only ever happen if the file didn't exist previously, as
testing with the file being regenerated every time was tedious.
* Add missing "@" for socks proxy requests
Initialization of the app now includes generation of a ddg-bang json
file, which is used for all bang style searches afterwards.
Also added search suggestion handling for bang json lookup. Queries
beginning with "!" now reference the bang json file to pull all keys
that match.
Updated test suite to include basic tests for bang functionality.
Updated gitignore to exclude bang subdir.
Full implementation of social media alt redirects (twitter/youtube/instagram -> nitter/invidious/bibliogram) depending on configuration.
Verbatim search and option to ignore search autocorrect are now supported as well.
Also cleaned up the javascript side of whoogle config so that it now
uses arrays of available fields for parsing config values instead of manually assigning each
one to a variable.
This doesn't include support for Google Maps -> Open Street Maps, that
seems a bit more involved than the social media redirects were, so it
should likely be a separate effort.
* Major refactor of requests and session management
- Switches from pycurl to requests library
- Allows for less janky decoding, especially with non-latin character
sets
- Adds session level management of user configs
- Allows for each session to set its own config (people are probably
going to complain about this, though not sure if it'll be the same
number of people who are upset that their friends/family have to share
their config)
- Updates key gen/regen to more aggressively swap out keys after each
request
* Added ability to save/load configs by name
- New PUT method for config allows changing config with specified name
- New methods in js controller to handle loading/saving of configs
* Result formatting and removal of unused elements
- Fixed question section formatting from results page (added appropriate
padding and made questions styled as italic)
- Removed user agent display from main config settings
* Minor change to button label
* Fixed issue with "de-pickling" of flask session
Having a gitignore-everything ("*") file within a flask session folder seems to cause a
weird bug where the state of the app becomes unusable from continuously
trying to prune files listed in the gitignore (and it can't prune '*').
* Switched to pickling saved configs
* Updated ad/sponsored content filter and conf naming
Configs are now named with a .conf extension to allow for easier manual
cleanup/modification of named config files
Sponsored content now removed by basic string matching of span content
* Version bump to 0.2.0
* Fixed request.send return style
The implementation of POST search support comes with a few benefits. The
most apparent is the avoidance of search queries appearing in web server
logs -- instead of the prior GET approach (i.e.
/search?q=my+search+query), using POST requests with the query stored in
the request body creates logs that simply appear as "/search".
Since a lot of relative links are generated in the results page, I came
up with a way to generate a unique key at run time that is used to
encrypt any query strings before sending to the user. This benefits both
regular text queries as well as fetching of image links and means that
web logs will only show an encrypted string where a link or query
string might slip through.
Unfortunately, GET search requests still need to be supported, as it
doesn't seem that Firefox (on iOS) supports loading search engines by
their opensearch.xml file, but instead relies on manual entry of a
search query string. Once this is updated, I'll probably remove GET
request search support.