A recent issue brought up a good point about how the latest changes to
setting default language to english break functionality for bilingual
users. The change was likely not the best solution for users who were
being affected by IP geolocation on their instances -- the right
solution for that would be to configure the interface/search language to
their preference instead.
The requests library requires both 'http' and 'https' values in any
included proxy dict, and whoogle was previously copying the http proxy
to https for simplicity. The assumption was that if the underlying
request wasn't able to connect via https, it would default to http
(otherwise why have the requirement to specify both?)
This led to connectivity issues for users with http only proxies as of
the latest urllib and requests package versions, which are a lot more
strict with connections over https. With the latest versions, if an
https connection cannot be made, the library returns an error.
As a result, the new proxy dict must look something like this for plain
http proxies:
{'http': 'http://domain.tld:port', 'https': 'http://domain.tld:port'}
where both http and https are identical, but both are still required.
Since the interface language defaults to IP geolocation by google, the
default language is now set to english. Still not sure if this is the
best solution, but at least temporarily should clear up some confusion
for users with instances deployed in countries outside of their own.
Also performed some minor cleanup:
- Updated name of strip_blocked_sites to clean_query
- Added clean_query to list of jinja template functions
- Ensured site block list doesn't contain duplicate filters
The new site filter breaks links to Maps results, so filter.py needed
to be updated to handle these links as a unique case. A new method was
introduced to easily remove any "-site:..." filters from the query,
which is now also used to format queries in the header template rather
than manually removing the blocked site list within the template itself.
Bumps version to 0.5.1 for releasing the bugfix
Fixes#329
* add view image option
* prevent whoogle links from opening in a new tab.
* remove view image template on mobile requests
* change loop values to be more robust to the number of images
* Update app/templates/imageresults.html
* fix "Basically the .cvifge class needs width: 100%; in order to expand the search input to fit the form width."
* Update app/templates/imageresults.html
* remove hardcoded string from template
* Add view image config var to app.json
* Add view image config var to whoogle.env
Co-authored-by: jacr13 <ramos.joao@protonmail.com>
Co-authored-by: Ben Busby <benbusby@protonmail.com>
* Block websites in search results via user config
Adds a new config field "Block" to specify a comma separated list of
websites to block in search results. This is applied for all searches.
* Add test for blocking sites from search results
* Document WHOOGLE_CONFIG_BLOCK usage
* Strip '-site:' filters from query in header template
The 'behind the scenes' site filter applied for blocked sites was
appearing in the query field when navigating between search categories
(all -> images -> news, etc). This prevents the filter from appearing in
all except "images", since the image category uses a separate header.
This should eventually be addressed when the image page can begin using
the standard whoogle header, but until then, the filter will still
appear for image searches.
Randomizing the "Mozilla" portion of the user agent changed the
character encoding to GB2312. Setting it to plain "Mozilla" enforces
UTF-8 encoding.
Bump to version 0.4.1 for release of bug fix
Fixes#267
This moves away from the previous (messy) approach of using two separate
keys for decrypting text and element URLs separately and regenerating
them for new searches. The current implementation of sessions is not very
reliable, which lead to keys being regenerated too soon, which would
break page navigation. Until that can be addressed, the single
key per session approach should work a lot better.
Fixes#250Fixes#90
Eventually this should be part of a separate mypy ci build, but right
now it's just a general guideline. Future commits and PRs should be
validated for static typing wherever possible.
For reference, the testing commands used for this commit were:
mypy --ignore-missing-imports --pretty --disallow-untyped-calls app/
mypy --ignore-missing-imports --pretty --disallow-untyped-calls test/
The app/utils/*_utils weren't named very well, and all have been updated
to have more accurate names.
Function and class documention for the utils have been updated as well,
as part of the effort to improve overall documentation for the project.
The lxml dependency in the project was fairly unnecessary, and made the
initial build time for the project considerably slower. This replaces
all instances of lxml with either the default html parser (for bs4
constructors) or the built in xml.etree package (for search suggestion
parsing).
* Add tor and http/socks proxy support
Allows users to enable/disable tor from the config menu, which will
forward all requests through Tor.
Also adds support for setting environment variables for alternative
proxy support. Setting the following variables will forward requests
through the proxy:
- WHOOGLE_PROXY_USER (optional)
- WHOOGLE_PROXY_PASS (optional)
- WHOOGLE_PROXY_TYPE (required)
- Can be "http", "socks4", or "socks5"
- WHOOGLE_PROXY_LOC (required)
- Format: "<ip address>:<port>"
See #30
* Refactor acquire_tor_conn -> acquire_tor_identity
Also updated travis CI to set up tor
* Add check for Tor socket on init, improve Tor error handling
Initializing the app sends a heartbeat request to Tor to check for
availability, and updates the home page config options accordingly. This
heartbeat is sent on every request, to ensure Tor support can be
reconfigured without restarting the entire app.
If Tor support is enabled, and a subsequent request fails, then a new
TorError exception is raised, and the Tor feature is disabled until a
valid connection is restored.
The max attempts has been updated to 10, since 5 seemed a bit too low
for how quickly the attempts go by.
* Change send_tor_signal arg type, update function doc
send_tor_signal now accepts a stem.Signal arg (a bit cleaner tbh). Also
added the doc string for the "disable" attribute in TorError.
* Fix tor identity logic in Request.send
* Update proxy init, change proxyloc var name
Proxy is now only initialized if both type and location are specified,
as neither have a default fallback and both are required. I suppose the
type could fall back to http, but seems safer this way.
Also refactored proxyurl -> proxyloc for the runtime args in order to
match the Dockerfile args.
* Add tor/proxy support for Docker builds, fix opensearch/init
The Dockerfile is now updated to include support for Tor configuration,
with a working torrc file included in the repo.
An issue with opensearch was fixed as well, which was uncovered during
testing and was simple enough to fix here. Likewise, DDG bang gen was
updated to only ever happen if the file didn't exist previously, as
testing with the file being regenerated every time was tedious.
* Add missing "@" for socks proxy requests
Full implementation of social media alt redirects (twitter/youtube/instagram -> nitter/invidious/bibliogram) depending on configuration.
Verbatim search and option to ignore search autocorrect are now supported as well.
Also cleaned up the javascript side of whoogle config so that it now
uses arrays of available fields for parsing config values instead of manually assigning each
one to a variable.
This doesn't include support for Google Maps -> Open Street Maps, that
seems a bit more involved than the social media redirects were, so it
should likely be a separate effort.
Adding support to choose separately the language of search and the one for the interface (allowing a default givent by google).
Co-authored-by: Joao <ramos.joao@protonmail.com>
Results page now includes method for switching to "All Languages" from
whichever language is specified as the primary in the config (see #74).
Also removes the non-Whoogle links from the page footer, leaving only
the page navigation controls
Added support for the date range filter on the results page, though I'd
still recommend using the ":past <unit>" query instead.
* Major refactor of requests and session management
- Switches from pycurl to requests library
- Allows for less janky decoding, especially with non-latin character
sets
- Adds session level management of user configs
- Allows for each session to set its own config (people are probably
going to complain about this, though not sure if it'll be the same
number of people who are upset that their friends/family have to share
their config)
- Updates key gen/regen to more aggressively swap out keys after each
request
* Added ability to save/load configs by name
- New PUT method for config allows changing config with specified name
- New methods in js controller to handle loading/saving of configs
* Result formatting and removal of unused elements
- Fixed question section formatting from results page (added appropriate
padding and made questions styled as italic)
- Removed user agent display from main config settings
* Minor change to button label
* Fixed issue with "de-pickling" of flask session
Having a gitignore-everything ("*") file within a flask session folder seems to cause a
weird bug where the state of the app becomes unusable from continuously
trying to prune files listed in the gitignore (and it can't prune '*').
* Switched to pickling saved configs
* Updated ad/sponsored content filter and conf naming
Configs are now named with a .conf extension to allow for easier manual
cleanup/modification of named config files
Sponsored content now removed by basic string matching of span content
* Version bump to 0.2.0
* Fixed request.send return style
Basic autocomplete/search suggestion functionality added
* Adds new GET and POST routes for '/autocomplete' that accept a string query and returns an array of suggestions
* Adds new autoscript.js file for handling queries on the main page and results view
* Updated requests class to include autocomplete method
* Updated opensearch template to handle search suggestions
* Added header template to allow for autocomplete on results view
* Updated readme to mention autocomplete feature
* Added country and safe search config options
* Updated handling of parser error in results test
* Improved handling of default country
* Added 1px empty gif fallback as a replacement for images that fail to load
* Added language configuration support
Main page now has a dropdown for selecting preferred language of
results.
Refactored config to be its own model with language constants.
* Added more language support
Interface language is now updated using the "hl" arg
Fixed chinese traditional and simplified values
Updated decoding of characters to gb2312
* Updated to use conditional decoding dependent on language
* Updated filter to not rely on valid config to work properly
Pushing straight to master since this is an extremely simple fix, with
a pretty large performance benefit.
The Phyme library used for generating a User Agent rhyme was consuming
an absolute unit of memory. Now that it's removed, it's using about 10x
less memory, at the cost of User Agents being not as funny anymore.
Switched encoding from utf-8 to unicode-escape in an effort to support multiple
languages besides English.
Updated image results page formatting to fix bad image links (added TODO
for adding full res image link for each image result).
Updated README to include libcurl and libssl install instructions for
manual setup.
The implementation of POST search support comes with a few benefits. The
most apparent is the avoidance of search queries appearing in web server
logs -- instead of the prior GET approach (i.e.
/search?q=my+search+query), using POST requests with the query stored in
the request body creates logs that simply appear as "/search".
Since a lot of relative links are generated in the results page, I came
up with a way to generate a unique key at run time that is used to
encrypt any query strings before sending to the user. This benefits both
regular text queries as well as fetching of image links and means that
web logs will only show an encrypted string where a link or query
string might slip through.
Unfortunately, GET search requests still need to be supported, as it
doesn't seem that Firefox (on iOS) supports loading search engines by
their opensearch.xml file, but instead relies on manual entry of a
search query string. Once this is updated, I'll probably remove GET
request search support.
Images were previously directly fetched from google search results,
which was a potential privacy hazard. All image sources are now modified
to be passed through shoogle's routing first, which will then fetch raw
image data and pass it through to the user.
Filter class was refactored to split the primary clean method into
smaller, more manageable submethods.
Curl requests and user agent related functionality was moved to its own
request class.
Routes was refactored to only include strictly routing related
functionality.
Filter class was cleaned up (had routing/request related logic in here,
which didn't make sense)