Commit Graph

46 Commits

Author SHA1 Message Date
Ben Busby
a0110fda8d
Switch to single Fernet key per session
This moves away from the previous (messy) approach of using two separate
keys for decrypting text and element URLs separately and regenerating
them for new searches. The current implementation of sessions is not very
reliable, which lead to keys being regenerated too soon, which would
break page navigation. Until that can be addressed, the single
key per session approach should work a lot better.

Fixes #250

Fixes #90
2021-04-01 00:23:30 -04:00
Ben Busby
f4a087303d
Improve static typing throughout repo
Eventually this should be part of a separate mypy ci build, but right
now it's just a general guideline. Future commits and PRs should be
validated for static typing wherever possible.

For reference, the testing commands used for this commit were:

mypy --ignore-missing-imports --pretty --disallow-untyped-calls app/
mypy --ignore-missing-imports --pretty --disallow-untyped-calls test/
2021-03-24 15:13:52 -04:00
Ben Busby
d447e5009f
Improve naming of *_utils files, update fn/class doc
The app/utils/*_utils weren't named very well, and all have been updated
to have more accurate names.

Function and class documention for the utils have been updated as well,
as part of the effort to improve overall documentation for the project.
2021-03-08 12:22:04 -05:00
Ben Busby
e066a19411
Ensure G logo doesn't appear in mobile img results
Adds a separate check to remove all images sourced from www.gstatic.com,
which is where the mobile logo in particular is coming from.
2021-02-20 15:04:32 -05:00
Ben Busby
440c4e9c50
Remove lxml dependency
The lxml dependency in the project was fairly unnecessary, and made the
initial build time for the project considerably slower. This replaces
all instances of lxml with either the default html parser (for bs4
constructors) or the built in xml.etree package (for search suggestion
parsing).
2020-12-29 18:43:42 -05:00
Ben Busby
6e7ec9918a
Move language/country settings to app config
Moves the language and country dicts from the config model to json files
that are loaded during app init and stored in the app config dict. This
substantially improves the readability of the config model and allows
for much more sensible loading of the language/country options.
2020-12-17 16:42:05 -05:00
Ben Busby
375f4ee9fd
PEP-8: Fix formatting issues, add CI workflow (#161)
Enforces PEP-8 formatting for all python code

Adds a github action build for checking pep8 formatting using pycodestyle
2020-12-17 16:06:47 -05:00
Ben Busby
b695179c79
Add ability to collapse "people also ask"
This adds a step in the filter process to wrap the "people also ask"
section in a <details> element, which automatically collapses the
contents of the section. Clicking/tapping the details element expands
the view as normal.

See #113
2020-12-15 11:09:48 -05:00
Ben Busby
e6db3112f7
Fix pagination bug for pages > 3
The pagination footer on the results page after page 2 has three actions
(beginning, next, previous). The footer filter was updated to remove
items with more than three actions to fix this.

See #131
2020-12-07 20:38:57 -05:00
Ben Busby
72cbc342af Add ability to set temp config in search query
Dark mode, country, interface language, and search language configs
can now be set in the search query by appending each option as a
url parameter.

Supported args are: 'dark', 'lang_search', 'lang_interface', and 'ctry'

Ex: /search?q=%s&dark=1&lang_search=lang_en...

These config settings persist across page navigation and switching
result type, but will be reset if the main search bar is used.

See #144
2020-11-11 00:40:49 -05:00
bugbounce
1148a7fb8d
Use relative links instead of absolute (#139)
* Use relative links instead of absolute

This allows for hosting under a subpath. For example if you want to host
whoogle at example.com/whoogle, it should work better with a reverse proxy.

* Use relative link for opensearch.xml
2020-10-29 11:09:31 -04:00
Ben Busby
f3bb1e22b4 Fix improper header styling, remove shopping tab links
The header template was using Google's classes for the "Whoogle" logo,
which meant keeping up with their list of colors used in the logo. The
template was updated to only ever use the Whoogle logo color.
Accordingly, the logo specific styling in filter.py was removed, since
it is no longer needed.

Also removes all links to the shopping tab, as it seems that the
majority of the links to items are Google specific links (usually
google.com/aclk links without any discernible param for determining the
true location for the link). The shopping page should be addressed
separately with unique filtering/formatting. Further tracking of this
task will be followed in #136.
2020-10-25 13:52:30 -04:00
Ben Busby
9afe5f81bd
Updated dark theme (#121)
* Implemented new dark theme

Now uses a dedicated css file for all dark theme color changes, rather
than replacing color codes directly.

Color theme is from discussion in #60.

* Minor link color update
2020-09-14 15:29:58 -04:00
Ben Busby
975ece8cd0
Privacy respecting alternatives in results view (#106)
Full implementation of social media alt redirects (twitter/youtube/instagram -> nitter/invidious/bibliogram) depending on configuration.

Verbatim search and option to ignore search autocorrect are now supported as well.

Also cleaned up the javascript side of whoogle config so that it now
uses arrays of available fields for parsing config values instead of manually assigning each
one to a variable.

This doesn't include support for Google Maps -> Open Street Maps, that
seems a bit more involved than the social media redirects were, so it
should likely be a separate effort.
2020-07-26 11:53:59 -06:00
Ben Busby
f7380ae15d Improving ad filtering for non-English languages 2020-06-11 13:21:40 -06:00
Ben Busby
4324fcd8f8 Added better multilingual support, updated filter
Results page now includes method for switching to "All Languages" from
whichever language is specified as the primary in the config (see #74).

Also removes the non-Whoogle links from the page footer, leaving only
the page navigation controls

Added support for the date range filter on the results page, though I'd
still recommend using the ":past <unit>" query instead.
2020-06-07 14:06:49 -06:00
Ben Busby
b6fb4723f9
Project refactor (#85)
* Major refactor of requests and session management

- Switches from pycurl to requests library
  - Allows for less janky decoding, especially with non-latin character
  sets
- Adds session level management of user configs
  - Allows for each session to set its own config (people are probably
  going to complain about this, though not sure if it'll be the same
  number of people who are upset that their friends/family have to share
  their config)
- Updates key gen/regen to more aggressively swap out keys after each
request

* Added ability to save/load configs by name

- New PUT method for config allows changing config with specified name
- New methods in js controller to handle loading/saving of configs

* Result formatting and removal of unused elements

- Fixed question section formatting from results page (added appropriate
padding and made questions styled as italic)
- Removed user agent display from main config settings

* Minor change to button label

* Fixed issue with "de-pickling" of flask session

Having a gitignore-everything ("*") file within a flask session folder seems to cause a
weird bug where the state of the app becomes unusable from continuously
trying to prune files listed in the gitignore (and it can't prune '*').

* Switched to pickling saved configs

* Updated ad/sponsored content filter and conf naming

Configs are now named with a .conf extension to allow for easier manual
cleanup/modification of named config files

Sponsored content now removed by basic string matching of span content

* Version bump to 0.2.0

* Fixed request.send return style
2020-06-02 12:54:47 -06:00
Ben Busby
71ba00785f Quick improvement to ad removal 2020-05-29 13:21:53 -06:00
Ben Busby
78939e7fb4 Reworked google url routing 2020-05-26 10:47:40 -06:00
Ben Busby
98d639883c Fixing styling/url/safe mode inconsistencies 2020-05-26 10:39:19 -06:00
Ben Busby
21012f5265
Feature: autocomplete/search suggestions (#72)
Basic autocomplete/search suggestion functionality added

* Adds new GET and POST routes for '/autocomplete' that accept a string query and returns an array of suggestions

* Adds new autoscript.js file for handling queries on the main page and results view

* Updated requests class to include autocomplete method

* Updated opensearch template to handle search suggestions

* Added header template to allow for autocomplete on results view

* Updated readme to mention autocomplete feature
2020-05-24 14:03:11 -06:00
Ben Busby
3dbe51e9e7 Removing google's filter card from results 2020-05-24 12:53:21 -06:00
Ben Busby
c51f186419 Added version footer, minor PEP 8 refactoring 2020-05-20 11:02:30 -06:00
Paul Rothrock
0e39b8f97b
Added "I'm feeling lucky" function (#46)
* Putting '! ' at the beginning of the query now redirects to the first search result

Signed-off-by: Paul Rothrock <paul@movetoiceland.com>

* Moved get_first_url outside of filter class

Signed-off-by: Paul Rothrock <paul@movetoiceland.com>
2020-05-18 10:28:23 -06:00
Ben Busby
3123789584
Added config option for opening links in new tab (#49) 2020-05-15 16:10:31 -06:00
Ben Busby
afd5b9aa83 Minor fix to dark mode on img results 2020-05-15 14:17:16 -06:00
Ben Busby
a11ceb0a57
Feature: language config (#27)
* Added language configuration support

Main page now has a dropdown for selecting preferred language of
results.

Refactored config to be its own model with language constants.

* Added more language support

Interface language is now updated using the "hl" arg

Fixed chinese traditional and simplified values

Updated decoding of characters to gb2312

* Updated to use conditional decoding dependent on language

* Updated filter to not rely on valid config to work properly
2020-05-12 17:15:53 -06:00
Ben Busby
708769f682 Minor styling refactor, updated app name 2020-05-04 18:00:43 -06:00
Ben Busby
0300eab6df Updated formatting and setup instructions
Switched encoding from utf-8 to unicode-escape in an effort to support multiple
languages besides English.

Updated image results page formatting to fix bad image links (added TODO
for adding full res image link for each image result).

Updated README to include libcurl and libssl install instructions for
manual setup.
2020-05-03 19:32:47 -06:00
Ben Busby
39c475af21 Using urlencode "doseq" option for url args 2020-04-29 20:31:03 -06:00
Ben Busby
c30f21f950 Minor conditional fix in filter 2020-04-29 14:46:00 -06:00
Ben Busby
b83f14be26 Fixed image href filter
Needed to be checking against img attrs, not just the img object itself
2020-04-29 11:18:07 -06:00
Ben Busby
dcd93d4869 Fixed filter params, updated search button text 2020-04-29 10:03:34 -06:00
Ben Busby
5fe308956b Cleaned up filter class, updated js config tool 2020-04-29 09:46:18 -06:00
Ben Busby
1cbe394e6f Updated tests, fixed a few bugs
Added opensearch routes test and individual tests for searching via GET
and POST separately.

Fixed incorrect assignment in gen_query.
2020-04-28 18:59:33 -06:00
Ben Busby
0c0ebb8917 Added POST search, encrypted query strings, refactoring
The implementation of POST search support comes with a few benefits. The
most apparent is the avoidance of search queries appearing in web server
logs -- instead of the prior GET approach (i.e.
/search?q=my+search+query), using POST requests with the query stored in
the request body creates logs that simply appear as "/search".

Since a lot of relative links are generated in the results page, I came
up with a way to generate a unique key at run time that is used to
encrypt any query strings before sending to the user. This benefits both
regular text queries as well as fetching of image links and means that
web logs will only show an encrypted string where a link or query
string might slip through.

Unfortunately, GET search requests still need to be supported, as it
doesn't seem that Firefox (on iOS) supports loading search engines by
their opensearch.xml file, but instead relies on manual entry of a
search query string. Once this is updated, I'll probably remove GET
request search support.
2020-04-28 18:19:34 -06:00
Ben Busby
4180aedd87 Added image proxying, refactored filter class
Images were previously directly fetched from google search results,
which was a potential privacy hazard. All image sources are now modified
to be passed through shoogle's routing first, which will then fetch raw
image data and pass it through to the user.

Filter class was refactored to split the primary clean method into
smaller, more manageable submethods.
2020-04-27 20:21:36 -06:00
Ben Busby
b0e6167733 Improved bad url arg filtering 2020-04-26 18:48:40 -06:00
Ben Busby
3bc58b64be Small update to filter class
The image results page seems to have different formatting from non-image
results pages. Should probably revisit this at some point and try to
style the image results page to be more in line with other result types.
2020-04-25 11:32:43 -06:00
Ben Busby
1f6bfa092e Complete refactoring of opensearch
Refactored opensearch.xml to only exist as a template that is
served by a flask route, which is then populated with the
necessary url root.
2020-04-24 18:45:57 -06:00
Ben Busby
a7005c012e Refactoring of user requests and routing
Curl requests and user agent related functionality was moved to its own
request class.

Routes was refactored to only include strictly routing related
functionality.

Filter class was cleaned up (had routing/request related logic in here,
which didn't make sense)
2020-04-23 20:59:43 -06:00
Ben Busby
6a150092a2 Fixed config bug in filter, updated run script to work on mac os 2020-04-16 18:50:31 -06:00
Ben Busby
e72ccc4988 Small change to mobile styling 2020-04-16 10:10:18 -06:00
Ben Busby
024552f2df Minor refactor of filter class, updated tests, fixed html/css, added ua to config 2020-04-16 10:01:02 -06:00
Ben Busby
b5b6e64177 Added testing and ci build, refactored filter class, refactored project structure 2020-04-15 17:41:53 -06:00
Ben Busby
850a46aea1 Refactored routes, added filter class for returned results, added dockerignore 2020-04-10 14:52:27 -06:00