Commit Graph

78 Commits (8a0b872337247e1fa5514b67768122a1121a159c)

Author SHA1 Message Date
Ben Busby 62d7491936
Only create ip card if main result div is found
The ip address card that is created for searches like "my ip" only needs
to be created/inserted if a main result div id is found.

Fixes #735
2 years ago
Ben Busby 5c8be4428b
Fall back to netloc for bang search if query is empty
Previously, empty bang searches would redirect to the Whoogle instance
home page. This now redirects to the specific site for the bang search
instead (i.e. "!yt" without a query redirects to "youtube.com", "!gh" to
"github.com", etc)

Fixes #719
2 years ago
Ben Busby 7688c1a233
Revert anon-view key change from #724
The "anon-view" translation key is the correct one to use for accessing
anonymous view within the search results. "config-anon-view" is only for
the configuration menu on the home page.
2 years ago
gdm85 6d362ca5c7
Add support for relative search results (#715)
* Relativization of search results

* Fix JavaScript error when opening images

* Replace single-letter logo and remove sign-in link

* Add `WHOOGLE_URL_PREFIX` env var to support relative path redirection

The `WHOOGLE_URL_PREFIX` var can now be set to fix internal app
redirects, such as the `/session` redirect performed on the first visit
to the Whoogle home page.

Co-authored-by: Ben Busby <contact@benbusby.com>
2 years ago
glitsj16 ca80bb0caa
Fix 'anon-view' KeyError (#724) 2 years ago
Ben Busby 9317d9217f
Support proxying results through Whoogle (aka "anonymous view") (#682)
* Expand `/window` endpoint to behave like a proxy

The `/window` endpoint was previously used as a type of proxy, but only
for removing Javascript from the result page. This expands the existing
functionality to allow users to proxy search result pages (with or without
Javascript) through their Whoogle instance.

* Implement filtering of remote content from css

* Condense NoJS feature into Anonymous View

Enabling NoJS now removes Javascript from the Anonymous View, rather
than creating a separate option.

* Exclude 'data:' urls from filter, add translations

The 'data:' url must be allowed in results to view certain elements on
the page, such as stars for review based results.

Add translations for the remaining languages.

* Add cssutils to requirements
2 years ago
Ben Busby 797372ecaa
Ignore blank alts if site alt config is enabled
If the alt for a particular service is blank, the original source is
used instead.

Example:
1. Site alts enabled in config
2. User wants wikipedia links, not wikiless
3. WHOOGLE_ALT_WIKI set to ""
4. All available alt links redirected to farside, except wikipedia

Fixes #704
2 years ago
138138138 5ecd4fe931
Add "nofollow noopener noreferrer" to all links (#698)
Old iOS 12 devices will pass the Referer HTTP header to the site user clicks.
Websites will know those traffic come from Whoogle search.
Adding "nofollow noopener noreferrer" solves the issue.
2 years ago
Ben Busby 0048c2f9aa
Update remaining alternative frontends to use Farside
Wikipedia, imgur, and translate alternatives were all still using
hardcoded URLs when replaced with their respective alternative frontend.
This updates them to use farside instead.
2 years ago
Ben Busby a58f70ca7e
Fix wikipedia->wikiless domain replacement
Was previously using wikipedia.com not wikipedia.org, causing wikiless
replacements to not occur.

Fixes #686
2 years ago
Ben Busby 69f845a047
Add test for empty bang behavior
Also fix pep8 issue
2 years ago
Ben Busby 809520ec70
Fallback to home page for empty bang searches
Bang searches without an actual query (i.e. just searching "!gh") will
now redirect to the home page. I guess people do this for some reason
and don't like that it redirects to the correct bang result URL, but
without an actual search term.

Fixes #595
2 years ago
Ben Busby b28fa86e33
Update ad filter
Recent changes to ads in search results caused Whoogle to display ads
for certain searches. In particular, ads recently started appearing
grouped into one div, as opposed to a singular ad per div. This was
accompanied by the div label "ads" (instead of just "ad"), which threw
off the existing ad filter. The ad keyword blacklist has been updated
accordingly, and has been enhanced to only check against alpha chars for
each label.

This only seems to have affected English language searches, and only for
very specific searches.
2 years ago
Ben Busby 9984158ec1
Ensure valid str->float conv in currency calc
Currency amounts returned by google seem to randomly include unicode
chars ('\xa0' noted in #642) which broke the currency calculator
included in the project. This ensures that only strings that can be
converted to float are ever used in the conversion.

Fixes #642
2 years ago
Ben Busby 23402e27e1
Check for updates using 24 hour time delta
Rather than only checking for an available update on app init, the check
for updates now performs the check once every 24 hours on the first
request sent after that period.

This also now catches the requests.exceptions.ConnectionError that is
thrown if the app is initialized without an active internet connection.

Fixes #649
2 years ago
Ben Busby d33e8241dc
Fix "my ip" search regression
Removes dependency on class names for creating the "my ip" info card in
the results list for searches pertaining to the user's public IP.

Adds test to prevent this from happening again.

Note to anyone reading this and looking to contribute: please avoid
using hardcoded class names at all costs. This approach of
creating/removing content just results in issues if/when Google decides
to introduce/remove class names from the result page.

Fixes #657
2 years ago
Joao A. Candido Ramos 11099f7b1d
Use consistent header for all result types (#535)
Introduces a header for switching between result types (i.e. "All", "News",
etc) that is consistent between the different result types. Previously, image
results had a tab header that was formatted in a drastically different manner,
which was jarring when switching from a different result page to the Images
page.

Created a G class enum to reference class names returned in search
results. As noted in the class doc, this should only be used/updated as
a last resort, as class names change frequently. For some instances,
such as replacing the tbm tab, it's a lot easier to just replace by
header name than attempting to replace it based on how the element is
structured.

Also updated a few styles to revert the latest styling changes being
applied by Google.

Co-authored-by: jacr13 <ramos.joao@protonmail.com>
Co-authored-by: Ben Busby <contact@benbusby.com>
2 years ago
Ben Busby 72e5a227c8
Move bangs init to bg thread
Initializing the DDG bangs when running whoogle for the first time
creates an indeterminate amount of delay before the app becomes usable,
which makes usability tests (particularly w/ Docker) unreliable. This
moves the bang json init to a background thread and writes a temporary
empty dict to the bangs json file until the full bangs json can be used.
2 years ago
DUO Labs 74cb48086c
Introduce site alts for imgur and wikipedia (#609)
* Add `WHOOGLE_ALT_IMG` for a replacement for imgur.

* Add `WHOOGLE_ALT_WIKI` for Wikipedia
2 years ago
Ben Busby 634d179568
Use farside.link for frontend alternatives in results (#560)
* Integrate Farside into Whoogle

When instances are ratelimited (when a captcha is returned instead of
the user's search results) the user can now hop to a new instance via
Farside, a new backend service that redirects users to working instances
of a particular frontend. In this case, it presents a user with a
Farside link to a new Whoogle (or Searx) instance instead, so that the
user can resume their search.

For the generated Farside->Whoogle link, the generated link includes the
user's current Whoogle configuration settings as URL params, to ensure a
more seamless transition between instances. This doesn't translate to
the Farside->Searx link, but potentially could with some changes.

* Expand conversion of config<->url params

Config settings can now be translated to and from URL params using a
predetermined set of "safe" keys (i.e. config settings that easily
translate to URL params).

* Allow jumping instances via Farside when ratelimited

When instances are ratelimited (when a captcha is returned instead of
the user's search results) the user can now hop to a new instance via
Farside, a new backend service that redirects users to working instances
of a particular frontend. In this case, it presents a user with a
Farside link to a new Whoogle (or Searx) instance instead, so that the
user can resume their search.

For the generated Farside->Whoogle link, the generated link includes the
user's current Whoogle configuration settings as URL params, to ensure a
more seamless transition between instances. This doesn't translate to
the Farside->Searx link, but potentially could with some changes.

Closes #554

Closes #559
3 years ago
Vansh Comar 7bea6349a0
Add tools for currency conversion in search results (#536)
This implements a method for converting between various currencies. When a user
searches "<currency A> to <currency B>" (including when prefixed by a specific
amount), they are now presented with a table for quickly converting between the
two. This makes use of the currency ratio returned as the first "card" in
currency related searches, and the table is inserted into this same card.
3 years ago
Ben Busby 10a15e06e1
Fix incorrect request type for image searches
Previously had hardcoded POST requests for all requests that didn't use
the header template (which currently is only the image tab).

Also refactored how the Filter class works. It now requires a valid
Config model to be provided, which is then set up as a class var that
the filtering functions can use as needed, rather than setting specific
values from the config as individual values (which was confusing and
sloppy).

Fixes #561
3 years ago
Ben Busby 6f5f3d8ca7
Fix incorrect redirect protocol used by Flask
Flask's `request.url` uses `http` as the protocol, which breaks
instances that enforce `https`, since the session redirect relies on
`request.url` for the follow-through URL.

This introduces a new method for determining the correct URL to use for
these redirects by automatically replacing the protocol with `https` if
the `HTTPS_ONLY` env var is set for that instance.

Fixes #538

Fixes #545
3 years ago
Ben Busby e06ff85579
Improve public instance session management (#480)
This introduces a new approach to handling user sessions, which should
allow for users to set more reliable config settings on public instances.

Previously, when a user with cookies disabled would update their config,
this would modify the app's default config file, which would in turn
cause new users to inherit these settings when visiting the app for the
first time and cause users to inherit these settings when their current
session cookie expired (which was after 30 days by default I believe).
There was also some half-baked logic for determining on the backend
whether or not a user had cookies disabled, which lead to some issues
with out of control session file creation by Flask.

Now, when a user visits the site, their initial request is forwarded to
a session/<session id> endpoint, and during that subsequent request
their current session id is matched against the one found in the url. If
the ids match, the user has cookies enabled. If not, their original
request is modified with a 'cookies_disabled' query param that tells
Flask not to bother trying to set up a new session for that user, and
instead just use the app's fallback Fernet key for encryption and the
default config.

Since attempting to create a session for a user with cookies disabled
creates a new session file, there is now also a clean-up routine included
in the new session decorator, which will remove all sessions that don't
include a valid key in the dict. NOTE!!! This means that current user
sessions on public instances will be cleared once this update is merged
in. In the long run that's a good thing though, since this will allow session
mgmt to be a lot more reliable overall for users regardless of their cookie
preference.

Individual user sessions still use a unique Fernet key for encrypting queries,
but users with cookies disabled will use the default app key for encryption
and decryption.

Sessions are also now (semi)permanent and have a lifetime of 1 year.
3 years ago
Ben Busby e93507f148
Catch connection error during Tor validation step
Validation of the Tor connection occasionally fails with a
ConnectionError from requests, which was previously uncaught. This is
now handled appropriately (error message shown and connection dropped).

Fixes #532
3 years ago
Ben Busby c766554eea
Bang refactor PEP-8 fix
Addresses PEP-8 formatting issue in previous commit
3 years ago
Ben Busby ddf951de35
Use `replace` in bang query formatting
Using `format` for formatting bang queries caused a KeyError for some
searches, such as !hd (HUDOC). In that example, the URL returned in the
bangs json was `http://...#{%22fulltext%22:[%22{}%22]...`, where
standard formatting would not work due to the misidentification of
"fulltext" as a formatting key.

The logic has been updated to just replace the first occurence of "{}"
in the URL returned by the bangs dict.

Fixes #513
3 years ago
gripped d1c9b7f803
Remove styling from NoJS liks (#511)
Fixes #510
3 years ago
Ben Busby 7fe066b4ea
Escape result html after bolding search terms
Fixes #518
3 years ago
gripped c2ced23073
Improve formatting with NoJS enabled (#509)
Removes line breaks, divider, and link location from all NoJS
links in results when NoJS mode is enabled
3 years ago
Ben Busby 0a78c524fa
Expand 'my ip' to work for proxied requests
Adds a check for the HTTP_X_FORWARDED_FOR header, and uses the value
from the request if found.
3 years ago
DUO Labs 5189cdb072
Update "skip bolding" regex to fix some edge cases (#500)
Should address errors caused by the "bold query" feature replacing
tags and style elements, resulting in unformatted response pages.
3 years ago
Vansh Comar f04c7c5557
Support DDG style bangs with bang at the end (#503)
DDG style bang searches can now have the bang (!) at the end of
the search (i.e. "bologna w!" will now redirect to wikipedia just like
"bologna !w" would)
3 years ago
DUO Labs d8dcdc7455
Skip bolding search terms that are not alphanumeric (#496)
Fixes #494
3 years ago
Ben Busby 591ed4a6d6
Use f-string in bold query regex
by @DUOLabs333
3 years ago
Ben Busby f154b5f2e2
PEP-8 formatting fix 3 years ago
Ben Busby 6decab5a51
Improve regex for bolding search terms
Co-authored by @DUOLabs333
3 years ago
DUO Labs 2c9cf3ecc6
Bold search query in results (#487)
This modifies the search result page by bold-ing all appearances
of any word in the original query. If portions of the query are in
quotes (i.e. "ice cream"), only exact matches of the sequence of
words will be made bold.

Co-authored-by: Ben Busby <noreply+git@benbusby.com>
3 years ago
Ben Busby 8f70236403
Update domains used for scribe.rip replacements
The levelup.gitconnected.com site is a Medium site that can also be
replaced with scribe.rip whenever privacy respecting site alternatives
are enabled in the config.

Also modified how link descriptions are updated when that config is
enabled (before it was missing replacements on quite a few
descriptions).
3 years ago
Vansh Comar 771bf34ce9
Show client IP for "my ip" searches (#469)
This introduces a new UI element for displaying the client IP
address when a search for "my ip" is used.

Note that this does not show the IP address seen by Google
if Whoogle is deployed remotely. It uses `request.remote_addr`
to display the client IP address in the UI, not the actual address
of the server (which is what Google sees in requests sent from
remote Whoogle instances).
3 years ago
Vansh Comar 79fb7531be
Implement scribe.rip replacement for medium.com results (#463)
scribe.rip is a privacy respecting front end for medium.com. This
feature allows medium.com results to be replaced with scribe.rip links,
and works for both regular medium.com domains as well as user specific
subdomains (i.e. user.medium.com).

[scribe.rip website](https://scribe.rip)
[scribe.rip source code](https://git.sr.ht/~edwardloveall/scribe)

Co-authored-by: Ben Busby <noreply+git@benbusby.com>
3 years ago
Ben Busby ff885e4fde
Disable autocomplete via WHOOGLE_AUTOCOMPLETE var
Setting WHOOGLE_AUTOCOMPLETE to 0 now disables the autocomplete/search
suggestion feature.

Closes #462
3 years ago
rn83 f18400b1f1
Strip SKIP_PREFIX for SITE_ALTS only (#452)
Domain prefixes (www, mobile, m) are now striped for site alternatives only.
3 years ago
BlissOWL f12b0e62c5
Make bang searches case insensitive (#438)
Bang searches now ignore the capitalization of the operator

Co-authored-by: Ben Busby <noreply+git@benbusby.com>
3 years ago
Ben Busby 68fdd55482
Use cache busting for css/js files
On app init, short hashes are generated from file checksums to use for
cache busting. These hashes are added into the full file name and used
to symlink to the actual file contents. These symlinks are loaded in the
jinja templates for each page, and can tell the browser to load a new
file if the hash changes.

This is only in place for css and js files, but can be extended in the
future for other file types if needed.
3 years ago
Ben Busby 43faaee77f
Hotfix: remove site filter for maps links
The new site filter breaks links to Maps results, so filter.py needed
to be updated to handle these links as a unique case. A new method was
introduced to easily remove any "-site:..." filters from the query,
which is now also used to format queries in the header template rather
than manually removing the blocked site list within the template itself.

Bumps version to 0.5.1 for releasing the bugfix

Fixes #329
3 years ago
Joao A. Candido Ramos 448efb8f2a
Add "view image" functionality (#268)
* add view image option

* prevent whoogle links from opening in a new tab.

* remove view image template on mobile requests

* change loop values to be more robust to the number of images

* Update app/templates/imageresults.html

* fix "Basically the .cvifge class needs width: 100%; in order to expand the search input to fit the form width."

* Update app/templates/imageresults.html

* remove hardcoded string from template

* Add view image config var to app.json

* Add view image config var to whoogle.env

Co-authored-by: jacr13 <ramos.joao@protonmail.com>
Co-authored-by: Ben Busby <benbusby@protonmail.com>
3 years ago
Ben Busby c8da53d4b0
Block websites from search results via user config (#304)
* Block websites in search results via user config

Adds a new config field "Block" to specify a comma separated list of
websites to block in search results. This is applied for all searches.

* Add test for blocking sites from search results

* Document WHOOGLE_CONFIG_BLOCK usage

* Strip '-site:' filters from query in header template

The 'behind the scenes' site filter applied for blocked sites was
appearing in the query field when navigating between search categories
(all -> images -> news, etc). This prevents the filter from appearing in
all except "images", since the image category uses a separate header.
This should eventually be addressed when the image page can begin using
the standard whoogle header, but until then, the filter will still
appear for image searches.
3 years ago
Ben Busby 50c888f9a7 Revert heroku app https upgrade fix 3 years ago
Ben Busby df0b7afa50 Switch to single Fernet key per session
This moves away from the previous (messy) approach of using two separate
keys for decrypting text and element URLs separately and regenerating
them for new searches. The current implementation of sessions is not very
reliable, which lead to keys being regenerated too soon, which would
break page navigation. Until that can be addressed, the single
key per session approach should work a lot better.

Fixes #250

Fixes #90
3 years ago