Commit Graph

114 Commits

Author SHA1 Message Date
Ben Busby
b39ba0533a
Suppress spurious warnings from bs4
More MarkupResemblesLocatorWarning warnings have been appearing. This
seems to be caused by parsing HTML content that contains a URL.

This new change suppresses the warning at the root level of the app
before any content has been parsed, so this error shouldn't appear
again.

Fixes #968
2023-03-22 12:29:05 -06:00
Ben Busby
8c426ab180
Suppress invalid warning from bs4, add 404 handler
An invalid parsing warning was being thrown by the latest version of the
bs4 library. This suppresses that warning from being shown in the
console.

A 404 handler was added to move logging from the console to the error
template, since a lot of users assumed that 404 errors from the result
page were problems with Whoogle itself.

Fixes #967
2023-03-07 11:28:55 -07:00
João
baa8bd0eb4
Add auth to cookie (#964)
When authenticated, the cookie set will allow the user to stay connected even
if the browser is restarted.

Fixes #951
2023-03-01 09:58:59 -07:00
Ben Busby
6b56dab4c1
Remove ig->bibliogram redirects
Bibliogram has been discontinued, and the remaining instances aren't
very reliable. As a result, all instagram redirects have been removed.

Fixes #955
2023-02-21 09:42:42 -07:00
elliot
7ca69e752d
Add calculator widget (#956)
This adds a simple calculator widget, somewhat similar to the one presented
when searching calculator on Google.

Also, it adds somewhat of a template for making the addition of new widgets
easier via the app/utils/widgets.py file. My eventual plan is to use this to
create more widgets that appear in Google, such as a color picker, timer, etc.

---------

Co-authored-by: Ben Busby <contact@benbusby.com>
2023-02-21 09:36:38 -07:00
Ben Busby
991fe6d910
Exclude subdomain in Medium->Scribe redirects
Medium redirects needed further cleanup to account for instances where a
link contains a subdomain that would not make sense in a Farside
redirect link.

Fixes #947
2023-02-04 16:36:16 -07:00
Ben Busby
12ce174b9a
Include url prefix for reverse proxied instances
The url prefix was not included when reconstructing the root url using
X-Forwarded-* headers, causing some elements to fail to load properly.

Fixes #937
2023-01-30 12:13:46 -07:00
Ahmad Alkadri
e5a5aad997
Always bold CN/JA/KO search terms (#928)
Add a function to check if target_word contains CJK characters

If a search term contains Chinese, Japanese, or Korean characters,
the term is bolded in search results regardless of whitespace.

CJK characters: Chinese, Japanese (hiragana, katakana, kanji), 
and Korean (hangul syllables, hangul jamo)

Co-authored-by: Ben Busby <contact@benbusby.com>
2023-01-09 12:54:41 -07:00
Charles Zawacki
cec10e81d3
Don't prepend to services that have schemes with '//' (#925) 2023-01-04 10:10:32 -07:00
Charles Zawacki
a760476d1b
Omit 'mobile.' and 'm.' in site alt replacements (#922)
Resolves #921
2023-01-03 10:19:39 -07:00
Ben Busby
c24caceb03
Omit "www." in site alt replacements
Fixes #913
2022-12-29 16:16:29 -07:00
Ahmad Alkadri
3dda8b25ef
Escape html text in result body (#912)
Moved the cleaner functions to app/utils/escaper.py

Removed unused import 're'

Moved the cleaner functionalities to the "search.py" and "routes.py"

Making sure escaped chars stay escaped during process

Replaced "&lt;" and "&gt;" with "andlt;" and "andgt;", respectively. This way,
when the 'response' object get loaded to bsoup (which happens several times
throughout the process between search.py and routes.py), bsoup will not
unescape them.
2022-12-29 15:19:28 -07:00
Ben Busby
3dc6d14377
Only extract domain+ext when using site alts
Parent sites using a 'www' subdomain or something similar were not
redirecting properly. This updates the hostname check to only validate
against the primary domain, except for Wikipedia since the subdomain is
used for interface translation in that case.

Fixes #901
2022-12-08 10:54:21 -07:00
Ben Busby
fd85f1573a
Refactor site alt link replacement
Replacing result links and text when site alts are enabled is now part
of its own function, and handles replacement of link location and link
description separately.

Fixes #880
2022-12-05 13:28:29 -07:00
Ben Busby
0310f0f542
Use app init enc key by default for all queries
This can be updated later to allow users with cookies enabled to use a
key that is unique to their session (if they want, not mandatory), but
for now it makes more sense to just use a single key for all queries
from all users. This should eliminate a lot of issues that users have
reported where they are unable to decrypt queries or page elements due
to an expired/renewed session key.
2022-12-05 12:14:14 -07:00
Ben Busby
3bd785b9b7
Update sponsored result filter for german results
Adds 'gesponsert' to ad keyword blacklist

Fixes #892
2022-11-28 10:18:10 -07:00
Ben Busby
09a90ec46a
Match only "//medium" and ".medium.com" for scribe links
Closes #885
2022-11-22 17:34:25 -07:00
Xabi
6bd48e40a7
Include new ad filter keyword (#879)
Adds "sponsored" result keyword for Spanish language
2022-11-07 20:50:27 -07:00
Ben Busby
06fd29f663
Update ad filter keywords
New changes to google search now include ads prefixed with the keyword
"sponsored". This update should remove these from appearing in search
results.

Fixes #871
2022-10-31 13:02:20 -06:00
Ben Busby
6696f2b12b
Escape word in term-bolding regex
Fixes #869
2022-10-31 12:45:44 -06:00
Abir10101
75682de892
Fix regex for bolding search terms (#865)
Updated regex to not remove chinese letters in bolding regex
2022-10-26 10:23:39 -06:00
watchakorn-18k
4b2b0bf3c9
Include thai keyword in ads blacklist (#857) 2022-10-03 11:21:17 -06:00
João
219fc58401
Fix handling of bangs (#851)
Changed the implementation to work if the bang is at anyplace in the query.

Added a check to not spend time looking for an operator if a "!" is not present
in the query.

No longer allowed to have the bang at the "!" char at the end, since this may
cause some conflicts like the issue cited before, where the ! is after a word
in the query, which is natural in most languages.
2022-09-30 14:39:13 -06:00
João
74503d542e
Encode config params in URL (#842)
Adds support for encoding (and optionally encrypting) user config values as
a single string that can be passed to any endpoint with the "preferences" url
param.

Co-authored-by: Ben Busby <contact@benbusby.com>
2022-09-22 14:14:56 -06:00
Biên
11275a7796
Add filter for ads in Vietnamese (#847) 2022-09-20 11:11:58 -06:00
João
8f59b7c340
Allow different true values for config vars (#841)
* Fixes read_config_bool to allow several true params

* add upper case comment
2022-09-07 12:54:43 -06:00
Ben Busby
2eee0b87d5
Include full path when determining proxy host url
Session validation includes a method for determining the proxy host url,
but previously did not include the path for the initial request. This
caused a situation where users with a new session would not be able to
complete their first search, since the session validation follow-through
url did not include the actual path for their search query.

The method now includes a flag for only extracting the root url, which
is needed for creating full urls in the content filter.

Fixes #708
2022-08-02 10:57:59 -06:00
Ben Busby
bf92944b95
Support quora and imdb alts through Farside
Farside can now redirect quora links to querte instances and imdb links
to libremdb instances. This updates Whoogle to perform link replacements
for both services when site alts are configured.
2022-08-01 11:49:09 -06:00
Ben Busby
a1adf60b30
PEP-8 fix 2022-07-13 10:31:55 -06:00
Ben Busby
5db72a9552
Use scheme in alt replacement if defined
For users running local instances of service alternatives such as
invidious, the alt replacement procedure broke if the scheme of the
original service (almost always https) didn't match the scheme of their
defined local service (likely http).

This adds a small check to see if the alt has a defined scheme, and if
so, removes the original scheme for that result.

Fixes #806
2022-07-13 10:25:51 -06:00
Ben Busby
f688b88bd8
Preserve wikipedia language setting for wikiless redirects
Wikipedia -> Wikiless redirects always result in an english language
result, even if the Wikipedia result would've been in a non-english
language. This is due to Wikipedia using language specific subdomains
(i.e. de.wikipedia.org, en.wikipedia.org, etc) whereas Wikiless uses a
"lang" url param.

This has been fixed by inspecting the subdomain of the wikipedia link
and passing that value to Wikiless as the lang param if it's determined
to be a language specific value (currently just looking for a 2-char
subdomain).

See #805
2022-07-06 09:49:43 -06:00
Marcell Fülöp
ee2d3726af
Use X-Forwarded-Host as url_root when present (#799)
If Whoogle is accessed on a non-standard port _and_ proxied,
this port is lost to the application and `element['src']`s are
incorrectly formed (omitting port).

HTTP x-Forwarded-Host will contain this front port number in
a typical Nginx reverse proxy configuration.
2022-07-05 10:01:47 -06:00
Joao A. Candido Ramos
d05ec08abf
Remove wildcard imports (#791) 2022-06-24 10:51:15 -06:00
Ben Busby
b03fe74f10
Ensure currency link parent exists before parsing
Fixes #782
2022-06-16 10:28:06 -06:00
Ben Busby
ef98d85dc5
Ensure searches with a leading slash are treated as queries
A user reported a bug where searches with a leading slash (in this case:
"/e/OS apps" were interpreted as a Google specific link when clicking
the next page of results.

This was due to the behavior that Google's search results exhibit, where
internal links for pages like support.google.com are delivered with
params like "?q=/support" rather than a direct link. This fixes that
scenario by checking the "q" param value against the user's original
query to ensure they don't match before assuming that the result is
intended as a redirect.

Fixes #776
2022-06-03 14:03:57 -06:00
invis-z
b4d9f1f5e5
Remove "/" before endpoints & tags (#734)
Removes the leading slash before imgres and other endpoints

Fix #733
2022-04-27 14:25:14 -06:00
Ben Busby
62d7491936
Only create ip card if main result div is found
The ip address card that is created for searches like "my ip" only needs
to be created/inserted if a main result div id is found.

Fixes #735
2022-04-26 15:18:29 -06:00
Ben Busby
5c8be4428b
Fall back to netloc for bang search if query is empty
Previously, empty bang searches would redirect to the Whoogle instance
home page. This now redirects to the specific site for the bang search
instead (i.e. "!yt" without a query redirects to "youtube.com", "!gh" to
"github.com", etc)

Fixes #719
2022-04-20 14:50:32 -06:00
Ben Busby
7688c1a233
Revert anon-view key change from #724
The "anon-view" translation key is the correct one to use for accessing
anonymous view within the search results. "config-anon-view" is only for
the configuration menu on the home page.
2022-04-20 14:11:29 -06:00
gdm85
6d362ca5c7
Add support for relative search results (#715)
* Relativization of search results

* Fix JavaScript error when opening images

* Replace single-letter logo and remove sign-in link

* Add `WHOOGLE_URL_PREFIX` env var to support relative path redirection

The `WHOOGLE_URL_PREFIX` var can now be set to fix internal app
redirects, such as the `/session` redirect performed on the first visit
to the Whoogle home page.

Co-authored-by: Ben Busby <contact@benbusby.com>
2022-04-18 15:27:45 -06:00
glitsj16
ca80bb0caa
Fix 'anon-view' KeyError (#724) 2022-04-18 12:45:20 -06:00
Ben Busby
9317d9217f
Support proxying results through Whoogle (aka "anonymous view") (#682)
* Expand `/window` endpoint to behave like a proxy

The `/window` endpoint was previously used as a type of proxy, but only
for removing Javascript from the result page. This expands the existing
functionality to allow users to proxy search result pages (with or without
Javascript) through their Whoogle instance.

* Implement filtering of remote content from css

* Condense NoJS feature into Anonymous View

Enabling NoJS now removes Javascript from the Anonymous View, rather
than creating a separate option.

* Exclude 'data:' urls from filter, add translations

The 'data:' url must be allowed in results to view certain elements on
the page, such as stars for review based results.

Add translations for the remaining languages.

* Add cssutils to requirements
2022-04-13 11:29:07 -06:00
Ben Busby
797372ecaa
Ignore blank alts if site alt config is enabled
If the alt for a particular service is blank, the original source is
used instead.

Example:
1. Site alts enabled in config
2. User wants wikipedia links, not wikiless
3. WHOOGLE_ALT_WIKI set to ""
4. All available alt links redirected to farside, except wikipedia

Fixes #704
2022-03-30 14:46:33 -06:00
138138138
5ecd4fe931
Add "nofollow noopener noreferrer" to all links (#698)
Old iOS 12 devices will pass the Referer HTTP header to the site user clicks.
Websites will know those traffic come from Whoogle search.
Adding "nofollow noopener noreferrer" solves the issue.
2022-03-28 10:11:09 -06:00
Ben Busby
0048c2f9aa
Update remaining alternative frontends to use Farside
Wikipedia, imgur, and translate alternatives were all still using
hardcoded URLs when replaced with their respective alternative frontend.
This updates them to use farside instead.
2022-03-21 10:08:52 -06:00
Ben Busby
a58f70ca7e
Fix wikipedia->wikiless domain replacement
Was previously using wikipedia.com not wikipedia.org, causing wikiless
replacements to not occur.

Fixes #686
2022-03-21 10:01:21 -06:00
Ben Busby
69f845a047
Add test for empty bang behavior
Also fix pep8 issue
2022-03-01 12:13:40 -07:00
Ben Busby
809520ec70
Fallback to home page for empty bang searches
Bang searches without an actual query (i.e. just searching "!gh") will
now redirect to the home page. I guess people do this for some reason
and don't like that it redirects to the correct bang result URL, but
without an actual search term.

Fixes #595
2022-03-01 12:06:59 -07:00
Ben Busby
b28fa86e33
Update ad filter
Recent changes to ads in search results caused Whoogle to display ads
for certain searches. In particular, ads recently started appearing
grouped into one div, as opposed to a singular ad per div. This was
accompanied by the div label "ads" (instead of just "ad"), which threw
off the existing ad filter. The ad keyword blacklist has been updated
accordingly, and has been enhanced to only check against alpha chars for
each label.

This only seems to have affected English language searches, and only for
very specific searches.
2022-02-25 23:02:58 -07:00
Ben Busby
9984158ec1
Ensure valid str->float conv in currency calc
Currency amounts returned by google seem to randomly include unicode
chars ('\xa0' noted in #642) which broke the currency calculator
included in the project. This ensures that only strings that can be
converted to float are ever used in the conversion.

Fixes #642
2022-02-17 16:33:44 -07:00