This improves the search result icon feature by "hiding" the site's icon
if one was not found. This happens in scenarios where a site doesn't
have a /favicon.ico due to having a unique path or using javascript to
load the icon.
This appends an icon element to each search result, using the result
domain's "/favicon.ico" path.
Note that some sites do not have a standard /favicon.ico, but have a
unique path to a specifically sized favicon instead. Worse still, some
sites use javascript to load their favicon, which would make it even
more difficult for Whoogle to figure out.
For now this approach is fine, but can be expanded upon in the future
if desired.
Domains were previously not validated before being handled, leading to a
potential scenario where someone could pass something like
"element_url=127.0.0.1:<port>/<resource>" to access other resources on a
machine running Whoogle. This change ensures that the resource used in
both endpoints is a valid domain.
This also includes validation of config names to prevent names from
including path values such as "../../(etc)".
When starting whoogle from another directory, the path to the calculator
widget was previously invalid. It now specifies the path relative to the widget
loader file.
The calculator was previously triggered for partial matches with words
like "calc", which meant searches containing the word "calcium" would
cause the calculator widget to appear.
Redirects to alternative frontends can now be defined using the
WHOOGLE_REDIRECTS environment variable. Usage is documented in the
readme, but is basically defined as <parent>:<new>.
Closes#988
This relates to an issue with an unknown cause (unable to reproduce on
my end) where the preferences string does not contain the correct amount
of padding on a base64 encoded value. This is mediated by appending
padding to the end of the encoded value, since any extra padding is
removed anyways.
Fixes#987
Defines separate environment variables for setting mobile vs desktop user
agents
Defines an environment variable for using the client's User-Agent
Co-authored-by: Ben Busby <contact@benbusby.com>
More MarkupResemblesLocatorWarning warnings have been appearing. This
seems to be caused by parsing HTML content that contains a URL.
This new change suppresses the warning at the root level of the app
before any content has been parsed, so this error shouldn't appear
again.
Fixes#968
* Add translation for new strings from 7041b43db9
Use same terms as Google's zh-tw interface.
* Fix missing period
* Sync string order with en (easier for future updates)
Fix the exception `AttributeError: 'Filter' object has no attribute 'block_url'`
introduced in this commit [1].
`self.block_title` and `self.block_url` were members of the Filter
object[2], but not anymore after commit [1].
This bug can be reproduced with setting WHOOGLE_CONFIG_BLOCK_URL to a
non-empty string.
[1] 10a15e06e1
[2] 284a8102c8
An invalid parsing warning was being thrown by the latest version of the
bs4 library. This suppresses that warning from being shown in the
console.
A 404 handler was added to move logging from the console to the error
template, since a lot of users assumed that 404 errors from the result
page were problems with Whoogle itself.
Fixes#967
When a browser adds a search engine using the opensearch template, it
does not have the correct context necessary to autofill the
`preferences` arg with the user's session prefs. As a result, queries
made using the browser bar will have the instance's default preferences
filled into the template.
Removing this shouldn't have any side effects, since queries made on the
same machine will have the correct session associated with the user.
Fixes#929
Some distributions require manually installing Python 3.10, which makes
it less convenient than just using whatever version of Python3.X the
package manager supports. Since the only 3.10 feature being used was
"match", and it was a very small change, it's been replaced with an
if/else statement to ensure compatibility with older versions of Python
3.
Navigating between pages of results now includes the user's preferences
string, which allows them to retain their config for a particular
instance between result pages.
Fixes#960
This adds a simple calculator widget, somewhat similar to the one presented
when searching calculator on Google.
Also, it adds somewhat of a template for making the addition of new widgets
easier via the app/utils/widgets.py file. My eventual plan is to use this to
create more widgets that appear in Google, such as a color picker, timer, etc.
---------
Co-authored-by: Ben Busby <contact@benbusby.com>
Medium redirects needed further cleanup to account for instances where a
link contains a subdomain that would not make sense in a Farside
redirect link.
Fixes#947
The url prefix was not included when reconstructing the root url using
X-Forwarded-* headers, causing some elements to fail to load properly.
Fixes#937
Add a function to check if target_word contains CJK characters
If a search term contains Chinese, Japanese, or Korean characters,
the term is bolded in search results regardless of whitespace.
CJK characters: Chinese, Japanese (hiragana, katakana, kanji),
and Korean (hangul syllables, hangul jamo)
Co-authored-by: Ben Busby <contact@benbusby.com>
The whoogle.env file previously needed to be created and enabled using
the WHOOGLE_DOTENV var. This removes the second step and loads the env
file if it's found during app init.
The Dockerfile has also been updated to copy in whoogle.env if it
exists.
Fixes#909
Moved the cleaner functions to app/utils/escaper.py
Removed unused import 're'
Moved the cleaner functionalities to the "search.py" and "routes.py"
Making sure escaped chars stay escaped during process
Replaced "<" and ">" with "andlt;" and "andgt;", respectively. This way,
when the 'response' object get loaded to bsoup (which happens several times
throughout the process between search.py and routes.py), bsoup will not
unescape them.
Introduces the ability to refine searches by time period:
- Past hour
- Past 24 hours
- Past week
- Past month
- Past year
Co-authored-by: Ben Busby <contact@benbusby.com>
Proxies that only support HTTP were causing request timeouts due to an
invalid upgrade to HTTPS when creating the request. This update restores
the ability to have an HTTP-only proxy for all requests.
Fixes#906
Parent sites using a 'www' subdomain or something similar were not
redirecting properly. This updates the hostname check to only validate
against the primary domain, except for Wikipedia since the subdomain is
used for interface translation in that case.
Fixes#901
Replacing result links and text when site alts are enabled is now part
of its own function, and handles replacement of link location and link
description separately.
Fixes#880
This can be updated later to allow users with cookies enabled to use a
key that is unique to their session (if they want, not mandatory), but
for now it makes more sense to just use a single key for all queries
from all users. This should eliminate a lot of issues that users have
reported where they are unable to decrypt queries or page elements due
to an expired/renewed session key.