Commit Graph

462 Commits (main)

Author SHA1 Message Date
test2user-aqil 09db4ff730
Add Azerbaijani translation (#944)
* Add Azerbaijani translation

Co-authored-by: Ben Busby <contact@benbusby.com>
1 year ago
Ahmad Alkadri 16794df68d
Add Indonesian translation (#940) 1 year ago
George T 94b208dd3f
Update greek translations (#943)
Add Hellenic Language

Co-authored-by: Ben Busby <contact@benbusby.com>
1 year ago
Ben Busby 12ce174b9a
Include url prefix for reverse proxied instances
The url prefix was not included when reconstructing the root url using
X-Forwarded-* headers, causing some elements to fail to load properly.

Fixes #937
1 year ago
Ahmad Alkadri e5a5aad997
Always bold `CN`/`JA`/`KO` search terms (#928)
Add a function to check if target_word contains CJK characters

If a search term contains Chinese, Japanese, or Korean characters,
the term is bolded in search results regardless of whitespace.

CJK characters: Chinese, Japanese (hiragana, katakana, kanji), 
and Korean (hangul syllables, hangul jamo)

Co-authored-by: Ben Busby <contact@benbusby.com>
1 year ago
Ben Busby fdc63b862e
Autoload `whoogle.env` if it exists
The whoogle.env file previously needed to be created and enabled using
the WHOOGLE_DOTENV var. This removes the second step and loads the env
file if it's found during app init.

The Dockerfile has also been updated to copy in whoogle.env if it
exists.

Fixes #909
1 year ago
Ben Busby aa54491ae0
Log rate-limiting errors
Rate limiting is now reported to the console as an error message.

Fixes #914
1 year ago
Charles Zawacki cec10e81d3
Don't prepend to services that have schemes with '//' (#925) 1 year ago
Charles Zawacki a760476d1b
Omit 'mobile.' and 'm.' in site alt replacements (#922)
Resolves #921
1 year ago
fiestasiesta 253ea62f8f
[Mobile] Add line break between header options (#918)
Fixes an issue where "Time Period" shows up on a separate
line from its dropdown
1 year ago
Ben Busby c24caceb03
Omit "www." in site alt replacements
Fixes #913
1 year ago
Ahmad Alkadri 3dda8b25ef
Escape html text in result body (#912)
Moved the cleaner functions to app/utils/escaper.py

Removed unused import 're'

Moved the cleaner functionalities to the "search.py" and "routes.py"

Making sure escaped chars stay escaped during process

Replaced "&lt;" and "&gt;" with "andlt;" and "andgt;", respectively. This way,
when the 'response' object get loaded to bsoup (which happens several times
throughout the process between search.py and routes.py), bsoup will not
unescape them.
1 year ago
MoistCat 08aa1ab8f1
Handle missing result div in filter (#911)
Changed "find_all()[0]" for find; which yields only one result.

Added check to ensure result_div exists before searching
for results.
1 year ago
fiestasiesta 7041b43db9
Add time constraint to search options (#888)
Introduces the ability to refine searches by time period:
- Past hour
- Past 24 hours
- Past week
- Past month
- Past year

Co-authored-by: Ben Busby <contact@benbusby.com>
1 year ago
Ben Busby c9c197bb5f
Bump version to 0.8.1 1 year ago
Ben Busby 7a852aa876
Allow HTTP-exclusive proxies for all requests
Proxies that only support HTTP were causing request timeouts due to an
invalid upgrade to HTTPS when creating the request. This update restores
the ability to have an HTTP-only proxy for all requests.

Fixes #906
1 year ago
Cx 8fbbdf2cec
Update Kurdish translation (#903) 1 year ago
Ben Busby 3dc6d14377
Only extract domain+ext when using site alts
Parent sites using a 'www' subdomain or something similar were not
redirecting properly. This updates the hostname check to only validate
against the primary domain, except for Wikipedia since the subdomain is
used for interface translation in that case.

Fixes #901
1 year ago
Ben Busby fd85f1573a
Refactor site alt link replacement
Replacing result links and text when site alts are enabled is now part
of its own function, and handles replacement of link location and link
description separately.

Fixes #880
1 year ago
Ben Busby 0310f0f542
Use app init enc key by default for all queries
This can be updated later to allow users with cookies enabled to use a
key that is unique to their session (if they want, not mandatory), but
for now it makes more sense to just use a single key for all queries
from all users. This should eliminate a lot of issues that users have
reported where they are unable to decrypt queries or page elements due
to an expired/renewed session key.
1 year ago
Ben Busby 3bd785b9b7
Update sponsored result filter for german results
Adds 'gesponsert' to ad keyword blacklist

Fixes #892
1 year ago
Ben Busby 33742ce247
Revert change to light theme contrast text color
The change made to whoogle-contrast-text in #873 wasn't the right
decision, since whoogle-contrast-text is meant to contrast with darker
UI elements. whoogle-text already contrasts with the default white
background.
2 years ago
Anna 08b16f5a0c
Switch to PEP517 standard for builds (#887)
* Sync setup.cfg with requirements.txt

* Include tests in PyPI tarballs

And exclude them from setuptools

* Set version number only once

Switch to PEP517 standard (pyproject.toml) for builds
2 years ago
Ben Busby d099b46336
Bump version to 0.8.0 2 years ago
Ben Busby 09a90ec46a
Match only "//medium" and ".medium.com" for scribe links
Closes #885
2 years ago
Xabi 6bd48e40a7
Include new ad filter keyword (#879)
Adds "sponsored" result keyword for Spanish language
2 years ago
curlpipe 2d23e0e952
Add Welsh translation (#876) 2 years ago
xatier 1a66b195d4
Update zh-tw translation (#875) 2 years ago
Ben Busby 06fd29f663
Update ad filter keywords
New changes to google search now include ads prefixed with the keyword
"sponsored". This update should remove these from appearing in search
results.

Fixes #871
2 years ago
Ben Busby 6696f2b12b
Escape word in term-bolding regex
Fixes #869
2 years ago
João 77884d05f2
Fix color for light contrast text (#873)
The color for the variable whoogle-contrast-text should be black or gray;
otherwise it will not be shown with white background.
2 years ago
João 3e39e0e041
Fix missing args in docstring [skip ci] (#872)
Update docstring with new arg
2 years ago
João 2a37619028
Replace error query params w/ preferences param (#867) 2 years ago
Abir10101 75682de892
Fix regex for bolding search terms (#865)
Updated regex to not remove chinese letters in bolding regex
2 years ago
João e99db8db26
Add country and interface lang to autocomplete (#866) 2 years ago
watchakorn-18k 4b2b0bf3c9
Include thai keyword in ads blacklist (#857) 2 years ago
watchakorn-18k 3943b2bc2c
Add thai translations (#856) 2 years ago
João 219fc58401
Fix handling of bangs (#851)
Changed the implementation to work if the bang is at anyplace in the query.

Added a check to not spend time looking for an operator if a "!" is not present
in the query.

No longer allowed to have the bang at the "!" char at the end, since this may
cause some conflicts like the issue cited before, where the ! is after a word
in the query, which is natural in most languages.
2 years ago
João 74503d542e
Encode config params in URL (#842)
Adds support for encoding (and optionally encrypting) user config values as
a single string that can be passed to any endpoint with the "preferences" url
param.

Co-authored-by: Ben Busby <contact@benbusby.com>
2 years ago
Biên 11275a7796
Add filter for ads in Vietnamese (#847) 2 years ago
João c42640e21c
Use `read_config_bool` for vars in app init (#848) 2 years ago
João 1aad47f2af
Fix bad internal redirection for google links (#850) 2 years ago
Cx 6bb9c8448b
Add Kurdish translation (#837) 2 years ago
João 8f59b7c340
Allow different `true` values for config vars (#841)
* Fixes read_config_bool to allow several true params

* add upper case comment
2 years ago
Ben Busby 32ad39d0e1
Refactor session behavior, remove `Flask-Session` dep
Sessions are no longer validated using the "/session/..." route. This
created a lot of problems due to buggy/unexpected behavior coming from
the Flask-Session dependency, which is (more or less) no longer
maintained.

Sessions are also no longer strictly server-side-only. The majority of
information that was being stored in user sessions was aesthetic only,
aside from the session specific key used to encrypt URLs. This key is
still unique per user, but is not (or shouldn't be) in anyone's threat
model to keep absolutely 100% private from everyone. Especially paranoid
users of Whoogle can easily modify the code to use a randomly generated
encryption key that is reset on session invalidation (and set
invalidation time to a short enough period for their liking).

Ultimately, this should result in much more stable sessions per client.
There shouldn't be decryption issues with element URLs or queries
during result page navigation.
2 years ago
Ben Busby a6a97aa9c7
Catch failure to restore adv search state
Shouldn't throw any errors if this fails to be restored from local
storage for any reason. It's purely a nice-to-have feature.
2 years ago
Ben Busby cab1105169
Add an "advanced search" toggle in result tabs
Adds a new advanced search icon alongside the result tabs for switching
to a different country from the result page.

This will obviously get populated with other methods of filtering
results, but for now it's just the country selector.
2 years ago
Ben Busby 2eee0b87d5
Include full path when determining proxy host url
Session validation includes a method for determining the proxy host url,
but previously did not include the path for the initial request. This
caused a situation where users with a new session would not be able to
complete their first search, since the session validation follow-through
url did not include the actual path for their search query.

The method now includes a flag for only extracting the root url, which
is needed for creating full urls in the content filter.

Fixes #708
2 years ago
Ben Busby aa198ed562
Include leading slash in path replacement for result config changes 2 years ago
Ben Busby 3f363b0175
Allow temp region selection from result view
This adds a new "temporary" config section of the results view, where a
user can now change the country that their results come from without
changing their default config settings.

Closes #322
2 years ago
Ben Busby 73dd5b80b5
Remove google prefs link for mismatched language queries
Queries performed in a different language than what is configured
contain a result div that prompts the user to configure their language
preferences using google's preferences page.

Since we want all language configuration to occur on Whoogle only, we
can safely remove this result div.

Fixes #444
Fixes #386
2 years ago
Ben Busby 839683b4e1
Allow result navigation w/ Tab and Shift+Tab
Closes #457
2 years ago
Ben Busby 78614877f2
Fix redirect for misspelled queries starting with `/`
Fixes #818
2 years ago
Ben Busby bf92944b95
Support quora and imdb alts through Farside
Farside can now redirect quora links to querte instances and imdb links
to libremdb instances. This updates Whoogle to perform link replacements
for both services when site alts are configured.
2 years ago
Ben Busby fde2c4db1e
Only select default country in config if none are selected 2 years ago
Ben Busby a1adf60b30
PEP-8 fix 2 years ago
Ben Busby 5db72a9552
Use scheme in alt replacement if defined
For users running local instances of service alternatives such as
invidious, the alt replacement procedure broke if the scheme of the
original service (almost always https) didn't match the scheme of their
defined local service (likely http).

This adds a small check to see if the alt has a defined scheme, and if
so, removes the original scheme for that result.

Fixes #806
2 years ago
Kian-Meng Ang 2a8519be30
Fix typos [skip ci] (#813) 2 years ago
MadcowOG 03eeb3fad1
Strip newlines when parsing tor password (#801)
When parsing control.conf or password file, a newline character could cause
Authentication Errors.
2 years ago
Ben Busby f688b88bd8
Preserve wikipedia language setting for wikiless redirects
Wikipedia -> Wikiless redirects always result in an english language
result, even if the Wikipedia result would've been in a non-english
language. This is due to Wikipedia using language specific subdomains
(i.e. de.wikipedia.org, en.wikipedia.org, etc) whereas Wikiless uses a
"lang" url param.

This has been fixed by inspecting the subdomain of the wikipedia link
and passing that value to Wikiless as the lang param if it's determined
to be a language specific value (currently just looking for a 2-char
subdomain).

See #805
2 years ago
Marcell Fülöp ee2d3726af
Use X-Forwarded-Host as url_root when present (#799)
If Whoogle is accessed on a non-standard port _and_ proxied,
this port is lost to the application and `element['src']`s are
incorrectly formed (omitting port).

HTTP x-Forwarded-Host will contain this front port number in
a typical Nginx reverse proxy configuration.
2 years ago
Ben Busby cada4efe1d
Fix missing `os` import in routes 2 years ago
Joao A. Candido Ramos 0d2d5fff5d
Fixes handling of maps (#792)
* fixes map url, e.g. when no q parameter is given

* move maps_args from results to filter where it is used
2 years ago
jan Anja 90e160094d
Add more OpenSearch definitions (for images etc.) (#786) 2 years ago
CAB233 877785c3ca
Update Simplified Chinese translation (#794) 2 years ago
Joao A. Candido Ramos d05ec08abf
Remove wildcard imports (#791) 2 years ago
Joao A. Candido Ramos ddb8931e68
Fix image links not being opened in new tab (#790)
The majority of image links and links that are not handle by whoogle are not
opening in new tabs, this allow links that are not related to the application
to open in new tabs.
2 years ago
jan Anja 194b2eae74
Fix a crash with protected Tor control port (#785) 2 years ago
Ben Busby 966644baa0
Broaden session validation exception handling
Due to how instances installed with pip seem to have issues storing
unrelated files in the same directory as sessions, exception handling
during session validation has been expanded to blindly ignore all
exceptions. This portion of the code is more for maintainers of large
public instances with a bunch of users who block cookies anyways, so
having basic app functionality break down as a result shouldn't be the
default.
2 years ago
Ben Busby ddc73a53fe
Flip country config check in template
Country config value should be checked against the valid value when
updating the home page config, not the other way around. This can lead
to a state where a user sets up an invalid country value, but can still
be matched against a correct value that is part of the invalid value
(i.e. "countryUK" is invalid, but would match against the correct value,
"UK")

Also minor refactor of where the session file size validation occurs.
2 years ago
Ben Busby cb5557cc2e
Check file sizes in session dir before validation
For pip installed instances of Whoogle, there seems to be an issue where
files other than sessions are being stored in the same directory as the
sessions. From a brief investigation, this does not seem to be caused by
Whoogle, since Flask-Session objects are the only files stored in that
directory. It could be an issue with the library that is being used for
sessions, however.

Regardless, the app shouldn't crash when trying to validate and remove
invalid sessions, so a file size limit of 4KB was imposed during
validation. Any file found in the session directory that exceeds this
size limit will be ignored.

Fixes #777
Fixes #793
2 years ago
MadcowOG c9ee9dcc8b
Tor password authentication (#746)
Added password authentication for tor control port.

For user configuration of access to tor control port. This file should be
heavily restricted in file system.

Co-authored-by: MadcowOG <madcowog@Arch-Main.localdomain>
2 years ago
Ben Busby b03fe74f10
Ensure currency link parent exists before parsing
Fixes #782
2 years ago
Ben Busby d512745767
Bump version to 0.7.4 2 years ago
Ben Busby d51be4f529
Fix missing box shadow for light theme results
Related to 65796fd1a5

Fixes an issue where box shadows were missing for light theme results.
2 years ago
Ben Busby 35ac5ac82f
Fix autocomplete behavior on result page
Similar issue to #629, but the result page uses a different script for
handling user input, so the fix was not applied appropriately.

It has been fixed for this view now.
2 years ago
Ben Busby 65796fd1a5
Counter latest result page style changes
Google updated their styling of the result page, which broke some
components of Whoogle's result page styling (namely the result div
backgrounds for dark mode).

The GClasses class has been updated to keep track of what class names
have been updated to, and roll them back to a value that works for
Whoogle. A function was added that loops through new class names and
replaces them with their older counterparts.
2 years ago
Ben Busby a9e1f0d1bc
Refactor autocomplete/suggestion behavior (front-end only)
The previous implementation of autocomplete/suggestions on the front end
resulted in a situation where input and keydown events were constantly
being added to the search input bar. This was refactored to set up the
events only once and process suggestion navigation and appending
suggestions separately with different functions.

This has been tested on both an Android simulator, as well as an Android
tablet and seems to work as expected.

Fixes #370
Fixes #629
2 years ago
Ben Busby 47df4da4b5
Bump version to 0.7.3 2 years ago
Ben Busby f22e5ac171
Catch and ignore unpickling errors in pip installs
This seems to be caused by an odd behavior related to Flask sessions and
instances of Whoogle installed via pip. I didn't investigate it too
much, since catching and ignoring the result doesn't impact Whoogle
functionality at all (configuration and session values persist as
normal). Since this doesn't affect non-pip instances, I don't believe it
to be a fault within Whoogle itself.

Fixes #765
2 years ago
Ben Busby ef98d85dc5
Ensure searches with a leading slash are treated as queries
A user reported a bug where searches with a leading slash (in this case:
"/e/OS apps" were interpreted as a Google specific link when clicking
the next page of results.

This was due to the behavior that Google's search results exhibit, where
internal links for pages like support.google.com are delivered with
params like "?q=/support" rather than a direct link. This fixes that
scenario by checking the "q" param value against the user's original
query to ensure they don't match before assuming that the result is
intended as a redirect.

Fixes #776
2 years ago
Joao A. Candido Ramos fb6627a9cc
Remove duplicated handling of /url result links (#769)
It appears that result links beginning with '/url' were mistakenly
commited with an inefficient filtering process in its place. With the
way the code is structured, this less effective '/url' link filter took
precedence over the previous link filter, and also caused users with the
"open link in new tab" config enabled to no longer have access to that
feature.

Fixes #769
2 years ago
invis-z 9bcd9931f7
Replace leading slash for image links (#762)
The leading slash was previously removed without noticing it was part of a
string replacement in #734. This caused the href of "View Image" contain a
leading "/" which is wrong.
2 years ago
Ben Busby fb600d6fc8
Improve G page distinction between footer and results
Pages in the Whoogle footer that by default route to Google pages were
previously being removed, but caused results that also routed to similar
pages to no longer be accessible. This was due to the removal of the
'/url' endpoint that Google uses for each result.

To fix this, the result link is now parsed so that the domain of the
result can be checked against the disallowed G page list. Since results
are delivered in a "/url?q=<domain>" format -- even for pages to
Google's own products -- and the footer links are formatted as
"<product>.google.com", footer links are removed and result links are
parsed correctly.

Fixes #747
2 years ago
Ben Busby f5d599e7d2
Use `lax` for session `SameSite` value (not `strict`)
SESSION_COOKIE_SAMESITE must be set to 'lax' to allow the user's
previous session to persist when accessing the instance from an external
link. Setting this value to 'strict' causes Whoogle to revalidate a new
session, and fail, resulting in cookies being disabled.

This could be re-evaluated if Whoogle ever switches to client side
configuration instead.

Fixes #749
2 years ago
invis-z 0f6226ce51
Use `window` from Endpoint enum for anon view (#748)
Removes previously hardcoded "/window" from anon view links
2 years ago
hoschi1337 b809c88fa5
Fix german translation error (#742)
"Nachrichten" is the correct translation of "News"
2 years ago
xatier 7486697d41
Update zh-tw translation (#736) 2 years ago
invis-z b4d9f1f5e5
Remove "/" before endpoints & tags (#734)
Removes the leading slash before imgres and other endpoints

Fix #733
2 years ago
Ben Busby 8a0b872337
Bump version to 0.7.2 2 years ago
Ben Busby 2490089645
Remove unused `/url` endpoint
The `/url` endpoint was previously used as a way of mirroring the
`/url?q=<result domain>` formatting of locations in search results from
Google. Rather than have this unnecessary intermediary step, the result
path was extracted and used as the immediate path for each result item
instead.

This endpoint hasn't been in use for many versions and has been in need
of removal for quite some time.
2 years ago
Ben Busby 62d7491936
Only create ip card if main result div is found
The ip address card that is created for searches like "my ip" only needs
to be created/inserted if a main result div id is found.

Fixes #735
2 years ago
Ben Busby abc30d7da3
Render error message w/o `safe` filter
The error message shown in the error template does not need to be
rendered using the safe filter, and furthermore opens up an XSS
vulnerability.
2 years ago
Warren Spits d62ceb8423
Add proxyfix to honor `X-Forwarded-Proto` header (#731)
Fixes #730
2 years ago
Ben Busby a9b675cd24
Strip trailing slash on root url in filter
If a trailing slash is defined here, it causes the Whoogle instance to
redirect these element requests back to the home page, causing unwanted
behavior.
2 years ago
Ben Busby 5c8be4428b
Fall back to netloc for bang search if query is empty
Previously, empty bang searches would redirect to the Whoogle instance
home page. This now redirects to the specific site for the bang search
instead (i.e. "!yt" without a query redirects to "youtube.com", "!gh" to
"github.com", etc)

Fixes #719
2 years ago
Ben Busby 7688c1a233
Revert anon-view key change from #724
The "anon-view" translation key is the correct one to use for accessing
anonymous view within the search results. "config-anon-view" is only for
the configuration menu on the home page.
2 years ago
gdm85 6d362ca5c7
Add support for relative search results (#715)
* Relativization of search results

* Fix JavaScript error when opening images

* Replace single-letter logo and remove sign-in link

* Add `WHOOGLE_URL_PREFIX` env var to support relative path redirection

The `WHOOGLE_URL_PREFIX` var can now be set to fix internal app
redirects, such as the `/session` redirect performed on the first visit
to the Whoogle home page.

Co-authored-by: Ben Busby <contact@benbusby.com>
2 years ago
gdm85 94b4eb08a2
Return 401 when token is invalid (#714)
In some rare instances (a race condition perhaps?) a
`cryptography.fernet.InvalidToken` exception is thrown resulting in
a broken connection.

This change gracefully returns a 401 error instead.
2 years ago
Ilya Prokopenko cded1e0272
Fix Russian translation (#726) 2 years ago