Merge pull request #1 from benbusby/main

Update to keep up with source repo
pull/1134/head
Thomas Fournier 11 months ago committed by GitHub
commit 678557abae
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -14,7 +14,7 @@
</tr>
</table>
Get Google search results, but without any ads, javascript, AMP links, cookies, or IP address tracking. Easily deployable in one click as a Docker app, and customizable with a single config file. Quick and simple to implement as a primary search engine replacement on both desktop and mobile.
Get Google search results, but without any ads, JavaScript, AMP links, cookies, or IP address tracking. Easily deployable in one click as a Docker app, and customizable with a single config file. Quick and simple to implement as a primary search engine replacement on both desktop and mobile.
Contents
1. [Features](#features)
@ -33,10 +33,11 @@ Contents
5. [Usage](#usage)
6. [Extra Steps](#extra-steps)
1. [Set Primary Search Engine](#set-whoogle-as-your-primary-search-engine)
2. [Prevent Downtime (Heroku Only)](#prevent-downtime-heroku-only)
3. [Manual HTTPS Enforcement](#https-enforcement)
4. [Using with Firefox Containers](#using-with-firefox-containers)
5. [Reverse Proxying](#reverse-proxying)
2. [Custom Redirecting](#custom-redirecting)
3. [Prevent Downtime (Heroku Only)](#prevent-downtime-heroku-only)
4. [Manual HTTPS Enforcement](#https-enforcement)
5. [Using with Firefox Containers](#using-with-firefox-containers)
6. [Reverse Proxying](#reverse-proxying)
1. [Nginx](#nginx)
7. [Contributing](#contributing)
8. [FAQ](#faq)
@ -95,7 +96,7 @@ Provides:
- Free deployment of app
- Free HTTPS url (https://\<app name\>.\<username\>\.repl\.co)
- Supports custom domains
- Downtime after periods of inactivity \([solution 1](https://repl.it/talk/ask/use-this-pingmat1replco-just-enter/28821/101298), [solution 2](https://repl.it/talk/learn/How-to-use-and-setup-UptimeRobot/9003)\)
- Downtime after periods of inactivity ([solution](https://repl.it/talk/learn/How-to-use-and-setup-UptimeRobot/9003)\)
___
@ -398,6 +399,10 @@ There are a few optional environment variables available for customizing a Whoog
| WHOOGLE_PROXY_PASS | The password of the proxy server. |
| WHOOGLE_PROXY_TYPE | The type of the proxy server. Can be "socks5", "socks4", or "http". |
| WHOOGLE_PROXY_LOC | The location of the proxy server (host or ip). |
| WHOOGLE_USER_AGENT | The desktop user agent to use. Defaults to a randomly generated one. |
| WHOOGLE_USER_AGENT_MOBILE | The mobile user agent to use. Defaults to a randomly generated one. |
| WHOOGLE_USE_CLIENT_USER_AGENT | Enable to use your own user agent for all requests. Defaults to false. |
| WHOOGLE_REDIRECTS | Specify sites that should be redirected elsewhere. See [custom redirecting](#custom-redirecting). |
| EXPOSE_PORT | The port where Whoogle will be exposed. |
| HTTPS_ONLY | Enforce HTTPS. (See [here](https://github.com/benbusby/whoogle-search#https-enforcement)) |
| WHOOGLE_ALT_TW | The twitter.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
@ -406,7 +411,7 @@ There are a few optional environment variables available for customizing a Whoog
| WHOOGLE_ALT_TL | The Google Translate alternative to use. This is used for all "translate ____" searches. Set to "" to disable. |
| WHOOGLE_ALT_MD | The medium.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_IMG | The imgur.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_WIKI | The wikipedia.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_WIKI | The wikipedia.org alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_IMDB | The imdb.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_QUORA | The quora.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_AUTOCOMPLETE | Controls visibility of autocomplete/search suggestions. Default on -- use '0' to disable. |
@ -448,6 +453,7 @@ Same as most search engines, with the exception of filtering by time range.
To filter by a range of time, append ":past <time>" to the end of your search, where <time> can be `hour`, `day`, `month`, or `year`. Example: `coronavirus updates :past hour`
## Extra Steps
### Set Whoogle as your primary search engine
*Note: If you're using a reverse proxy to run Whoogle Search, make sure the "Root URL" config option on the home page is set to your URL before going through these steps.*
@ -492,6 +498,32 @@ Browser settings:
- Manual
- Under search engines > manage search engines > add, manually enter your Whoogle instance details with a `<whoogle url>/search?q=%s` formatted search URL.
### Custom Redirecting
You can set custom site redirects using the `WHOOGLE_REDIRECTS` environment
variable. A lot of sites, such as Twitter, Reddit, etc, have built-in redirects
to [Farside links](https://sr.ht/~benbusby/farside), but you may want to define
your own.
To do this, you can use the following syntax:
```
WHOOGLE_REDIRECTS="<parent_domain>:<new_domain>"
```
For example, if you want to redirect from "badsite.com" to "goodsite.com":
```
WHOOGLE_REDIRECTS="badsite.com:goodsite.com"
```
This can be used for multiple sites as well, with comma separation:
```
WHOOGLE_REDIRECTS="badA.com:goodA.com,badB.com:goodB.com"
```
NOTE: Do not include "http(s)://" when defining your redirect.
### Prevent Downtime (Heroku only)
Part of the deal with Heroku's free tier is that you're allocated 550 hours/month (meaning it can't stay active 24/7), and the app is temporarily shut down after 30 minutes of inactivity. Once it becomes inactive, any Whoogle searches will still work, but it'll take an extra 10-15 seconds for the app to come back online before displaying the result, which can be frustrating if you're in a hurry.
@ -568,7 +600,7 @@ Under the hood, Whoogle is a basic Flask app with the following structure:
- `opensearch.xml`: A template used for supporting [OpenSearch](https://developer.mozilla.org/en-US/docs/Web/OpenSearch).
- `imageresults.html`: An "experimental" template used for supporting the "Full Size" image feature on desktop.
- `static/<css|js>`
- CSS/Javascript files, should be self-explanatory
- CSS/JavaScript files, should be self-explanatory
- `static/settings`
- Key-value JSON files for establishing valid configuration values
@ -607,7 +639,7 @@ I'm a huge fan of Searx though and encourage anyone to use that instead if they
**Why does the image results page look different?**
A lot of the app currently piggybacks on Google's existing support for fetching results pages with Javascript disabled. To their credit, they've done an excellent job with styling pages, but it seems that the image results page - particularly on mobile - is a little rough. Moving forward, with enough interest, I'd like to transition to fetching the results and parsing them into a unique Whoogle-fied interface that I can style myself.
A lot of the app currently piggybacks on Google's existing support for fetching results pages with JavaScript disabled. To their credit, they've done an excellent job with styling pages, but it seems that the image results page - particularly on mobile - is a little rough. Moving forward, with enough interest, I'd like to transition to fetching the results and parsing them into a unique Whoogle-fied interface that I can style myself.
## Public Instances
@ -621,16 +653,17 @@ A lot of the app currently piggybacks on Google's existing support for fetching
| [https://s.tokhmi.xyz](https://s.tokhmi.xyz) | 🇺🇸 US | Multi-choice | ✅ |
| [https://search.sethforprivacy.com](https://search.sethforprivacy.com) | 🇩🇪 DE | English | |
| [https://whoogle.dcs0.hu](https://whoogle.dcs0.hu) | 🇭🇺 HU | Multi-choice | |
| [https://whoogle.esmailelbob.xyz](https://whoogle.esmailelbob.xyz) | 🇨🇦 CA | Multi-choice | |
| [https://gowogle.voring.me](https://gowogle.voring.me) | 🇺🇸 US | Multi-choice | |
| [https://whoogle.privacydev.net](https://whoogle.privacydev.net) | 🇳🇱 NL | English | |
| [https://whoogle.privacydev.net](https://whoogle.privacydev.net) | 🇩🇪 DE | English | |
| [https://wg.vern.cc](https://wg.vern.cc) | 🇺🇸 US | English | |
| [https://whoogle.hxvy0.gq](https://whoogle.hxvy0.gq) | 🇨🇦 CA | Turkish Only | ✅ |
| [https://whoogle.hostux.net](https://whoogle.hostux.net) | 🇫🇷 FR | Multi-choice | |
| [https://whoogle.lunar.icu](https://whoogle.lunar.icu) | 🇩🇪 DE | Multi-choice | ✅ |
| [https://wgl.frail.duckdns.org](https://wgl.frail.duckdns.org) | 🇧🇷 BR | Multi-choice | |
| [https://whoogle.no-logs.com/)(https://whoogle.no-logs.com/) | 🇸🇪 SE | Multi-choice | |
| [https://whoogle.no-logs.com](https://whoogle.no-logs.com/) | 🇸🇪 SE | Multi-choice | |
| [https://search.rubberverse.xyz](https://search.rubberverse.xyz) | 🇵🇱 PL | English | |
| [https://whoogle.ftw.lol](https://whoogle.ftw.lol) | 🇩🇪 DE | Multi-choice | |
| [https://whoogle-search--replitcomreside.repl.co](https://whoogle-search--replitcomreside.repl.co) | 🇺🇸 US | English | |
* A checkmark in the "Cloudflare" category here refers to the use of the reverse proxy, [Cloudflare](https://cloudflare.com). The checkmark will not be listed for a site which uses Cloudflare DNS but rather the proxying service which grants Cloudflare the ability to monitor traffic to the website.
@ -642,7 +675,7 @@ A lot of the app currently piggybacks on Google's existing support for fetching
| [http://whoglqjdkgt2an4tdepberwqz3hk7tjo4kqgdnuj77rt7nshw2xqhqad.onion](http://whoglqjdkgt2an4tdepberwqz3hk7tjo4kqgdnuj77rt7nshw2xqhqad.onion) | 🇺🇸 US | Multi-choice
| [http://nuifgsnbb2mcyza74o7illtqmuaqbwu4flam3cdmsrnudwcmkqur37qd.onion](http://nuifgsnbb2mcyza74o7illtqmuaqbwu4flam3cdmsrnudwcmkqur37qd.onion) | 🇩🇪 DE | English
| [http://whoogle.vernccvbvyi5qhfzyqengccj7lkove6bjot2xhh5kajhwvidqafczrad.onion](http://whoogle.vernccvbvyi5qhfzyqengccj7lkove6bjot2xhh5kajhwvidqafczrad.onion/) | 🇺🇸 US | English |
| [http://whoogle.g4c3eya4clenolymqbpgwz3q3tawoxw56yhzk4vugqrl6dtu3ejvhjid.onion](http://whoogle.g4c3eya4clenolymqbpgwz3q3tawoxw56yhzk4vugqrl6dtu3ejvhjid.onion/) | 🇳🇱 NL | English |
| [http://whoogle.g4c3eya4clenolymqbpgwz3q3tawoxw56yhzk4vugqrl6dtu3ejvhjid.onion](http://whoogle.g4c3eya4clenolymqbpgwz3q3tawoxw56yhzk4vugqrl6dtu3ejvhjid.onion/) | 🇩🇪 DE | English |
#### I2P Instances

@ -561,18 +561,19 @@ class Filter:
is enabled
"""
for site, alt in SITE_ALTS.items():
for div in self.soup.find_all('div', text=re.compile(site)):
# Use the number of words in the div string to determine if the
# string is a result description (shouldn't replace domains used
# in desc text).
# Also ignore medium.com replacements since these are handled
if site != "medium.com" and alt != "":
# Ignore medium.com replacements since these are handled
# specifically in the link description replacement, and medium
# results are never given their own "card" result where this
# replacement would make sense.
if site == 'medium.com' or len(div.string.split(' ')) > 1:
continue
div.string = div.string.replace(site, alt)
# Also ignore if the alt is empty, since this is used to indicate
# that the alt is not enabled.
for div in self.soup.find_all('div', text=re.compile(site)):
# Use the number of words in the div string to determine if the
# string is a result description (shouldn't replace domains used
# in desc text).
if len(div.string.split(' ')) == 1:
div.string = div.string.replace(site, alt)
for link in self.soup.find_all('a', href=True):
# Search and replace all link descriptions
@ -596,7 +597,7 @@ class Filter:
# replaced (i.e. 'philomedium.com' should stay as it is).
if 'medium.com' in link_str:
if link_str.startswith('medium.com') or '.medium.com' in link_str:
link_str = 'farside.link/scribe' + link_str[
link_str = SITE_ALTS['medium.com'] + link_str[
link_str.find('medium.com') + len('medium.com'):]
new_desc.string = link_str
else:

@ -254,7 +254,8 @@ class Config:
key = self._get_fernet_key(self.preferences_key)
config = Fernet(key).decrypt(
brotli.decompress(urlsafe_b64decode(preferences.encode()))
brotli.decompress(urlsafe_b64decode(
preferences.encode() + b'=='))
)
config = pickle.loads(brotli.decompress(config))
@ -262,7 +263,8 @@ class Config:
config = {}
elif mode == 'u': # preferences are not encrypted
config = pickle.loads(
brotli.decompress(urlsafe_b64decode(preferences.encode()))
brotli.decompress(urlsafe_b64decode(
preferences.encode() + b'=='))
)
else: # preferences are incorrectly formatted
config = {}

@ -73,6 +73,14 @@ def send_tor_signal(signal: Signal) -> bool:
def gen_user_agent(is_mobile) -> str:
user_agent = os.environ.get('WHOOGLE_USER_AGENT', '')
user_agent_mobile = os.environ.get('WHOOGLE_USER_AGENT_MOBILE', '')
if user_agent and not is_mobile:
return user_agent
if user_agent_mobile and is_mobile:
return user_agent_mobile
firefox = random.choice(['Choir', 'Squier', 'Higher', 'Wire']) + 'fox'
linux = random.choice(['Win', 'Sin', 'Gin', 'Fin', 'Kin']) + 'ux'
@ -261,7 +269,7 @@ class Request:
return []
def send(self, base_url='', query='', attempt=0,
force_mobile=False) -> Response:
force_mobile=False, user_agent='') -> Response:
"""Sends an outbound request to a URL. Optionally sends the request
using Tor, if enabled by the user.
@ -277,10 +285,14 @@ class Request:
Response: The Response object returned by the requests call
"""
if force_mobile and not self.mobile:
modified_user_agent = self.modified_user_agent_mobile
use_client_user_agent = int(os.environ.get('WHOOGLE_USE_CLIENT_USER_AGENT', '0'))
if user_agent and use_client_user_agent == 1:
modified_user_agent = user_agent
else:
modified_user_agent = self.modified_user_agent
if force_mobile and not self.mobile:
modified_user_agent = self.modified_user_agent_mobile
else:
modified_user_agent = self.modified_user_agent
headers = {
'User-Agent': modified_user_agent

@ -557,6 +557,15 @@ def window():
)
@app.route(f'/robots.txt')
def robots():
response = make_response(
'''User-Agent: *
Disallow: /''', 200)
response.mimetype = 'text/plain'
return response
@app.errorhandler(404)
def page_not_found(e):
return render_template('error.html', error_message=str(e)), 404

@ -1056,12 +1056,12 @@
"books": "Pirtûk",
"anon-view": "Dîtina Nenas",
"": "--",
"qdr:h": "Saet berê",
"qdr:d": "24 saetên borî",
"qdr:h": "Demjimêra borî",
"qdr:d": "24 Demjimêrên borî",
"qdr:w": "Hefteya borî",
"qdr:m": "Meha borî",
"qdr:y": "Sala borî",
"config-time-period": "Dem Period"
"config-time-period": "Pêşsazkariyên demê"
},
"lang_th": {
"search": "ค้นหา",

@ -79,3 +79,10 @@ def get_abs_url(url, page_url):
elif url.startswith('./'):
return f'{page_url}{url[2:]}'
return url
def list_to_dict(lst: list) -> dict:
if len(lst) < 2:
return {}
return {lst[i].replace(' ', ''): lst[i+1].replace(' ', '')
for i in range(0, len(lst), 2)}

@ -1,5 +1,6 @@
from app.models.config import Config
from app.models.endpoint import Endpoint
from app.utils.misc import list_to_dict
from bs4 import BeautifulSoup, NavigableString
import copy
from flask import current_app
@ -43,6 +44,9 @@ SITE_ALTS = {
'quora.com': os.getenv('WHOOGLE_ALT_QUORA', 'farside.link/quetre')
}
# Include custom site redirects from WHOOGLE_REDIRECTS
SITE_ALTS.update(list_to_dict(re.split(',|:', os.getenv('WHOOGLE_REDIRECTS', ''))))
def contains_cjko(s: str) -> bool:
"""This function check whether or not a string contains Chinese, Japanese,

@ -144,7 +144,8 @@ class Search:
and not g.user_request.mobile)
get_body = g.user_request.send(query=full_query,
force_mobile=view_image)
force_mobile=view_image,
user_agent=self.user_agent)
# Produce cleanable html soup from response
get_body_safed = get_body.text.replace("&lt;","andlt;").replace("&gt;","andgt;")

@ -4,7 +4,6 @@ https://search.dr460nf1r3.org
https://s.tokhmi.xyz
https://search.sethforprivacy.com
https://whoogle.dcs0.hu
https://whoogle.esmailelbob.xyz
https://whoogle.lunar.icu
https://gowogle.voring.me
https://whoogle.privacydev.net
@ -17,3 +16,5 @@ https://whoogle3.ungovernable.men
https://wgl.frail.duckdns.org
https://whoogle.no-logs.com
https://search.rubberverse.xyz
https://whoogle.ftw.lol
https://whoogle-search--replitcomreside.repl.co

@ -7,10 +7,10 @@ cffi==1.15.1
chardet==5.1.0
click==8.1.3
cryptography==3.3.2; platform_machine == 'armv7l'
cryptography==39.0.1; platform_machine != 'armv7l'
cryptography==41.0.0; platform_machine != 'armv7l'
cssutils==2.6.0
defusedxml==0.7.1
Flask==2.2.3
Flask==2.3.2
idna==3.4
itsdangerous==2.1.2
Jinja2==3.1.2
@ -21,16 +21,16 @@ pluggy==1.0.0
pycodestyle==2.10.0
pycparser==2.21
pyOpenSSL==19.1.0; platform_machine == 'armv7l'
pyOpenSSL==23.0.0; platform_machine != 'armv7l'
pyOpenSSL==23.2.0; platform_machine != 'armv7l'
pyparsing==3.0.9
PySocks==1.7.1
pytest==7.2.1
python-dateutil==2.8.2
requests==2.28.2
requests==2.31.0
soupsieve==2.4
stem==1.8.1
urllib3==1.26.14
waitress==2.1.2
wcwidth==0.2.6
Werkzeug==2.2.3
Werkzeug==2.3.3
python-dotenv==0.21.1

Loading…
Cancel
Save