Compare commits

..

No commits in common. 'main' and 'v0.8.2' have entirely different histories.
main ... v0.8.2

@ -42,10 +42,10 @@ jobs:
docker buildx ls
docker buildx build --push \
--tag benbusby/whoogle-search:latest \
--platform linux/amd64,linux/arm64 .
--platform linux/amd64,linux/arm/v7,linux/arm64 .
docker buildx build --push \
--tag ghcr.io/benbusby/whoogle-search:latest \
--platform linux/amd64,linux/arm64 .
--platform linux/amd64,linux/arm/v7,linux/arm64 .
- name: build and push tag
if: startsWith(github.ref, 'refs/tags')
run: |

4
.gitignore vendored

@ -1,5 +1,4 @@
venv/
.venv/
.idea/
__pycache__/
*.pyc
@ -11,8 +10,7 @@ test/static
flask_session/
app/static/config
app/static/custom_config
app/static/bangs/*
!app/static/bangs/00-whoogle.json
app/static/bangs
# pip stuff
/build/

@ -15,17 +15,10 @@ RUN pip install --prefix /install --no-warn-script-location --no-cache-dir -r re
FROM python:3.11.0a5-alpine
RUN apk add --update --no-cache tor curl openrc libstdc++
# git go //for obfs4proxy
# libcurl4-openssl-dev
RUN apk -U upgrade
# uncomment to build obfs4proxy
# RUN git clone https://gitlab.com/yawning/obfs4.git
# WORKDIR /obfs4
# RUN go build -o obfs4proxy/obfs4proxy ./obfs4proxy
# RUN cp ./obfs4proxy/obfs4proxy /usr/bin/obfs4proxy
ARG DOCKER_USER=whoogle
ARG DOCKER_USERID=927
ARG config_dir=/config

@ -14,13 +14,12 @@
</tr>
</table>
Get Google search results, but without any ads, JavaScript, AMP links, cookies, or IP address tracking. Easily deployable in one click as a Docker app, and customizable with a single config file. Quick and simple to implement as a primary search engine replacement on both desktop and mobile.
Get Google search results, but without any ads, javascript, AMP links, cookies, or IP address tracking. Easily deployable in one click as a Docker app, and customizable with a single config file. Quick and simple to implement as a primary search engine replacement on both desktop and mobile.
Contents
1. [Features](#features)
3. [Install/Deploy Options](#install)
1. [Heroku Quick Deploy](#heroku-quick-deploy)
1. [Render.com](#render)
1. [Repl.it](#replit)
1. [Fly.io](#flyio)
1. [Koyeb](#koyeb)
@ -34,12 +33,10 @@ Contents
5. [Usage](#usage)
6. [Extra Steps](#extra-steps)
1. [Set Primary Search Engine](#set-whoogle-as-your-primary-search-engine)
2. [Custom Redirecting](#custom-redirecting)
2. [Custom Bangs](#custom-bangs)
3. [Prevent Downtime (Heroku Only)](#prevent-downtime-heroku-only)
4. [Manual HTTPS Enforcement](#https-enforcement)
5. [Using with Firefox Containers](#using-with-firefox-containers)
6. [Reverse Proxying](#reverse-proxying)
2. [Prevent Downtime (Heroku Only)](#prevent-downtime-heroku-only)
3. [Manual HTTPS Enforcement](#https-enforcement)
4. [Using with Firefox Containers](#using-with-firefox-containers)
5. [Reverse Proxying](#reverse-proxying)
1. [Nginx](#nginx)
7. [Contributing](#contributing)
8. [FAQ](#faq)
@ -62,7 +59,6 @@ Contents
- Randomly generated User Agent
- Easy to install/deploy
- DDG-style bang (i.e. `!<tag> <query>`) searches
- User-defined [custom bangs](#custom-bangs)
- Optional location-based searching (i.e. results near \<city\>)
- Optional NoJS mode to view search results in a separate window with JavaScript blocked
@ -90,16 +86,6 @@ Notes:
___
### [Render](https://render.com)
Create an account on [render.com](https://render.com) and import the Whoogle repo with the following settings:
- Runtime: `Python 3`
- Build Command: `pip install -r requirements.txt`
- Run Command: `./run`
___
### [Repl.it](https://repl.it)
[![Run on Repl.it](https://repl.it/badge/github/benbusby/whoogle-search)](https://repl.it/github/benbusby/whoogle-search)
@ -109,13 +95,13 @@ Provides:
- Free deployment of app
- Free HTTPS url (https://\<app name\>.\<username\>\.repl\.co)
- Supports custom domains
- Downtime after periods of inactivity ([solution](https://repl.it/talk/learn/How-to-use-and-setup-UptimeRobot/9003)\)
- Downtime after periods of inactivity \([solution 1](https://repl.it/talk/ask/use-this-pingmat1replco-just-enter/28821/101298), [solution 2](https://repl.it/talk/learn/How-to-use-and-setup-UptimeRobot/9003)\)
___
### [Fly.io](https://fly.io)
You will need a [Fly.io](https://fly.io) account to deploy Whoogle. The [free allowances](https://fly.io/docs/about/pricing/#free-allowances) are enough for personal use.
You will need a **PAID** [Fly.io](https://fly.io) account to deploy Whoogle.
#### Install the CLI: https://fly.io/docs/hands-on/installing/
@ -247,7 +233,6 @@ ExecStart=<python_install_dir>/python3 <whoogle_install_dir>/whoogle-search --ho
ExecStart=<whoogle_repo_dir>/run
# For example:
# ExecStart=/var/www/whoogle-search/run
WorkingDirectory=<whoogle_repo_dir>
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=3
@ -413,10 +398,6 @@ There are a few optional environment variables available for customizing a Whoog
| WHOOGLE_PROXY_PASS | The password of the proxy server. |
| WHOOGLE_PROXY_TYPE | The type of the proxy server. Can be "socks5", "socks4", or "http". |
| WHOOGLE_PROXY_LOC | The location of the proxy server (host or ip). |
| WHOOGLE_USER_AGENT | The desktop user agent to use. Defaults to a randomly generated one. |
| WHOOGLE_USER_AGENT_MOBILE | The mobile user agent to use. Defaults to a randomly generated one. |
| WHOOGLE_USE_CLIENT_USER_AGENT | Enable to use your own user agent for all requests. Defaults to false. |
| WHOOGLE_REDIRECTS | Specify sites that should be redirected elsewhere. See [custom redirecting](#custom-redirecting). |
| EXPOSE_PORT | The port where Whoogle will be exposed. |
| HTTPS_ONLY | Enforce HTTPS. (See [here](https://github.com/benbusby/whoogle-search#https-enforcement)) |
| WHOOGLE_ALT_TW | The twitter.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
@ -425,7 +406,7 @@ There are a few optional environment variables available for customizing a Whoog
| WHOOGLE_ALT_TL | The Google Translate alternative to use. This is used for all "translate ____" searches. Set to "" to disable. |
| WHOOGLE_ALT_MD | The medium.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_IMG | The imgur.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_WIKI | The wikipedia.org alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_WIKI | The wikipedia.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_IMDB | The imdb.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_QUORA | The quora.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_AUTOCOMPLETE | Controls visibility of autocomplete/search suggestions. Default on -- use '0' to disable. |
@ -435,8 +416,6 @@ There are a few optional environment variables available for customizing a Whoog
| WHOOGLE_TOR_SERVICE | Enable/disable the Tor service on startup. Default on -- use '0' to disable. |
| WHOOGLE_TOR_USE_PASS | Use password authentication for tor control port. |
| WHOOGLE_TOR_CONF | The absolute path to the config file containing the password for the tor control port. Default: ./misc/tor/control.conf WHOOGLE_TOR_PASS must be 1 for this to work.|
| WHOOGLE_SHOW_FAVICONS | Show/hide favicons next to search result URLs. Default on. |
| WHOOGLE_UPDATE_CHECK | Enable/disable the automatic daily check for new versions of Whoogle. Default on. |
### Config Environment Variables
These environment variables allow setting default config values, but can be overwritten manually by using the home page config menu. These allow a shortcut for destroying/rebuilding an instance to the same config state every time.
@ -462,7 +441,6 @@ These environment variables allow setting default config values, but can be over
| WHOOGLE_CONFIG_STYLE | The custom CSS to use for styling (should be single line) |
| WHOOGLE_CONFIG_PREFERENCES_ENCRYPTED | Encrypt preferences token, requires preferences key |
| WHOOGLE_CONFIG_PREFERENCES_KEY | Key to encrypt preferences in URL (REQUIRED to show url) |
| WHOOGLE_CONFIG_ANON_VIEW | Include the "anonymous view" option for each search result |
## Usage
Same as most search engines, with the exception of filtering by time range.
@ -470,7 +448,6 @@ Same as most search engines, with the exception of filtering by time range.
To filter by a range of time, append ":past <time>" to the end of your search, where <time> can be `hour`, `day`, `month`, or `year`. Example: `coronavirus updates :past hour`
## Extra Steps
### Set Whoogle as your primary search engine
*Note: If you're using a reverse proxy to run Whoogle Search, make sure the "Root URL" config option on the home page is set to your URL before going through these steps.*
@ -515,40 +492,6 @@ Browser settings:
- Manual
- Under search engines > manage search engines > add, manually enter your Whoogle instance details with a `<whoogle url>/search?q=%s` formatted search URL.
### Custom Redirecting
You can set custom site redirects using the `WHOOGLE_REDIRECTS` environment
variable. A lot of sites, such as Twitter, Reddit, etc, have built-in redirects
to [Farside links](https://sr.ht/~benbusby/farside), but you may want to define
your own.
To do this, you can use the following syntax:
```
WHOOGLE_REDIRECTS="<parent_domain>:<new_domain>"
```
For example, if you want to redirect from "badsite.com" to "goodsite.com":
```
WHOOGLE_REDIRECTS="badsite.com:goodsite.com"
```
This can be used for multiple sites as well, with comma separation:
```
WHOOGLE_REDIRECTS="badA.com:goodA.com,badB.com:goodB.com"
```
NOTE: Do not include "http(s)://" when defining your redirect.
### Custom Bangs
You can create your own custom bangs. By default, bangs are stored in
`app/static/bangs`. See [`00-whoogle.json`](https://github.com/benbusby/whoogle-search/blob/main/app/static/bangs/00-whoogle.json)
for an example. These are parsed in alphabetical order with later files
overriding bangs set in earlier files, with the exception that DDG bangs
(downloaded to `app/static/bangs/bangs.json`) are always parsed first. Thus,
any custom bangs will always override the DDG ones.
### Prevent Downtime (Heroku only)
Part of the deal with Heroku's free tier is that you're allocated 550 hours/month (meaning it can't stay active 24/7), and the app is temporarily shut down after 30 minutes of inactivity. Once it becomes inactive, any Whoogle searches will still work, but it'll take an extra 10-15 seconds for the app to come back online before displaying the result, which can be frustrating if you're in a hurry.
@ -625,7 +568,7 @@ Under the hood, Whoogle is a basic Flask app with the following structure:
- `opensearch.xml`: A template used for supporting [OpenSearch](https://developer.mozilla.org/en-US/docs/Web/OpenSearch).
- `imageresults.html`: An "experimental" template used for supporting the "Full Size" image feature on desktop.
- `static/<css|js>`
- CSS/JavaScript files, should be self-explanatory
- CSS/Javascript files, should be self-explanatory
- `static/settings`
- Key-value JSON files for establishing valid configuration values
@ -664,7 +607,7 @@ I'm a huge fan of Searx though and encourage anyone to use that instead if they
**Why does the image results page look different?**
A lot of the app currently piggybacks on Google's existing support for fetching results pages with JavaScript disabled. To their credit, they've done an excellent job with styling pages, but it seems that the image results page - particularly on mobile - is a little rough. Moving forward, with enough interest, I'd like to transition to fetching the results and parsing them into a unique Whoogle-fied interface that I can style myself.
A lot of the app currently piggybacks on Google's existing support for fetching results pages with Javascript disabled. To their credit, they've done an excellent job with styling pages, but it seems that the image results page - particularly on mobile - is a little rough. Moving forward, with enough interest, I'd like to transition to fetching the results and parsing them into a unique Whoogle-fied interface that I can style myself.
## Public Instances
@ -678,22 +621,15 @@ A lot of the app currently piggybacks on Google's existing support for fetching
| [https://s.tokhmi.xyz](https://s.tokhmi.xyz) | 🇺🇸 US | Multi-choice | ✅ |
| [https://search.sethforprivacy.com](https://search.sethforprivacy.com) | 🇩🇪 DE | English | |
| [https://whoogle.dcs0.hu](https://whoogle.dcs0.hu) | 🇭🇺 HU | Multi-choice | |
| [https://whoogle.esmailelbob.xyz](https://whoogle.esmailelbob.xyz) | 🇨🇦 CA | Multi-choice | |
| [https://gowogle.voring.me](https://gowogle.voring.me) | 🇺🇸 US | Multi-choice | |
| [https://whoogle.privacydev.net](https://whoogle.privacydev.net) | 🇫🇷 FR | English | |
| [https://whoogle.privacydev.net](https://whoogle.privacydev.net) | 🇳🇱 NL | English | |
| [https://wg.vern.cc](https://wg.vern.cc) | 🇺🇸 US | English | |
| [https://whoogle.hxvy0.gq](https://whoogle.hxvy0.gq) | 🇨🇦 CA | Turkish Only | ✅ |
| [https://whoogle.hostux.net](https://whoogle.hostux.net) | 🇫🇷 FR | Multi-choice | |
| [https://whoogle.lunar.icu](https://whoogle.lunar.icu) | 🇩🇪 DE | Multi-choice | ✅ |
| [https://whoogle.rhyshl.live](https://whoogle.rhyshl.live) | 🇬🇧 GB | Multi-choice | ✅ |
| [https://wgl.frail.duckdns.org](https://wgl.frail.duckdns.org) | 🇧🇷 BR | Multi-choice | |
| [https://whoogle.no-logs.com](https://whoogle.no-logs.com/) | 🇸🇪 SE | Multi-choice | |
| [https://whoogle.ftw.lol](https://whoogle.ftw.lol) | 🇩🇪 DE | Multi-choice | |
| [https://whoogle-search--replitcomreside.repl.co](https://whoogle-search--replitcomreside.repl.co) | 🇺🇸 US | English | |
| [https://search.notrustverify.ch](https://search.notrustverify.ch) | 🇨🇭 CH | Multi-choice | |
| [https://whoogle.datura.network](https://whoogle.datura.network) | 🇩🇪 DE | Multi-choice | |
| [https://whoogle.yepserver.xyz](https://whoogle.yepserver.xyz) | 🇺🇦 UA | Multi-choice | |
| [https://search.nezumi.party](https://search.nezumi.party) | 🇮🇹 IT | Multi-choice | |
| [https://search.snine.nl](https://search.snine.nl) | 🇳🇱 NL | Mult-choice | ✅ |
* A checkmark in the "Cloudflare" category here refers to the use of the reverse proxy, [Cloudflare](https://cloudflare.com). The checkmark will not be listed for a site which uses Cloudflare DNS but rather the proxying service which grants Cloudflare the ability to monitor traffic to the website.
@ -704,8 +640,7 @@ A lot of the app currently piggybacks on Google's existing support for fetching
| [http://whoglqjdkgt2an4tdepberwqz3hk7tjo4kqgdnuj77rt7nshw2xqhqad.onion](http://whoglqjdkgt2an4tdepberwqz3hk7tjo4kqgdnuj77rt7nshw2xqhqad.onion) | 🇺🇸 US | Multi-choice
| [http://nuifgsnbb2mcyza74o7illtqmuaqbwu4flam3cdmsrnudwcmkqur37qd.onion](http://nuifgsnbb2mcyza74o7illtqmuaqbwu4flam3cdmsrnudwcmkqur37qd.onion) | 🇩🇪 DE | English
| [http://whoogle.vernccvbvyi5qhfzyqengccj7lkove6bjot2xhh5kajhwvidqafczrad.onion](http://whoogle.vernccvbvyi5qhfzyqengccj7lkove6bjot2xhh5kajhwvidqafczrad.onion/) | 🇺🇸 US | English |
| [http://whoogle.g4c3eya4clenolymqbpgwz3q3tawoxw56yhzk4vugqrl6dtu3ejvhjid.onion](http://whoogle.g4c3eya4clenolymqbpgwz3q3tawoxw56yhzk4vugqrl6dtu3ejvhjid.onion/) | 🇫🇷 FR | English |
| [http://whoogle.daturab6drmkhyeia4ch5gvfc2f3wgo6bhjrv3pz6n7kxmvoznlkq4yd.onion](http://whoogle.daturab6drmkhyeia4ch5gvfc2f3wgo6bhjrv3pz6n7kxmvoznlkq4yd.onion/) | 🇩🇪 DE | Multi-choice | |
| [http://whoogle.g4c3eya4clenolymqbpgwz3q3tawoxw56yhzk4vugqrl6dtu3ejvhjid.onion](http://whoogle.g4c3eya4clenolymqbpgwz3q3tawoxw56yhzk4vugqrl6dtu3ejvhjid.onion/) | 🇳🇱 NL | English |
#### I2P Instances

@ -1,7 +1,7 @@
from app.filter import clean_query
from app.request import send_tor_signal
from app.utils.session import generate_key
from app.utils.bangs import gen_bangs_json, load_all_bangs
from app.utils.bangs import gen_bangs_json
from app.utils.misc import gen_file_hash, read_config_bool
from base64 import b64encode
from bs4 import MarkupResemblesLocatorWarning
@ -101,10 +101,7 @@ if not os.path.exists(app.config['BUILD_FOLDER']):
# Session values
app_key_path = os.path.join(app.config['CONFIG_PATH'], 'whoogle.key')
if os.path.exists(app_key_path):
try:
app.config['SECRET_KEY'] = open(app_key_path, 'r').read()
except PermissionError:
app.config['SECRET_KEY'] = str(b64encode(os.urandom(32)))
app.config['SECRET_KEY'] = open(app_key_path, 'r').read()
else:
app.config['SECRET_KEY'] = str(b64encode(os.urandom(32)))
with open(app_key_path, 'w') as key_file:
@ -142,9 +139,7 @@ app.config['CSP'] = 'default-src \'none\';' \
'connect-src \'self\';'
# Generate DDG bang filter
generating_bangs = False
if not os.path.exists(app.config['BANG_FILE']):
generating_bangs = True
json.dump({}, open(app.config['BANG_FILE'], 'w'))
bangs_thread = threading.Thread(
target=gen_bangs_json,
@ -186,11 +181,6 @@ warnings.simplefilter('ignore', MarkupResemblesLocatorWarning)
from app import routes # noqa
# The gen_bangs_json function takes care of loading bangs, so skip it here if
# it's already being loaded
if not generating_bangs:
load_all_bangs(app.config['BANG_FILE'])
# Disable logging from imported modules
logging.config.dictConfig({
'version': 1,

@ -3,7 +3,6 @@ from bs4 import BeautifulSoup
from bs4.element import ResultSet, Tag
from cryptography.fernet import Fernet
from flask import render_template
import html
import urllib.parse as urlparse
from urllib.parse import parse_qs
import re
@ -29,12 +28,9 @@ unsupported_g_pages = [
'google.com/preferences',
'google.com/intl',
'advanced_search',
'tbm=shop',
'ageverification.google.co.kr'
'tbm=shop'
]
unsupported_g_divs = ['google.com/preferences?hl=', 'ageverification.google.co.kr']
def extract_q(q_str: str, href: str) -> str:
"""Extracts the 'q' element from a result link. This is typically
@ -48,7 +44,7 @@ def extract_q(q_str: str, href: str) -> str:
Returns:
str: The 'q' element of the link, or an empty string
"""
return parse_qs(q_str, keep_blank_values=True)['q'][0] if ('&q=' in href or '?q=' in href) else ''
return parse_qs(q_str)['q'][0] if ('&q=' in href or '?q=' in href) else ''
def build_map_url(href: str) -> str:
@ -164,22 +160,14 @@ class Filter:
self.update_styling()
self.remove_block_tabs()
# self.main_divs is only populated for the main page of search results
# (i.e. not images/news/etc).
if self.main_divs:
for div in self.main_divs:
self.sanitize_div(div)
for img in [_ for _ in self.soup.find_all('img') if 'src' in _.attrs]:
self.update_element_src(img, 'image/png')
for audio in [_ for _ in self.soup.find_all('audio') if 'src' in _.attrs]:
self.update_element_src(audio, 'audio/mpeg')
audio['controls'] = ''
for link in self.soup.find_all('a', href=True):
self.update_link(link)
self.add_favicon(link)
if self.config.alts:
self.site_alt_swap()
@ -209,87 +197,6 @@ class Filter:
self.remove_site_blocks(self.soup)
return self.soup
def sanitize_div(self, div) -> None:
"""Removes escaped script and iframe tags from results
Returns:
None (The soup object is modified directly)
"""
if not div:
return
for d in div.find_all('div', recursive=True):
d_text = d.find(text=True, recursive=False)
# Ensure we're working with tags that contain text content
if not d_text or not d.string:
continue
d.string = html.unescape(d_text)
div_soup = BeautifulSoup(d.string, 'html.parser')
# Remove all valid script or iframe tags in the div
for script in div_soup.find_all('script'):
script.decompose()
for iframe in div_soup.find_all('iframe'):
iframe.decompose()
d.string = str(div_soup)
def add_favicon(self, link) -> None:
"""Adds icons for each returned result, using the result site's favicon
Returns:
None (The soup object is modified directly)
"""
# Skip empty, parentless, or internal links
show_favicons = read_config_bool('WHOOGLE_SHOW_FAVICONS', True)
is_valid_link = link and link.parent and link['href'].startswith('http')
if not show_favicons or not is_valid_link:
return
parent = link.parent
is_result_div = False
# Check each parent to make sure that the div doesn't already have a
# favicon attached, and that the div is a result div
while parent:
p_cls = parent.attrs.get('class') or []
if 'has-favicon' in p_cls or GClasses.scroller_class in p_cls:
return
elif GClasses.result_class_a not in p_cls:
parent = parent.parent
else:
is_result_div = True
break
if not is_result_div:
return
# Construct the html for inserting the icon into the parent div
parsed = urlparse.urlparse(link['href'])
favicon = self.encrypt_path(
f'{parsed.scheme}://{parsed.netloc}/favicon.ico',
is_element=True)
src = f'{self.root_url}/{Endpoint.element}?url={favicon}' + \
'&type=image/x-icon'
html = f'<img class="site-favicon" src="{src}">'
favicon = BeautifulSoup(html, 'html.parser')
link.parent.insert(0, favicon)
# Update all parents to indicate that a favicon has been attached
parent = link.parent
while parent:
p_cls = parent.get('class') or []
p_cls.append('has-favicon')
parent['class'] = p_cls
parent = parent.parent
if GClasses.result_class_a in p_cls:
break
def remove_site_blocks(self, soup) -> None:
if not self.config.block or not soup.body:
return
@ -559,7 +466,7 @@ class Filter:
link['href'] = link_netloc
parent = link.parent
if any(divlink in link_netloc for divlink in unsupported_g_divs):
if 'google.com/preferences?hl=' in link_netloc:
# Handle case where a search is performed in a different
# language than what is configured. This usually returns a
# div with the same classes as normal search results, but with
@ -579,9 +486,7 @@ class Filter:
if parent.name == 'footer' or f'{GClasses.footer}' in p_cls:
link.decompose()
parent = parent.parent
if link.decomposed:
return
return
# Replace href with only the intended destination (no "utm" type tags)
href = link['href'].replace('https://www.google.com', '')
@ -650,19 +555,18 @@ class Filter:
is enabled
"""
for site, alt in SITE_ALTS.items():
if site != "medium.com" and alt != "":
# Ignore medium.com replacements since these are handled
for div in self.soup.find_all('div', text=re.compile(site)):
# Use the number of words in the div string to determine if the
# string is a result description (shouldn't replace domains used
# in desc text).
# Also ignore medium.com replacements since these are handled
# specifically in the link description replacement, and medium
# results are never given their own "card" result where this
# replacement would make sense.
# Also ignore if the alt is empty, since this is used to indicate
# that the alt is not enabled.
for div in self.soup.find_all('div', text=re.compile(site)):
# Use the number of words in the div string to determine if the
# string is a result description (shouldn't replace domains used
# in desc text).
if len(div.string.split(' ')) == 1:
div.string = div.string.replace(site, alt)
if site == 'medium.com' or len(div.string.split(' ')) > 1:
continue
div.string = div.string.replace(site, alt)
for link in self.soup.find_all('a', href=True):
# Search and replace all link descriptions
@ -686,7 +590,7 @@ class Filter:
# replaced (i.e. 'philomedium.com' should stay as it is).
if 'medium.com' in link_str:
if link_str.startswith('medium.com') or '.medium.com' in link_str:
link_str = SITE_ALTS['medium.com'] + link_str[
link_str = 'farside.link/scribe' + link_str[
link_str.find('medium.com') + len('medium.com'):]
new_desc.string = link_str
else:

@ -1,5 +1,4 @@
from inspect import Attribute
from typing import Optional
from app.utils.misc import read_config_bool
from flask import current_app
import os
@ -9,31 +8,6 @@ import pickle
from cryptography.fernet import Fernet
import hashlib
import brotli
import logging
import cssutils
from cssutils.css.cssstylesheet import CSSStyleSheet
from cssutils.css.cssstylerule import CSSStyleRule
# removes warnings from cssutils
cssutils.log.setLevel(logging.CRITICAL)
def get_rule_for_selector(stylesheet: CSSStyleSheet,
selector: str) -> Optional[CSSStyleRule]:
"""Search for a rule that matches a given selector in a stylesheet.
Args:
stylesheet (CSSStyleSheet) -- the stylesheet to search
selector (str) -- the selector to search for
Returns:
Optional[CSSStyleRule] -- the rule that matches the selector or None
"""
for rule in stylesheet.cssRules:
if hasattr(rule, "selectorText") and selector == rule.selectorText:
return rule
return None
class Config:
@ -42,8 +16,10 @@ class Config:
self.url = os.getenv('WHOOGLE_CONFIG_URL', '')
self.lang_search = os.getenv('WHOOGLE_CONFIG_SEARCH_LANGUAGE', '')
self.lang_interface = os.getenv('WHOOGLE_CONFIG_LANGUAGE', '')
self.style_modified = os.getenv(
'WHOOGLE_CONFIG_STYLE', '')
self.style = os.getenv(
'WHOOGLE_CONFIG_STYLE',
open(os.path.join(app_config['STATIC_FOLDER'],
'css/variables.css')).read())
self.block = os.getenv('WHOOGLE_CONFIG_BLOCK', '')
self.block_title = os.getenv('WHOOGLE_CONFIG_BLOCK_TITLE', '')
self.block_url = os.getenv('WHOOGLE_CONFIG_BLOCK_URL', '')
@ -112,33 +88,6 @@ class Config:
if not name.startswith("__")
and (type(attr) is bool or type(attr) is str)}
@property
def style(self) -> str:
"""Returns the default style updated with specified modifications.
Returns:
str -- the new style
"""
style_sheet = cssutils.parseString(
open(os.path.join(current_app.config['STATIC_FOLDER'],
'css/variables.css')).read()
)
modified_sheet = cssutils.parseString(self.style_modified)
for rule in modified_sheet:
rule_default = get_rule_for_selector(style_sheet,
rule.selectorText)
# if modified rule is in default stylesheet, update it
if rule_default is not None:
# TODO: update this in a smarter way to handle :root better
# for now if we change a varialbe in :root all other default
# variables need to be also present
rule_default.style = rule.style
# else add the new rule to the default stylesheet
else:
style_sheet.add(rule)
return str(style_sheet.cssText, 'utf-8')
@property
def preferences(self) -> str:
# if encryption key is not set will uncheck preferences encryption
@ -254,8 +203,7 @@ class Config:
key = self._get_fernet_key(self.preferences_key)
config = Fernet(key).decrypt(
brotli.decompress(urlsafe_b64decode(
preferences.encode() + b'=='))
brotli.decompress(urlsafe_b64decode(preferences.encode()))
)
config = pickle.loads(brotli.decompress(config))
@ -263,8 +211,7 @@ class Config:
config = {}
elif mode == 'u': # preferences are not encrypted
config = pickle.loads(
brotli.decompress(urlsafe_b64decode(
preferences.encode() + b'=='))
brotli.decompress(urlsafe_b64decode(preferences.encode()))
)
else: # preferences are incorrectly formatted
config = {}

@ -14,7 +14,6 @@ class GClasses:
footer = 'TuS8Ad'
result_class_a = 'ZINbbc'
result_class_b = 'luh4td'
scroller_class = 'idg8be'
result_classes = {
result_class_a: ['Gx5Zad'],

@ -73,14 +73,6 @@ def send_tor_signal(signal: Signal) -> bool:
def gen_user_agent(is_mobile) -> str:
user_agent = os.environ.get('WHOOGLE_USER_AGENT', '')
user_agent_mobile = os.environ.get('WHOOGLE_USER_AGENT_MOBILE', '')
if user_agent and not is_mobile:
return user_agent
if user_agent_mobile and is_mobile:
return user_agent_mobile
firefox = random.choice(['Choir', 'Squier', 'Higher', 'Wire']) + 'fox'
linux = random.choice(['Win', 'Sin', 'Gin', 'Fin', 'Kin']) + 'ux'
@ -269,7 +261,7 @@ class Request:
return []
def send(self, base_url='', query='', attempt=0,
force_mobile=False, user_agent='') -> Response:
force_mobile=False) -> Response:
"""Sends an outbound request to a URL. Optionally sends the request
using Tor, if enabled by the user.
@ -285,14 +277,10 @@ class Request:
Response: The Response object returned by the requests call
"""
use_client_user_agent = int(os.environ.get('WHOOGLE_USE_CLIENT_USER_AGENT', '0'))
if user_agent and use_client_user_agent == 1:
modified_user_agent = user_agent
if force_mobile and not self.mobile:
modified_user_agent = self.modified_user_agent_mobile
else:
if force_mobile and not self.mobile:
modified_user_agent = self.modified_user_agent_mobile
else:
modified_user_agent = self.modified_user_agent
modified_user_agent = self.modified_user_agent
headers = {
'User-Agent': modified_user_agent
@ -307,8 +295,9 @@ class Request:
# view is suppressed correctly
now = datetime.now()
cookies = {
'CONSENT': 'PENDING+987',
'SOCS': 'CAESHAgBEhIaAB',
'CONSENT': 'YES+cb.{:d}{:02d}{:02d}-17-p0.de+F+678'.format(
now.year, now.month, now.day
)
}
# Validate Tor conn and request new identity if the last one failed

@ -4,12 +4,8 @@ import io
import json
import os
import pickle
import re
import urllib.parse as urlparse
import uuid
import validators
import sys
import traceback
from datetime import datetime, timedelta
from functools import wraps
@ -18,12 +14,11 @@ from app import app
from app.models.config import Config
from app.models.endpoint import Endpoint
from app.request import Request, TorError
from app.utils.bangs import suggest_bang, resolve_bang
from app.utils.misc import empty_gif, placeholder_img, get_proxy_host_url, \
fetch_favicon
from app.utils.bangs import resolve_bang
from app.utils.misc import get_proxy_host_url
from app.filter import Filter
from app.utils.misc import read_config_bool, get_client_ip, get_request_url, \
check_for_update, encrypt_string
check_for_update
from app.utils.widgets import *
from app.utils.results import bold_search_terms,\
add_currency_card, check_currency, get_tabs_content
@ -36,7 +31,9 @@ from requests import exceptions
from requests.models import PreparedRequest
from cryptography.fernet import Fernet, InvalidToken
from cryptography.exceptions import InvalidSignature
from werkzeug.datastructures import MultiDict
# Load DDG bang json files only on init
bang_json = json.load(open(app.config['BANG_FILE'])) or {}
ac_var = 'WHOOGLE_AUTOCOMPLETE'
autocomplete_enabled = os.getenv(ac_var, '1')
@ -129,12 +126,12 @@ def session_required(f):
@app.before_request
def before_request_func():
global bang_json
session.permanent = True
# Check for latest version if needed
now = datetime.now()
needs_update_check = now - timedelta(hours=24) > app.config['LAST_UPDATE_CHECK']
if read_config_bool('WHOOGLE_UPDATE_CHECK', True) and needs_update_check:
if now - timedelta(hours=24) > app.config['LAST_UPDATE_CHECK']:
app.config['LAST_UPDATE_CHECK'] = now
app.config['HAS_UPDATE'] = check_for_update(
app.config['RELEASES_URL'],
@ -170,12 +167,20 @@ def before_request_func():
g.app_location = g.user_config.url
# Attempt to reload bangs json if not generated yet
if not bang_json and os.path.getsize(app.config['BANG_FILE']) > 4:
try:
bang_json = json.load(open(app.config['BANG_FILE']))
except json.decoder.JSONDecodeError:
# Ignore decoding error, can occur if file is still
# being written
pass
@app.after_request
def after_request_func(resp):
resp.headers['X-Content-Type-Options'] = 'nosniff'
resp.headers['X-Frame-Options'] = 'DENY'
resp.headers['Cache-Control'] = 'max-age=86400'
if os.getenv('WHOOGLE_CSP', False):
resp.headers['Content-Security-Policy'] = app.config['CSP']
@ -271,7 +276,8 @@ def autocomplete():
# Search bangs if the query begins with "!", but not "! " (feeling lucky)
if q.startswith('!') and len(q) > 1 and not q.startswith('! '):
return jsonify([q, suggest_bang(q)])
return jsonify([q, [bang_json[_]['suggestion'] for _ in bang_json if
_.startswith(q)]])
if not q and not request.data:
return jsonify({'?': []})
@ -292,17 +298,10 @@ def autocomplete():
@session_required
@auth_required
def search():
if request.method == 'POST':
# Redirect as a GET request with an encrypted query
post_data = MultiDict(request.form)
post_data['q'] = encrypt_string(g.session_key, post_data['q'])
get_req_str = urlparse.urlencode(post_data)
return redirect(url_for('.search') + '?' + get_req_str)
search_util = Search(request, g.user_config, g.session_key)
query = search_util.new_search_query()
bang = resolve_bang(query)
bang = resolve_bang(query, bang_json)
if bang:
return redirect(bang)
@ -421,18 +420,13 @@ def config():
config_disabled = (
app.config['CONFIG_DISABLE'] or
not valid_user_session(session))
name = ''
if 'name' in request.args:
name = os.path.normpath(request.args.get('name'))
if not re.match(r'^[A-Za-z0-9_.+-]+$', name):
return make_response('Invalid config name', 400)
if request.method == 'GET':
return json.dumps(g.user_config.__dict__)
elif request.method == 'PUT' and not config_disabled:
if name:
config_pkl = os.path.join(app.config['CONFIG_PATH'], name)
if 'name' in request.args:
config_pkl = os.path.join(
app.config['CONFIG_PATH'],
request.args.get('name'))
session['config'] = (pickle.load(open(config_pkl, 'rb'))
if os.path.exists(config_pkl)
else session['config'])
@ -450,7 +444,7 @@ def config():
config_data,
open(os.path.join(
app.config['CONFIG_PATH'],
name), 'wb'))
request.args.get('name')), 'wb'))
session['config'] = config_data
return redirect(config_data['url'])
@ -481,23 +475,8 @@ def element():
src_type = request.args.get('type')
# Ensure requested element is from a valid domain
domain = urlparse.urlparse(src_url).netloc
if not validators.domain(domain):
return send_file(io.BytesIO(empty_gif), mimetype='image/gif')
try:
response = g.user_request.send(base_url=src_url)
# Display an empty gif if the requested element couldn't be retrieved
if response.status_code != 200 or len(response.content) == 0:
if 'favicon' in src_url:
favicon = fetch_favicon(src_url)
return send_file(io.BytesIO(favicon), mimetype='image/png')
else:
return send_file(io.BytesIO(empty_gif), mimetype='image/gif')
file_data = response.content
file_data = g.user_request.send(base_url=src_url).content
tmp_mem = io.BytesIO()
tmp_mem.write(file_data)
tmp_mem.seek(0)
@ -506,6 +485,8 @@ def element():
except exceptions.RequestException:
pass
empty_gif = base64.b64decode(
'R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==')
return send_file(io.BytesIO(empty_gif), mimetype='image/gif')
@ -523,13 +504,6 @@ def window():
root_url=request.url_root,
config=g.user_config)
target = urlparse.urlparse(target_url)
# Ensure requested URL has a valid domain
if not validators.domain(target.netloc):
return render_template(
'error.html',
error_message='Invalid location'), 400
host_url = f'{target.scheme}://{target.netloc}'
get_body = g.user_request.send(base_url=target_url).text
@ -583,54 +557,11 @@ def window():
)
@app.route('/robots.txt')
def robots():
response = make_response(
'''User-Agent: *
Disallow: /''', 200)
response.mimetype = 'text/plain'
return response
@app.route('/favicon.ico')
def favicon():
return app.send_static_file('img/favicon.ico')
@app.errorhandler(404)
def page_not_found(e):
return render_template('error.html', error_message=str(e)), 404
@app.errorhandler(Exception)
def internal_error(e):
query = ''
if request.method == 'POST':
query = request.form.get('q')
else:
query = request.args.get('q')
# Attempt to parse the query
try:
search_util = Search(request, g.user_config, g.session_key)
query = search_util.new_search_query()
except Exception:
pass
print(traceback.format_exc(), file=sys.stderr)
localization_lang = g.user_config.get_localization_lang()
translation = app.config['TRANSLATIONS'][localization_lang]
return render_template(
'error.html',
error_message='Internal server error (500)',
translation=translation,
farside='https://farside.link',
config=g.user_config,
query=urlparse.unquote(query),
params=g.user_config.to_params(keys=['preferences'])), 500
def run_app() -> None:
parser = argparse.ArgumentParser(
description='Whoogle Search console runner')
@ -649,11 +580,6 @@ def run_app() -> None:
default='',
metavar='</path/to/unix.sock>',
help='Listen for app on unix socket instead of host:port')
parser.add_argument(
'--unix-socket-perms',
default='600',
metavar='<octal permissions>',
help='Octal permissions to use for the Unix domain socket (default 600)')
parser.add_argument(
'--debug',
default=False,
@ -705,7 +631,7 @@ def run_app() -> None:
if args.debug:
app.run(host=args.host, port=args.port, debug=args.debug)
elif args.unix_socket:
waitress.serve(app, unix_socket=args.unix_socket, unix_socket_perms=args.unix_socket_perms)
waitress.serve(app, unix_socket=args.unix_socket)
else:
waitress.serve(
app,

@ -1,14 +0,0 @@
{
"!i": {
"url": "search?q={}&tbm=isch",
"suggestion": "!i (Whoogle Images)"
},
"!v": {
"url": "search?q={}&tbm=vid",
"suggestion": "!v (Whoogle Videos)"
},
"!n": {
"url": "search?q={}&tbm=nws",
"suggestion": "!n (Whoogle News)"
}
}

@ -58,26 +58,6 @@ details summary span {
text-align: center;
}
.site-favicon {
float: left;
width: 25px;
padding-right: 5px;
}
.has-favicon .sCuL3 {
padding-left: 30px;
}
#flex_text_audio_icon_chunk {
display: none;
}
audio {
display: block;
margin-right: auto;
padding-bottom: 5px;
}
@media (min-width: 801px) {
body {
min-width: 736px !important;

@ -21,6 +21,16 @@ const handleUserInput = () => {
xhrRequest.send('q=' + searchInput.value);
};
const closeAllLists = el => {
// Close all autocomplete suggestions
let suggestions = document.getElementsByClassName("autocomplete-items");
for (let i = 0; i < suggestions.length; i++) {
if (el !== suggestions[i] && el !== searchInput) {
suggestions[i].parentNode.removeChild(suggestions[i]);
}
}
};
const removeActive = suggestion => {
// Remove "autocomplete-active" class from previously active suggestion
for (let i = 0; i < suggestion.length; i++) {
@ -61,7 +71,7 @@ const addActive = (suggestion) => {
const autocompleteInput = (e) => {
// Handle navigation between autocomplete suggestions
let suggestion = document.getElementById("autocomplete-list");
let suggestion = document.getElementById(this.id + "-autocomplete-list");
if (suggestion) suggestion = suggestion.getElementsByTagName("div");
if (e.keyCode === 40) { // down
e.preventDefault();
@ -82,28 +92,29 @@ const autocompleteInput = (e) => {
};
const updateAutocompleteList = () => {
let autocompleteItem, i;
let autocompleteList, autocompleteItem, i;
let val = originalSearch;
let autocompleteList = document.getElementById("autocomplete-list");
autocompleteList.innerHTML = "";
closeAllLists();
if (!val || !autocompleteResults) {
return false;
}
currentFocus = -1;
autocompleteList = document.createElement("div");
autocompleteList.setAttribute("id", this.id + "-autocomplete-list");
autocompleteList.setAttribute("class", "autocomplete-items");
searchInput.parentNode.appendChild(autocompleteList);
for (i = 0; i < autocompleteResults.length; i++) {
if (autocompleteResults[i].substr(0, val.length).toUpperCase() === val.toUpperCase()) {
autocompleteItem = document.createElement("div");
autocompleteItem.setAttribute("class", "autocomplete-item");
autocompleteItem.innerHTML = "<strong>" + autocompleteResults[i].substr(0, val.length) + "</strong>";
autocompleteItem.innerHTML += autocompleteResults[i].substr(val.length);
autocompleteItem.innerHTML += "<input type=\"hidden\" value=\"" + autocompleteResults[i] + "\">";
autocompleteItem.addEventListener("click", function () {
searchInput.value = this.getElementsByTagName("input")[0].value;
autocompleteList.innerHTML = "";
closeAllLists();
document.getElementById("search-form").submit();
});
autocompleteList.appendChild(autocompleteItem);
@ -112,16 +123,10 @@ const updateAutocompleteList = () => {
};
document.addEventListener("DOMContentLoaded", function() {
let autocompleteList = document.createElement("div");
autocompleteList.setAttribute("id", "autocomplete-list");
autocompleteList.setAttribute("class", "autocomplete-items");
searchInput = document.getElementById("search-bar");
searchInput.parentNode.appendChild(autocompleteList);
searchInput.addEventListener("keydown", (event) => autocompleteInput(event));
document.addEventListener("click", function (e) {
autocompleteList.innerHTML = "";
closeAllLists(e.target);
});
});
});

@ -13,7 +13,7 @@
},
"maps": {
"tbm": null,
"href": "https://maps.google.com/maps?q={map_query}",
"href": "https://maps.google.com/maps?q={query}",
"name": "Maps",
"selected": false
},

@ -1056,12 +1056,12 @@
"books": "Pirtûk",
"anon-view": "Dîtina Nenas",
"": "--",
"qdr:h": "Demjimêra borî",
"qdr:d": "24 Demjimêrên borî",
"qdr:h": "Saet berê",
"qdr:d": "24 saetên borî",
"qdr:w": "Hefteya borî",
"qdr:m": "Meha borî",
"qdr:y": "Sala borî",
"config-time-period": "Pêşsazkariyên demê"
"config-time-period": "Dem Period"
},
"lang_th": {
"search": "ค้นหา",

@ -19,88 +19,22 @@
{{ error_message }}
</p>
<hr>
{% if query and translation %}
<p>
<h4><a class="link" href="https://farside.link">{{ translation['continue-search'] }}</a></h4>
<ul>
<li>
<a href="https://github.com/benbusby/whoogle-search">Whoogle</a>
<ul>
<li>
<a class="link-color" href="{{farside}}/whoogle/search?q={{query}}{{params}}">
{{farside}}/whoogle/search?q={{query}}
</a>
</li>
</ul>
</li>
<li>
<a href="https://github.com/searxng/searxng">SearXNG</a>
<ul>
<li>
<a class="link-color" href="{{farside}}/searxng/search?q={{query}}">
{{farside}}/searxng/search?q={{query}}
</a>
</li>
</ul>
</li>
</ul>
<hr>
<h4>Other options:</h4>
<ul>
<li>
<a href="https://kagi.com">Kagi</a>
<ul>
<li>Requires account</li>
<li>
<a class="link-color" href="https://kagi.com/search?q={{query}}">
kagi.com/search?q={{query}}
</a>
</li>
</ul>
</li>
<li>
<a href="https://duckduckgo.com">DuckDuckGo</a>
<ul>
<li>
<a class="link-color" href="https://duckduckgo.com/search?q={{query}}">
duckduckgo.com/search?q={{query}}
</a>
</li>
</ul>
</li>
<li>
<a href="https://search.brave.com">Brave Search</a>
<ul>
<li>
<a class="link-color" href="https://search.brave.com/search?q={{query}}">
search.brave.com/search?q={{query}}
</a>
</li>
</ul>
</li>
<li>
<a href="https://ecosia.com">Ecosia</a>
<ul>
<li>
<a class="link-color" href="https://ecosia.com/search?q={{query}}">
ecosia.com/search?q={{query}}
</a>
</li>
</ul>
</li>
<li>
<a href="https://google.com">Google</a>
<ul>
<li>
<a class="link-color" href="https://google.com/search?q={{query}}">
google.com/search?q={{query}}
</a>
</li>
</ul>
</li>
</ul>
<hr>
</p>
<p>
{% if blocked is defined %}
<h4><a class="link" href="https://farside.link">{{ translation['continue-search'] }}</a></h4>
Whoogle:
<br>
<a class="link-color" href="{{farside}}/whoogle/search?q={{query}}{{params}}">
{{farside}}/whoogle/search?q={{query}}
</a>
<br><br>
Searx:
<br>
<a class="link-color" href="{{farside}}/searx/search?q={{query}}">
{{farside}}/searx/search?q={{query}}
</a>
<hr>
{% endif %}
</p>
<a class="link" href="home">Return Home</a>
</div>

@ -243,13 +243,15 @@
{{ translation['config-css'] }}:
</a>
<textarea
name="style_modified"
name="style"
id="config-style"
autocapitalize="off"
autocomplete="off"
spellcheck="false"
autocorrect="off"
value="">{{ config.style_modified.replace('\t', '') }}</textarea>
value="">
{{ config.style.replace('\t', '') }}
</textarea>
</div>
<div class="config-div config-div-pref-url">
<label for="config-pref-encryption">{{ translation['config-pref-encryption'] }}: </label>

@ -1,56 +1,8 @@
import json
import requests
import urllib.parse as urlparse
import os
import glob
bangs_dict = {}
DDG_BANGS = 'https://duckduckgo.com/bang.js'
def load_all_bangs(ddg_bangs_file: str, ddg_bangs: dict = {}):
"""Loads all the bang files in alphabetical order
Args:
ddg_bangs_file: The str path to the new DDG bangs json file
ddg_bangs: The dict of ddg bangs. If this is empty, it will load the
bangs from the file
Returns:
None
"""
global bangs_dict
ddg_bangs_file = os.path.normpath(ddg_bangs_file)
if (bangs_dict and not ddg_bangs) or os.path.getsize(ddg_bangs_file) <= 4:
return
bangs = {}
bangs_dir = os.path.dirname(ddg_bangs_file)
bang_files = glob.glob(os.path.join(bangs_dir, '*.json'))
# Normalize the paths
bang_files = [os.path.normpath(f) for f in bang_files]
# Move the ddg bangs file to the beginning
bang_files = sorted([f for f in bang_files if f != ddg_bangs_file])
if ddg_bangs:
bangs |= ddg_bangs
else:
bang_files.insert(0, ddg_bangs_file)
for i, bang_file in enumerate(bang_files):
try:
bangs |= json.load(open(bang_file))
except json.decoder.JSONDecodeError:
# Ignore decoding error only for the ddg bangs file, since this can
# occur if file is still being written
if i != 0:
raise
bangs_dict = dict(sorted(bangs.items()))
DDG_BANGS = 'https://duckduckgo.com/bang.v255.js'
def gen_bangs_json(bangs_file: str) -> None:
@ -85,35 +37,22 @@ def gen_bangs_json(bangs_file: str) -> None:
json.dump(bangs_data, open(bangs_file, 'w'))
print('* Finished creating ddg bangs json')
load_all_bangs(bangs_file, bangs_data)
def suggest_bang(query: str) -> list[str]:
"""Suggests bangs for a user's query
Args:
query: The search query
Returns:
list[str]: A list of bang suggestions
"""
global bangs_dict
return [bangs_dict[_]['suggestion'] for _ in bangs_dict if _.startswith(query)]
def resolve_bang(query: str) -> str:
def resolve_bang(query: str, bangs_dict: dict) -> str:
"""Transform's a user's query to a bang search, if an operator is found
Args:
query: The search query
bangs_dict: The dict of available bang operators, with corresponding
format string search URLs
(i.e. "!w": "https://en.wikipedia.org...?search={}")
Returns:
str: A formatted redirect for a bang search, or an empty str if there
wasn't a match or didn't contain a bang operator
"""
global bangs_dict
#if ! not in query simply return (speed up processing)
if '!' not in query:

@ -1,52 +1,11 @@
import base64
from bs4 import BeautifulSoup as bsoup
from cryptography.fernet import Fernet
from flask import Request
import hashlib
import io
import os
import re
from requests import exceptions, get
from urllib.parse import urlparse
ddg_favicon_site = 'http://icons.duckduckgo.com/ip2'
empty_gif = base64.b64decode(
'R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==')
placeholder_img = base64.b64decode(
'iVBORw0KGgoAAAANSUhEUgAAABkAAAAZCAYAAADE6YVjAAABF0lEQVRIS8XWPw9EMBQA8Eok' \
'JBKrMFqMBt//GzAYLTZ/VomExPDu6uLiaPteqVynBn0/75W2Vp7nEIYhe6p1XcespmmAd7Is' \
'M+4URcGiKPogvMMvmIS2eN9MOMKbKWgf54SYgI4vKkTuQKJKSJErkKzUSkQHUs0lilAg7GMh' \
'ISoIA/hYMiKCKIA2soeowCWEMkfHtUmrXLcyGYYBfN9HF8djiaglWzNZlgVs21YisoAUaEXG' \
'cQTP86QIFgi7vyLzPIPjOEIEC7ANQv/4aZrAdd0TUtc1i+MYnSsMWjPp+x6CIPgJVlUVS5KE' \
'DKig/+wnVzM4pnzaGeHd+ENlWbI0TbVLJBtw2uMfP63wc9d2kDCWxi5Q27bsBerSJ9afJbeL' \
'AAAAAElFTkSuQmCC'
)
def fetch_favicon(url: str) -> bytes:
"""Fetches a favicon using DuckDuckGo's favicon retriever
Args:
url: The url to fetch the favicon from
Returns:
bytes - the favicon bytes, or a placeholder image if one
was not returned
"""
domain = urlparse(url).netloc
response = get(f'{ddg_favicon_site}/{domain}.ico')
if response.status_code == 200 and len(response.content) > 0:
tmp_mem = io.BytesIO()
tmp_mem.write(response.content)
tmp_mem.seek(0)
return tmp_mem.read()
else:
return placeholder_img
def gen_file_hash(path: str, static_file: str) -> str:
file_contents = open(os.path.join(path, static_file), 'rb').read()
@ -56,8 +15,8 @@ def gen_file_hash(path: str, static_file: str) -> str:
return filename_split[0] + '.' + file_hash + filename_split[-1]
def read_config_bool(var: str, default: bool=False) -> bool:
val = os.getenv(var, '1' if default else '0')
def read_config_bool(var: str) -> bool:
val = os.getenv(var, '0')
# user can specify one of the following values as 'true' inputs (all
# variants with upper case letters will also work):
# ('true', 't', '1', 'yes', 'y')
@ -120,20 +79,3 @@ def get_abs_url(url, page_url):
elif url.startswith('./'):
return f'{page_url}{url[2:]}'
return url
def list_to_dict(lst: list) -> dict:
if len(lst) < 2:
return {}
return {lst[i].replace(' ', ''): lst[i+1].replace(' ', '')
for i in range(0, len(lst), 2)}
def encrypt_string(key: bytes, string: str) -> str:
cipher_suite = Fernet(key)
return cipher_suite.encrypt(string.encode()).decode()
def decrypt_string(key: bytes, string: str) -> str:
cipher_suite = Fernet(g.session_key)
return cipher_suite.decrypt(string.encode()).decode()

@ -1,6 +1,5 @@
from app.models.config import Config
from app.models.endpoint import Endpoint
from app.utils.misc import list_to_dict
from bs4 import BeautifulSoup, NavigableString
import copy
from flask import current_app
@ -12,7 +11,7 @@ import re
import warnings
SKIP_ARGS = ['ref_src', 'utm']
SKIP_PREFIX = ['//www.', '//mobile.', '//m.']
SKIP_PREFIX = ['//www.', '//mobile.', '//m.', 'www.', 'mobile.', 'm.']
GOOG_STATIC = 'www.gstatic.com'
G_M_LOGO_URL = 'https://www.gstatic.com/m/images/icons/googleg.gif'
GOOG_IMG = '/images/branding/searchlogo/1x/googlelogo'
@ -44,9 +43,6 @@ SITE_ALTS = {
'quora.com': os.getenv('WHOOGLE_ALT_QUORA', 'farside.link/quetre')
}
# Include custom site redirects from WHOOGLE_REDIRECTS
SITE_ALTS.update(list_to_dict(re.split(',|:', os.getenv('WHOOGLE_REDIRECTS', ''))))
def contains_cjko(s: str) -> bool:
"""This function check whether or not a string contains Chinese, Japanese,
@ -144,34 +140,19 @@ def get_first_link(soup: BeautifulSoup) -> str:
str: A str link to the first result
"""
first_link = ''
orig_details = []
# Temporarily remove details so we don't grab those links
for details in soup.find_all('details'):
temp_details = soup.new_tag('removed_details')
orig_details.append(details.replace_with(temp_details))
# Replace hrefs with only the intended destination (no "utm" type tags)
for a in soup.find_all('a', href=True):
# Return the first search result URL
if a['href'].startswith('http://') or a['href'].startswith('https://'):
first_link = a['href']
break
if 'url?q=' in a['href']:
return filter_link_args(a['href'])
return ''
# Add the details back
for orig_detail, details in zip(orig_details, soup.find_all('removed_details')):
details.replace_with(orig_detail)
return first_link
def get_site_alt(link: str, site_alts: dict = SITE_ALTS) -> str:
def get_site_alt(link: str) -> str:
"""Returns an alternative to a particular site, if one is configured
Args:
link: A string result URL to check against the site_alts map
site_alts: A map of site alternatives to replace with. defaults to SITE_ALTS
link: A string result URL to check against the SITE_ALTS map
Returns:
str: An updated (or ignored) result link
@ -193,9 +174,9 @@ def get_site_alt(link: str, site_alts: dict = SITE_ALTS) -> str:
# "https://medium.com/..." should match, but "philomedium.com" should not)
hostcomp = f'{parsed_link.scheme}://{hostname}'
for site_key in site_alts.keys():
for site_key in SITE_ALTS.keys():
site_alt = f'{parsed_link.scheme}://{site_key}'
if not hostname or site_alt not in hostcomp or not site_alts[site_key]:
if not hostname or site_alt not in hostcomp or not SITE_ALTS[site_key]:
continue
# Wikipedia -> Wikiless replacements require the subdomain (if it's
@ -208,8 +189,9 @@ def get_site_alt(link: str, site_alts: dict = SITE_ALTS) -> str:
elif 'medium' in hostname and len(subdomain) > 0:
hostname = f'{subdomain}.{hostname}'
parsed_alt = urlparse.urlparse(site_alts[site_key])
link = link.replace(hostname, site_alts[site_key]) + params
parsed_alt = urlparse.urlparse(SITE_ALTS[site_key])
link = link.replace(hostname, SITE_ALTS[site_key]) + params
# If a scheme is specified in the alternative, this results in a
# replaced link that looks like "https://http://altservice.tld".
# In this case, we can remove the original scheme from the result
@ -219,12 +201,9 @@ def get_site_alt(link: str, site_alts: dict = SITE_ALTS) -> str:
for prefix in SKIP_PREFIX:
if parsed_alt.scheme:
# If a scheme is specified, remove everything before the
# first occurence of it
link = f'{parsed_alt.scheme}{link.split(parsed_alt.scheme, 1)[-1]}'
link = link.replace(prefix, '')
else:
# Otherwise, replace the first occurrence of the prefix
link = link.replace(prefix, '//', 1)
link = link.replace(prefix, '//')
break
return link
@ -435,10 +414,6 @@ def get_tabs_content(tabs: dict,
Returns:
dict: contains the name, the href and if the tab is selected or not
"""
map_query = full_query
if '-site:' in full_query:
block_idx = full_query.index('-site:')
map_query = map_query[:block_idx]
tabs = copy.deepcopy(tabs)
for tab_id, tab_content in tabs.items():
# update name to desired language
@ -454,9 +429,7 @@ def get_tabs_content(tabs: dict,
if preferences:
query = f"{query}&preferences={preferences}"
tab_content['href'] = tab_content['href'].format(
query=query,
map_query=map_query)
tab_content['href'] = tab_content['href'].format(query=query)
# update if selected tab (default all tab is selected)
if tab_content['tbm'] == search_type:

@ -102,22 +102,14 @@ class Search:
except InvalidToken:
pass
# Strip '!' for "feeling lucky" queries
if match := re.search("(^|\s)!($|\s)", q):
self.feeling_lucky = True
start, end = match.span()
self.query = " ".join([seg for seg in [q[:start], q[end:]] if seg])
else:
self.feeling_lucky = False
self.query = q
# Strip leading '! ' for "feeling lucky" queries
self.feeling_lucky = q.startswith('! ')
self.query = q[2:] if self.feeling_lucky else q
# Check for possible widgets
self.widget = "ip" if re.search("([^a-z0-9]|^)my *[^a-z0-9] *(ip|internet protocol)" +
"($|( *[^a-z0-9] *(((addres|address|adres|" +
"adress)|a)? *$)))", self.query.lower()) else self.widget
self.widget = 'calculator' if re.search(
r"\bcalculator\b|\bcalc\b|\bcalclator\b|\bmath\b",
self.query.lower()) else self.widget
self.widget = 'calculator' if re.search("calculator|calc|calclator|math", self.query.lower()) else self.widget
return self.query
def generate_response(self) -> str:
@ -152,8 +144,7 @@ class Search:
and not g.user_request.mobile)
get_body = g.user_request.send(query=full_query,
force_mobile=view_image,
user_agent=self.user_agent)
force_mobile=view_image)
# Produce cleanable html soup from response
get_body_safed = get_body.text.replace("&lt;","andlt;").replace("&gt;","andgt;")
@ -167,25 +158,22 @@ class Search:
if g.user_request.tor_valid:
html_soup.insert(0, bsoup(TOR_BANNER, 'html.parser'))
formatted_results = content_filter.clean(html_soup)
if self.feeling_lucky:
if lucky_link := get_first_link(formatted_results):
return lucky_link
# Fall through to regular search if unable to find link
self.feeling_lucky = False
# Append user config to all search links, if available
param_str = ''.join('&{}={}'.format(k, v)
for k, v in
self.request_params.to_dict(flat=True).items()
if self.config.is_safe_key(k))
for link in formatted_results.find_all('a', href=True):
link['rel'] = "nofollow noopener noreferrer"
if 'search?' not in link['href'] or link['href'].index(
'search?') > 1:
continue
link['href'] += param_str
return str(formatted_results)
return get_first_link(html_soup)
else:
formatted_results = content_filter.clean(html_soup)
# Append user config to all search links, if available
param_str = ''.join('&{}={}'.format(k, v)
for k, v in
self.request_params.to_dict(flat=True).items()
if self.config.is_safe_key(k))
for link in formatted_results.find_all('a', href=True):
link['rel'] = "nofollow noopener noreferrer"
if 'search?' not in link['href'] or link['href'].index(
'search?') > 1:
continue
link['href'] += param_str
return str(formatted_results)

@ -1,10 +1,5 @@
from pathlib import Path
from bs4 import BeautifulSoup
# root
BASE_DIR = Path(__file__).parent.parent.parent
def add_ip_card(html_soup: BeautifulSoup, ip: str) -> BeautifulSoup:
"""Adds the client's IP address to the search results
if query contains keywords
@ -53,8 +48,7 @@ def add_calculator_card(html_soup: BeautifulSoup) -> BeautifulSoup:
"""
main_div = html_soup.select_one('#main')
if main_div:
# absolute path
widget_file = open(BASE_DIR / 'app/static/widgets/calculator.html', encoding="utf8")
widget_file = open('app/static/widgets/calculator.html')
widget_tag = html_soup.new_tag('div')
widget_tag['class'] = 'ZINbbc xpd O9g5cc uUPGi'
widget_tag['id'] = 'calculator-wrapper'
@ -62,7 +56,7 @@ def add_calculator_card(html_soup: BeautifulSoup) -> BeautifulSoup:
calculator_text['class'] = 'kCrYT ip-address-div'
calculator_text.string = 'Calculator'
calculator_widget = html_soup.new_tag('div')
calculator_widget.append(BeautifulSoup(widget_file, 'html.parser'))
calculator_widget.append(BeautifulSoup(widget_file, 'html.parser'));
calculator_widget['class'] = 'kCrYT ip-text-div'
widget_tag.append(calculator_text)
widget_tag.append(calculator_widget)

@ -4,4 +4,4 @@ optional_dev_tag = ''
if os.getenv('DEV_BUILD'):
optional_dev_tag = '.dev' + os.getenv('DEV_BUILD')
__version__ = '0.8.4' + optional_dev_tag
__version__ = '0.8.1' + optional_dev_tag

@ -3,7 +3,7 @@ name: whoogle
description: A self hosted search engine on Kubernetes
type: application
version: 0.1.0
appVersion: 0.8.4
appVersion: 0.8.1
icon: https://github.com/benbusby/whoogle-search/raw/main/app/static/img/favicon/favicon-96x96.png

@ -52,20 +52,10 @@ spec:
httpGet:
path: /
port: http
{{- if and .Values.conf.WHOOGLE_USER .Values.conf.WHOOGLE_PASS }}
httpHeaders:
- name: Authorization
value: Basic {{ b64enc (printf "%s:%s" .Values.conf.WHOOGLE_USER .Values.conf.WHOOGLE_PASS) }}
{{- end }}
readinessProbe:
httpGet:
path: /
port: http
{{- if and .Values.conf.WHOOGLE_USER .Values.conf.WHOOGLE_PASS }}
httpHeaders:
- name: Authorization
value: Basic {{ b64enc (printf "%s:%s" .Values.conf.WHOOGLE_USER .Values.conf.WHOOGLE_PASS) }}
{{- end }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
{{- with .Values.nodeSelector }}

@ -1,11 +1,12 @@
https://search.albony.xyz
https://search.garudalinux.org
https://search.dr460nf1r3.org
https://search.nezumi.party
https://s.tokhmi.xyz
https://search.sethforprivacy.com
https://whoogle.dcs0.hu
https://whoogle.esmailelbob.xyz
https://whoogle.lunar.icu
https://whoogle.rhyshl.live
https://gowogle.voring.me
https://whoogle.privacydev.net
https://whoogle.hostux.net
@ -15,10 +16,3 @@ https://whoogle.ungovernable.men
https://whoogle2.ungovernable.men
https://whoogle3.ungovernable.men
https://wgl.frail.duckdns.org
https://whoogle.no-logs.com
https://whoogle.ftw.lol
https://whoogle-search--replitcomreside.repl.co
https://search.notrustverify.ch
https://whoogle.datura.network
https://whoogle.yepserver.xyz
https://search.snine.nl

@ -7,6 +7,3 @@ ExtORPortCookieAuthFileGroupReadable 1
CacheDirectoryGroupReadable 1
CookieAuthFile /var/lib/tor/control_auth_cookie
Log debug-notice file /dev/null
# UseBridges 1
# ClientTransportPlugin obfs4 exec /usr/bin/obfs4proxy
# Bridge obfs4 ip and so on

@ -2,18 +2,18 @@ attrs==22.2.0
beautifulsoup4==4.11.2
brotli==1.0.9
cachelib==0.10.2
certifi==2023.7.22
certifi==2022.12.7
cffi==1.15.1
chardet==5.1.0
click==8.1.3
cryptography==3.3.2; platform_machine == 'armv7l'
cryptography==42.0.4; platform_machine != 'armv7l'
cryptography==39.0.1; platform_machine != 'armv7l'
cssutils==2.6.0
defusedxml==0.7.1
Flask==2.3.2
Flask==2.2.3
idna==3.4
itsdangerous==2.1.2
Jinja2==3.1.3
Jinja2==3.1.2
MarkupSafe==2.1.2
more-itertools==9.0.0
packaging==23.0
@ -21,17 +21,16 @@ pluggy==1.0.0
pycodestyle==2.10.0
pycparser==2.21
pyOpenSSL==19.1.0; platform_machine == 'armv7l'
pyOpenSSL==24.0.0; platform_machine != 'armv7l'
pyOpenSSL==23.0.0; platform_machine != 'armv7l'
pyparsing==3.0.9
PySocks==1.7.1
pytest==7.2.1
python-dateutil==2.8.2
requests==2.31.0
requests==2.28.2
soupsieve==2.4
stem==1.8.1
urllib3==1.26.18
validators==0.22.0
urllib3==1.26.14
waitress==2.1.2
wcwidth==0.2.6
Werkzeug==3.0.1
Werkzeug==2.2.3
python-dotenv==0.21.1

1
run

@ -29,7 +29,6 @@ else
python3 -um app \
--unix-socket "$UNIX_SOCKET"
else
echo "Running on http://${ADDRESS:-0.0.0.0}:${PORT:-"${EXPOSE_PORT:-5000}"}"
python3 -um app \
--host "${ADDRESS:-0.0.0.0}" \
--port "${PORT:-"${EXPOSE_PORT:-5000}"}"

@ -27,7 +27,6 @@ install_requires=
python-dotenv
requests
stem
validators
waitress
[options.extras_require]

@ -2,7 +2,6 @@ from bs4 import BeautifulSoup
from app.filter import Filter
from app.models.config import Config
from app.models.endpoint import Endpoint
from app.utils import results
from app.utils.session import generate_key
from datetime import datetime
from dateutil.parser import ParserError, parse
@ -45,11 +44,17 @@ def test_get_results(client):
def test_post_results(client):
rv = client.post(f'/{Endpoint.search}', data=dict(q='test'))
assert rv._status_code == 302
assert rv._status_code == 200
# Depending on the search, there can be more
# than 10 result divs
results = get_search_results(rv.data)
assert len(results) >= 10
assert len(results) <= 15
def test_translate_search(client):
rv = client.get(f'/{Endpoint.search}?q=translate hola')
rv = client.post(f'/{Endpoint.search}', data=dict(q='translate hola'))
assert rv._status_code == 200
# Pretty weak test, but better than nothing
@ -59,7 +64,7 @@ def test_translate_search(client):
def test_block_results(client):
rv = client.get(f'/{Endpoint.search}?q=pinterest')
rv = client.post(f'/{Endpoint.search}', data=dict(q='pinterest'))
assert rv._status_code == 200
has_pinterest = False
@ -74,7 +79,7 @@ def test_block_results(client):
rv = client.post(f'/{Endpoint.config}', data=demo_config)
assert rv._status_code == 302
rv = client.get(f'/{Endpoint.search}?q=pinterest')
rv = client.post(f'/{Endpoint.search}', data=dict(q='pinterest'))
assert rv._status_code == 200
for link in BeautifulSoup(rv.data, 'html.parser').find_all('a', href=True):
@ -85,7 +90,7 @@ def test_block_results(client):
def test_view_my_ip(client):
rv = client.get(f'/{Endpoint.search}?q=my ip address')
rv = client.post(f'/{Endpoint.search}', data=dict(q='my ip address'))
assert rv._status_code == 200
# Pretty weak test, but better than nothing
@ -96,13 +101,13 @@ def test_view_my_ip(client):
def test_recent_results(client):
times = {
'tbs=qdr:y': 365,
'tbs=qdr:m': 31,
'tbs=qdr:w': 7
'past year': 365,
'past month': 31,
'past week': 7
}
for time, num_days in times.items():
rv = client.get(f'/{Endpoint.search}?q=test&' + time)
rv = client.post(f'/{Endpoint.search}', data=dict(q='test :' + time))
result_divs = get_search_results(rv.data)
current_date = datetime.now()
@ -137,22 +142,3 @@ def test_leading_slash_search(client):
continue
assert link['href'].startswith(f'{Endpoint.search}')
def test_site_alt_prefix_skip():
# Ensure prefixes are skipped correctly for site alts
# default silte_alts (farside.link)
assert results.get_site_alt(link = 'https://www.reddit.com') == 'https://farside.link/libreddit'
assert results.get_site_alt(link = 'https://www.twitter.com') == 'https://farside.link/nitter'
assert results.get_site_alt(link = 'https://www.youtube.com') == 'https://farside.link/invidious'
test_site_alts = {
'reddit.com': 'reddit.endswithmobile.domain',
'twitter.com': 'https://twitter.endswithm.domain',
'youtube.com': 'http://yt.endswithwww.domain',
}
# Domains with part of SKIP_PREFIX in them
assert results.get_site_alt(link = 'https://www.reddit.com', site_alts = test_site_alts) == 'https://reddit.endswithmobile.domain'
assert results.get_site_alt(link = 'https://www.twitter.com', site_alts = test_site_alts) == 'https://twitter.endswithm.domain'
assert results.get_site_alt(link = 'https://www.youtube.com', site_alts = test_site_alts) == 'http://yt.endswithwww.domain'

@ -17,15 +17,8 @@ def test_search(client):
def test_feeling_lucky(client):
# Bang at beginning of query
rv = client.get(f'/{Endpoint.search}?q=!%20wikipedia')
rv = client.get(f'/{Endpoint.search}?q=!%20test')
assert rv._status_code == 303
assert rv.headers.get('Location').startswith('https://www.wikipedia.org')
# Move bang to end of query
rv = client.get(f'/{Endpoint.search}?q=github%20!')
assert rv._status_code == 303
assert rv.headers.get('Location').startswith('https://github.com')
def test_ddg_bang(client):
@ -55,13 +48,6 @@ def test_ddg_bang(client):
assert rv.headers.get('Location').startswith('https://github.com')
def test_custom_bang(client):
# Bang at beginning of query
rv = client.get(f'/{Endpoint.search}?q=!i%20whoogle')
assert rv._status_code == 302
assert rv.headers.get('Location').startswith('search?q=')
def test_config(client):
rv = client.post(f'/{Endpoint.config}', data=demo_config)
assert rv._status_code == 302

Loading…
Cancel
Save