You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
whoogle-search/app/utils/search.py

192 lines
6.6 KiB
Python

import os
import re
from typing import Any
from app.filter import Filter
from app.request import gen_query
from app.utils.misc import get_proxy_host_url
from app.utils.results import get_first_link
from bs4 import BeautifulSoup as bsoup
Project refactor (#85) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("*") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune '*'). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style
4 years ago
from cryptography.fernet import Fernet, InvalidToken
from flask import g
Add tor and http/socks proxy support (#137) * Add tor and http/socks proxy support Allows users to enable/disable tor from the config menu, which will forward all requests through Tor. Also adds support for setting environment variables for alternative proxy support. Setting the following variables will forward requests through the proxy: - WHOOGLE_PROXY_USER (optional) - WHOOGLE_PROXY_PASS (optional) - WHOOGLE_PROXY_TYPE (required) - Can be "http", "socks4", or "socks5" - WHOOGLE_PROXY_LOC (required) - Format: "<ip address>:<port>" See #30 * Refactor acquire_tor_conn -> acquire_tor_identity Also updated travis CI to set up tor * Add check for Tor socket on init, improve Tor error handling Initializing the app sends a heartbeat request to Tor to check for availability, and updates the home page config options accordingly. This heartbeat is sent on every request, to ensure Tor support can be reconfigured without restarting the entire app. If Tor support is enabled, and a subsequent request fails, then a new TorError exception is raised, and the Tor feature is disabled until a valid connection is restored. The max attempts has been updated to 10, since 5 seemed a bit too low for how quickly the attempts go by. * Change send_tor_signal arg type, update function doc send_tor_signal now accepts a stem.Signal arg (a bit cleaner tbh). Also added the doc string for the "disable" attribute in TorError. * Fix tor identity logic in Request.send * Update proxy init, change proxyloc var name Proxy is now only initialized if both type and location are specified, as neither have a default fallback and both are required. I suppose the type could fall back to http, but seems safer this way. Also refactored proxyurl -> proxyloc for the runtime args in order to match the Dockerfile args. * Add tor/proxy support for Docker builds, fix opensearch/init The Dockerfile is now updated to include support for Tor configuration, with a working torrc file included in the repo. An issue with opensearch was fixed as well, which was uncovered during testing and was simple enough to fix here. Likewise, DDG bang gen was updated to only ever happen if the file didn't exist previously, as testing with the file being regenerated every time was tedious. * Add missing "@" for socks proxy requests
4 years ago
TOR_BANNER = '<hr><h1 style="text-align: center">You are using Tor</h1><hr>'
CAPTCHA = 'div class="g-recaptcha"'
Add tor and http/socks proxy support (#137) * Add tor and http/socks proxy support Allows users to enable/disable tor from the config menu, which will forward all requests through Tor. Also adds support for setting environment variables for alternative proxy support. Setting the following variables will forward requests through the proxy: - WHOOGLE_PROXY_USER (optional) - WHOOGLE_PROXY_PASS (optional) - WHOOGLE_PROXY_TYPE (required) - Can be "http", "socks4", or "socks5" - WHOOGLE_PROXY_LOC (required) - Format: "<ip address>:<port>" See #30 * Refactor acquire_tor_conn -> acquire_tor_identity Also updated travis CI to set up tor * Add check for Tor socket on init, improve Tor error handling Initializing the app sends a heartbeat request to Tor to check for availability, and updates the home page config options accordingly. This heartbeat is sent on every request, to ensure Tor support can be reconfigured without restarting the entire app. If Tor support is enabled, and a subsequent request fails, then a new TorError exception is raised, and the Tor feature is disabled until a valid connection is restored. The max attempts has been updated to 10, since 5 seemed a bit too low for how quickly the attempts go by. * Change send_tor_signal arg type, update function doc send_tor_signal now accepts a stem.Signal arg (a bit cleaner tbh). Also added the doc string for the "disable" attribute in TorError. * Fix tor identity logic in Request.send * Update proxy init, change proxyloc var name Proxy is now only initialized if both type and location are specified, as neither have a default fallback and both are required. I suppose the type could fall back to http, but seems safer this way. Also refactored proxyurl -> proxyloc for the runtime args in order to match the Dockerfile args. * Add tor/proxy support for Docker builds, fix opensearch/init The Dockerfile is now updated to include support for Tor configuration, with a working torrc file included in the repo. An issue with opensearch was fixed as well, which was uncovered during testing and was simple enough to fix here. Likewise, DDG bang gen was updated to only ever happen if the file didn't exist previously, as testing with the file being regenerated every time was tedious. * Add missing "@" for socks proxy requests
4 years ago
def needs_https(url: str) -> bool:
"""Checks if the current instance needs to be upgraded to HTTPS
Note that all Heroku instances are available by default over HTTPS, but
do not automatically set up a redirect when visited over HTTP.
Args:
url: The instance url
Returns:
bool: True/False representing the need to upgrade
"""
https_only = bool(os.getenv('HTTPS_ONLY', 0))
is_heroku = url.endswith('.herokuapp.com')
is_http = url.startswith('http://')
return (is_heroku and is_http) or (https_only and is_http)
def has_captcha(results: str) -> bool:
"""Checks to see if the search results are blocked by a captcha
Args:
results: The search page html as a string
Returns:
bool: True/False indicating if a captcha element was found
"""
return CAPTCHA in results
class Search:
"""Search query preprocessor - used before submitting the query or
redirecting to another site
Attributes:
request: the incoming flask request
config: the current user config settings
Improve public instance session management (#480) This introduces a new approach to handling user sessions, which should allow for users to set more reliable config settings on public instances. Previously, when a user with cookies disabled would update their config, this would modify the app's default config file, which would in turn cause new users to inherit these settings when visiting the app for the first time and cause users to inherit these settings when their current session cookie expired (which was after 30 days by default I believe). There was also some half-baked logic for determining on the backend whether or not a user had cookies disabled, which lead to some issues with out of control session file creation by Flask. Now, when a user visits the site, their initial request is forwarded to a session/<session id> endpoint, and during that subsequent request their current session id is matched against the one found in the url. If the ids match, the user has cookies enabled. If not, their original request is modified with a 'cookies_disabled' query param that tells Flask not to bother trying to set up a new session for that user, and instead just use the app's fallback Fernet key for encryption and the default config. Since attempting to create a session for a user with cookies disabled creates a new session file, there is now also a clean-up routine included in the new session decorator, which will remove all sessions that don't include a valid key in the dict. NOTE!!! This means that current user sessions on public instances will be cleared once this update is merged in. In the long run that's a good thing though, since this will allow session mgmt to be a lot more reliable overall for users regardless of their cookie preference. Individual user sessions still use a unique Fernet key for encrypting queries, but users with cookies disabled will use the default app key for encryption and decryption. Sessions are also now (semi)permanent and have a lifetime of 1 year.
3 years ago
session_key: the flask user fernet key
"""
Improve public instance session management (#480) This introduces a new approach to handling user sessions, which should allow for users to set more reliable config settings on public instances. Previously, when a user with cookies disabled would update their config, this would modify the app's default config file, which would in turn cause new users to inherit these settings when visiting the app for the first time and cause users to inherit these settings when their current session cookie expired (which was after 30 days by default I believe). There was also some half-baked logic for determining on the backend whether or not a user had cookies disabled, which lead to some issues with out of control session file creation by Flask. Now, when a user visits the site, their initial request is forwarded to a session/<session id> endpoint, and during that subsequent request their current session id is matched against the one found in the url. If the ids match, the user has cookies enabled. If not, their original request is modified with a 'cookies_disabled' query param that tells Flask not to bother trying to set up a new session for that user, and instead just use the app's fallback Fernet key for encryption and the default config. Since attempting to create a session for a user with cookies disabled creates a new session file, there is now also a clean-up routine included in the new session decorator, which will remove all sessions that don't include a valid key in the dict. NOTE!!! This means that current user sessions on public instances will be cleared once this update is merged in. In the long run that's a good thing though, since this will allow session mgmt to be a lot more reliable overall for users regardless of their cookie preference. Individual user sessions still use a unique Fernet key for encrypting queries, but users with cookies disabled will use the default app key for encryption and decryption. Sessions are also now (semi)permanent and have a lifetime of 1 year.
3 years ago
def __init__(self, request, config, session_key, cookies_disabled=False):
method = request.method
self.request = request
self.request_params = request.args if method == 'GET' else request.form
Project refactor (#85) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("*") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune '*'). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style
4 years ago
self.user_agent = request.headers.get('User-Agent')
self.feeling_lucky = False
self.config = config
Improve public instance session management (#480) This introduces a new approach to handling user sessions, which should allow for users to set more reliable config settings on public instances. Previously, when a user with cookies disabled would update their config, this would modify the app's default config file, which would in turn cause new users to inherit these settings when visiting the app for the first time and cause users to inherit these settings when their current session cookie expired (which was after 30 days by default I believe). There was also some half-baked logic for determining on the backend whether or not a user had cookies disabled, which lead to some issues with out of control session file creation by Flask. Now, when a user visits the site, their initial request is forwarded to a session/<session id> endpoint, and during that subsequent request their current session id is matched against the one found in the url. If the ids match, the user has cookies enabled. If not, their original request is modified with a 'cookies_disabled' query param that tells Flask not to bother trying to set up a new session for that user, and instead just use the app's fallback Fernet key for encryption and the default config. Since attempting to create a session for a user with cookies disabled creates a new session file, there is now also a clean-up routine included in the new session decorator, which will remove all sessions that don't include a valid key in the dict. NOTE!!! This means that current user sessions on public instances will be cleared once this update is merged in. In the long run that's a good thing though, since this will allow session mgmt to be a lot more reliable overall for users regardless of their cookie preference. Individual user sessions still use a unique Fernet key for encrypting queries, but users with cookies disabled will use the default app key for encryption and decryption. Sessions are also now (semi)permanent and have a lifetime of 1 year.
3 years ago
self.session_key = session_key
Project refactor (#85) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("*") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune '*'). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style
4 years ago
self.query = ''
self.widget = ''
self.cookies_disabled = cookies_disabled
self.search_type = self.request_params.get(
'tbm') if 'tbm' in self.request_params else ''
Project refactor (#85) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("*") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune '*'). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style
4 years ago
def __getitem__(self, name) -> Any:
Project refactor (#85) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("*") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune '*'). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style
4 years ago
return getattr(self, name)
def __setitem__(self, name, value) -> None:
Project refactor (#85) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("*") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune '*'). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style
4 years ago
return setattr(self, name, value)
def __delitem__(self, name) -> None:
Project refactor (#85) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("*") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune '*'). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style
4 years ago
return delattr(self, name)
def __contains__(self, name) -> bool:
Project refactor (#85) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("*") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune '*'). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style
4 years ago
return hasattr(self, name)
def new_search_query(self) -> str:
"""Parses a plaintext query into a valid string for submission
Also decrypts the query string, if encrypted (in the case of
paginated results).
Returns:
str: A valid query string
"""
Project refactor (#85) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("*") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune '*'). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style
4 years ago
q = self.request_params.get('q')
if q is None or len(q) == 0:
return ''
else:
# Attempt to decrypt if this is an internal link
try:
Improve public instance session management (#480) This introduces a new approach to handling user sessions, which should allow for users to set more reliable config settings on public instances. Previously, when a user with cookies disabled would update their config, this would modify the app's default config file, which would in turn cause new users to inherit these settings when visiting the app for the first time and cause users to inherit these settings when their current session cookie expired (which was after 30 days by default I believe). There was also some half-baked logic for determining on the backend whether or not a user had cookies disabled, which lead to some issues with out of control session file creation by Flask. Now, when a user visits the site, their initial request is forwarded to a session/<session id> endpoint, and during that subsequent request their current session id is matched against the one found in the url. If the ids match, the user has cookies enabled. If not, their original request is modified with a 'cookies_disabled' query param that tells Flask not to bother trying to set up a new session for that user, and instead just use the app's fallback Fernet key for encryption and the default config. Since attempting to create a session for a user with cookies disabled creates a new session file, there is now also a clean-up routine included in the new session decorator, which will remove all sessions that don't include a valid key in the dict. NOTE!!! This means that current user sessions on public instances will be cleared once this update is merged in. In the long run that's a good thing though, since this will allow session mgmt to be a lot more reliable overall for users regardless of their cookie preference. Individual user sessions still use a unique Fernet key for encrypting queries, but users with cookies disabled will use the default app key for encryption and decryption. Sessions are also now (semi)permanent and have a lifetime of 1 year.
3 years ago
q = Fernet(self.session_key).decrypt(q.encode()).decode()
Project refactor (#85) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("*") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune '*'). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style
4 years ago
except InvalidToken:
pass
# Strip '!' for "feeling lucky" queries
if match := re.search("(^|\s)!($|\s)", q):
self.feeling_lucky = True
start, end = match.span()
self.query = " ".join([seg for seg in [q[:start], q[end:]] if seg])
else:
self.feeling_lucky = False
self.query = q
# Check for possible widgets
self.widget = "ip" if re.search("([^a-z0-9]|^)my *[^a-z0-9] *(ip|internet protocol)" +
"($|( *[^a-z0-9] *(((addres|address|adres|" +
"adress)|a)? *$)))", self.query.lower()) else self.widget
self.widget = 'calculator' if re.search(
r"\bcalculator\b|\bcalc\b|\bcalclator\b|\bmath\b",
self.query.lower()) else self.widget
Project refactor (#85) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("*") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune '*'). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style
4 years ago
return self.query
def generate_response(self) -> str:
"""Generates a response for the user's query
Returns:
str: A string response to the search query, in the form of a URL
or string representation of HTML content.
"""
Project refactor (#85) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("*") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune '*'). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style
4 years ago
mobile = 'Android' in self.user_agent or 'iPhone' in self.user_agent
# reconstruct url if X-Forwarded-Host header present
root_url = get_proxy_host_url(
self.request,
self.request.url_root,
root=True)
Project refactor (#85) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("*") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune '*'). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style
4 years ago
Improve public instance session management (#480) This introduces a new approach to handling user sessions, which should allow for users to set more reliable config settings on public instances. Previously, when a user with cookies disabled would update their config, this would modify the app's default config file, which would in turn cause new users to inherit these settings when visiting the app for the first time and cause users to inherit these settings when their current session cookie expired (which was after 30 days by default I believe). There was also some half-baked logic for determining on the backend whether or not a user had cookies disabled, which lead to some issues with out of control session file creation by Flask. Now, when a user visits the site, their initial request is forwarded to a session/<session id> endpoint, and during that subsequent request their current session id is matched against the one found in the url. If the ids match, the user has cookies enabled. If not, their original request is modified with a 'cookies_disabled' query param that tells Flask not to bother trying to set up a new session for that user, and instead just use the app's fallback Fernet key for encryption and the default config. Since attempting to create a session for a user with cookies disabled creates a new session file, there is now also a clean-up routine included in the new session decorator, which will remove all sessions that don't include a valid key in the dict. NOTE!!! This means that current user sessions on public instances will be cleared once this update is merged in. In the long run that's a good thing though, since this will allow session mgmt to be a lot more reliable overall for users regardless of their cookie preference. Individual user sessions still use a unique Fernet key for encrypting queries, but users with cookies disabled will use the default app key for encryption and decryption. Sessions are also now (semi)permanent and have a lifetime of 1 year.
3 years ago
content_filter = Filter(self.session_key,
root_url=root_url,
mobile=mobile,
config=self.config,
query=self.query)
full_query = gen_query(self.query,
self.request_params,
self.config)
self.full_query = full_query
# force mobile search when view image is true and
# the request is not already made by a mobile
view_image = ('tbm=isch' in full_query
and self.config.view_image
and not g.user_request.mobile)
get_body = g.user_request.send(query=full_query,
force_mobile=view_image,
user_agent=self.user_agent)
Project refactor (#85) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("*") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune '*'). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style
4 years ago
# Produce cleanable html soup from response
get_body_safed = get_body.text.replace("&lt;","andlt;").replace("&gt;","andgt;")
html_soup = bsoup(get_body_safed, 'html.parser')
# Replace current soup if view_image is active
if view_image:
html_soup = content_filter.view_image(html_soup)
# Indicate whether or not a Tor connection is active
if g.user_request.tor_valid:
html_soup.insert(0, bsoup(TOR_BANNER, 'html.parser'))
Project refactor (#85) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("*") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune '*'). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style
4 years ago
formatted_results = content_filter.clean(html_soup)
Project refactor (#85) * Major refactor of requests and session management - Switches from pycurl to requests library - Allows for less janky decoding, especially with non-latin character sets - Adds session level management of user configs - Allows for each session to set its own config (people are probably going to complain about this, though not sure if it'll be the same number of people who are upset that their friends/family have to share their config) - Updates key gen/regen to more aggressively swap out keys after each request * Added ability to save/load configs by name - New PUT method for config allows changing config with specified name - New methods in js controller to handle loading/saving of configs * Result formatting and removal of unused elements - Fixed question section formatting from results page (added appropriate padding and made questions styled as italic) - Removed user agent display from main config settings * Minor change to button label * Fixed issue with "de-pickling" of flask session Having a gitignore-everything ("*") file within a flask session folder seems to cause a weird bug where the state of the app becomes unusable from continuously trying to prune files listed in the gitignore (and it can't prune '*'). * Switched to pickling saved configs * Updated ad/sponsored content filter and conf naming Configs are now named with a .conf extension to allow for easier manual cleanup/modification of named config files Sponsored content now removed by basic string matching of span content * Version bump to 0.2.0 * Fixed request.send return style
4 years ago
if self.feeling_lucky:
if lucky_link := get_first_link(formatted_results):
return lucky_link
# Fall through to regular search if unable to find link
self.feeling_lucky = False
# Append user config to all search links, if available
param_str = ''.join('&{}={}'.format(k, v)
for k, v in
self.request_params.to_dict(flat=True).items()
if self.config.is_safe_key(k))
for link in formatted_results.find_all('a', href=True):
link['rel'] = "nofollow noopener noreferrer"
if 'search?' not in link['href'] or link['href'].index(
'search?') > 1:
continue
link['href'] += param_str
return str(formatted_results)