[fix] google news - send CONSENT Cookie to not be redirected

In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking.  To get the consent
from the user, google-news requests are redirected to confirm and get a CONSENT
Cookie from https://consent.google.de/s?continue=...

This patch adds a CONSENT Cookie to the google-news request to avoid
redirection.

The behavior of the CONTENTS cookies over all google engines seems similar but
the pattern is not yet fully clear to me, here are some random samples from my
analysis ..

Using common google search from different domains::

    google.com:        CONSENT=YES+cb.{{date}}-14-p0.de+FX+816
    google.de:         CONSENT=YES+cb.{{date}}-14-p0.de+FX+333
    google.fr:         CONSENT=YES+srp.gws-{{date}}-0-RC2.fr+FX+826

When searching about videos (google-videos)::

    google.es:         CONSENT=YES+srp.gws-{{date}}-0-RC2.es+FX+076
    google.de:         CONSENT=YES+srp.gws-{{date}}-0-RC2.de+FX+171

Google news has only one domain for all languages::

    news.google.com:   CONSENT=YES+cb.{{date}}-14-p0.de+FX+816

Using google-scholar search from different domains::

    scholar.google.de: CONSENT=YES+cb.{{date}}-14-p0.de+FX+333
    scholar.google.fr: does not use such a cookie / did not ask the user
    scholar.google.es: does not use such a cookie / did not ask the user

Interim summary:

  Pattern is unclear and I won't apply the CONSENT cookie to all google engines.
  More experience is need before we generalize the CONSENT cookies over all
  google engines.

Related:

- e9a6ab401 [fix] youtube - send CONSENT Cookie to not be redirected
- https://github.com/benbusby/whoogle-search/issues/311
- https://github.com/benbusby/whoogle-search/issues/243

[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This commit is contained in:
Markus Heiser 2021-06-18 11:49:20 +02:00
parent dd7b53d369
commit 9328c66e93

View File

@ -19,6 +19,7 @@ Definitions`_. Not all parameters can be appied:
# pylint: disable=invalid-name, missing-function-docstring
import binascii
from datetime import datetime
import re
from urllib.parse import urlencode
from base64 import b64decode
@ -115,6 +116,7 @@ def request(query, params):
params['headers']['Accept'] = (
'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
)
params['headers']['Cookie'] = "CONSENT=YES+cb.%s-14-p0.en+F+941;" % datetime.now().strftime("%Y%m%d")
return params