searxng/searx/engines/kickass.py

## Kickass Torrent (Videos, Music, Files)
#
# @website     https://kickass.so
# @provide-api no (nothing found)
#
# @using-api   no
# @results     HTML (using search portal)
# @stable      yes (HTML can change)
# @parse       url, title, content, seed, leech, magnetlink

from urlparse import urljoin
from cgi import escape
from urllib import quote
from lxml import html
from operator import itemgetter
from searx.engines.xpath import extract_text

# engine dependent config
categories = ['videos', 'music', 'files']
paging = True

# search-url
url = 'https://kickass.to/'
search_url = url + 'search/{search_term}/{pageno}/'

# specific xpath variables
magnet_xpath = './/a[@title="Torrent magnet link"]'
torrent_xpath = './/a[@title="Download torrent file"]'
content_xpath = './/span[@class="font11px lightgrey block"]'


# do search-request
def request(query, params):
    params['url'] = search_url.format(search_term=quote(query),
                                      pageno=params['pageno'])

    # FIX: SSLError: hostname 'kickass.so'
    # doesn't match either of '*.kickass.to', 'kickass.to'
    params['verify'] = False

    return params


# get response from search-request
def response(resp):
    results = []

    # check if redirect comparing to the True value,
    # because resp can be a Mock object, and any attribut name returns something.
    if resp.is_redirect is True:
        return results

    dom = html.fromstring(resp.text)

    search_res = dom.xpath('//table[@class="data"]//tr')

    # return empty array if nothing is found
    if not search_res:
        return []

    # parse results
    for result in search_res[1:]:
        link = result.xpath('.//a[@class="cellMainLink"]')[0]
        href = urljoin(url, link.attrib['href'])
        title = extract_text(link)
        content = escape(extract_text(result.xpath(content_xpath)))
        seed = result.xpath('.//td[contains(@class, "green")]/text()')[0]
        leech = result.xpath('.//td[contains(@class, "red")]/text()')[0]
        filesize = result.xpath('.//td[contains(@class, "nobr")]/text()')[0]
        filesize_multiplier = result.xpath('.//td[contains(@class, "nobr")]//span/text()')[0]
        files = result.xpath('.//td[contains(@class, "center")][2]/text()')[0]

        # convert seed to int if possible
        if seed.isdigit():
            seed = int(seed)
        else:
            seed = 0

        # convert leech to int if possible
        if leech.isdigit():
            leech = int(leech)
        else:
            leech = 0

        # convert filesize to byte if possible
        try:
            filesize = float(filesize)

            # convert filesize to byte
            if filesize_multiplier == 'TB':
                filesize = int(filesize * 1024 * 1024 * 1024 * 1024)
            elif filesize_multiplier == 'GB':
                filesize = int(filesize * 1024 * 1024 * 1024)
            elif filesize_multiplier == 'MB':
                filesize = int(filesize * 1024 * 1024)
            elif filesize_multiplier == 'KB':
                filesize = int(filesize * 1024)
        except:
            filesize = None

        # convert files to int if possible
        if files.isdigit():
            files = int(files)
        else:
            files = None

        magnetlink = result.xpath(magnet_xpath)[0].attrib['href']

        torrentfile = result.xpath(torrent_xpath)[0].attrib['href']
        torrentfileurl = quote(torrentfile, safe="%/:=&?~#+!$,;'@()*")

        # append result
        results.append({'url': href,
                        'title': title,
                        'content': content,
                        'seed': seed,
                        'leech': leech,
                        'filesize': filesize,
                        'files': files,
                        'magnetlink': magnetlink,
                        'torrentfile': torrentfileurl,
                        'template': 'torrent.html'})

    # return results sorted by seeder
    return sorted(results, key=itemgetter('seed'), reverse=True)
First pass at Kickass Engine Parse and return results correctly. Pages numbers taken care of. Not done, and maybe to do : - 'content' : I don't know what it could be. Maybe votes ? - 'categories' : the results are not filtered by categories, because I don't see how to do it properly : there are too much categories on Kickass. Is 'video' only movies, or also tv show or porn ? So for now, the category is 'all'. - Favicon/icon : may be a good idea. 2014-12-09 18:19:39 +00:00			`## Kickass Torrent (Videos, Music, Files)`
[fix] pep8 2014-12-16 16:26:16 +00:00			`#`
First pass at Kickass Engine Parse and return results correctly. Pages numbers taken care of. Not done, and maybe to do : - 'content' : I don't know what it could be. Maybe votes ? - 'categories' : the results are not filtered by categories, because I don't see how to do it properly : there are too much categories on Kickass. Is 'video' only movies, or also tv show or porn ? So for now, the category is 'all'. - Favicon/icon : may be a good idea. 2014-12-09 18:19:39 +00:00			`# @website https://kickass.so`
			`# @provide-api no (nothing found)`
[fix] pep8 2014-12-16 16:26:16 +00:00			`#`
First pass at Kickass Engine Parse and return results correctly. Pages numbers taken care of. Not done, and maybe to do : - 'content' : I don't know what it could be. Maybe votes ? - 'categories' : the results are not filtered by categories, because I don't see how to do it properly : there are too much categories on Kickass. Is 'video' only movies, or also tv show or porn ? So for now, the category is 'all'. - Favicon/icon : may be a good idea. 2014-12-09 18:19:39 +00:00			`# @using-api no`
			`# @results HTML (using search portal)`
			`# @stable yes (HTML can change)`
			`# @parse url, title, content, seed, leech, magnetlink`

			`from urlparse import urljoin`
			`from cgi import escape`
			`from urllib import quote`
			`from lxml import html`
			`from operator import itemgetter`
Kickass' unit test 2015-01-30 20:02:17 +00:00			`from searx.engines.xpath import extract_text`
First pass at Kickass Engine Parse and return results correctly. Pages numbers taken care of. Not done, and maybe to do : - 'content' : I don't know what it could be. Maybe votes ? - 'categories' : the results are not filtered by categories, because I don't see how to do it properly : there are too much categories on Kickass. Is 'video' only movies, or also tv show or porn ? So for now, the category is 'all'. - Favicon/icon : may be a good idea. 2014-12-09 18:19:39 +00:00
			`# engine dependent config`
			`categories = ['videos', 'music', 'files']`
			`paging = True`

			`# search-url`
[fix] kickass engine : change the hostname to kickass.to (since kickass.so doesn't respond). Close #197 perhaps not in clean way. Explanation : In fact 301 responses are followed, except the hook is called for each HTTP response, the first time for the HTTP 301 response then for HTTP 200 response. Since the kickass engine excepts a real result, the engine crashes, AND the requests lib stops here. Add a simple test at the beginning of the result function allows pass the first response and handle correctly the second response (the real one) May be a proper way is to add this test in search.py ? Code inside requests : https://github.com/kennethreitz/requests/blob/53d02381e22436b6d0757eb305eb1a960f82d361/requests/sessions.py#L579 and line 591 2015-02-12 11:30:03 +00:00			`url = 'https://kickass.to/'`
First pass at Kickass Engine Parse and return results correctly. Pages numbers taken care of. Not done, and maybe to do : - 'content' : I don't know what it could be. Maybe votes ? - 'categories' : the results are not filtered by categories, because I don't see how to do it properly : there are too much categories on Kickass. Is 'video' only movies, or also tv show or porn ? So for now, the category is 'all'. - Favicon/icon : may be a good idea. 2014-12-09 18:19:39 +00:00			`search_url = url + 'search/{search_term}/{pageno}/'`

			`# specific xpath variables`
			`magnet_xpath = './/a[@title="Torrent magnet link"]'`
[enh] improve torrent results 2015-01-10 18:40:27 +00:00			`torrent_xpath = './/a[@title="Download torrent file"]'`
Flake8 and Twitter corrections Lots of Flake8 corrections Maybe we should change the rule to allow lines of 120 chars. It seems more usable. Big twitter correction : now it outputs the words in right order... 2014-12-29 20:31:04 +00:00			`content_xpath = './/span[@class="font11px lightgrey block"]'`
First pass at Kickass Engine Parse and return results correctly. Pages numbers taken care of. Not done, and maybe to do : - 'content' : I don't know what it could be. Maybe votes ? - 'categories' : the results are not filtered by categories, because I don't see how to do it properly : there are too much categories on Kickass. Is 'video' only movies, or also tv show or porn ? So for now, the category is 'all'. - Favicon/icon : may be a good idea. 2014-12-09 18:19:39 +00:00

			`# do search-request`
			`def request(query, params):`
			`params['url'] = search_url.format(search_term=quote(query),`
			`pageno=params['pageno'])`

[fix] pep8 2014-12-16 16:26:16 +00:00			`# FIX: SSLError: hostname 'kickass.so'`
			`# doesn't match either of '*.kickass.to', 'kickass.to'`
[fix] fix kickass engine thanks @Cqoicebordel in #144: https://github.com/asciimoo/searx/pull/144#issuecomment-67036903 2014-12-15 18:37:58 +00:00			`params['verify'] = False`

First pass at Kickass Engine Parse and return results correctly. Pages numbers taken care of. Not done, and maybe to do : - 'content' : I don't know what it could be. Maybe votes ? - 'categories' : the results are not filtered by categories, because I don't see how to do it properly : there are too much categories on Kickass. Is 'video' only movies, or also tv show or porn ? So for now, the category is 'all'. - Favicon/icon : may be a good idea. 2014-12-09 18:19:39 +00:00			`return params`


			`# get response from search-request`
			`def response(resp):`
			`results = []`

[fix] kickass engine : change the hostname to kickass.to (since kickass.so doesn't respond). Close #197 perhaps not in clean way. Explanation : In fact 301 responses are followed, except the hook is called for each HTTP response, the first time for the HTTP 301 response then for HTTP 200 response. Since the kickass engine excepts a real result, the engine crashes, AND the requests lib stops here. Add a simple test at the beginning of the result function allows pass the first response and handle correctly the second response (the real one) May be a proper way is to add this test in search.py ? Code inside requests : https://github.com/kennethreitz/requests/blob/53d02381e22436b6d0757eb305eb1a960f82d361/requests/sessions.py#L579 and line 591 2015-02-12 11:30:03 +00:00			`# check if redirect comparing to the True value,`
			`# because resp can be a Mock object, and any attribut name returns something.`
[fix] kickass tests 2015-02-12 13:50:41 +00:00			`if resp.is_redirect is True:`
[fix] kickass engine : change the hostname to kickass.to (since kickass.so doesn't respond). Close #197 perhaps not in clean way. Explanation : In fact 301 responses are followed, except the hook is called for each HTTP response, the first time for the HTTP 301 response then for HTTP 200 response. Since the kickass engine excepts a real result, the engine crashes, AND the requests lib stops here. Add a simple test at the beginning of the result function allows pass the first response and handle correctly the second response (the real one) May be a proper way is to add this test in search.py ? Code inside requests : https://github.com/kennethreitz/requests/blob/53d02381e22436b6d0757eb305eb1a960f82d361/requests/sessions.py#L579 and line 591 2015-02-12 11:30:03 +00:00			`return results`

First pass at Kickass Engine Parse and return results correctly. Pages numbers taken care of. Not done, and maybe to do : - 'content' : I don't know what it could be. Maybe votes ? - 'categories' : the results are not filtered by categories, because I don't see how to do it properly : there are too much categories on Kickass. Is 'video' only movies, or also tv show or porn ? So for now, the category is 'all'. - Favicon/icon : may be a good idea. 2014-12-09 18:19:39 +00:00			`dom = html.fromstring(resp.text)`

			`search_res = dom.xpath('//table[@class="data"]//tr')`

			`# return empty array if nothing is found`
			`if not search_res:`
			`return []`

			`# parse results`
			`for result in search_res[1:]:`
			`link = result.xpath('.//a[@class="cellMainLink"]')[0]`
			`href = urljoin(url, link.attrib['href'])`
Kickass' unit test 2015-01-30 20:02:17 +00:00			`title = extract_text(link)`
			`content = escape(extract_text(result.xpath(content_xpath)))`
First pass at Kickass Engine Parse and return results correctly. Pages numbers taken care of. Not done, and maybe to do : - 'content' : I don't know what it could be. Maybe votes ? - 'categories' : the results are not filtered by categories, because I don't see how to do it properly : there are too much categories on Kickass. Is 'video' only movies, or also tv show or porn ? So for now, the category is 'all'. - Favicon/icon : may be a good idea. 2014-12-09 18:19:39 +00:00			`seed = result.xpath('.//td[contains(@class, "green")]/text()')[0]`
			`leech = result.xpath('.//td[contains(@class, "red")]/text()')[0]`
[enh] improve torrent results 2015-01-10 18:40:27 +00:00			`filesize = result.xpath('.//td[contains(@class, "nobr")]/text()')[0]`
			`filesize_multiplier = result.xpath('.//td[contains(@class, "nobr")]//span/text()')[0]`
			`files = result.xpath('.//td[contains(@class, "center")][2]/text()')[0]`
First pass at Kickass Engine Parse and return results correctly. Pages numbers taken care of. Not done, and maybe to do : - 'content' : I don't know what it could be. Maybe votes ? - 'categories' : the results are not filtered by categories, because I don't see how to do it properly : there are too much categories on Kickass. Is 'video' only movies, or also tv show or porn ? So for now, the category is 'all'. - Favicon/icon : may be a good idea. 2014-12-09 18:19:39 +00:00
			`# convert seed to int if possible`
			`if seed.isdigit():`
			`seed = int(seed)`
			`else:`
			`seed = 0`

			`# convert leech to int if possible`
			`if leech.isdigit():`
			`leech = int(leech)`
			`else:`
			`leech = 0`

[enh] improve torrent results 2015-01-10 18:40:27 +00:00			`# convert filesize to byte if possible`
			`try:`
			`filesize = float(filesize)`
[fix] pep8 2015-01-10 19:01:36 +00:00
[enh] improve torrent results 2015-01-10 18:40:27 +00:00			`# convert filesize to byte`
			`if filesize_multiplier == 'TB':`
[fix] pep8 2015-01-10 19:01:36 +00:00			`filesize = int(filesize * 1024 * 1024 * 1024 * 1024)`
[enh] improve torrent results 2015-01-10 18:40:27 +00:00			`elif filesize_multiplier == 'GB':`
[fix] pep8 2015-01-10 19:01:36 +00:00			`filesize = int(filesize * 1024 * 1024 * 1024)`
[enh] improve torrent results 2015-01-10 18:40:27 +00:00			`elif filesize_multiplier == 'MB':`
[fix] pep8 2015-01-10 19:01:36 +00:00			`filesize = int(filesize * 1024 * 1024)`
Kickass' unit test 2015-01-30 20:02:17 +00:00			`elif filesize_multiplier == 'KB':`
[fix] pep8 2015-01-10 19:01:36 +00:00			`filesize = int(filesize * 1024)`
[enh] improve torrent results 2015-01-10 18:40:27 +00:00			`except:`
			`filesize = None`

			`# convert files to int if possible`
			`if files.isdigit():`
			`files = int(files)`
			`else:`
			`files = None`

First pass at Kickass Engine Parse and return results correctly. Pages numbers taken care of. Not done, and maybe to do : - 'content' : I don't know what it could be. Maybe votes ? - 'categories' : the results are not filtered by categories, because I don't see how to do it properly : there are too much categories on Kickass. Is 'video' only movies, or also tv show or porn ? So for now, the category is 'all'. - Favicon/icon : may be a good idea. 2014-12-09 18:19:39 +00:00			`magnetlink = result.xpath(magnet_xpath)[0].attrib['href']`
[fix] pep8 2015-01-10 19:01:36 +00:00
[enh] improve torrent results 2015-01-10 18:40:27 +00:00			`torrentfile = result.xpath(torrent_xpath)[0].attrib['href']`
Fix torrent W3C+UX Puts links to torrents and magnets in tool bar Fixes a lot of W3C errors 2015-01-11 18:34:11 +00:00			`torrentfileurl = quote(torrentfile, safe="%/:=&?~#+!$,;'@()*")`
First pass at Kickass Engine Parse and return results correctly. Pages numbers taken care of. Not done, and maybe to do : - 'content' : I don't know what it could be. Maybe votes ? - 'categories' : the results are not filtered by categories, because I don't see how to do it properly : there are too much categories on Kickass. Is 'video' only movies, or also tv show or porn ? So for now, the category is 'all'. - Favicon/icon : may be a good idea. 2014-12-09 18:19:39 +00:00
			`# append result`
			`results.append({'url': href,`
			`'title': title,`
Add icons and badge for the themes Add kickass in engine list Add content for the result from kickass 2014-12-14 22:27:27 +00:00			`'content': content,`
First pass at Kickass Engine Parse and return results correctly. Pages numbers taken care of. Not done, and maybe to do : - 'content' : I don't know what it could be. Maybe votes ? - 'categories' : the results are not filtered by categories, because I don't see how to do it properly : there are too much categories on Kickass. Is 'video' only movies, or also tv show or porn ? So for now, the category is 'all'. - Favicon/icon : may be a good idea. 2014-12-09 18:19:39 +00:00			`'seed': seed,`
			`'leech': leech,`
[enh] improve torrent results 2015-01-10 18:40:27 +00:00			`'filesize': filesize,`
			`'files': files,`
First pass at Kickass Engine Parse and return results correctly. Pages numbers taken care of. Not done, and maybe to do : - 'content' : I don't know what it could be. Maybe votes ? - 'categories' : the results are not filtered by categories, because I don't see how to do it properly : there are too much categories on Kickass. Is 'video' only movies, or also tv show or porn ? So for now, the category is 'all'. - Favicon/icon : may be a good idea. 2014-12-09 18:19:39 +00:00			`'magnetlink': magnetlink,`
Fix torrent W3C+UX Puts links to torrents and magnets in tool bar Fixes a lot of W3C errors 2015-01-11 18:34:11 +00:00			`'torrentfile': torrentfileurl,`
First pass at Kickass Engine Parse and return results correctly. Pages numbers taken care of. Not done, and maybe to do : - 'content' : I don't know what it could be. Maybe votes ? - 'categories' : the results are not filtered by categories, because I don't see how to do it properly : there are too much categories on Kickass. Is 'video' only movies, or also tv show or porn ? So for now, the category is 'all'. - Favicon/icon : may be a good idea. 2014-12-09 18:19:39 +00:00			`'template': 'torrent.html'})`

			`# return results sorted by seeder`
			`return sorted(results, key=itemgetter('seed'), reverse=True)`