{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sentiment Classification & How To \"Frame Problems\" for a Neural Network\n", "\n", "by Andrew Trask\n", "\n", "- **Twitter**: @iamtrask\n", "- **Blog**: http://iamtrask.github.io" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What You Should Already Know\n", "\n", "- neural networks, forward and back-propagation\n", "- stochastic gradient descent\n", "- mean squared error\n", "- and train/test splits\n", "\n", "### Where to Get Help if You Need it\n", "- Re-watch previous Udacity Lectures\n", "- Leverage the recommended Course Reading Material - [Grokking Deep Learning](https://www.manning.com/books/grokking-deep-learning) (40% Off: **traskud17**)\n", "- Shoot me a tweet @iamtrask\n", "\n", "\n", "### Tutorial Outline:\n", "\n", "- Intro: The Importance of \"Framing a Problem\"\n", "\n", "\n", "- Curate a Dataset\n", "- Developing a \"Predictive Theory\"\n", "- **PROJECT 1**: Quick Theory Validation\n", "\n", "\n", "- Transforming Text to Numbers\n", "- **PROJECT 2**: Creating the Input/Output Data\n", "\n", "\n", "- Putting it all together in a Neural Network\n", "- **PROJECT 3**: Building our Neural Network\n", "\n", "\n", "- Understanding Neural Noise\n", "- **PROJECT 4**: Making Learning Faster by Reducing Noise\n", "\n", "\n", "- Analyzing Inefficiencies in our Network\n", "- **PROJECT 5**: Making our Network Train and Run Faster\n", "\n", "\n", "- Further Noise Reduction\n", "- **PROJECT 6**: Reducing Noise by Strategically Reducing the Vocabulary\n", "\n", "\n", "- Analysis: What's going on in the weights?" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "56bb3cba-260c-4ebe-9ed6-b995b4c72aa3" } }, "source": [ "# Lesson: Curate a Dataset" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "nbpresent": { "id": "eba2b193-0419-431e-8db9-60f34dd3fe83" } }, "outputs": [], "source": [ "def pretty_print_review_and_label(i):\n", " print(labels[i] + \"\\t:\\t\" + reviews[i][:80] + \"...\")\n", "\n", "g = open('reviews.txt','r') # What we know!\n", "reviews = list(map(lambda x:x[:-1],g.readlines()))\n", "g.close()\n", "\n", "g = open('labels.txt','r') # What we WANT to know!\n", "labels = list(map(lambda x:x[:-1].upper(),g.readlines()))\n", "g.close()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "25000" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(reviews)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "nbpresent": { "id": "bb95574b-21a0-4213-ae50-34363cf4f87f" } }, "outputs": [ { "data": { "text/plain": [ "'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life such as teachers . my years in the teaching profession lead me to believe that bromwell high s satire is much closer to reality than is teachers . the scramble to survive financially the insightful students who can see right through their pathetic teachers pomp the pettiness of the whole situation all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn t '" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews[0]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "nbpresent": { "id": "e0408810-c424-4ed4-afb9-1735e9ddbd0a" } }, "outputs": [ { "data": { "text/plain": [ "'POSITIVE'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "labels[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson: Develop a Predictive Theory" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "nbpresent": { "id": "e67a709f-234f-4493-bae6-4fb192141ee0" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "labels.txt \t : \t reviews.txt\n", "\n", "NEGATIVE\t:\tthis movie is terrible but it has some good effects . ...\n", "POSITIVE\t:\tadrian pasdar is excellent is this film . he makes a fascinating woman . ...\n", "NEGATIVE\t:\tcomment this movie is impossible . is terrible very improbable bad interpretat...\n", "POSITIVE\t:\texcellent episode movie ala pulp fiction . days suicides . it doesnt get more...\n", "NEGATIVE\t:\tif you haven t seen this it s terrible . it is pure trash . i saw this about ...\n", "POSITIVE\t:\tthis schiffer guy is a real genius the movie is of excellent quality and both e...\n" ] } ], "source": [ "print(\"labels.txt \\t : \\t reviews.txt\\n\")\n", "pretty_print_review_and_label(2137)\n", "pretty_print_review_and_label(12816)\n", "pretty_print_review_and_label(6267)\n", "pretty_print_review_and_label(21934)\n", "pretty_print_review_and_label(5297)\n", "pretty_print_review_and_label(4998)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Project 1: Quick Theory Validation" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from collections import Counter\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "positive_counts = Counter()\n", "negative_counts = Counter()\n", "total_counts = Counter()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "for i in range(len(reviews)):\n", " if(labels[i] == 'POSITIVE'):\n", " for word in reviews[i].split(\" \"):\n", " positive_counts[word] += 1\n", " total_counts[word] += 1\n", " else:\n", " for word in reviews[i].split(\" \"):\n", " negative_counts[word] += 1\n", " total_counts[word] += 1" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[('', 550468),\n", " ('the', 173324),\n", " ('.', 159654),\n", " ('and', 89722),\n", " ('a', 83688),\n", " ('of', 76855),\n", " ('to', 66746),\n", " ('is', 57245),\n", " ('in', 50215),\n", " ('br', 49235),\n", " ('it', 48025),\n", " ('i', 40743),\n", " ('that', 35630),\n", " ('this', 35080),\n", " ('s', 33815),\n", " ('as', 26308),\n", " ('with', 23247),\n", " ('for', 22416),\n", " ('was', 21917),\n", " ('film', 20937),\n", " ('but', 20822),\n", " ('movie', 19074),\n", " ('his', 17227),\n", " ('on', 17008),\n", " ('you', 16681),\n", " ('he', 16282),\n", " ('are', 14807),\n", " ('not', 14272),\n", " ('t', 13720),\n", " ('one', 13655),\n", " ('have', 12587),\n", " ('be', 12416),\n", " ('by', 11997),\n", " ('all', 11942),\n", " ('who', 11464),\n", " ('an', 11294),\n", " ('at', 11234),\n", " ('from', 10767),\n", " ('her', 10474),\n", " ('they', 9895),\n", " ('has', 9186),\n", " ('so', 9154),\n", " ('like', 9038),\n", " ('about', 8313),\n", " ('very', 8305),\n", " ('out', 8134),\n", " ('there', 8057),\n", " ('she', 7779),\n", " ('what', 7737),\n", " ('or', 7732),\n", " ('good', 7720),\n", " ('more', 7521),\n", " ('when', 7456),\n", " ('some', 7441),\n", " ('if', 7285),\n", " ('just', 7152),\n", " ('can', 7001),\n", " ('story', 6780),\n", " ('time', 6515),\n", " ('my', 6488),\n", " ('great', 6419),\n", " ('well', 6405),\n", " ('up', 6321),\n", " ('which', 6267),\n", " ('their', 6107),\n", " ('see', 6026),\n", " ('also', 5550),\n", " ('we', 5531),\n", " ('really', 5476),\n", " ('would', 5400),\n", " ('will', 5218),\n", " ('me', 5167),\n", " ('had', 5148),\n", " ('only', 5137),\n", " ('him', 5018),\n", " ('even', 4964),\n", " ('most', 4864),\n", " ('other', 4858),\n", " ('were', 4782),\n", " ('first', 4755),\n", " ('than', 4736),\n", " ('much', 4685),\n", " ('its', 4622),\n", " ('no', 4574),\n", " ('into', 4544),\n", " ('people', 4479),\n", " ('best', 4319),\n", " ('love', 4301),\n", " ('get', 4272),\n", " ('how', 4213),\n", " ('life', 4199),\n", " ('been', 4189),\n", " ('because', 4079),\n", " ('way', 4036),\n", " ('do', 3941),\n", " ('made', 3823),\n", " ('films', 3813),\n", " ('them', 3805),\n", " ('after', 3800),\n", " ('many', 3766),\n", " ('two', 3733),\n", " ('too', 3659),\n", " ('think', 3655),\n", " ('movies', 3586),\n", " ('characters', 3560),\n", " ('character', 3514),\n", " ('don', 3468),\n", " ('man', 3460),\n", " ('show', 3432),\n", " ('watch', 3424),\n", " ('seen', 3414),\n", " ('then', 3358),\n", " ('little', 3341),\n", " ('still', 3340),\n", " ('make', 3303),\n", " ('could', 3237),\n", " ('never', 3226),\n", " ('being', 3217),\n", " ('where', 3173),\n", " ('does', 3069),\n", " ('over', 3017),\n", " ('any', 3002),\n", " ('while', 2899),\n", " ('know', 2833),\n", " ('did', 2790),\n", " ('years', 2758),\n", " ('here', 2740),\n", " ('ever', 2734),\n", " ('end', 2696),\n", " ('these', 2694),\n", " ('such', 2590),\n", " ('real', 2568),\n", " ('scene', 2567),\n", " ('back', 2547),\n", " ('those', 2485),\n", " ('though', 2475),\n", " ('off', 2463),\n", " ('new', 2458),\n", " ('your', 2453),\n", " ('go', 2440),\n", " ('acting', 2437),\n", " ('plot', 2432),\n", " ('world', 2429),\n", " ('scenes', 2427),\n", " ('say', 2414),\n", " ('through', 2409),\n", " ('makes', 2390),\n", " ('better', 2381),\n", " ('now', 2368),\n", " ('work', 2346),\n", " ('young', 2343),\n", " ('old', 2311),\n", " ('ve', 2307),\n", " ('find', 2272),\n", " ('both', 2248),\n", " ('before', 2177),\n", " ('us', 2162),\n", " ('again', 2158),\n", " ('series', 2153),\n", " ('quite', 2143),\n", " ('something', 2135),\n", " ('cast', 2133),\n", " ('should', 2121),\n", " ('part', 2098),\n", " ('always', 2088),\n", " ('lot', 2087),\n", " ('another', 2075),\n", " ('actors', 2047),\n", " ('director', 2040),\n", " ('family', 2032),\n", " ('between', 2016),\n", " ('own', 2016),\n", " ('m', 1998),\n", " ('may', 1997),\n", " ('same', 1972),\n", " ('role', 1967),\n", " ('watching', 1966),\n", " ('every', 1954),\n", " ('funny', 1953),\n", " ('doesn', 1935),\n", " ('performance', 1928),\n", " ('few', 1918),\n", " ('bad', 1907),\n", " ('look', 1900),\n", " ('re', 1884),\n", " ('why', 1855),\n", " ('things', 1849),\n", " ('times', 1832),\n", " ('big', 1815),\n", " ('however', 1795),\n", " ('actually', 1790),\n", " ('action', 1789),\n", " ('going', 1783),\n", " ('bit', 1757),\n", " ('comedy', 1742),\n", " ('down', 1740),\n", " ('music', 1738),\n", " ('must', 1728),\n", " ('take', 1709),\n", " ('saw', 1692),\n", " ('long', 1690),\n", " ('right', 1688),\n", " ('fun', 1686),\n", " ('fact', 1684),\n", " ('excellent', 1683),\n", " ('around', 1674),\n", " ('didn', 1672),\n", " ('without', 1671),\n", " ('thing', 1662),\n", " ('thought', 1639),\n", " ('got', 1635),\n", " ('each', 1630),\n", " ('day', 1614),\n", " ('feel', 1597),\n", " ('seems', 1596),\n", " ('come', 1594),\n", " ('done', 1586),\n", " ('beautiful', 1580),\n", " ('especially', 1572),\n", " ('played', 1571),\n", " ('almost', 1566),\n", " ('want', 1562),\n", " ('yet', 1556),\n", " ('give', 1553),\n", " ('pretty', 1549),\n", " ('last', 1543),\n", " ('since', 1519),\n", " ('different', 1504),\n", " ('although', 1501),\n", " ('gets', 1490),\n", " ('true', 1487),\n", " ('interesting', 1481),\n", " ('job', 1470),\n", " ('enough', 1455),\n", " ('our', 1454),\n", " ('shows', 1447),\n", " ('horror', 1441),\n", " ('woman', 1439),\n", " ('tv', 1400),\n", " ('probably', 1398),\n", " ('father', 1395),\n", " ('original', 1393),\n", " ('girl', 1390),\n", " ('point', 1379),\n", " ('plays', 1378),\n", " ('wonderful', 1372),\n", " ('far', 1358),\n", " ('course', 1358),\n", " ('john', 1350),\n", " ('rather', 1340),\n", " ('isn', 1328),\n", " ('ll', 1326),\n", " ('later', 1324),\n", " ('dvd', 1324),\n", " ('war', 1310),\n", " ('whole', 1310),\n", " ('d', 1307),\n", " ('away', 1306),\n", " ('found', 1306),\n", " ('screen', 1305),\n", " ('nothing', 1300),\n", " ('year', 1297),\n", " ('once', 1296),\n", " ('hard', 1294),\n", " ('together', 1280),\n", " ('am', 1277),\n", " ('set', 1277),\n", " ('having', 1266),\n", " ('making', 1265),\n", " ('place', 1263),\n", " ('comes', 1260),\n", " ('might', 1260),\n", " ('sure', 1253),\n", " ('american', 1248),\n", " ('play', 1245),\n", " ('kind', 1244),\n", " ('takes', 1242),\n", " ('perfect', 1242),\n", " ('performances', 1237),\n", " ('himself', 1230),\n", " ('worth', 1221),\n", " ('everyone', 1221),\n", " ('anyone', 1214),\n", " ('actor', 1203),\n", " ('three', 1201),\n", " ('wife', 1196),\n", " ('classic', 1192),\n", " ('goes', 1186),\n", " ('ending', 1178),\n", " ('version', 1168),\n", " ('star', 1149),\n", " ('enjoy', 1146),\n", " ('book', 1142),\n", " ('nice', 1132),\n", " ('everything', 1128),\n", " ('during', 1124),\n", " ('put', 1118),\n", " ('seeing', 1111),\n", " ('least', 1102),\n", " ('house', 1100),\n", " ('high', 1095),\n", " ('watched', 1094),\n", " ('men', 1087),\n", " ('loved', 1087),\n", " ('night', 1082),\n", " ('anything', 1075),\n", " ('guy', 1071),\n", " ('believe', 1071),\n", " ('top', 1063),\n", " ('amazing', 1058),\n", " ('hollywood', 1056),\n", " ('looking', 1053),\n", " ('main', 1044),\n", " ('definitely', 1043),\n", " ('gives', 1031),\n", " ('home', 1029),\n", " ('seem', 1028),\n", " ('episode', 1023),\n", " ('sense', 1020),\n", " ('audience', 1020),\n", " ('truly', 1017),\n", " ('special', 1011),\n", " ('fan', 1009),\n", " ('second', 1009),\n", " ('short', 1009),\n", " ('mind', 1005),\n", " ('human', 1001),\n", " ('recommend', 999),\n", " ('full', 996),\n", " ('black', 995),\n", " ('help', 991),\n", " ('along', 989),\n", " ('trying', 987),\n", " ('small', 986),\n", " ('death', 985),\n", " ('friends', 981),\n", " ('remember', 974),\n", " ('often', 970),\n", " ('said', 966),\n", " ('favorite', 962),\n", " ('heart', 959),\n", " ('early', 957),\n", " ('left', 956),\n", " ('until', 955),\n", " ('let', 954),\n", " ('script', 954),\n", " ('maybe', 937),\n", " ('today', 936),\n", " ('live', 934),\n", " ('less', 934),\n", " ('moments', 933),\n", " ('others', 929),\n", " ('brilliant', 926),\n", " ('shot', 925),\n", " ('liked', 923),\n", " ('become', 916),\n", " ('won', 915),\n", " ('used', 910),\n", " ('style', 907),\n", " ('mother', 895),\n", " ('lives', 894),\n", " ('came', 893),\n", " ('stars', 890),\n", " ('cinema', 889),\n", " ('looks', 885),\n", " ('perhaps', 884),\n", " ('read', 882),\n", " ('enjoyed', 879),\n", " ('boy', 875),\n", " ('drama', 873),\n", " ('highly', 871),\n", " ('given', 870),\n", " ('playing', 867),\n", " ('use', 864),\n", " ('next', 859),\n", " ('women', 858),\n", " ('fine', 857),\n", " ('effects', 856),\n", " ('kids', 854),\n", " ('entertaining', 853),\n", " ('need', 852),\n", " ('line', 850),\n", " ('works', 848),\n", " ('someone', 847),\n", " ('mr', 836),\n", " ('simply', 835),\n", " ('children', 833),\n", " ('picture', 833),\n", " ('face', 831),\n", " ('friend', 831),\n", " ('keep', 831),\n", " ('dark', 830),\n", " ('overall', 828),\n", " ('certainly', 828),\n", " ('minutes', 827),\n", " ('wasn', 824),\n", " ('history', 822),\n", " ('finally', 820),\n", " ('couple', 816),\n", " ('against', 815),\n", " ('son', 809),\n", " ('understand', 808),\n", " ('lost', 807),\n", " ('michael', 805),\n", " ('else', 801),\n", " ('throughout', 798),\n", " ('fans', 797),\n", " ('city', 792),\n", " ('reason', 789),\n", " ('written', 787),\n", " ('production', 787),\n", " ('several', 784),\n", " ('school', 783),\n", " ('rest', 781),\n", " ('based', 781),\n", " ('try', 780),\n", " ('dead', 776),\n", " ('hope', 775),\n", " ('strong', 768),\n", " ('white', 765),\n", " ('tell', 759),\n", " ('itself', 758),\n", " ('half', 753),\n", " ('person', 749),\n", " ('sometimes', 746),\n", " ('past', 744),\n", " ('start', 744),\n", " ('genre', 743),\n", " ('final', 739),\n", " ('beginning', 739),\n", " ('town', 738),\n", " ('art', 734),\n", " ('game', 732),\n", " ('humor', 732),\n", " ('yes', 731),\n", " ('idea', 731),\n", " ('late', 730),\n", " ('becomes', 729),\n", " ('despite', 729),\n", " ('able', 726),\n", " ('case', 726),\n", " ('money', 723),\n", " ('child', 721),\n", " ('completely', 721),\n", " ('side', 719),\n", " ('camera', 716),\n", " ('getting', 714),\n", " ('instead', 712),\n", " ('soon', 702),\n", " ('under', 700),\n", " ('viewer', 699),\n", " ('age', 697),\n", " ('days', 696),\n", " ('stories', 696),\n", " ('felt', 694),\n", " ('simple', 694),\n", " ('roles', 693),\n", " ('video', 688),\n", " ('name', 683),\n", " ('either', 683),\n", " ('doing', 677),\n", " ('turns', 674),\n", " ('wants', 671),\n", " ('close', 671),\n", " ('title', 669),\n", " ('wrong', 668),\n", " ('went', 666),\n", " ('james', 665),\n", " ('evil', 659),\n", " ('budget', 657),\n", " ('episodes', 657),\n", " ('relationship', 655),\n", " ('piece', 653),\n", " ('fantastic', 653),\n", " ('david', 651),\n", " ('turn', 648),\n", " ('murder', 646),\n", " ('parts', 645),\n", " ('brother', 644),\n", " ('head', 643),\n", " ('absolutely', 643),\n", " ('experience', 642),\n", " ('eyes', 641),\n", " ('sex', 638),\n", " ('direction', 637),\n", " ('called', 637),\n", " ('directed', 636),\n", " ('lines', 634),\n", " ('behind', 633),\n", " ('sort', 632),\n", " ('actress', 631),\n", " ('lead', 630),\n", " ('oscar', 628),\n", " ('example', 627),\n", " ('including', 627),\n", " ('known', 625),\n", " ('musical', 625),\n", " ('chance', 621),\n", " ('score', 620),\n", " ('feeling', 619),\n", " ('already', 619),\n", " ('hit', 619),\n", " ('voice', 615),\n", " ('moment', 612),\n", " ('living', 612),\n", " ('low', 610),\n", " ('supporting', 610),\n", " ('ago', 609),\n", " ('themselves', 608),\n", " ('hilarious', 605),\n", " ('reality', 605),\n", " ('jack', 604),\n", " ('told', 603),\n", " ('hand', 601),\n", " ('moving', 600),\n", " ('dialogue', 600),\n", " ('quality', 600),\n", " ('song', 599),\n", " ('happy', 599),\n", " ('paul', 598),\n", " ('matter', 598),\n", " ('light', 594),\n", " ('future', 593),\n", " ('entire', 592),\n", " ('finds', 591),\n", " ('gave', 589),\n", " ('laugh', 587),\n", " ('released', 586),\n", " ('expect', 584),\n", " ('fight', 581),\n", " ('particularly', 580),\n", " ('cinematography', 579),\n", " ('police', 579),\n", " ('whose', 578),\n", " ('type', 578),\n", " ('sound', 578),\n", " ('enjoyable', 573),\n", " ('view', 573),\n", " ('husband', 572),\n", " ('romantic', 572),\n", " ('number', 572),\n", " ('daughter', 572),\n", " ('documentary', 571),\n", " ('self', 570),\n", " ('modern', 569),\n", " ('robert', 569),\n", " ('took', 569),\n", " ('superb', 569),\n", " ('mean', 566),\n", " ('shown', 563),\n", " ('coming', 561),\n", " ('important', 560),\n", " ('king', 559),\n", " ('leave', 559),\n", " ('change', 558),\n", " ('wanted', 555),\n", " ('somewhat', 555),\n", " ('tells', 554),\n", " ('run', 552),\n", " ('events', 552),\n", " ('country', 552),\n", " ('career', 552),\n", " ('heard', 550),\n", " ('season', 550),\n", " ('girls', 549),\n", " ('greatest', 549),\n", " ('etc', 547),\n", " ('care', 546),\n", " ('starts', 545),\n", " ('english', 542),\n", " ('killer', 541),\n", " ('animation', 540),\n", " ('guys', 540),\n", " ('totally', 540),\n", " ('tale', 540),\n", " ('usual', 539),\n", " ('opinion', 535),\n", " ('miss', 535),\n", " ('violence', 531),\n", " ('easy', 531),\n", " ('songs', 530),\n", " ('british', 528),\n", " ('says', 526),\n", " ('realistic', 525),\n", " ('writing', 524),\n", " ('act', 522),\n", " ('writer', 522),\n", " ('comic', 521),\n", " ('thriller', 519),\n", " ('television', 517),\n", " ('power', 516),\n", " ('ones', 515),\n", " ('kid', 514),\n", " ('novel', 513),\n", " ('york', 513),\n", " ('problem', 512),\n", " ('alone', 512),\n", " ('attention', 509),\n", " ('involved', 508),\n", " ('kill', 507),\n", " ('extremely', 507),\n", " ('seemed', 506),\n", " ('hero', 505),\n", " ('french', 505),\n", " ('rock', 504),\n", " ('stuff', 501),\n", " ('wish', 499),\n", " ('begins', 498),\n", " ('taken', 497),\n", " ('sad', 497),\n", " ('ways', 496),\n", " ('richard', 495),\n", " ('knows', 494),\n", " ('atmosphere', 493),\n", " ('surprised', 491),\n", " ('similar', 491),\n", " ('taking', 491),\n", " ('car', 491),\n", " ('george', 490),\n", " ('perfectly', 490),\n", " ('across', 489),\n", " ('sequence', 489),\n", " ('eye', 489),\n", " ('team', 489),\n", " ('serious', 488),\n", " ('powerful', 488),\n", " ('room', 488),\n", " ('due', 488),\n", " ('among', 488),\n", " ('order', 487),\n", " ('b', 487),\n", " ('cannot', 487),\n", " ('strange', 487),\n", " ('beauty', 486),\n", " ('famous', 485),\n", " ('tries', 484),\n", " ('myself', 484),\n", " ('happened', 484),\n", " ('herself', 484),\n", " ('class', 483),\n", " ('four', 482),\n", " ('cool', 481),\n", " ('release', 479),\n", " ('anyway', 479),\n", " ('theme', 479),\n", " ('opening', 478),\n", " ('entertainment', 477),\n", " ('unique', 475),\n", " ('ends', 475),\n", " ('slow', 475),\n", " ('exactly', 475),\n", " ('red', 474),\n", " ('o', 474),\n", " ('level', 474),\n", " ('easily', 474),\n", " ('interest', 472),\n", " ('happen', 471),\n", " ('crime', 470),\n", " ('viewing', 468),\n", " ('memorable', 467),\n", " ('sets', 467),\n", " ('group', 466),\n", " ('stop', 466),\n", " ('dance', 463),\n", " ('message', 463),\n", " ('sister', 463),\n", " ('working', 463),\n", " ('problems', 463),\n", " ('knew', 462),\n", " ('mystery', 461),\n", " ('nature', 461),\n", " ('bring', 460),\n", " ('believable', 459),\n", " ('thinking', 459),\n", " ('brought', 459),\n", " ('mostly', 458),\n", " ('couldn', 457),\n", " ('disney', 457),\n", " ('society', 456),\n", " ('within', 455),\n", " ('lady', 455),\n", " ('blood', 454),\n", " ('upon', 453),\n", " ('viewers', 453),\n", " ('parents', 453),\n", " ('meets', 452),\n", " ('form', 452),\n", " ('soundtrack', 452),\n", " ('usually', 452),\n", " ('tom', 452),\n", " ('peter', 452),\n", " ('local', 450),\n", " ('certain', 448),\n", " ('follow', 448),\n", " ('whether', 447),\n", " ('possible', 446),\n", " ('emotional', 445),\n", " ('killed', 444),\n", " ('de', 444),\n", " ('above', 444),\n", " ('middle', 443),\n", " ('god', 443),\n", " ('happens', 442),\n", " ('flick', 442),\n", " ('needs', 442),\n", " ('masterpiece', 441),\n", " ('major', 440),\n", " ('period', 440),\n", " ('haven', 439),\n", " ('named', 439),\n", " ('th', 438),\n", " ('particular', 438),\n", " ('earth', 437),\n", " ('feature', 437),\n", " ('stand', 436),\n", " ('words', 435),\n", " ('typical', 435),\n", " ('obviously', 433),\n", " ('elements', 433),\n", " ('romance', 431),\n", " ('jane', 430),\n", " ('yourself', 427),\n", " ('showing', 427),\n", " ('fantasy', 426),\n", " ('brings', 426),\n", " ('america', 423),\n", " ('guess', 423),\n", " ('huge', 422),\n", " ('unfortunately', 422),\n", " ('indeed', 421),\n", " ('running', 421),\n", " ('talent', 420),\n", " ('stage', 419),\n", " ('started', 418),\n", " ('sweet', 417),\n", " ('leads', 417),\n", " ('japanese', 417),\n", " ('poor', 416),\n", " ('deal', 416),\n", " ('personal', 413),\n", " ('incredible', 413),\n", " ('fast', 412),\n", " ('became', 410),\n", " ('deep', 410),\n", " ('hours', 409),\n", " ('nearly', 408),\n", " ('dream', 408),\n", " ('giving', 408),\n", " ('turned', 407),\n", " ('clearly', 407),\n", " ('near', 406),\n", " ('obvious', 406),\n", " ('cut', 405),\n", " ('surprise', 405),\n", " ('body', 404),\n", " ('era', 404),\n", " ('female', 403),\n", " ('hour', 403),\n", " ('five', 403),\n", " ('note', 399),\n", " ('learn', 398),\n", " ('truth', 398),\n", " ('match', 397),\n", " ('feels', 397),\n", " ('except', 397),\n", " ('tony', 397),\n", " ('filmed', 394),\n", " ('complete', 394),\n", " ('clear', 394),\n", " ('older', 393),\n", " ('street', 393),\n", " ('lots', 393),\n", " ('eventually', 393),\n", " ('keeps', 393),\n", " ('buy', 392),\n", " ('stewart', 391),\n", " ('william', 391),\n", " ('joe', 390),\n", " ('meet', 390),\n", " ('fall', 390),\n", " ('shots', 389),\n", " ('talking', 389),\n", " ('difficult', 389),\n", " ('unlike', 389),\n", " ('rating', 389),\n", " ('means', 388),\n", " ('dramatic', 388),\n", " ('appears', 386),\n", " ('subject', 386),\n", " ('wonder', 386),\n", " ('present', 386),\n", " ('situation', 386),\n", " ('comments', 385),\n", " ('sequences', 383),\n", " ('general', 383),\n", " ('lee', 383),\n", " ('earlier', 382),\n", " ('points', 382),\n", " ('check', 379),\n", " ('gone', 379),\n", " ('ten', 378),\n", " ('suspense', 378),\n", " ('recommended', 378),\n", " ('business', 377),\n", " ('third', 377),\n", " ('talk', 375),\n", " ('leaves', 375),\n", " ('beyond', 375),\n", " ('portrayal', 374),\n", " ('beautifully', 373),\n", " ('single', 372),\n", " ('bill', 372),\n", " ('word', 371),\n", " ('plenty', 371),\n", " ('falls', 370),\n", " ('whom', 370),\n", " ('figure', 369),\n", " ('battle', 369),\n", " ('scary', 369),\n", " ('non', 369),\n", " ('return', 368),\n", " ('using', 368),\n", " ('doubt', 367),\n", " ('add', 367),\n", " ('hear', 366),\n", " ('solid', 366),\n", " ('success', 366),\n", " ('touching', 365),\n", " ('political', 365),\n", " ('oh', 365),\n", " ('jokes', 365),\n", " ('awesome', 364),\n", " ('hell', 364),\n", " ('boys', 364),\n", " ('dog', 362),\n", " ('recently', 362),\n", " ('sexual', 362),\n", " ('please', 361),\n", " ('wouldn', 361),\n", " ('features', 361),\n", " ('straight', 361),\n", " ('lack', 360),\n", " ('forget', 360),\n", " ('setting', 360),\n", " ('mark', 359),\n", " ('married', 359),\n", " ('social', 357),\n", " ('adventure', 356),\n", " ('interested', 356),\n", " ('brothers', 355),\n", " ('sees', 355),\n", " ('actual', 355),\n", " ('terrific', 355),\n", " ('move', 354),\n", " ('call', 354),\n", " ('various', 353),\n", " ('dr', 353),\n", " ('theater', 353),\n", " ('animated', 352),\n", " ('western', 351),\n", " ('space', 350),\n", " ('baby', 350),\n", " ('leading', 348),\n", " ('disappointed', 348),\n", " ('portrayed', 346),\n", " ('aren', 346),\n", " ('screenplay', 345),\n", " ('smith', 345),\n", " ('hate', 344),\n", " ('towards', 344),\n", " ('noir', 343),\n", " ('outstanding', 342),\n", " ('decent', 342),\n", " ('kelly', 342),\n", " ('directors', 341),\n", " ('journey', 341),\n", " ('none', 340),\n", " ('effective', 340),\n", " ('looked', 340),\n", " ('caught', 339),\n", " ('cold', 339),\n", " ('storyline', 339),\n", " ('fi', 339),\n", " ('sci', 339),\n", " ('mary', 339),\n", " ('rich', 338),\n", " ('charming', 338),\n", " ('harry', 337),\n", " ('popular', 337),\n", " ('manages', 337),\n", " ('rare', 337),\n", " ('spirit', 336),\n", " ('open', 335),\n", " ('appreciate', 335),\n", " ('basically', 334),\n", " ('moves', 334),\n", " ('acted', 334),\n", " ('deserves', 333),\n", " ('subtle', 333),\n", " ('mention', 333),\n", " ('inside', 333),\n", " ('pace', 333),\n", " ('century', 333),\n", " ('boring', 333),\n", " ('familiar', 332),\n", " ('background', 332),\n", " ('ben', 331),\n", " ('creepy', 330),\n", " ('supposed', 330),\n", " ('secret', 329),\n", " ('jim', 328),\n", " ('die', 328),\n", " ('question', 327),\n", " ('effect', 327),\n", " ('natural', 327),\n", " ('rate', 326),\n", " ('language', 326),\n", " ('impressive', 326),\n", " ('intelligent', 325),\n", " ('saying', 325),\n", " ('material', 324),\n", " ('realize', 324),\n", " ('telling', 324),\n", " ('scott', 324),\n", " ('singing', 323),\n", " ('dancing', 322),\n", " ('adult', 321),\n", " ('imagine', 321),\n", " ('visual', 321),\n", " ('kept', 320),\n", " ('office', 320),\n", " ('uses', 319),\n", " ('pure', 318),\n", " ('wait', 318),\n", " ('stunning', 318),\n", " ('copy', 317),\n", " ('review', 317),\n", " ('previous', 317),\n", " ('seriously', 317),\n", " ('somehow', 316),\n", " ('created', 316),\n", " ('magic', 316),\n", " ('create', 316),\n", " ('hot', 316),\n", " ('reading', 316),\n", " ('crazy', 315),\n", " ('air', 315),\n", " ('frank', 315),\n", " ('stay', 315),\n", " ('escape', 315),\n", " ('attempt', 315),\n", " ('hands', 314),\n", " ('filled', 313),\n", " ('surprisingly', 312),\n", " ('expected', 312),\n", " ('average', 312),\n", " ('complex', 311),\n", " ('studio', 310),\n", " ('successful', 310),\n", " ('quickly', 310),\n", " ('male', 309),\n", " ('plus', 309),\n", " ('co', 307),\n", " ('minute', 306),\n", " ('images', 306),\n", " ('casting', 306),\n", " ('exciting', 306),\n", " ('following', 306),\n", " ('members', 305),\n", " ('german', 305),\n", " ('e', 305),\n", " ('reasons', 305),\n", " ('follows', 305),\n", " ('themes', 305),\n", " ('touch', 304),\n", " ('genius', 304),\n", " ('free', 304),\n", " ('edge', 304),\n", " ('cute', 304),\n", " ('outside', 303),\n", " ('ok', 302),\n", " ('admit', 302),\n", " ('younger', 302),\n", " ('reviews', 302),\n", " ('odd', 301),\n", " ('fighting', 301),\n", " ('master', 301),\n", " ('break', 300),\n", " ('thanks', 300),\n", " ('recent', 300),\n", " ('comment', 300),\n", " ('apart', 299),\n", " ('lovely', 298),\n", " ('begin', 298),\n", " ('emotions', 298),\n", " ('doctor', 297),\n", " ('italian', 297),\n", " ('party', 297),\n", " ('la', 296),\n", " ('missed', 296),\n", " ...]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "positive_counts.most_common()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "pos_neg_ratios = Counter()\n", "\n", "for term,cnt in list(total_counts.most_common()):\n", " if(cnt > 100):\n", " pos_neg_ratio = positive_counts[term] / float(negative_counts[term]+1)\n", " pos_neg_ratios[term] = pos_neg_ratio\n", "\n", "for word,ratio in pos_neg_ratios.most_common():\n", " if(ratio > 1):\n", " pos_neg_ratios[word] = np.log(ratio)\n", " else:\n", " pos_neg_ratios[word] = -np.log((1 / (ratio+0.01)))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[('edie', 4.6913478822291435),\n", " ('paulie', 4.0775374439057197),\n", " ('felix', 3.1527360223636558),\n", " ('polanski', 2.8233610476132043),\n", " ('matthau', 2.8067217286092401),\n", " ('victoria', 2.6810215287142909),\n", " ('mildred', 2.6026896854443837),\n", " ('gandhi', 2.5389738710582761),\n", " ('flawless', 2.451005098112319),\n", " ('superbly', 2.2600254785752498),\n", " ('perfection', 2.1594842493533721),\n", " ('astaire', 2.1400661634962708),\n", " ('captures', 2.0386195471595809),\n", " ('voight', 2.0301704926730531),\n", " ('wonderfully', 2.0218960560332353),\n", " ('powell', 1.9783454248084671),\n", " ('brosnan', 1.9547990964725592),\n", " ('lily', 1.9203768470501485),\n", " ('bakshi', 1.9029851043382795),\n", " ('lincoln', 1.9014583864844796),\n", " ('refreshing', 1.8551812956655511),\n", " ('breathtaking', 1.8481124057791867),\n", " ('bourne', 1.8478489358790986),\n", " ('lemmon', 1.8458266904983307),\n", " ('delightful', 1.8002701588959635),\n", " ('flynn', 1.7996646487351682),\n", " ('andrews', 1.7764919970972666),\n", " ('homer', 1.7692866133759964),\n", " ('beautifully', 1.7626953362841438),\n", " ('soccer', 1.7578579175523736),\n", " ('elvira', 1.7397031072720019),\n", " ('underrated', 1.7197859696029656),\n", " ('gripping', 1.7165360479904674),\n", " ('superb', 1.7091514458966952),\n", " ('delight', 1.6714733033535532),\n", " ('welles', 1.6677068205580761),\n", " ('sadness', 1.663505133704376),\n", " ('sinatra', 1.6389967146756448),\n", " ('touching', 1.637217476541176),\n", " ('timeless', 1.62924053973028),\n", " ('macy', 1.6211339521972916),\n", " ('unforgettable', 1.6177367152487956),\n", " ('favorites', 1.6158688027643908),\n", " ('stewart', 1.6119987332957739),\n", " ('hartley', 1.6094379124341003),\n", " ('sullivan', 1.6094379124341003),\n", " ('extraordinary', 1.6094379124341003),\n", " ('brilliantly', 1.5950491749820008),\n", " ('friendship', 1.5677652160335325),\n", " ('wonderful', 1.5645425925262093),\n", " ('palma', 1.5553706911638245),\n", " ('magnificent', 1.54663701119507),\n", " ('finest', 1.5462590108125689),\n", " ('jackie', 1.5439233053234738),\n", " ('ritter', 1.5404450409471491),\n", " ('tremendous', 1.5184661342283736),\n", " ('freedom', 1.5091151908062312),\n", " ('fantastic', 1.5048433868558566),\n", " ('terrific', 1.5026699370083942),\n", " ('noir', 1.493925025312256),\n", " ('sidney', 1.493925025312256),\n", " ('outstanding', 1.4910053152089213),\n", " ('mann', 1.4894785973551214),\n", " ('pleasantly', 1.4894785973551214),\n", " ('nancy', 1.488077055429833),\n", " ('marie', 1.4825711915553104),\n", " ('marvelous', 1.4739999415389962),\n", " ('excellent', 1.4647538505723599),\n", " ('ruth', 1.4596256342054401),\n", " ('stanwyck', 1.4412101187160054),\n", " ('widmark', 1.4350845252893227),\n", " ('splendid', 1.4271163556401458),\n", " ('chan', 1.423108334242607),\n", " ('exceptional', 1.4201959127955721),\n", " ('tender', 1.410986973710262),\n", " ('gentle', 1.4078005663408544),\n", " ('poignant', 1.4022947024663317),\n", " ('gem', 1.3932148039644643),\n", " ('amazing', 1.3919815802404802),\n", " ('chilling', 1.3862943611198906),\n", " ('captivating', 1.3862943611198906),\n", " ('fisher', 1.3862943611198906),\n", " ('davies', 1.3862943611198906),\n", " ('darker', 1.3652409519220583),\n", " ('april', 1.3499267169490159),\n", " ('kelly', 1.3461743673304654),\n", " ('blake', 1.3418425985490567),\n", " ('overlooked', 1.329135947279942),\n", " ('ralph', 1.32818673031261),\n", " ('bette', 1.3156767939059373),\n", " ('hoffman', 1.3150668518315229),\n", " ('cole', 1.3121863889661687),\n", " ('shines', 1.3049487216659381),\n", " ('powerful', 1.2999662776313934),\n", " ('notch', 1.2950456896547455),\n", " ('remarkable', 1.2883688239495823),\n", " ('pitt', 1.286210902562908),\n", " ('winters', 1.2833463918674481),\n", " ('vivid', 1.2762934659055623),\n", " ('gritty', 1.2757524867200667),\n", " ('giallo', 1.2745029551317739),\n", " ('portrait', 1.2704625455947689),\n", " ('innocence', 1.2694300209805796),\n", " ('psychiatrist', 1.2685113254635072),\n", " ('favorite', 1.2668956297860055),\n", " ('ensemble', 1.2656663733312759),\n", " ('stunning', 1.2622417124499117),\n", " ('burns', 1.259880436264232),\n", " ('garbo', 1.258954938743289),\n", " ('barbara', 1.2580400255962119),\n", " ('panic', 1.2527629684953681),\n", " ('holly', 1.2527629684953681),\n", " ('philip', 1.2527629684953681),\n", " ('carol', 1.2481440226390734),\n", " ('perfect', 1.246742480713785),\n", " ('appreciated', 1.2462482874741743),\n", " ('favourite', 1.2411123512753928),\n", " ('journey', 1.2367626271489269),\n", " ('rural', 1.235471471385307),\n", " ('bond', 1.2321436812926323),\n", " ('builds', 1.2305398317106577),\n", " ('brilliant', 1.2287554137664785),\n", " ('brooklyn', 1.2286654169163074),\n", " ('von', 1.225175011976539),\n", " ('unfolds', 1.2163953243244932),\n", " ('recommended', 1.2163953243244932),\n", " ('daniel', 1.20215296760895),\n", " ('perfectly', 1.1971931173405572),\n", " ('crafted', 1.1962507582320256),\n", " ('prince', 1.1939224684724346),\n", " ('troubled', 1.192138346678933),\n", " ('consequences', 1.1865810616140668),\n", " ('haunting', 1.1814999484738773),\n", " ('cinderella', 1.180052620608284),\n", " ('alexander', 1.1759989522835299),\n", " ('emotions', 1.1753049094563641),\n", " ('boxing', 1.1735135968412274),\n", " ('subtle', 1.1734135017508081),\n", " ('curtis', 1.1649873576129823),\n", " ('rare', 1.1566438362402944),\n", " ('loved', 1.1563661500586044),\n", " ('daughters', 1.1526795099383853),\n", " ('courage', 1.1438688802562305),\n", " ('dentist', 1.1426722784621401),\n", " ('highly', 1.1420208631618658),\n", " ('nominated', 1.1409146683587992),\n", " ('tony', 1.1397491942285991),\n", " ('draws', 1.1325138403437911),\n", " ('everyday', 1.1306150197542835),\n", " ('contrast', 1.1284652518177909),\n", " ('cried', 1.1213405397456659),\n", " ('fabulous', 1.1210851445201684),\n", " ('ned', 1.120591195386885),\n", " ('fay', 1.120591195386885),\n", " ('emma', 1.1184149159642893),\n", " ('sensitive', 1.113318436057805),\n", " ('smooth', 1.1089750757036563),\n", " ('dramas', 1.1080910326226534),\n", " ('today', 1.1050431789984001),\n", " ('helps', 1.1023091505494358),\n", " ('inspiring', 1.0986122886681098),\n", " ('jimmy', 1.0937696641923216),\n", " ('awesome', 1.0931328229034842),\n", " ('unique', 1.0881409888008142),\n", " ('tragic', 1.0871835928444868),\n", " ('intense', 1.0870514662670339),\n", " ('stellar', 1.0857088838322018),\n", " ('rival', 1.0822184788924332),\n", " ('provides', 1.0797081340289569),\n", " ('depression', 1.0782034170369026),\n", " ('shy', 1.0775588794702773),\n", " ('carrie', 1.076139432816051),\n", " ('blend', 1.0753554265038423),\n", " ('hank', 1.0736109864626924),\n", " ('diana', 1.0726368022648489),\n", " ('adorable', 1.0726368022648489),\n", " ('unexpected', 1.0722255334949147),\n", " ('achievement', 1.0668635903535293),\n", " ('bettie', 1.0663514264498881),\n", " ('happiness', 1.0632729222228008),\n", " ('glorious', 1.0608719606852626),\n", " ('davis', 1.0541605260972757),\n", " ('terrifying', 1.0525211814678428),\n", " ('beauty', 1.050410186850232),\n", " ('ideal', 1.0479685558493548),\n", " ('fears', 1.0467872208035236),\n", " ('hong', 1.0438040521731147),\n", " ('seasons', 1.0433496099930604),\n", " ('fascinating', 1.0414538748281612),\n", " ('carries', 1.0345904299031787),\n", " ('satisfying', 1.0321225473992768),\n", " ('definite', 1.0319209141694374),\n", " ('touched', 1.0296194171811581),\n", " ('greatest', 1.0248947127715422),\n", " ('creates', 1.0241097613701886),\n", " ('aunt', 1.023388867430522),\n", " ('walter', 1.022328983918479),\n", " ('spectacular', 1.0198314108149955),\n", " ('portrayal', 1.0189810189761024),\n", " ('ann', 1.0127808528183286),\n", " ('enterprise', 1.0116009116784799),\n", " ('musicals', 1.0096648026516135),\n", " ('deeply', 1.0094845087721023),\n", " ('incredible', 1.0061677561461084),\n", " ('mature', 1.0060195018402847),\n", " ('triumph', 0.99682959435816731),\n", " ('margaret', 0.99682959435816731),\n", " ('navy', 0.99493385919326827),\n", " ('harry', 0.99176919305006062),\n", " ('lucas', 0.990398704027877),\n", " ('sweet', 0.98966110487955483),\n", " ('joey', 0.98794672078059009),\n", " ('oscar', 0.98721905111049713),\n", " ('balance', 0.98649499054740353),\n", " ('warm', 0.98485340331145166),\n", " ('ages', 0.98449898190068863),\n", " ('glover', 0.98082925301172619),\n", " ('guilt', 0.98082925301172619),\n", " ('carrey', 0.98082925301172619),\n", " ('learns', 0.97881108885548895),\n", " ('unusual', 0.97788374278196932),\n", " ('sons', 0.97777581552483595),\n", " ('complex', 0.97761897738147796),\n", " ('essence', 0.97753435711487369),\n", " ('brazil', 0.9769153536905899),\n", " ('widow', 0.97650959186720987),\n", " ('solid', 0.97537964824416146),\n", " ('beautiful', 0.97326301262841053),\n", " ('holmes', 0.97246100334120955),\n", " ('awe', 0.97186058302896583),\n", " ('vhs', 0.97116734209998934),\n", " ('eerie', 0.97116734209998934),\n", " ('lonely', 0.96873720724669754),\n", " ('grim', 0.96873720724669754),\n", " ('sport', 0.96825047080486615),\n", " ('debut', 0.96508089604358704),\n", " ('destiny', 0.96343751029985703),\n", " ('thrillers', 0.96281074750904794),\n", " ('tears', 0.95977584381389391),\n", " ('rose', 0.95664202739772253),\n", " ('feelings', 0.95551144502743635),\n", " ('ginger', 0.95551144502743635),\n", " ('winning', 0.95471810900804055),\n", " ('stanley', 0.95387344302319799),\n", " ('cox', 0.95343027882361187),\n", " ('paris', 0.95278479030472663),\n", " ('heart', 0.95238806924516806),\n", " ('hooked', 0.95155887071161305),\n", " ('comfortable', 0.94803943018873538),\n", " ('mgm', 0.94446160884085151),\n", " ('masterpiece', 0.94155039863339296),\n", " ('themes', 0.94118828349588235),\n", " ('danny', 0.93967118051821874),\n", " ('anime', 0.93378388932167222),\n", " ('perry', 0.93328830824272613),\n", " ('joy', 0.93301752567946861),\n", " ('lovable', 0.93081883243706487),\n", " ('hal', 0.92953595862417571),\n", " ('mysteries', 0.92953595862417571),\n", " ('louis', 0.92871325187271225),\n", " ('charming', 0.92520609553210742),\n", " ('urban', 0.92367083917177761),\n", " ('allows', 0.92183091224977043),\n", " ('impact', 0.91815814604895041),\n", " ('gradually', 0.91629073187415511),\n", " ('lifestyle', 0.91629073187415511),\n", " ('italy', 0.91629073187415511),\n", " ('spy', 0.91289514287301687),\n", " ('treat', 0.91193342650519937),\n", " ('subsequent', 0.91056005716517008),\n", " ('kennedy', 0.90981821736853763),\n", " ('loving', 0.90967549275543591),\n", " ('surprising', 0.90937028902958128),\n", " ('quiet', 0.90648673177753425),\n", " ('winter', 0.90624039602065365),\n", " ('reveals', 0.90490540964902977),\n", " ('raw', 0.90445627422715225),\n", " ('funniest', 0.90078654533818991),\n", " ('pleased', 0.89994159387262562),\n", " ('norman', 0.89994159387262562),\n", " ('thief', 0.89874642222324552),\n", " ('season', 0.89827222637147675),\n", " ('secrets', 0.89794159320595857),\n", " ('colorful', 0.89705936994626756),\n", " ('highest', 0.8967461358011849),\n", " ('compelling', 0.89462923509297576),\n", " ('danes', 0.89248008318043659),\n", " ('castle', 0.88967708335606499),\n", " ('kudos', 0.88889175768604067),\n", " ('great', 0.88810470901464589),\n", " ('baseball', 0.88730319500090271),\n", " ('subtitles', 0.88730319500090271),\n", " ('bleak', 0.88730319500090271),\n", " ('winner', 0.88643776872447388),\n", " ('tragedy', 0.88563699078315261),\n", " ('todd', 0.88551907320740142),\n", " ('nicely', 0.87924946019380601),\n", " ('arthur', 0.87546873735389985),\n", " ('essential', 0.87373111745535925),\n", " ('gorgeous', 0.8731725250935497),\n", " ('fonda', 0.87294029100054127),\n", " ('eastwood', 0.87139541196626402),\n", " ('focuses', 0.87082835779739776),\n", " ('enjoyed', 0.87070195951624607),\n", " ('natural', 0.86997924506912838),\n", " ('intensity', 0.86835126958503595),\n", " ('witty', 0.86824103423244681),\n", " ('rob', 0.8642954367557748),\n", " ('worlds', 0.86377269759070874),\n", " ('health', 0.86113891179907498),\n", " ('magical', 0.85953791528170564),\n", " ('deeper', 0.85802182375017932),\n", " ('lucy', 0.85618680780444956),\n", " ('moving', 0.85566611005772031),\n", " ('lovely', 0.85290640004681306),\n", " ('purple', 0.8513711857748395),\n", " ('memorable', 0.84801189112086062),\n", " ('sings', 0.84729786038720367),\n", " ('craig', 0.84342938360928321),\n", " ('modesty', 0.84342938360928321),\n", " ('relate', 0.84326559685926517),\n", " ('episodes', 0.84223712084137292),\n", " ('strong', 0.84167135777060931),\n", " ('smith', 0.83959811108590054),\n", " ('tear', 0.83704136022001441),\n", " ('apartment', 0.83333115290549531),\n", " ('princess', 0.83290912293510388),\n", " ('disagree', 0.83290912293510388),\n", " ('kung', 0.83173334384609199),\n", " ('adventure', 0.83150561393278388),\n", " ('columbo', 0.82667857318446791),\n", " ('jake', 0.82667857318446791),\n", " ('adds', 0.82485652591452319),\n", " ('hart', 0.82472353834866463),\n", " ('strength', 0.82417544296634937),\n", " ('realizes', 0.82360006895738058),\n", " ('dave', 0.8232003088081431),\n", " ('childhood', 0.82208086393583857),\n", " ('forbidden', 0.81989888619908913),\n", " ('tight', 0.81883539572344199),\n", " ('surreal', 0.8178506590609026),\n", " ('manager', 0.81770990320170756),\n", " ('dancer', 0.81574950265227764),\n", " ('con', 0.81093021621632877),\n", " ('studios', 0.81093021621632877),\n", " ('miike', 0.80821651034473263),\n", " ('realistic', 0.80807714723392232),\n", " ('explicit', 0.80792269515237358),\n", " ('kurt', 0.8060875917405409),\n", " ('traditional', 0.80535917116687328),\n", " ('deals', 0.80535917116687328),\n", " ('holds', 0.80493858654806194),\n", " ('carl', 0.80437281567016972),\n", " ('touches', 0.80396154690023547),\n", " ('gene', 0.80314807577427383),\n", " ('albert', 0.8027669055771679),\n", " ('abc', 0.80234647252493729),\n", " ('cry', 0.80011930011211307),\n", " ('sides', 0.7995275841185171),\n", " ('develops', 0.79850769621777162),\n", " ('eyre', 0.79850769621777162),\n", " ('dances', 0.79694397424158891),\n", " ('oscars', 0.79633141679517616),\n", " ('legendary', 0.79600456599965308),\n", " ('importance', 0.79492987486988764),\n", " ('hearted', 0.79492987486988764),\n", " ('portraying', 0.79356592830699269),\n", " ('impressed', 0.79258107754813223),\n", " ('waters', 0.79112758892014912),\n", " ('empire', 0.79078565012386137),\n", " ('edge', 0.789774016249017),\n", " ('environment', 0.78845736036427028),\n", " ('jean', 0.78845736036427028),\n", " ('sentimental', 0.7864791203521645),\n", " ('captured', 0.78623760362595729),\n", " ('styles', 0.78592891401091158),\n", " ('daring', 0.78592891401091158),\n", " ('backgrounds', 0.78275933924963248),\n", " ('frank', 0.78275933924963248),\n", " ('matches', 0.78275933924963248),\n", " ('tense', 0.78275933924963248),\n", " ('gothic', 0.78209466657644144),\n", " ('sharp', 0.7814397877056235),\n", " ('achieved', 0.78015855754957497),\n", " ('court', 0.77947526404844247),\n", " ('steals', 0.7789140023173704),\n", " ('rules', 0.77844476107184035),\n", " ('colors', 0.77684619943659217),\n", " ('reunion', 0.77318988823348167),\n", " ('covers', 0.77139937745969345),\n", " ('tale', 0.77010822169607374),\n", " ('rain', 0.7683706017975328),\n", " ('denzel', 0.76804848873306297),\n", " ('stays', 0.76787072675588186),\n", " ('blob', 0.76725515271366718),\n", " ('conventional', 0.76214005204689672),\n", " ('maria', 0.76214005204689672),\n", " ('fresh', 0.76158434211317383),\n", " ('midnight', 0.76096977689870637),\n", " ('landscape', 0.75852993982279704),\n", " ('animated', 0.75768570169751648),\n", " ('titanic', 0.75666058628227129),\n", " ('sunday', 0.75666058628227129),\n", " ('spring', 0.7537718023763802),\n", " ('cagney', 0.7537718023763802),\n", " ('enjoyable', 0.75246375771636476),\n", " ('immensely', 0.75198768058287868),\n", " ('sir', 0.7507762933965817),\n", " ('nevertheless', 0.75067102469813185),\n", " ('driven', 0.74994477895307854),\n", " ('performances', 0.74883252516063137),\n", " ('memories', 0.74721440183022114),\n", " ('nowadays', 0.74721440183022114),\n", " ('simple', 0.74641420974143258),\n", " ('golden', 0.74533293373051557),\n", " ('leslie', 0.74533293373051557),\n", " ('lovers', 0.74497224842453125),\n", " ('relationship', 0.74484232345601786),\n", " ('supporting', 0.74357803418683721),\n", " ('che', 0.74262723782331497),\n", " ('packed', 0.7410032017375805),\n", " ('trek', 0.74021469141793106),\n", " ('provoking', 0.73840377214806618),\n", " ('strikes', 0.73759894313077912),\n", " ('depiction', 0.73682224406260699),\n", " ('emotional', 0.73678211645681524),\n", " ('secretary', 0.7366322924996842),\n", " ('influenced', 0.73511137965897755),\n", " ('florida', 0.73511137965897755),\n", " ('germany', 0.73288750920945944),\n", " ('brings', 0.73142936713096229),\n", " ('lewis', 0.73129894652432159),\n", " ('elderly', 0.73088750854279239),\n", " ('owner', 0.72743625403857748),\n", " ('streets', 0.72666987259858895),\n", " ('henry', 0.72642196944481741),\n", " ('portrays', 0.72593700338293632),\n", " ('bears', 0.7252354951114458),\n", " ('china', 0.72489587887452556),\n", " ('anger', 0.72439972406404984),\n", " ('society', 0.72433010799663333),\n", " ('available', 0.72415741730250549),\n", " ('best', 0.72347034060446314),\n", " ('bugs', 0.72270598280148979),\n", " ('magic', 0.71878961117328299),\n", " ('verhoeven', 0.71846498854423513),\n", " ('delivers', 0.71846498854423513),\n", " ('jim', 0.71783979315031676),\n", " ('donald', 0.71667767797013937),\n", " ('endearing', 0.71465338578090898),\n", " ('relationships', 0.71393795022901896),\n", " ('greatly', 0.71256526641704687),\n", " ('charlie', 0.71024161391924534),\n", " ('brad', 0.71024161391924534),\n", " ('simon', 0.70967648251115578),\n", " ('effectively', 0.70914752190638641),\n", " ('march', 0.70774597998109789),\n", " ('atmosphere', 0.70744773070214162),\n", " ('influence', 0.70733181555190172),\n", " ('genius', 0.706392407309966),\n", " ('emotionally', 0.70556970055850243),\n", " ('ken', 0.70526854109229009),\n", " ('identity', 0.70484322032313651),\n", " ('sophisticated', 0.70470800296102132),\n", " ('dan', 0.70457587638356811),\n", " ('andrew', 0.70329955202396321),\n", " ('india', 0.70144598337464037),\n", " ('roy', 0.69970458110610434),\n", " ('surprisingly', 0.6995780708902356),\n", " ('sky', 0.69780919366575667),\n", " ('romantic', 0.69664981111114743),\n", " ('match', 0.69566924999265523),\n", " ('britain', 0.69314718055994529),\n", " ('beatty', 0.69314718055994529),\n", " ('affected', 0.69314718055994529),\n", " ('cowboy', 0.69314718055994529),\n", " ('wave', 0.69314718055994529),\n", " ('stylish', 0.69314718055994529),\n", " ('bitter', 0.69314718055994529),\n", " ('patient', 0.69314718055994529),\n", " ('meets', 0.69314718055994529),\n", " ('love', 0.69198533541937324),\n", " ('paul', 0.68980827929443067),\n", " ('andy', 0.68846333124751902),\n", " ('performance', 0.68797386327972465),\n", " ('patrick', 0.68645819240914863),\n", " ('unlike', 0.68546468438792907),\n", " ('brooks', 0.68433655087779044),\n", " ('refuses', 0.68348526964820844),\n", " ('award', 0.6824518914431974),\n", " ('complaint', 0.6824518914431974),\n", " ('ride', 0.68229716453587952),\n", " ('dawson', 0.68171848473632257),\n", " ('luke', 0.68158635815886937),\n", " ('wells', 0.68087708796813096),\n", " ('france', 0.6804081547825156),\n", " ('handsome', 0.68007509899259255),\n", " ('sports', 0.68007509899259255),\n", " ('rebel', 0.67875844310784572),\n", " ('directs', 0.67875844310784572),\n", " ('greater', 0.67605274720064523),\n", " ('dreams', 0.67599410133369586),\n", " ('effective', 0.67565402311242806),\n", " ('interpretation', 0.67479804189174875),\n", " ('works', 0.67445504754779284),\n", " ('brando', 0.67445504754779284),\n", " ('noble', 0.6737290947028437),\n", " ('paced', 0.67314651385327573),\n", " ('le', 0.67067432470788668),\n", " ('master', 0.67015766233524654),\n", " ('h', 0.6696166831497512),\n", " ('rings', 0.66904962898088483),\n", " ('easy', 0.66895995494594152),\n", " ('city', 0.66820823221269321),\n", " ('sunshine', 0.66782937257565544),\n", " ('succeeds', 0.66647893347778397),\n", " ('relations', 0.664159643686693),\n", " ('england', 0.66387679825983203),\n", " ('glimpse', 0.66329421741026418),\n", " ('aired', 0.66268797307523675),\n", " ('sees', 0.66263163663399482),\n", " ('both', 0.66248336767382998),\n", " ('definitely', 0.66199789483898808),\n", " ('imaginative', 0.66139848224536502),\n", " ('appreciate', 0.66083893732728749),\n", " ('tricks', 0.66071190480679143),\n", " ('striking', 0.66071190480679143),\n", " ('carefully', 0.65999497324304479),\n", " ('complicated', 0.65981076029235353),\n", " ('perspective', 0.65962448852130173),\n", " ('trilogy', 0.65877953705573755),\n", " ('future', 0.65834665141052828),\n", " ('lion', 0.65742909795786608),\n", " ('victor', 0.65540685257709819),\n", " ('douglas', 0.65540685257709819),\n", " ('inspired', 0.65459851044271034),\n", " ('marriage', 0.65392646740666405),\n", " ('demands', 0.65392646740666405),\n", " ('father', 0.65172321672194655),\n", " ('page', 0.65123628494430852),\n", " ('instant', 0.65058756614114943),\n", " ('era', 0.6495567444850836),\n", " ('ruthless', 0.64934455790155243),\n", " ('saga', 0.64934455790155243),\n", " ('joan', 0.64891392558311978),\n", " ('joseph', 0.64841128671855386),\n", " ('workers', 0.64829661439459352),\n", " ('fantasy', 0.64726757480925168),\n", " ('accomplished', 0.64551913157069074),\n", " ('distant', 0.64551913157069074),\n", " ('manhattan', 0.64435701639051324),\n", " ('personal', 0.64355023942057321),\n", " ('pushing', 0.64313675998528386),\n", " ('meeting', 0.64313675998528386),\n", " ('individual', 0.64313675998528386),\n", " ('pleasant', 0.64250344774119039),\n", " ('brave', 0.64185388617239469),\n", " ('william', 0.64083139119578469),\n", " ('hudson', 0.64077919504262937),\n", " ('friendly', 0.63949446706762514),\n", " ('eccentric', 0.63907995928966954),\n", " ('awards', 0.63875310849414646),\n", " ('jack', 0.63838309514997038),\n", " ('seeking', 0.63808740337691783),\n", " ('colonel', 0.63757732940513456),\n", " ('divorce', 0.63757732940513456),\n", " ('jane', 0.63443957973316734),\n", " ('keeping', 0.63414883979798953),\n", " ('gives', 0.63383568159497883),\n", " ('ted', 0.63342794585832296),\n", " ('animation', 0.63208692379869902),\n", " ('progress', 0.6317782341836532),\n", " ('concert', 0.63127177684185776),\n", " ('larger', 0.63127177684185776),\n", " ('nation', 0.6296337748376194),\n", " ('albeit', 0.62739580299716491),\n", " ('adapted', 0.62613647027698516),\n", " ('discovers', 0.62542900650499444),\n", " ('classic', 0.62504956428050518),\n", " ('segment', 0.62335141862440335),\n", " ('morgan', 0.62303761437291871),\n", " ('mouse', 0.62294292188669675),\n", " ('impressive', 0.62211140744319349),\n", " ('artist', 0.62168821657780038),\n", " ('ultimate', 0.62168821657780038),\n", " ('griffith', 0.62117368093485603),\n", " ('emily', 0.62082651898031915),\n", " ('drew', 0.62082651898031915),\n", " ('moved', 0.6197197120051281),\n", " ('profound', 0.61903920840622351),\n", " ('families', 0.61903920840622351),\n", " ('innocent', 0.61851219917136446),\n", " ('versions', 0.61730910416844087),\n", " ('eddie', 0.61691981517206107),\n", " ('criticism', 0.61651395453902935),\n", " ('nature', 0.61594514653194088),\n", " ('recognized', 0.61518563909023349),\n", " ('sexuality', 0.61467556511845012),\n", " ('contract', 0.61400986000122149),\n", " ('brian', 0.61344043794920278),\n", " ('remembered', 0.6131044728864089),\n", " ('determined', 0.6123858239154869),\n", " ('offers', 0.61207935747116349),\n", " ('pleasure', 0.61195702582993206),\n", " ('washington', 0.61180154110599294),\n", " ('images', 0.61159731359583758),\n", " ('games', 0.61067095873570676),\n", " ('academy', 0.60872983874736208),\n", " ('fashioned', 0.60798937221963845),\n", " ('melodrama', 0.60749173598145145),\n", " ('peoples', 0.60613580357031549),\n", " ('charismatic', 0.60613580357031549),\n", " ('rough', 0.60613580357031549),\n", " ('dealing', 0.60517840761398811),\n", " ('fine', 0.60496962268013299),\n", " ('tap', 0.60391604683200273),\n", " ('trio', 0.60157998703445481),\n", " ('russell', 0.60120968523425966),\n", " ('figures', 0.60077386042893011),\n", " ('ward', 0.60005675749393339),\n", " ('shine', 0.59911823091166894),\n", " ('brady', 0.59911823091166894),\n", " ('job', 0.59845562125168661),\n", " ('satisfied', 0.59652034487087369),\n", " ('river', 0.59637962862495086),\n", " ('brown', 0.595773016534769),\n", " ('believable', 0.59566072133302495),\n", " ('bound', 0.59470710774669278),\n", " ('always', 0.59470710774669278),\n", " ('hall', 0.5933967777928858),\n", " ('cook', 0.5916777203950857),\n", " ('claire', 0.59136448625000293),\n", " ('broadway', 0.59033768669372433),\n", " ('anna', 0.58778666490211906),\n", " ('peace', 0.58628403501758408),\n", " ('visually', 0.58539431926349916),\n", " ('falk', 0.58525821854876026),\n", " ('morality', 0.58525821854876026),\n", " ('growing', 0.58466653756587539),\n", " ('experiences', 0.58314628534561685),\n", " ('stood', 0.58314628534561685),\n", " ('touch', 0.58122926435596001),\n", " ('lives', 0.5810976767513224),\n", " ('kubrick', 0.58066919713325493),\n", " ('timing', 0.58047401805583243),\n", " ('struggles', 0.57981849525294216),\n", " ('expressions', 0.57981849525294216),\n", " ('authentic', 0.57848427223980559),\n", " ('helen', 0.57763429343810091),\n", " ('pre', 0.57700753064729182),\n", " ('quirky', 0.5753641449035618),\n", " ('young', 0.57531672344534313),\n", " ('inner', 0.57454143815209846),\n", " ('mexico', 0.57443087372056334),\n", " ('clint', 0.57380042292737909),\n", " ('sisters', 0.57286101468544337),\n", " ('realism', 0.57226528899949558),\n", " ('personalities', 0.5720692490067093),\n", " ('french', 0.5720692490067093),\n", " ('surprises', 0.57113222999698177),\n", " ('adventures', 0.57113222999698177),\n", " ('overcome', 0.5697681593994407),\n", " ('timothy', 0.56953322459276867),\n", " ('tales', 0.56909453188996639),\n", " ('war', 0.56843317302781682),\n", " ('civil', 0.5679840376059393),\n", " ('countries', 0.56737779327091187),\n", " ('streep', 0.56710645966458029),\n", " ('tradition', 0.56685345523565323),\n", " ('oliver', 0.56673325570428668),\n", " ('australia', 0.56580775818334383),\n", " ('understanding', 0.56531380905006046),\n", " ('players', 0.56509525370004821),\n", " ('knowing', 0.56489284503626647),\n", " ('rogers', 0.56421349718405212),\n", " ('suspenseful', 0.56368911332305849),\n", " ('variety', 0.56368911332305849),\n", " ('true', 0.56281525180810066),\n", " ('jr', 0.56220982311246936),\n", " ('psychological', 0.56108745854687891),\n", " ('branagh', 0.55961578793542266),\n", " ('wealth', 0.55961578793542266),\n", " ('performing', 0.55961578793542266),\n", " ('odds', 0.55961578793542266),\n", " ('sent', 0.55961578793542266),\n", " ('reminiscent', 0.55961578793542266),\n", " ('grand', 0.55961578793542266),\n", " ('overwhelming', 0.55961578793542266),\n", " ('brothers', 0.55891181043362848),\n", " ('howard', 0.55811089675600245),\n", " ('david', 0.55693122256475369),\n", " ('generation', 0.55628799784274796),\n", " ('grow', 0.55612538299565417),\n", " ('survival', 0.55594605904646033),\n", " ('mainstream', 0.55574731115750231),\n", " ('dick', 0.55431073570572953),\n", " ('charm', 0.55288175575407861),\n", " ('kirk', 0.55278982286502287),\n", " ('twists', 0.55244729845681018),\n", " ('gangster', 0.55206858230003986),\n", " ('jeff', 0.55179306225421365),\n", " ('family', 0.55116244510065526),\n", " ('tend', 0.55053307336110335),\n", " ('thanks', 0.55049088015842218),\n", " ('world', 0.54744234723432639),\n", " ('sutherland', 0.54743536937855164),\n", " ('life', 0.54695514434959924),\n", " ('disc', 0.54654370636806993),\n", " ('bug', 0.54654370636806993),\n", " ('tribute', 0.5455111817538808),\n", " ('europe', 0.54522705048332309),\n", " ('sacrifice', 0.54430155296238014),\n", " ('color', 0.54405127139431109),\n", " ('superior', 0.54333490233128523),\n", " ('york', 0.54318235866536513),\n", " ('pulls', 0.54266622962164945),\n", " ('hearts', 0.54232429082536171),\n", " ('jackson', 0.54232429082536171),\n", " ('enjoy', 0.54124285135906114),\n", " ('redemption', 0.54056759296472823),\n", " ('madness', 0.540384426007535),\n", " ('hamilton', 0.5389965007326869),\n", " ('stands', 0.5389965007326869),\n", " ('trial', 0.5389965007326869),\n", " ('greek', 0.5389965007326869),\n", " ('each', 0.5388212312554177),\n", " ('faithful', 0.53773307668591508),\n", " ('received', 0.5372768098531604),\n", " ('jealous', 0.53714293208336406),\n", " ('documentaries', 0.53714293208336406),\n", " ('different', 0.53709860682460819),\n", " ('describes', 0.53680111016925136),\n", " ('shorts', 0.53596159703753288),\n", " ('brilliance', 0.53551823635636209),\n", " ('mountains', 0.53492317534505118),\n", " ('share', 0.53408248593025787),\n", " ('dealt', 0.53408248593025787),\n", " ('providing', 0.53329847961804933),\n", " ('explore', 0.53329847961804933),\n", " ('series', 0.5325809226575603),\n", " ('fellow', 0.5323318289869543),\n", " ('loves', 0.53062825106217038),\n", " ('olivier', 0.53062825106217038),\n", " ('revolution', 0.53062825106217038),\n", " ('roman', 0.53062825106217038),\n", " ('century', 0.53002783074992665),\n", " ('musical', 0.52966871156747064),\n", " ('heroic', 0.52925932545482868),\n", " ('ironically', 0.52806743020049673),\n", " ('approach', 0.52806743020049673),\n", " ('temple', 0.52806743020049673),\n", " ('moves', 0.5279372642387119),\n", " ('gift', 0.52702030968597136),\n", " ('julie', 0.52609309589677911),\n", " ('tells', 0.52415107836314001),\n", " ('radio', 0.52394671172868779),\n", " ('uncle', 0.52354439617376536),\n", " ('union', 0.52324814376454787),\n", " ('deep', 0.52309571635780505),\n", " ('reminds', 0.52157841554225237),\n", " ('famous', 0.52118841080153722),\n", " ('jazz', 0.52053443789295151),\n", " ('dennis', 0.51987545928590861),\n", " ('epic', 0.51919387343650736),\n", " ('adult', 0.519167695083386),\n", " ('shows', 0.51915322220375304),\n", " ('performed', 0.5191244265806858),\n", " ('demons', 0.5191244265806858),\n", " ('eric', 0.51879379341516751),\n", " ('discovered', 0.51879379341516751),\n", " ('youth', 0.5185626062681431),\n", " ('human', 0.51851411224987087),\n", " ('tarzan', 0.51813827061227724),\n", " ('ourselves', 0.51794309153485463),\n", " ('wwii', 0.51758240622887042),\n", " ('passion', 0.5162164724008671),\n", " ('desire', 0.51607497965213445),\n", " ('pays', 0.51581316527702981),\n", " ('fox', 0.51557622652458857),\n", " ('dirty', 0.51557622652458857),\n", " ('symbolism', 0.51546600332249293),\n", " ('sympathetic', 0.51546600332249293),\n", " ('attitude', 0.51530993621331933),\n", " ('appearances', 0.51466440007315639),\n", " ('jeremy', 0.51466440007315639),\n", " ('fun', 0.51439068993048687),\n", " ('south', 0.51420972175023116),\n", " ('arrives', 0.51409894911095988),\n", " ('present', 0.51341965894303732),\n", " ('com', 0.51326167856387173),\n", " ('smile', 0.51265880484765169),\n", " ('fits', 0.51082562376599072),\n", " ('provided', 0.51082562376599072),\n", " ('carter', 0.51082562376599072),\n", " ('ring', 0.51082562376599072),\n", " ('aging', 0.51082562376599072),\n", " ('countryside', 0.51082562376599072),\n", " ('alan', 0.51082562376599072),\n", " ('visit', 0.51082562376599072),\n", " ('begins', 0.51015650363396647),\n", " ('success', 0.50900578704900468),\n", " ('japan', 0.50900578704900468),\n", " ('accurate', 0.50895471583017893),\n", " ('proud', 0.50800474742434931),\n", " ('daily', 0.5075946031845443),\n", " ('atmospheric', 0.50724780241810674),\n", " ('karloff', 0.50724780241810674),\n", " ('recently', 0.50714914903668207),\n", " ('fu', 0.50704490092608467),\n", " ('horrors', 0.50656122497953315),\n", " ('finding', 0.50637127341661037),\n", " ('lust', 0.5059356384717989),\n", " ('hitchcock', 0.50574947073413001),\n", " ('among', 0.50334004951332734),\n", " ('viewing', 0.50302139827440906),\n", " ('shining', 0.50262885656181222),\n", " ('investigation', 0.50262885656181222),\n", " ('duo', 0.5020919437972361),\n", " ('cameron', 0.5020919437972361),\n", " ('finds', 0.50128303100539795),\n", " ('contemporary', 0.50077528791248915),\n", " ('genuine', 0.50046283673044401),\n", " ('frightening', 0.49995595152908684),\n", " ('plays', 0.49975983848890226),\n", " ('age', 0.49941323171424595),\n", " ('position', 0.49899116611898781),\n", " ('continues', 0.49863035067217237),\n", " ('roles', 0.49839716550752178),\n", " ('james', 0.49837216269470402),\n", " ('individuals', 0.49824684155913052),\n", " ('brought', 0.49783842823917956),\n", " ('hilarious', 0.49714551986191058),\n", " ('brutal', 0.49681488669639234),\n", " ('appropriate', 0.49643688631389105),\n", " ('dance', 0.49581998314812048),\n", " ('league', 0.49578774640145024),\n", " ('helping', 0.49578774640145024),\n", " ('answers', 0.49578774640145024),\n", " ('stunts', 0.49561620510246196),\n", " ('traveling', 0.49532143723002542),\n", " ('thoroughly', 0.49414593456733524),\n", " ('depicted', 0.49317068852726992),\n", " ('honor', 0.49247648509779424),\n", " ('combination', 0.49247648509779424),\n", " ('differences', 0.49247648509779424),\n", " ('fully', 0.49213349075383811),\n", " ('tracy', 0.49159426183810306),\n", " ('battles', 0.49140753790888908),\n", " ('possibility', 0.49112055268665822),\n", " ('romance', 0.4901589869574316),\n", " ('initially', 0.49002249613622745),\n", " ('happy', 0.4898997500608791),\n", " ('crime', 0.48977221456815834),\n", " ('singing', 0.4893852925281213),\n", " ('especially', 0.48901267837860624),\n", " ('shakespeare', 0.48754793889664511),\n", " ('hugh', 0.48729512635579658),\n", " ('detail', 0.48609484250827351),\n", " ('guide', 0.48550781578170082),\n", " ('companion', 0.48550781578170082),\n", " ('julia', 0.48550781578170082),\n", " ('san', 0.48550781578170082),\n", " ('desperation', 0.48550781578170082),\n", " ('strongly', 0.48460242866688824),\n", " ('necessary', 0.48302334245403883),\n", " ('humanity', 0.48265474679929443),\n", " ('drama', 0.48221998493060503),\n", " ('warming', 0.48183808689273838),\n", " ('intrigue', 0.48183808689273838),\n", " ('nonetheless', 0.48183808689273838),\n", " ('cuba', 0.48183808689273838),\n", " ('planned', 0.47957308026188628),\n", " ('pictures', 0.47929937011921681),\n", " ('broadcast', 0.47849024312305422),\n", " ('nine', 0.47803580094299974),\n", " ('settings', 0.47743860773325364),\n", " ('history', 0.47732966933780852),\n", " ('ordinary', 0.47725880012690741),\n", " ('trade', 0.47692407209030935),\n", " ('primary', 0.47608267532211779),\n", " ('official', 0.47608267532211779),\n", " ('episode', 0.47529620261150429),\n", " ('role', 0.47520268270188676),\n", " ('spirit', 0.47477690799839323),\n", " ('grey', 0.47409361449726067),\n", " ('ways', 0.47323464982718205),\n", " ('cup', 0.47260441094579297),\n", " ('piano', 0.47260441094579297),\n", " ('familiar', 0.47241617565111949),\n", " ('sinister', 0.47198579044972683),\n", " ('reveal', 0.47171449364936496),\n", " ('max', 0.47150852042515579),\n", " ('dated', 0.47121648567094482),\n", " ('discovery', 0.47000362924573563),\n", " ('vicious', 0.47000362924573563),\n", " ('losing', 0.47000362924573563),\n", " ('genuinely', 0.46871413841586385),\n", " ('hatred', 0.46734051182625186),\n", " ('mistaken', 0.46702300110759781),\n", " ('dream', 0.46608972992459924),\n", " ('challenge', 0.46608972992459924),\n", " ('crisis', 0.46575733836428446),\n", " ('photographed', 0.46488852857896512),\n", " ('machines', 0.46430560813109778),\n", " ('critics', 0.46430560813109778),\n", " ('bird', 0.46430560813109778),\n", " ('born', 0.46411383518967209),\n", " ('detective', 0.4636633473511525),\n", " ('higher', 0.46328467899699055),\n", " ('remains', 0.46262352194811296),\n", " ('inevitable', 0.46262352194811296),\n", " ('soviet', 0.4618180446592961),\n", " ('ryan', 0.46134556650262099),\n", " ('african', 0.46112595521371813),\n", " ('smaller', 0.46081520319132935),\n", " ('techniques', 0.46052488529119184),\n", " ('information', 0.46034171833399862),\n", " ('deserved', 0.45999798712841444),\n", " ('cynical', 0.45953232937844013),\n", " ('lynch', 0.45953232937844013),\n", " ('francisco', 0.45953232937844013),\n", " ('tour', 0.45953232937844013),\n", " ('spielberg', 0.45953232937844013),\n", " ('struggle', 0.45911782160048453),\n", " ('language', 0.45902121257712653),\n", " ('visual', 0.45823514408822852),\n", " ('warner', 0.45724137763188427),\n", " ('social', 0.45720078250735313),\n", " ('reality', 0.45719346885019546),\n", " ('hidden', 0.45675840249571492),\n", " ('breaking', 0.45601738727099561),\n", " ('sometimes', 0.45563021171182794),\n", " ('modern', 0.45500247579345005),\n", " ('surfing', 0.45425527227759638),\n", " ('popular', 0.45410691533051023),\n", " ('surprised', 0.4534409399850382),\n", " ('follows', 0.45245361754408348),\n", " ('keeps', 0.45234869400701483),\n", " ('john', 0.4520909494482197),\n", " ('defeat', 0.45198512374305722),\n", " ('mixed', 0.45198512374305722),\n", " ('justice', 0.45142724367280018),\n", " ('treasure', 0.45083371313801535),\n", " ('presents', 0.44973793178615257),\n", " ('years', 0.44919197032104968),\n", " ('chief', 0.44895022004790319),\n", " ('shadows', 0.44802472252696035),\n", " ('closely', 0.44701411102103689),\n", " ('segments', 0.44701411102103689),\n", " ('lose', 0.44658335503763702),\n", " ('caine', 0.44628710262841953),\n", " ('caught', 0.44610275383999071),\n", " ('hamlet', 0.44558510189758965),\n", " ('chinese', 0.44507424620321018),\n", " ('welcome', 0.44438052435783792),\n", " ('birth', 0.44368632092836219),\n", " ('represents', 0.44320543609101143),\n", " ('puts', 0.44279106572085081),\n", " ('fame', 0.44183275227903923),\n", " ('closer', 0.44183275227903923),\n", " ('visuals', 0.44183275227903923),\n", " ('web', 0.44183275227903923),\n", " ('criminal', 0.4412745608048752),\n", " ('minor', 0.4409224199448939),\n", " ('jon', 0.44086703515908027),\n", " ('liked', 0.44074991514020723),\n", " ('restaurant', 0.44031183943833246),\n", " ('flaws', 0.43983275161237217),\n", " ('de', 0.43983275161237217),\n", " ('searching', 0.4393666597838457),\n", " ('rap', 0.43891304217570443),\n", " ('light', 0.43884433018199892),\n", " ('elizabeth', 0.43872232986464677),\n", " ('marry', 0.43861731542506488),\n", " ('oz', 0.43825493093115531),\n", " ('controversial', 0.43825493093115531),\n", " ('learned', 0.43825493093115531),\n", " ('slowly', 0.43785660389939979),\n", " ('bridge', 0.43721380642274466),\n", " ('thrilling', 0.43721380642274466),\n", " ('wayne', 0.43721380642274466),\n", " ('comedic', 0.43721380642274466),\n", " ('married', 0.43658501682196887),\n", " ('nazi', 0.4361020775700542),\n", " ('murder', 0.4353180712578455),\n", " ('physical', 0.4353180712578455),\n", " ('johnny', 0.43483971678806865),\n", " ('michelle', 0.43445264498141672),\n", " ('wallace', 0.43403848055222038),\n", " ('silent', 0.43395706390247063),\n", " ('comedies', 0.43395706390247063),\n", " ('played', 0.43387244114515305),\n", " ('international', 0.43363598507486073),\n", " ('vision', 0.43286408229627887),\n", " ('intelligent', 0.43196704885367099),\n", " ('shop', 0.43078291609245434),\n", " ('also', 0.43036720209769169),\n", " ('levels', 0.4302451371066513),\n", " ('miss', 0.43006426712153217),\n", " ('ocean', 0.4295626596872249),\n", " ...]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# words most frequently seen in a review with a \"POSITIVE\" label\n", "pos_neg_ratios.most_common()" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "[('boll', -4.0778152602708904),\n", " ('uwe', -3.9218753018711578),\n", " ('seagal', -3.3202501058581921),\n", " ('unwatchable', -3.0269848170580955),\n", " ('stinker', -2.9876839403711624),\n", " ('mst', -2.7753833211707968),\n", " ('incoherent', -2.7641396677532537),\n", " ('unfunny', -2.5545257844967644),\n", " ('waste', -2.4907515123361046),\n", " ('blah', -2.4475792789485005),\n", " ('horrid', -2.3715779644809971),\n", " ('pointless', -2.3451073877136341),\n", " ('atrocious', -2.3187369339642556),\n", " ('redeeming', -2.2667790015910296),\n", " ('prom', -2.2601040980178784),\n", " ('drivel', -2.2476029585766928),\n", " ('lousy', -2.2118080125207054),\n", " ('worst', -2.1930856334332267),\n", " ('laughable', -2.172468615469592),\n", " ('awful', -2.1385076866397488),\n", " ('poorly', -2.1326133844207011),\n", " ('wasting', -2.1178155545614512),\n", " ('remotely', -2.111046881095167),\n", " ('existent', -2.0024805005437076),\n", " ('boredom', -1.9241486572738005),\n", " ('miserably', -1.9216610938019989),\n", " ('sucks', -1.9166645809588516),\n", " ('uninspired', -1.9131499212248517),\n", " ('lame', -1.9117232884159072),\n", " ('insult', -1.9085323769376259)]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# words most frequently seen in a review with a \"NEGATIVE\" label\n", "list(reversed(pos_neg_ratios.most_common()))[0:30]" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 1 }