You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

2408 lines
85 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sentiment Classification & How To \"Frame Problems\" for a Neural Network\n",
"\n",
"by Andrew Trask\n",
"\n",
"- **Twitter**: @iamtrask\n",
"- **Blog**: http://iamtrask.github.io"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What You Should Already Know\n",
"\n",
"- neural networks, forward and back-propagation\n",
"- stochastic gradient descent\n",
"- mean squared error\n",
"- and train/test splits\n",
"\n",
"### Where to Get Help if You Need it\n",
"- Re-watch previous Udacity Lectures\n",
"- Leverage the recommended Course Reading Material - [Grokking Deep Learning](https://www.manning.com/books/grokking-deep-learning) (40% Off: **traskud17**)\n",
"- Shoot me a tweet @iamtrask\n",
"\n",
"\n",
"### Tutorial Outline:\n",
"\n",
"- Intro: The Importance of \"Framing a Problem\"\n",
"\n",
"\n",
"- Curate a Dataset\n",
"- Developing a \"Predictive Theory\"\n",
"- **PROJECT 1**: Quick Theory Validation\n",
"\n",
"\n",
"- Transforming Text to Numbers\n",
"- **PROJECT 2**: Creating the Input/Output Data\n",
"\n",
"\n",
"- Putting it all together in a Neural Network\n",
"- **PROJECT 3**: Building our Neural Network\n",
"\n",
"\n",
"- Understanding Neural Noise\n",
"- **PROJECT 4**: Making Learning Faster by Reducing Noise\n",
"\n",
"\n",
"- Analyzing Inefficiencies in our Network\n",
"- **PROJECT 5**: Making our Network Train and Run Faster\n",
"\n",
"\n",
"- Further Noise Reduction\n",
"- **PROJECT 6**: Reducing Noise by Strategically Reducing the Vocabulary\n",
"\n",
"\n",
"- Analysis: What's going on in the weights?"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "56bb3cba-260c-4ebe-9ed6-b995b4c72aa3"
}
},
"source": [
"# Lesson: Curate a Dataset"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"nbpresent": {
"id": "eba2b193-0419-431e-8db9-60f34dd3fe83"
}
},
"outputs": [],
"source": [
"def pretty_print_review_and_label(i):\n",
" print(labels[i] + \"\\t:\\t\" + reviews[i][:80] + \"...\")\n",
"\n",
"g = open('reviews.txt','r') # What we know!\n",
"reviews = list(map(lambda x:x[:-1],g.readlines()))\n",
"g.close()\n",
"\n",
"g = open('labels.txt','r') # What we WANT to know!\n",
"labels = list(map(lambda x:x[:-1].upper(),g.readlines()))\n",
"g.close()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"25000"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(reviews)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"nbpresent": {
"id": "bb95574b-21a0-4213-ae50-34363cf4f87f"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life such as teachers . my years in the teaching profession lead me to believe that bromwell high s satire is much closer to reality than is teachers . the scramble to survive financially the insightful students who can see right through their pathetic teachers pomp the pettiness of the whole situation all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn t '"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"reviews[0]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false,
"nbpresent": {
"id": "e0408810-c424-4ed4-afb9-1735e9ddbd0a"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'POSITIVE'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"labels[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lesson: Develop a Predictive Theory"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false,
"nbpresent": {
"id": "e67a709f-234f-4493-bae6-4fb192141ee0"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"labels.txt \t : \t reviews.txt\n",
"\n",
"NEGATIVE\t:\tthis movie is terrible but it has some good effects . ...\n",
"POSITIVE\t:\tadrian pasdar is excellent is this film . he makes a fascinating woman . ...\n",
"NEGATIVE\t:\tcomment this movie is impossible . is terrible very improbable bad interpretat...\n",
"POSITIVE\t:\texcellent episode movie ala pulp fiction . days suicides . it doesnt get more...\n",
"NEGATIVE\t:\tif you haven t seen this it s terrible . it is pure trash . i saw this about ...\n",
"POSITIVE\t:\tthis schiffer guy is a real genius the movie is of excellent quality and both e...\n"
]
}
],
"source": [
"print(\"labels.txt \\t : \\t reviews.txt\\n\")\n",
"pretty_print_review_and_label(2137)\n",
"pretty_print_review_and_label(12816)\n",
"pretty_print_review_and_label(6267)\n",
"pretty_print_review_and_label(21934)\n",
"pretty_print_review_and_label(5297)\n",
"pretty_print_review_and_label(4998)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Project 1: Quick Theory Validation"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from collections import Counter\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"positive_counts = Counter()\n",
"negative_counts = Counter()\n",
"total_counts = Counter()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"for i in range(len(reviews)):\n",
" if(labels[i] == 'POSITIVE'):\n",
" for word in reviews[i].split(\" \"):\n",
" positive_counts[word] += 1\n",
" total_counts[word] += 1\n",
" else:\n",
" for word in reviews[i].split(\" \"):\n",
" negative_counts[word] += 1\n",
" total_counts[word] += 1"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[('', 550468),\n",
" ('the', 173324),\n",
" ('.', 159654),\n",
" ('and', 89722),\n",
" ('a', 83688),\n",
" ('of', 76855),\n",
" ('to', 66746),\n",
" ('is', 57245),\n",
" ('in', 50215),\n",
" ('br', 49235),\n",
" ('it', 48025),\n",
" ('i', 40743),\n",
" ('that', 35630),\n",
" ('this', 35080),\n",
" ('s', 33815),\n",
" ('as', 26308),\n",
" ('with', 23247),\n",
" ('for', 22416),\n",
" ('was', 21917),\n",
" ('film', 20937),\n",
" ('but', 20822),\n",
" ('movie', 19074),\n",
" ('his', 17227),\n",
" ('on', 17008),\n",
" ('you', 16681),\n",
" ('he', 16282),\n",
" ('are', 14807),\n",
" ('not', 14272),\n",
" ('t', 13720),\n",
" ('one', 13655),\n",
" ('have', 12587),\n",
" ('be', 12416),\n",
" ('by', 11997),\n",
" ('all', 11942),\n",
" ('who', 11464),\n",
" ('an', 11294),\n",
" ('at', 11234),\n",
" ('from', 10767),\n",
" ('her', 10474),\n",
" ('they', 9895),\n",
" ('has', 9186),\n",
" ('so', 9154),\n",
" ('like', 9038),\n",
" ('about', 8313),\n",
" ('very', 8305),\n",
" ('out', 8134),\n",
" ('there', 8057),\n",
" ('she', 7779),\n",
" ('what', 7737),\n",
" ('or', 7732),\n",
" ('good', 7720),\n",
" ('more', 7521),\n",
" ('when', 7456),\n",
" ('some', 7441),\n",
" ('if', 7285),\n",
" ('just', 7152),\n",
" ('can', 7001),\n",
" ('story', 6780),\n",
" ('time', 6515),\n",
" ('my', 6488),\n",
" ('great', 6419),\n",
" ('well', 6405),\n",
" ('up', 6321),\n",
" ('which', 6267),\n",
" ('their', 6107),\n",
" ('see', 6026),\n",
" ('also', 5550),\n",
" ('we', 5531),\n",
" ('really', 5476),\n",
" ('would', 5400),\n",
" ('will', 5218),\n",
" ('me', 5167),\n",
" ('had', 5148),\n",
" ('only', 5137),\n",
" ('him', 5018),\n",
" ('even', 4964),\n",
" ('most', 4864),\n",
" ('other', 4858),\n",
" ('were', 4782),\n",
" ('first', 4755),\n",
" ('than', 4736),\n",
" ('much', 4685),\n",
" ('its', 4622),\n",
" ('no', 4574),\n",
" ('into', 4544),\n",
" ('people', 4479),\n",
" ('best', 4319),\n",
" ('love', 4301),\n",
" ('get', 4272),\n",
" ('how', 4213),\n",
" ('life', 4199),\n",
" ('been', 4189),\n",
" ('because', 4079),\n",
" ('way', 4036),\n",
" ('do', 3941),\n",
" ('made', 3823),\n",
" ('films', 3813),\n",
" ('them', 3805),\n",
" ('after', 3800),\n",
" ('many', 3766),\n",
" ('two', 3733),\n",
" ('too', 3659),\n",
" ('think', 3655),\n",
" ('movies', 3586),\n",
" ('characters', 3560),\n",
" ('character', 3514),\n",
" ('don', 3468),\n",
" ('man', 3460),\n",
" ('show', 3432),\n",
" ('watch', 3424),\n",
" ('seen', 3414),\n",
" ('then', 3358),\n",
" ('little', 3341),\n",
" ('still', 3340),\n",
" ('make', 3303),\n",
" ('could', 3237),\n",
" ('never', 3226),\n",
" ('being', 3217),\n",
" ('where', 3173),\n",
" ('does', 3069),\n",
" ('over', 3017),\n",
" ('any', 3002),\n",
" ('while', 2899),\n",
" ('know', 2833),\n",
" ('did', 2790),\n",
" ('years', 2758),\n",
" ('here', 2740),\n",
" ('ever', 2734),\n",
" ('end', 2696),\n",
" ('these', 2694),\n",
" ('such', 2590),\n",
" ('real', 2568),\n",
" ('scene', 2567),\n",
" ('back', 2547),\n",
" ('those', 2485),\n",
" ('though', 2475),\n",
" ('off', 2463),\n",
" ('new', 2458),\n",
" ('your', 2453),\n",
" ('go', 2440),\n",
" ('acting', 2437),\n",
" ('plot', 2432),\n",
" ('world', 2429),\n",
" ('scenes', 2427),\n",
" ('say', 2414),\n",
" ('through', 2409),\n",
" ('makes', 2390),\n",
" ('better', 2381),\n",
" ('now', 2368),\n",
" ('work', 2346),\n",
" ('young', 2343),\n",
" ('old', 2311),\n",
" ('ve', 2307),\n",
" ('find', 2272),\n",
" ('both', 2248),\n",
" ('before', 2177),\n",
" ('us', 2162),\n",
" ('again', 2158),\n",
" ('series', 2153),\n",
" ('quite', 2143),\n",
" ('something', 2135),\n",
" ('cast', 2133),\n",
" ('should', 2121),\n",
" ('part', 2098),\n",
" ('always', 2088),\n",
" ('lot', 2087),\n",
" ('another', 2075),\n",
" ('actors', 2047),\n",
" ('director', 2040),\n",
" ('family', 2032),\n",
" ('between', 2016),\n",
" ('own', 2016),\n",
" ('m', 1998),\n",
" ('may', 1997),\n",
" ('same', 1972),\n",
" ('role', 1967),\n",
" ('watching', 1966),\n",
" ('every', 1954),\n",
" ('funny', 1953),\n",
" ('doesn', 1935),\n",
" ('performance', 1928),\n",
" ('few', 1918),\n",
" ('bad', 1907),\n",
" ('look', 1900),\n",
" ('re', 1884),\n",
" ('why', 1855),\n",
" ('things', 1849),\n",
" ('times', 1832),\n",
" ('big', 1815),\n",
" ('however', 1795),\n",
" ('actually', 1790),\n",
" ('action', 1789),\n",
" ('going', 1783),\n",
" ('bit', 1757),\n",
" ('comedy', 1742),\n",
" ('down', 1740),\n",
" ('music', 1738),\n",
" ('must', 1728),\n",
" ('take', 1709),\n",
" ('saw', 1692),\n",
" ('long', 1690),\n",
" ('right', 1688),\n",
" ('fun', 1686),\n",
" ('fact', 1684),\n",
" ('excellent', 1683),\n",
" ('around', 1674),\n",
" ('didn', 1672),\n",
" ('without', 1671),\n",
" ('thing', 1662),\n",
" ('thought', 1639),\n",
" ('got', 1635),\n",
" ('each', 1630),\n",
" ('day', 1614),\n",
" ('feel', 1597),\n",
" ('seems', 1596),\n",
" ('come', 1594),\n",
" ('done', 1586),\n",
" ('beautiful', 1580),\n",
" ('especially', 1572),\n",
" ('played', 1571),\n",
" ('almost', 1566),\n",
" ('want', 1562),\n",
" ('yet', 1556),\n",
" ('give', 1553),\n",
" ('pretty', 1549),\n",
" ('last', 1543),\n",
" ('since', 1519),\n",
" ('different', 1504),\n",
" ('although', 1501),\n",
" ('gets', 1490),\n",
" ('true', 1487),\n",
" ('interesting', 1481),\n",
" ('job', 1470),\n",
" ('enough', 1455),\n",
" ('our', 1454),\n",
" ('shows', 1447),\n",
" ('horror', 1441),\n",
" ('woman', 1439),\n",
" ('tv', 1400),\n",
" ('probably', 1398),\n",
" ('father', 1395),\n",
" ('original', 1393),\n",
" ('girl', 1390),\n",
" ('point', 1379),\n",
" ('plays', 1378),\n",
" ('wonderful', 1372),\n",
" ('far', 1358),\n",
" ('course', 1358),\n",
" ('john', 1350),\n",
" ('rather', 1340),\n",
" ('isn', 1328),\n",
" ('ll', 1326),\n",
" ('later', 1324),\n",
" ('dvd', 1324),\n",
" ('war', 1310),\n",
" ('whole', 1310),\n",
" ('d', 1307),\n",
" ('away', 1306),\n",
" ('found', 1306),\n",
" ('screen', 1305),\n",
" ('nothing', 1300),\n",
" ('year', 1297),\n",
" ('once', 1296),\n",
" ('hard', 1294),\n",
" ('together', 1280),\n",
" ('am', 1277),\n",
" ('set', 1277),\n",
" ('having', 1266),\n",
" ('making', 1265),\n",
" ('place', 1263),\n",
" ('comes', 1260),\n",
" ('might', 1260),\n",
" ('sure', 1253),\n",
" ('american', 1248),\n",
" ('play', 1245),\n",
" ('kind', 1244),\n",
" ('takes', 1242),\n",
" ('perfect', 1242),\n",
" ('performances', 1237),\n",
" ('himself', 1230),\n",
" ('worth', 1221),\n",
" ('everyone', 1221),\n",
" ('anyone', 1214),\n",
" ('actor', 1203),\n",
" ('three', 1201),\n",
" ('wife', 1196),\n",
" ('classic', 1192),\n",
" ('goes', 1186),\n",
" ('ending', 1178),\n",
" ('version', 1168),\n",
" ('star', 1149),\n",
" ('enjoy', 1146),\n",
" ('book', 1142),\n",
" ('nice', 1132),\n",
" ('everything', 1128),\n",
" ('during', 1124),\n",
" ('put', 1118),\n",
" ('seeing', 1111),\n",
" ('least', 1102),\n",
" ('house', 1100),\n",
" ('high', 1095),\n",
" ('watched', 1094),\n",
" ('men', 1087),\n",
" ('loved', 1087),\n",
" ('night', 1082),\n",
" ('anything', 1075),\n",
" ('guy', 1071),\n",
" ('believe', 1071),\n",
" ('top', 1063),\n",
" ('amazing', 1058),\n",
" ('hollywood', 1056),\n",
" ('looking', 1053),\n",
" ('main', 1044),\n",
" ('definitely', 1043),\n",
" ('gives', 1031),\n",
" ('home', 1029),\n",
" ('seem', 1028),\n",
" ('episode', 1023),\n",
" ('sense', 1020),\n",
" ('audience', 1020),\n",
" ('truly', 1017),\n",
" ('special', 1011),\n",
" ('fan', 1009),\n",
" ('second', 1009),\n",
" ('short', 1009),\n",
" ('mind', 1005),\n",
" ('human', 1001),\n",
" ('recommend', 999),\n",
" ('full', 996),\n",
" ('black', 995),\n",
" ('help', 991),\n",
" ('along', 989),\n",
" ('trying', 987),\n",
" ('small', 986),\n",
" ('death', 985),\n",
" ('friends', 981),\n",
" ('remember', 974),\n",
" ('often', 970),\n",
" ('said', 966),\n",
" ('favorite', 962),\n",
" ('heart', 959),\n",
" ('early', 957),\n",
" ('left', 956),\n",
" ('until', 955),\n",
" ('let', 954),\n",
" ('script', 954),\n",
" ('maybe', 937),\n",
" ('today', 936),\n",
" ('live', 934),\n",
" ('less', 934),\n",
" ('moments', 933),\n",
" ('others', 929),\n",
" ('brilliant', 926),\n",
" ('shot', 925),\n",
" ('liked', 923),\n",
" ('become', 916),\n",
" ('won', 915),\n",
" ('used', 910),\n",
" ('style', 907),\n",
" ('mother', 895),\n",
" ('lives', 894),\n",
" ('came', 893),\n",
" ('stars', 890),\n",
" ('cinema', 889),\n",
" ('looks', 885),\n",
" ('perhaps', 884),\n",
" ('read', 882),\n",
" ('enjoyed', 879),\n",
" ('boy', 875),\n",
" ('drama', 873),\n",
" ('highly', 871),\n",
" ('given', 870),\n",
" ('playing', 867),\n",
" ('use', 864),\n",
" ('next', 859),\n",
" ('women', 858),\n",
" ('fine', 857),\n",
" ('effects', 856),\n",
" ('kids', 854),\n",
" ('entertaining', 853),\n",
" ('need', 852),\n",
" ('line', 850),\n",
" ('works', 848),\n",
" ('someone', 847),\n",
" ('mr', 836),\n",
" ('simply', 835),\n",
" ('children', 833),\n",
" ('picture', 833),\n",
" ('face', 831),\n",
" ('friend', 831),\n",
" ('keep', 831),\n",
" ('dark', 830),\n",
" ('overall', 828),\n",
" ('certainly', 828),\n",
" ('minutes', 827),\n",
" ('wasn', 824),\n",
" ('history', 822),\n",
" ('finally', 820),\n",
" ('couple', 816),\n",
" ('against', 815),\n",
" ('son', 809),\n",
" ('understand', 808),\n",
" ('lost', 807),\n",
" ('michael', 805),\n",
" ('else', 801),\n",
" ('throughout', 798),\n",
" ('fans', 797),\n",
" ('city', 792),\n",
" ('reason', 789),\n",
" ('written', 787),\n",
" ('production', 787),\n",
" ('several', 784),\n",
" ('school', 783),\n",
" ('rest', 781),\n",
" ('based', 781),\n",
" ('try', 780),\n",
" ('dead', 776),\n",
" ('hope', 775),\n",
" ('strong', 768),\n",
" ('white', 765),\n",
" ('tell', 759),\n",
" ('itself', 758),\n",
" ('half', 753),\n",
" ('person', 749),\n",
" ('sometimes', 746),\n",
" ('past', 744),\n",
" ('start', 744),\n",
" ('genre', 743),\n",
" ('final', 739),\n",
" ('beginning', 739),\n",
" ('town', 738),\n",
" ('art', 734),\n",
" ('game', 732),\n",
" ('humor', 732),\n",
" ('yes', 731),\n",
" ('idea', 731),\n",
" ('late', 730),\n",
" ('becomes', 729),\n",
" ('despite', 729),\n",
" ('able', 726),\n",
" ('case', 726),\n",
" ('money', 723),\n",
" ('child', 721),\n",
" ('completely', 721),\n",
" ('side', 719),\n",
" ('camera', 716),\n",
" ('getting', 714),\n",
" ('instead', 712),\n",
" ('soon', 702),\n",
" ('under', 700),\n",
" ('viewer', 699),\n",
" ('age', 697),\n",
" ('days', 696),\n",
" ('stories', 696),\n",
" ('felt', 694),\n",
" ('simple', 694),\n",
" ('roles', 693),\n",
" ('video', 688),\n",
" ('name', 683),\n",
" ('either', 683),\n",
" ('doing', 677),\n",
" ('turns', 674),\n",
" ('wants', 671),\n",
" ('close', 671),\n",
" ('title', 669),\n",
" ('wrong', 668),\n",
" ('went', 666),\n",
" ('james', 665),\n",
" ('evil', 659),\n",
" ('budget', 657),\n",
" ('episodes', 657),\n",
" ('relationship', 655),\n",
" ('piece', 653),\n",
" ('fantastic', 653),\n",
" ('david', 651),\n",
" ('turn', 648),\n",
" ('murder', 646),\n",
" ('parts', 645),\n",
" ('brother', 644),\n",
" ('head', 643),\n",
" ('absolutely', 643),\n",
" ('experience', 642),\n",
" ('eyes', 641),\n",
" ('sex', 638),\n",
" ('direction', 637),\n",
" ('called', 637),\n",
" ('directed', 636),\n",
" ('lines', 634),\n",
" ('behind', 633),\n",
" ('sort', 632),\n",
" ('actress', 631),\n",
" ('lead', 630),\n",
" ('oscar', 628),\n",
" ('example', 627),\n",
" ('including', 627),\n",
" ('known', 625),\n",
" ('musical', 625),\n",
" ('chance', 621),\n",
" ('score', 620),\n",
" ('feeling', 619),\n",
" ('already', 619),\n",
" ('hit', 619),\n",
" ('voice', 615),\n",
" ('moment', 612),\n",
" ('living', 612),\n",
" ('low', 610),\n",
" ('supporting', 610),\n",
" ('ago', 609),\n",
" ('themselves', 608),\n",
" ('hilarious', 605),\n",
" ('reality', 605),\n",
" ('jack', 604),\n",
" ('told', 603),\n",
" ('hand', 601),\n",
" ('moving', 600),\n",
" ('dialogue', 600),\n",
" ('quality', 600),\n",
" ('song', 599),\n",
" ('happy', 599),\n",
" ('paul', 598),\n",
" ('matter', 598),\n",
" ('light', 594),\n",
" ('future', 593),\n",
" ('entire', 592),\n",
" ('finds', 591),\n",
" ('gave', 589),\n",
" ('laugh', 587),\n",
" ('released', 586),\n",
" ('expect', 584),\n",
" ('fight', 581),\n",
" ('particularly', 580),\n",
" ('cinematography', 579),\n",
" ('police', 579),\n",
" ('whose', 578),\n",
" ('type', 578),\n",
" ('sound', 578),\n",
" ('enjoyable', 573),\n",
" ('view', 573),\n",
" ('husband', 572),\n",
" ('romantic', 572),\n",
" ('number', 572),\n",
" ('daughter', 572),\n",
" ('documentary', 571),\n",
" ('self', 570),\n",
" ('modern', 569),\n",
" ('robert', 569),\n",
" ('took', 569),\n",
" ('superb', 569),\n",
" ('mean', 566),\n",
" ('shown', 563),\n",
" ('coming', 561),\n",
" ('important', 560),\n",
" ('king', 559),\n",
" ('leave', 559),\n",
" ('change', 558),\n",
" ('wanted', 555),\n",
" ('somewhat', 555),\n",
" ('tells', 554),\n",
" ('run', 552),\n",
" ('events', 552),\n",
" ('country', 552),\n",
" ('career', 552),\n",
" ('heard', 550),\n",
" ('season', 550),\n",
" ('girls', 549),\n",
" ('greatest', 549),\n",
" ('etc', 547),\n",
" ('care', 546),\n",
" ('starts', 545),\n",
" ('english', 542),\n",
" ('killer', 541),\n",
" ('animation', 540),\n",
" ('guys', 540),\n",
" ('totally', 540),\n",
" ('tale', 540),\n",
" ('usual', 539),\n",
" ('opinion', 535),\n",
" ('miss', 535),\n",
" ('violence', 531),\n",
" ('easy', 531),\n",
" ('songs', 530),\n",
" ('british', 528),\n",
" ('says', 526),\n",
" ('realistic', 525),\n",
" ('writing', 524),\n",
" ('act', 522),\n",
" ('writer', 522),\n",
" ('comic', 521),\n",
" ('thriller', 519),\n",
" ('television', 517),\n",
" ('power', 516),\n",
" ('ones', 515),\n",
" ('kid', 514),\n",
" ('novel', 513),\n",
" ('york', 513),\n",
" ('problem', 512),\n",
" ('alone', 512),\n",
" ('attention', 509),\n",
" ('involved', 508),\n",
" ('kill', 507),\n",
" ('extremely', 507),\n",
" ('seemed', 506),\n",
" ('hero', 505),\n",
" ('french', 505),\n",
" ('rock', 504),\n",
" ('stuff', 501),\n",
" ('wish', 499),\n",
" ('begins', 498),\n",
" ('taken', 497),\n",
" ('sad', 497),\n",
" ('ways', 496),\n",
" ('richard', 495),\n",
" ('knows', 494),\n",
" ('atmosphere', 493),\n",
" ('surprised', 491),\n",
" ('similar', 491),\n",
" ('taking', 491),\n",
" ('car', 491),\n",
" ('george', 490),\n",
" ('perfectly', 490),\n",
" ('across', 489),\n",
" ('sequence', 489),\n",
" ('eye', 489),\n",
" ('team', 489),\n",
" ('serious', 488),\n",
" ('powerful', 488),\n",
" ('room', 488),\n",
" ('due', 488),\n",
" ('among', 488),\n",
" ('order', 487),\n",
" ('b', 487),\n",
" ('cannot', 487),\n",
" ('strange', 487),\n",
" ('beauty', 486),\n",
" ('famous', 485),\n",
" ('tries', 484),\n",
" ('myself', 484),\n",
" ('happened', 484),\n",
" ('herself', 484),\n",
" ('class', 483),\n",
" ('four', 482),\n",
" ('cool', 481),\n",
" ('release', 479),\n",
" ('anyway', 479),\n",
" ('theme', 479),\n",
" ('opening', 478),\n",
" ('entertainment', 477),\n",
" ('unique', 475),\n",
" ('ends', 475),\n",
" ('slow', 475),\n",
" ('exactly', 475),\n",
" ('red', 474),\n",
" ('o', 474),\n",
" ('level', 474),\n",
" ('easily', 474),\n",
" ('interest', 472),\n",
" ('happen', 471),\n",
" ('crime', 470),\n",
" ('viewing', 468),\n",
" ('memorable', 467),\n",
" ('sets', 467),\n",
" ('group', 466),\n",
" ('stop', 466),\n",
" ('dance', 463),\n",
" ('message', 463),\n",
" ('sister', 463),\n",
" ('working', 463),\n",
" ('problems', 463),\n",
" ('knew', 462),\n",
" ('mystery', 461),\n",
" ('nature', 461),\n",
" ('bring', 460),\n",
" ('believable', 459),\n",
" ('thinking', 459),\n",
" ('brought', 459),\n",
" ('mostly', 458),\n",
" ('couldn', 457),\n",
" ('disney', 457),\n",
" ('society', 456),\n",
" ('within', 455),\n",
" ('lady', 455),\n",
" ('blood', 454),\n",
" ('upon', 453),\n",
" ('viewers', 453),\n",
" ('parents', 453),\n",
" ('meets', 452),\n",
" ('form', 452),\n",
" ('soundtrack', 452),\n",
" ('usually', 452),\n",
" ('tom', 452),\n",
" ('peter', 452),\n",
" ('local', 450),\n",
" ('certain', 448),\n",
" ('follow', 448),\n",
" ('whether', 447),\n",
" ('possible', 446),\n",
" ('emotional', 445),\n",
" ('killed', 444),\n",
" ('de', 444),\n",
" ('above', 444),\n",
" ('middle', 443),\n",
" ('god', 443),\n",
" ('happens', 442),\n",
" ('flick', 442),\n",
" ('needs', 442),\n",
" ('masterpiece', 441),\n",
" ('major', 440),\n",
" ('period', 440),\n",
" ('haven', 439),\n",
" ('named', 439),\n",
" ('th', 438),\n",
" ('particular', 438),\n",
" ('earth', 437),\n",
" ('feature', 437),\n",
" ('stand', 436),\n",
" ('words', 435),\n",
" ('typical', 435),\n",
" ('obviously', 433),\n",
" ('elements', 433),\n",
" ('romance', 431),\n",
" ('jane', 430),\n",
" ('yourself', 427),\n",
" ('showing', 427),\n",
" ('fantasy', 426),\n",
" ('brings', 426),\n",
" ('america', 423),\n",
" ('guess', 423),\n",
" ('huge', 422),\n",
" ('unfortunately', 422),\n",
" ('indeed', 421),\n",
" ('running', 421),\n",
" ('talent', 420),\n",
" ('stage', 419),\n",
" ('started', 418),\n",
" ('sweet', 417),\n",
" ('leads', 417),\n",
" ('japanese', 417),\n",
" ('poor', 416),\n",
" ('deal', 416),\n",
" ('personal', 413),\n",
" ('incredible', 413),\n",
" ('fast', 412),\n",
" ('became', 410),\n",
" ('deep', 410),\n",
" ('hours', 409),\n",
" ('nearly', 408),\n",
" ('dream', 408),\n",
" ('giving', 408),\n",
" ('turned', 407),\n",
" ('clearly', 407),\n",
" ('near', 406),\n",
" ('obvious', 406),\n",
" ('cut', 405),\n",
" ('surprise', 405),\n",
" ('body', 404),\n",
" ('era', 404),\n",
" ('female', 403),\n",
" ('hour', 403),\n",
" ('five', 403),\n",
" ('note', 399),\n",
" ('learn', 398),\n",
" ('truth', 398),\n",
" ('match', 397),\n",
" ('feels', 397),\n",
" ('except', 397),\n",
" ('tony', 397),\n",
" ('filmed', 394),\n",
" ('complete', 394),\n",
" ('clear', 394),\n",
" ('older', 393),\n",
" ('street', 393),\n",
" ('lots', 393),\n",
" ('eventually', 393),\n",
" ('keeps', 393),\n",
" ('buy', 392),\n",
" ('stewart', 391),\n",
" ('william', 391),\n",
" ('joe', 390),\n",
" ('meet', 390),\n",
" ('fall', 390),\n",
" ('shots', 389),\n",
" ('talking', 389),\n",
" ('difficult', 389),\n",
" ('unlike', 389),\n",
" ('rating', 389),\n",
" ('means', 388),\n",
" ('dramatic', 388),\n",
" ('appears', 386),\n",
" ('subject', 386),\n",
" ('wonder', 386),\n",
" ('present', 386),\n",
" ('situation', 386),\n",
" ('comments', 385),\n",
" ('sequences', 383),\n",
" ('general', 383),\n",
" ('lee', 383),\n",
" ('earlier', 382),\n",
" ('points', 382),\n",
" ('check', 379),\n",
" ('gone', 379),\n",
" ('ten', 378),\n",
" ('suspense', 378),\n",
" ('recommended', 378),\n",
" ('business', 377),\n",
" ('third', 377),\n",
" ('talk', 375),\n",
" ('leaves', 375),\n",
" ('beyond', 375),\n",
" ('portrayal', 374),\n",
" ('beautifully', 373),\n",
" ('single', 372),\n",
" ('bill', 372),\n",
" ('word', 371),\n",
" ('plenty', 371),\n",
" ('falls', 370),\n",
" ('whom', 370),\n",
" ('figure', 369),\n",
" ('battle', 369),\n",
" ('scary', 369),\n",
" ('non', 369),\n",
" ('return', 368),\n",
" ('using', 368),\n",
" ('doubt', 367),\n",
" ('add', 367),\n",
" ('hear', 366),\n",
" ('solid', 366),\n",
" ('success', 366),\n",
" ('touching', 365),\n",
" ('political', 365),\n",
" ('oh', 365),\n",
" ('jokes', 365),\n",
" ('awesome', 364),\n",
" ('hell', 364),\n",
" ('boys', 364),\n",
" ('dog', 362),\n",
" ('recently', 362),\n",
" ('sexual', 362),\n",
" ('please', 361),\n",
" ('wouldn', 361),\n",
" ('features', 361),\n",
" ('straight', 361),\n",
" ('lack', 360),\n",
" ('forget', 360),\n",
" ('setting', 360),\n",
" ('mark', 359),\n",
" ('married', 359),\n",
" ('social', 357),\n",
" ('adventure', 356),\n",
" ('interested', 356),\n",
" ('brothers', 355),\n",
" ('sees', 355),\n",
" ('actual', 355),\n",
" ('terrific', 355),\n",
" ('move', 354),\n",
" ('call', 354),\n",
" ('various', 353),\n",
" ('dr', 353),\n",
" ('theater', 353),\n",
" ('animated', 352),\n",
" ('western', 351),\n",
" ('space', 350),\n",
" ('baby', 350),\n",
" ('leading', 348),\n",
" ('disappointed', 348),\n",
" ('portrayed', 346),\n",
" ('aren', 346),\n",
" ('screenplay', 345),\n",
" ('smith', 345),\n",
" ('hate', 344),\n",
" ('towards', 344),\n",
" ('noir', 343),\n",
" ('outstanding', 342),\n",
" ('decent', 342),\n",
" ('kelly', 342),\n",
" ('directors', 341),\n",
" ('journey', 341),\n",
" ('none', 340),\n",
" ('effective', 340),\n",
" ('looked', 340),\n",
" ('caught', 339),\n",
" ('cold', 339),\n",
" ('storyline', 339),\n",
" ('fi', 339),\n",
" ('sci', 339),\n",
" ('mary', 339),\n",
" ('rich', 338),\n",
" ('charming', 338),\n",
" ('harry', 337),\n",
" ('popular', 337),\n",
" ('manages', 337),\n",
" ('rare', 337),\n",
" ('spirit', 336),\n",
" ('open', 335),\n",
" ('appreciate', 335),\n",
" ('basically', 334),\n",
" ('moves', 334),\n",
" ('acted', 334),\n",
" ('deserves', 333),\n",
" ('subtle', 333),\n",
" ('mention', 333),\n",
" ('inside', 333),\n",
" ('pace', 333),\n",
" ('century', 333),\n",
" ('boring', 333),\n",
" ('familiar', 332),\n",
" ('background', 332),\n",
" ('ben', 331),\n",
" ('creepy', 330),\n",
" ('supposed', 330),\n",
" ('secret', 329),\n",
" ('jim', 328),\n",
" ('die', 328),\n",
" ('question', 327),\n",
" ('effect', 327),\n",
" ('natural', 327),\n",
" ('rate', 326),\n",
" ('language', 326),\n",
" ('impressive', 326),\n",
" ('intelligent', 325),\n",
" ('saying', 325),\n",
" ('material', 324),\n",
" ('realize', 324),\n",
" ('telling', 324),\n",
" ('scott', 324),\n",
" ('singing', 323),\n",
" ('dancing', 322),\n",
" ('adult', 321),\n",
" ('imagine', 321),\n",
" ('visual', 321),\n",
" ('kept', 320),\n",
" ('office', 320),\n",
" ('uses', 319),\n",
" ('pure', 318),\n",
" ('wait', 318),\n",
" ('stunning', 318),\n",
" ('copy', 317),\n",
" ('review', 317),\n",
" ('previous', 317),\n",
" ('seriously', 317),\n",
" ('somehow', 316),\n",
" ('created', 316),\n",
" ('magic', 316),\n",
" ('create', 316),\n",
" ('hot', 316),\n",
" ('reading', 316),\n",
" ('crazy', 315),\n",
" ('air', 315),\n",
" ('frank', 315),\n",
" ('stay', 315),\n",
" ('escape', 315),\n",
" ('attempt', 315),\n",
" ('hands', 314),\n",
" ('filled', 313),\n",
" ('surprisingly', 312),\n",
" ('expected', 312),\n",
" ('average', 312),\n",
" ('complex', 311),\n",
" ('studio', 310),\n",
" ('successful', 310),\n",
" ('quickly', 310),\n",
" ('male', 309),\n",
" ('plus', 309),\n",
" ('co', 307),\n",
" ('minute', 306),\n",
" ('images', 306),\n",
" ('casting', 306),\n",
" ('exciting', 306),\n",
" ('following', 306),\n",
" ('members', 305),\n",
" ('german', 305),\n",
" ('e', 305),\n",
" ('reasons', 305),\n",
" ('follows', 305),\n",
" ('themes', 305),\n",
" ('touch', 304),\n",
" ('genius', 304),\n",
" ('free', 304),\n",
" ('edge', 304),\n",
" ('cute', 304),\n",
" ('outside', 303),\n",
" ('ok', 302),\n",
" ('admit', 302),\n",
" ('younger', 302),\n",
" ('reviews', 302),\n",
" ('odd', 301),\n",
" ('fighting', 301),\n",
" ('master', 301),\n",
" ('break', 300),\n",
" ('thanks', 300),\n",
" ('recent', 300),\n",
" ('comment', 300),\n",
" ('apart', 299),\n",
" ('lovely', 298),\n",
" ('begin', 298),\n",
" ('emotions', 298),\n",
" ('doctor', 297),\n",
" ('italian', 297),\n",
" ('party', 297),\n",
" ('la', 296),\n",
" ('missed', 296),\n",
" ...]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"positive_counts.most_common()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"pos_neg_ratios = Counter()\n",
"\n",
"for term,cnt in list(total_counts.most_common()):\n",
" if(cnt > 100):\n",
" pos_neg_ratio = positive_counts[term] / float(negative_counts[term]+1)\n",
" pos_neg_ratios[term] = pos_neg_ratio\n",
"\n",
"for word,ratio in pos_neg_ratios.most_common():\n",
" if(ratio > 1):\n",
" pos_neg_ratios[word] = np.log(ratio)\n",
" else:\n",
" pos_neg_ratios[word] = -np.log((1 / (ratio+0.01)))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[('edie', 4.6913478822291435),\n",
" ('paulie', 4.0775374439057197),\n",
" ('felix', 3.1527360223636558),\n",
" ('polanski', 2.8233610476132043),\n",
" ('matthau', 2.8067217286092401),\n",
" ('victoria', 2.6810215287142909),\n",
" ('mildred', 2.6026896854443837),\n",
" ('gandhi', 2.5389738710582761),\n",
" ('flawless', 2.451005098112319),\n",
" ('superbly', 2.2600254785752498),\n",
" ('perfection', 2.1594842493533721),\n",
" ('astaire', 2.1400661634962708),\n",
" ('captures', 2.0386195471595809),\n",
" ('voight', 2.0301704926730531),\n",
" ('wonderfully', 2.0218960560332353),\n",
" ('powell', 1.9783454248084671),\n",
" ('brosnan', 1.9547990964725592),\n",
" ('lily', 1.9203768470501485),\n",
" ('bakshi', 1.9029851043382795),\n",
" ('lincoln', 1.9014583864844796),\n",
" ('refreshing', 1.8551812956655511),\n",
" ('breathtaking', 1.8481124057791867),\n",
" ('bourne', 1.8478489358790986),\n",
" ('lemmon', 1.8458266904983307),\n",
" ('delightful', 1.8002701588959635),\n",
" ('flynn', 1.7996646487351682),\n",
" ('andrews', 1.7764919970972666),\n",
" ('homer', 1.7692866133759964),\n",
" ('beautifully', 1.7626953362841438),\n",
" ('soccer', 1.7578579175523736),\n",
" ('elvira', 1.7397031072720019),\n",
" ('underrated', 1.7197859696029656),\n",
" ('gripping', 1.7165360479904674),\n",
" ('superb', 1.7091514458966952),\n",
" ('delight', 1.6714733033535532),\n",
" ('welles', 1.6677068205580761),\n",
" ('sadness', 1.663505133704376),\n",
" ('sinatra', 1.6389967146756448),\n",
" ('touching', 1.637217476541176),\n",
" ('timeless', 1.62924053973028),\n",
" ('macy', 1.6211339521972916),\n",
" ('unforgettable', 1.6177367152487956),\n",
" ('favorites', 1.6158688027643908),\n",
" ('stewart', 1.6119987332957739),\n",
" ('hartley', 1.6094379124341003),\n",
" ('sullivan', 1.6094379124341003),\n",
" ('extraordinary', 1.6094379124341003),\n",
" ('brilliantly', 1.5950491749820008),\n",
" ('friendship', 1.5677652160335325),\n",
" ('wonderful', 1.5645425925262093),\n",
" ('palma', 1.5553706911638245),\n",
" ('magnificent', 1.54663701119507),\n",
" ('finest', 1.5462590108125689),\n",
" ('jackie', 1.5439233053234738),\n",
" ('ritter', 1.5404450409471491),\n",
" ('tremendous', 1.5184661342283736),\n",
" ('freedom', 1.5091151908062312),\n",
" ('fantastic', 1.5048433868558566),\n",
" ('terrific', 1.5026699370083942),\n",
" ('noir', 1.493925025312256),\n",
" ('sidney', 1.493925025312256),\n",
" ('outstanding', 1.4910053152089213),\n",
" ('mann', 1.4894785973551214),\n",
" ('pleasantly', 1.4894785973551214),\n",
" ('nancy', 1.488077055429833),\n",
" ('marie', 1.4825711915553104),\n",
" ('marvelous', 1.4739999415389962),\n",
" ('excellent', 1.4647538505723599),\n",
" ('ruth', 1.4596256342054401),\n",
" ('stanwyck', 1.4412101187160054),\n",
" ('widmark', 1.4350845252893227),\n",
" ('splendid', 1.4271163556401458),\n",
" ('chan', 1.423108334242607),\n",
" ('exceptional', 1.4201959127955721),\n",
" ('tender', 1.410986973710262),\n",
" ('gentle', 1.4078005663408544),\n",
" ('poignant', 1.4022947024663317),\n",
" ('gem', 1.3932148039644643),\n",
" ('amazing', 1.3919815802404802),\n",
" ('chilling', 1.3862943611198906),\n",
" ('captivating', 1.3862943611198906),\n",
" ('fisher', 1.3862943611198906),\n",
" ('davies', 1.3862943611198906),\n",
" ('darker', 1.3652409519220583),\n",
" ('april', 1.3499267169490159),\n",
" ('kelly', 1.3461743673304654),\n",
" ('blake', 1.3418425985490567),\n",
" ('overlooked', 1.329135947279942),\n",
" ('ralph', 1.32818673031261),\n",
" ('bette', 1.3156767939059373),\n",
" ('hoffman', 1.3150668518315229),\n",
" ('cole', 1.3121863889661687),\n",
" ('shines', 1.3049487216659381),\n",
" ('powerful', 1.2999662776313934),\n",
" ('notch', 1.2950456896547455),\n",
" ('remarkable', 1.2883688239495823),\n",
" ('pitt', 1.286210902562908),\n",
" ('winters', 1.2833463918674481),\n",
" ('vivid', 1.2762934659055623),\n",
" ('gritty', 1.2757524867200667),\n",
" ('giallo', 1.2745029551317739),\n",
" ('portrait', 1.2704625455947689),\n",
" ('innocence', 1.2694300209805796),\n",
" ('psychiatrist', 1.2685113254635072),\n",
" ('favorite', 1.2668956297860055),\n",
" ('ensemble', 1.2656663733312759),\n",
" ('stunning', 1.2622417124499117),\n",
" ('burns', 1.259880436264232),\n",
" ('garbo', 1.258954938743289),\n",
" ('barbara', 1.2580400255962119),\n",
" ('panic', 1.2527629684953681),\n",
" ('holly', 1.2527629684953681),\n",
" ('philip', 1.2527629684953681),\n",
" ('carol', 1.2481440226390734),\n",
" ('perfect', 1.246742480713785),\n",
" ('appreciated', 1.2462482874741743),\n",
" ('favourite', 1.2411123512753928),\n",
" ('journey', 1.2367626271489269),\n",
" ('rural', 1.235471471385307),\n",
" ('bond', 1.2321436812926323),\n",
" ('builds', 1.2305398317106577),\n",
" ('brilliant', 1.2287554137664785),\n",
" ('brooklyn', 1.2286654169163074),\n",
" ('von', 1.225175011976539),\n",
" ('unfolds', 1.2163953243244932),\n",
" ('recommended', 1.2163953243244932),\n",
" ('daniel', 1.20215296760895),\n",
" ('perfectly', 1.1971931173405572),\n",
" ('crafted', 1.1962507582320256),\n",
" ('prince', 1.1939224684724346),\n",
" ('troubled', 1.192138346678933),\n",
" ('consequences', 1.1865810616140668),\n",
" ('haunting', 1.1814999484738773),\n",
" ('cinderella', 1.180052620608284),\n",
" ('alexander', 1.1759989522835299),\n",
" ('emotions', 1.1753049094563641),\n",
" ('boxing', 1.1735135968412274),\n",
" ('subtle', 1.1734135017508081),\n",
" ('curtis', 1.1649873576129823),\n",
" ('rare', 1.1566438362402944),\n",
" ('loved', 1.1563661500586044),\n",
" ('daughters', 1.1526795099383853),\n",
" ('courage', 1.1438688802562305),\n",
" ('dentist', 1.1426722784621401),\n",
" ('highly', 1.1420208631618658),\n",
" ('nominated', 1.1409146683587992),\n",
" ('tony', 1.1397491942285991),\n",
" ('draws', 1.1325138403437911),\n",
" ('everyday', 1.1306150197542835),\n",
" ('contrast', 1.1284652518177909),\n",
" ('cried', 1.1213405397456659),\n",
" ('fabulous', 1.1210851445201684),\n",
" ('ned', 1.120591195386885),\n",
" ('fay', 1.120591195386885),\n",
" ('emma', 1.1184149159642893),\n",
" ('sensitive', 1.113318436057805),\n",
" ('smooth', 1.1089750757036563),\n",
" ('dramas', 1.1080910326226534),\n",
" ('today', 1.1050431789984001),\n",
" ('helps', 1.1023091505494358),\n",
" ('inspiring', 1.0986122886681098),\n",
" ('jimmy', 1.0937696641923216),\n",
" ('awesome', 1.0931328229034842),\n",
" ('unique', 1.0881409888008142),\n",
" ('tragic', 1.0871835928444868),\n",
" ('intense', 1.0870514662670339),\n",
" ('stellar', 1.0857088838322018),\n",
" ('rival', 1.0822184788924332),\n",
" ('provides', 1.0797081340289569),\n",
" ('depression', 1.0782034170369026),\n",
" ('shy', 1.0775588794702773),\n",
" ('carrie', 1.076139432816051),\n",
" ('blend', 1.0753554265038423),\n",
" ('hank', 1.0736109864626924),\n",
" ('diana', 1.0726368022648489),\n",
" ('adorable', 1.0726368022648489),\n",
" ('unexpected', 1.0722255334949147),\n",
" ('achievement', 1.0668635903535293),\n",
" ('bettie', 1.0663514264498881),\n",
" ('happiness', 1.0632729222228008),\n",
" ('glorious', 1.0608719606852626),\n",
" ('davis', 1.0541605260972757),\n",
" ('terrifying', 1.0525211814678428),\n",
" ('beauty', 1.050410186850232),\n",
" ('ideal', 1.0479685558493548),\n",
" ('fears', 1.0467872208035236),\n",
" ('hong', 1.0438040521731147),\n",
" ('seasons', 1.0433496099930604),\n",
" ('fascinating', 1.0414538748281612),\n",
" ('carries', 1.0345904299031787),\n",
" ('satisfying', 1.0321225473992768),\n",
" ('definite', 1.0319209141694374),\n",
" ('touched', 1.0296194171811581),\n",
" ('greatest', 1.0248947127715422),\n",
" ('creates', 1.0241097613701886),\n",
" ('aunt', 1.023388867430522),\n",
" ('walter', 1.022328983918479),\n",
" ('spectacular', 1.0198314108149955),\n",
" ('portrayal', 1.0189810189761024),\n",
" ('ann', 1.0127808528183286),\n",
" ('enterprise', 1.0116009116784799),\n",
" ('musicals', 1.0096648026516135),\n",
" ('deeply', 1.0094845087721023),\n",
" ('incredible', 1.0061677561461084),\n",
" ('mature', 1.0060195018402847),\n",
" ('triumph', 0.99682959435816731),\n",
" ('margaret', 0.99682959435816731),\n",
" ('navy', 0.99493385919326827),\n",
" ('harry', 0.99176919305006062),\n",
" ('lucas', 0.990398704027877),\n",
" ('sweet', 0.98966110487955483),\n",
" ('joey', 0.98794672078059009),\n",
" ('oscar', 0.98721905111049713),\n",
" ('balance', 0.98649499054740353),\n",
" ('warm', 0.98485340331145166),\n",
" ('ages', 0.98449898190068863),\n",
" ('glover', 0.98082925301172619),\n",
" ('guilt', 0.98082925301172619),\n",
" ('carrey', 0.98082925301172619),\n",
" ('learns', 0.97881108885548895),\n",
" ('unusual', 0.97788374278196932),\n",
" ('sons', 0.97777581552483595),\n",
" ('complex', 0.97761897738147796),\n",
" ('essence', 0.97753435711487369),\n",
" ('brazil', 0.9769153536905899),\n",
" ('widow', 0.97650959186720987),\n",
" ('solid', 0.97537964824416146),\n",
" ('beautiful', 0.97326301262841053),\n",
" ('holmes', 0.97246100334120955),\n",
" ('awe', 0.97186058302896583),\n",
" ('vhs', 0.97116734209998934),\n",
" ('eerie', 0.97116734209998934),\n",
" ('lonely', 0.96873720724669754),\n",
" ('grim', 0.96873720724669754),\n",
" ('sport', 0.96825047080486615),\n",
" ('debut', 0.96508089604358704),\n",
" ('destiny', 0.96343751029985703),\n",
" ('thrillers', 0.96281074750904794),\n",
" ('tears', 0.95977584381389391),\n",
" ('rose', 0.95664202739772253),\n",
" ('feelings', 0.95551144502743635),\n",
" ('ginger', 0.95551144502743635),\n",
" ('winning', 0.95471810900804055),\n",
" ('stanley', 0.95387344302319799),\n",
" ('cox', 0.95343027882361187),\n",
" ('paris', 0.95278479030472663),\n",
" ('heart', 0.95238806924516806),\n",
" ('hooked', 0.95155887071161305),\n",
" ('comfortable', 0.94803943018873538),\n",
" ('mgm', 0.94446160884085151),\n",
" ('masterpiece', 0.94155039863339296),\n",
" ('themes', 0.94118828349588235),\n",
" ('danny', 0.93967118051821874),\n",
" ('anime', 0.93378388932167222),\n",
" ('perry', 0.93328830824272613),\n",
" ('joy', 0.93301752567946861),\n",
" ('lovable', 0.93081883243706487),\n",
" ('hal', 0.92953595862417571),\n",
" ('mysteries', 0.92953595862417571),\n",
" ('louis', 0.92871325187271225),\n",
" ('charming', 0.92520609553210742),\n",
" ('urban', 0.92367083917177761),\n",
" ('allows', 0.92183091224977043),\n",
" ('impact', 0.91815814604895041),\n",
" ('gradually', 0.91629073187415511),\n",
" ('lifestyle', 0.91629073187415511),\n",
" ('italy', 0.91629073187415511),\n",
" ('spy', 0.91289514287301687),\n",
" ('treat', 0.91193342650519937),\n",
" ('subsequent', 0.91056005716517008),\n",
" ('kennedy', 0.90981821736853763),\n",
" ('loving', 0.90967549275543591),\n",
" ('surprising', 0.90937028902958128),\n",
" ('quiet', 0.90648673177753425),\n",
" ('winter', 0.90624039602065365),\n",
" ('reveals', 0.90490540964902977),\n",
" ('raw', 0.90445627422715225),\n",
" ('funniest', 0.90078654533818991),\n",
" ('pleased', 0.89994159387262562),\n",
" ('norman', 0.89994159387262562),\n",
" ('thief', 0.89874642222324552),\n",
" ('season', 0.89827222637147675),\n",
" ('secrets', 0.89794159320595857),\n",
" ('colorful', 0.89705936994626756),\n",
" ('highest', 0.8967461358011849),\n",
" ('compelling', 0.89462923509297576),\n",
" ('danes', 0.89248008318043659),\n",
" ('castle', 0.88967708335606499),\n",
" ('kudos', 0.88889175768604067),\n",
" ('great', 0.88810470901464589),\n",
" ('baseball', 0.88730319500090271),\n",
" ('subtitles', 0.88730319500090271),\n",
" ('bleak', 0.88730319500090271),\n",
" ('winner', 0.88643776872447388),\n",
" ('tragedy', 0.88563699078315261),\n",
" ('todd', 0.88551907320740142),\n",
" ('nicely', 0.87924946019380601),\n",
" ('arthur', 0.87546873735389985),\n",
" ('essential', 0.87373111745535925),\n",
" ('gorgeous', 0.8731725250935497),\n",
" ('fonda', 0.87294029100054127),\n",
" ('eastwood', 0.87139541196626402),\n",
" ('focuses', 0.87082835779739776),\n",
" ('enjoyed', 0.87070195951624607),\n",
" ('natural', 0.86997924506912838),\n",
" ('intensity', 0.86835126958503595),\n",
" ('witty', 0.86824103423244681),\n",
" ('rob', 0.8642954367557748),\n",
" ('worlds', 0.86377269759070874),\n",
" ('health', 0.86113891179907498),\n",
" ('magical', 0.85953791528170564),\n",
" ('deeper', 0.85802182375017932),\n",
" ('lucy', 0.85618680780444956),\n",
" ('moving', 0.85566611005772031),\n",
" ('lovely', 0.85290640004681306),\n",
" ('purple', 0.8513711857748395),\n",
" ('memorable', 0.84801189112086062),\n",
" ('sings', 0.84729786038720367),\n",
" ('craig', 0.84342938360928321),\n",
" ('modesty', 0.84342938360928321),\n",
" ('relate', 0.84326559685926517),\n",
" ('episodes', 0.84223712084137292),\n",
" ('strong', 0.84167135777060931),\n",
" ('smith', 0.83959811108590054),\n",
" ('tear', 0.83704136022001441),\n",
" ('apartment', 0.83333115290549531),\n",
" ('princess', 0.83290912293510388),\n",
" ('disagree', 0.83290912293510388),\n",
" ('kung', 0.83173334384609199),\n",
" ('adventure', 0.83150561393278388),\n",
" ('columbo', 0.82667857318446791),\n",
" ('jake', 0.82667857318446791),\n",
" ('adds', 0.82485652591452319),\n",
" ('hart', 0.82472353834866463),\n",
" ('strength', 0.82417544296634937),\n",
" ('realizes', 0.82360006895738058),\n",
" ('dave', 0.8232003088081431),\n",
" ('childhood', 0.82208086393583857),\n",
" ('forbidden', 0.81989888619908913),\n",
" ('tight', 0.81883539572344199),\n",
" ('surreal', 0.8178506590609026),\n",
" ('manager', 0.81770990320170756),\n",
" ('dancer', 0.81574950265227764),\n",
" ('con', 0.81093021621632877),\n",
" ('studios', 0.81093021621632877),\n",
" ('miike', 0.80821651034473263),\n",
" ('realistic', 0.80807714723392232),\n",
" ('explicit', 0.80792269515237358),\n",
" ('kurt', 0.8060875917405409),\n",
" ('traditional', 0.80535917116687328),\n",
" ('deals', 0.80535917116687328),\n",
" ('holds', 0.80493858654806194),\n",
" ('carl', 0.80437281567016972),\n",
" ('touches', 0.80396154690023547),\n",
" ('gene', 0.80314807577427383),\n",
" ('albert', 0.8027669055771679),\n",
" ('abc', 0.80234647252493729),\n",
" ('cry', 0.80011930011211307),\n",
" ('sides', 0.7995275841185171),\n",
" ('develops', 0.79850769621777162),\n",
" ('eyre', 0.79850769621777162),\n",
" ('dances', 0.79694397424158891),\n",
" ('oscars', 0.79633141679517616),\n",
" ('legendary', 0.79600456599965308),\n",
" ('importance', 0.79492987486988764),\n",
" ('hearted', 0.79492987486988764),\n",
" ('portraying', 0.79356592830699269),\n",
" ('impressed', 0.79258107754813223),\n",
" ('waters', 0.79112758892014912),\n",
" ('empire', 0.79078565012386137),\n",
" ('edge', 0.789774016249017),\n",
" ('environment', 0.78845736036427028),\n",
" ('jean', 0.78845736036427028),\n",
" ('sentimental', 0.7864791203521645),\n",
" ('captured', 0.78623760362595729),\n",
" ('styles', 0.78592891401091158),\n",
" ('daring', 0.78592891401091158),\n",
" ('backgrounds', 0.78275933924963248),\n",
" ('frank', 0.78275933924963248),\n",
" ('matches', 0.78275933924963248),\n",
" ('tense', 0.78275933924963248),\n",
" ('gothic', 0.78209466657644144),\n",
" ('sharp', 0.7814397877056235),\n",
" ('achieved', 0.78015855754957497),\n",
" ('court', 0.77947526404844247),\n",
" ('steals', 0.7789140023173704),\n",
" ('rules', 0.77844476107184035),\n",
" ('colors', 0.77684619943659217),\n",
" ('reunion', 0.77318988823348167),\n",
" ('covers', 0.77139937745969345),\n",
" ('tale', 0.77010822169607374),\n",
" ('rain', 0.7683706017975328),\n",
" ('denzel', 0.76804848873306297),\n",
" ('stays', 0.76787072675588186),\n",
" ('blob', 0.76725515271366718),\n",
" ('conventional', 0.76214005204689672),\n",
" ('maria', 0.76214005204689672),\n",
" ('fresh', 0.76158434211317383),\n",
" ('midnight', 0.76096977689870637),\n",
" ('landscape', 0.75852993982279704),\n",
" ('animated', 0.75768570169751648),\n",
" ('titanic', 0.75666058628227129),\n",
" ('sunday', 0.75666058628227129),\n",
" ('spring', 0.7537718023763802),\n",
" ('cagney', 0.7537718023763802),\n",
" ('enjoyable', 0.75246375771636476),\n",
" ('immensely', 0.75198768058287868),\n",
" ('sir', 0.7507762933965817),\n",
" ('nevertheless', 0.75067102469813185),\n",
" ('driven', 0.74994477895307854),\n",
" ('performances', 0.74883252516063137),\n",
" ('memories', 0.74721440183022114),\n",
" ('nowadays', 0.74721440183022114),\n",
" ('simple', 0.74641420974143258),\n",
" ('golden', 0.74533293373051557),\n",
" ('leslie', 0.74533293373051557),\n",
" ('lovers', 0.74497224842453125),\n",
" ('relationship', 0.74484232345601786),\n",
" ('supporting', 0.74357803418683721),\n",
" ('che', 0.74262723782331497),\n",
" ('packed', 0.7410032017375805),\n",
" ('trek', 0.74021469141793106),\n",
" ('provoking', 0.73840377214806618),\n",
" ('strikes', 0.73759894313077912),\n",
" ('depiction', 0.73682224406260699),\n",
" ('emotional', 0.73678211645681524),\n",
" ('secretary', 0.7366322924996842),\n",
" ('influenced', 0.73511137965897755),\n",
" ('florida', 0.73511137965897755),\n",
" ('germany', 0.73288750920945944),\n",
" ('brings', 0.73142936713096229),\n",
" ('lewis', 0.73129894652432159),\n",
" ('elderly', 0.73088750854279239),\n",
" ('owner', 0.72743625403857748),\n",
" ('streets', 0.72666987259858895),\n",
" ('henry', 0.72642196944481741),\n",
" ('portrays', 0.72593700338293632),\n",
" ('bears', 0.7252354951114458),\n",
" ('china', 0.72489587887452556),\n",
" ('anger', 0.72439972406404984),\n",
" ('society', 0.72433010799663333),\n",
" ('available', 0.72415741730250549),\n",
" ('best', 0.72347034060446314),\n",
" ('bugs', 0.72270598280148979),\n",
" ('magic', 0.71878961117328299),\n",
" ('verhoeven', 0.71846498854423513),\n",
" ('delivers', 0.71846498854423513),\n",
" ('jim', 0.71783979315031676),\n",
" ('donald', 0.71667767797013937),\n",
" ('endearing', 0.71465338578090898),\n",
" ('relationships', 0.71393795022901896),\n",
" ('greatly', 0.71256526641704687),\n",
" ('charlie', 0.71024161391924534),\n",
" ('brad', 0.71024161391924534),\n",
" ('simon', 0.70967648251115578),\n",
" ('effectively', 0.70914752190638641),\n",
" ('march', 0.70774597998109789),\n",
" ('atmosphere', 0.70744773070214162),\n",
" ('influence', 0.70733181555190172),\n",
" ('genius', 0.706392407309966),\n",
" ('emotionally', 0.70556970055850243),\n",
" ('ken', 0.70526854109229009),\n",
" ('identity', 0.70484322032313651),\n",
" ('sophisticated', 0.70470800296102132),\n",
" ('dan', 0.70457587638356811),\n",
" ('andrew', 0.70329955202396321),\n",
" ('india', 0.70144598337464037),\n",
" ('roy', 0.69970458110610434),\n",
" ('surprisingly', 0.6995780708902356),\n",
" ('sky', 0.69780919366575667),\n",
" ('romantic', 0.69664981111114743),\n",
" ('match', 0.69566924999265523),\n",
" ('britain', 0.69314718055994529),\n",
" ('beatty', 0.69314718055994529),\n",
" ('affected', 0.69314718055994529),\n",
" ('cowboy', 0.69314718055994529),\n",
" ('wave', 0.69314718055994529),\n",
" ('stylish', 0.69314718055994529),\n",
" ('bitter', 0.69314718055994529),\n",
" ('patient', 0.69314718055994529),\n",
" ('meets', 0.69314718055994529),\n",
" ('love', 0.69198533541937324),\n",
" ('paul', 0.68980827929443067),\n",
" ('andy', 0.68846333124751902),\n",
" ('performance', 0.68797386327972465),\n",
" ('patrick', 0.68645819240914863),\n",
" ('unlike', 0.68546468438792907),\n",
" ('brooks', 0.68433655087779044),\n",
" ('refuses', 0.68348526964820844),\n",
" ('award', 0.6824518914431974),\n",
" ('complaint', 0.6824518914431974),\n",
" ('ride', 0.68229716453587952),\n",
" ('dawson', 0.68171848473632257),\n",
" ('luke', 0.68158635815886937),\n",
" ('wells', 0.68087708796813096),\n",
" ('france', 0.6804081547825156),\n",
" ('handsome', 0.68007509899259255),\n",
" ('sports', 0.68007509899259255),\n",
" ('rebel', 0.67875844310784572),\n",
" ('directs', 0.67875844310784572),\n",
" ('greater', 0.67605274720064523),\n",
" ('dreams', 0.67599410133369586),\n",
" ('effective', 0.67565402311242806),\n",
" ('interpretation', 0.67479804189174875),\n",
" ('works', 0.67445504754779284),\n",
" ('brando', 0.67445504754779284),\n",
" ('noble', 0.6737290947028437),\n",
" ('paced', 0.67314651385327573),\n",
" ('le', 0.67067432470788668),\n",
" ('master', 0.67015766233524654),\n",
" ('h', 0.6696166831497512),\n",
" ('rings', 0.66904962898088483),\n",
" ('easy', 0.66895995494594152),\n",
" ('city', 0.66820823221269321),\n",
" ('sunshine', 0.66782937257565544),\n",
" ('succeeds', 0.66647893347778397),\n",
" ('relations', 0.664159643686693),\n",
" ('england', 0.66387679825983203),\n",
" ('glimpse', 0.66329421741026418),\n",
" ('aired', 0.66268797307523675),\n",
" ('sees', 0.66263163663399482),\n",
" ('both', 0.66248336767382998),\n",
" ('definitely', 0.66199789483898808),\n",
" ('imaginative', 0.66139848224536502),\n",
" ('appreciate', 0.66083893732728749),\n",
" ('tricks', 0.66071190480679143),\n",
" ('striking', 0.66071190480679143),\n",
" ('carefully', 0.65999497324304479),\n",
" ('complicated', 0.65981076029235353),\n",
" ('perspective', 0.65962448852130173),\n",
" ('trilogy', 0.65877953705573755),\n",
" ('future', 0.65834665141052828),\n",
" ('lion', 0.65742909795786608),\n",
" ('victor', 0.65540685257709819),\n",
" ('douglas', 0.65540685257709819),\n",
" ('inspired', 0.65459851044271034),\n",
" ('marriage', 0.65392646740666405),\n",
" ('demands', 0.65392646740666405),\n",
" ('father', 0.65172321672194655),\n",
" ('page', 0.65123628494430852),\n",
" ('instant', 0.65058756614114943),\n",
" ('era', 0.6495567444850836),\n",
" ('ruthless', 0.64934455790155243),\n",
" ('saga', 0.64934455790155243),\n",
" ('joan', 0.64891392558311978),\n",
" ('joseph', 0.64841128671855386),\n",
" ('workers', 0.64829661439459352),\n",
" ('fantasy', 0.64726757480925168),\n",
" ('accomplished', 0.64551913157069074),\n",
" ('distant', 0.64551913157069074),\n",
" ('manhattan', 0.64435701639051324),\n",
" ('personal', 0.64355023942057321),\n",
" ('pushing', 0.64313675998528386),\n",
" ('meeting', 0.64313675998528386),\n",
" ('individual', 0.64313675998528386),\n",
" ('pleasant', 0.64250344774119039),\n",
" ('brave', 0.64185388617239469),\n",
" ('william', 0.64083139119578469),\n",
" ('hudson', 0.64077919504262937),\n",
" ('friendly', 0.63949446706762514),\n",
" ('eccentric', 0.63907995928966954),\n",
" ('awards', 0.63875310849414646),\n",
" ('jack', 0.63838309514997038),\n",
" ('seeking', 0.63808740337691783),\n",
" ('colonel', 0.63757732940513456),\n",
" ('divorce', 0.63757732940513456),\n",
" ('jane', 0.63443957973316734),\n",
" ('keeping', 0.63414883979798953),\n",
" ('gives', 0.63383568159497883),\n",
" ('ted', 0.63342794585832296),\n",
" ('animation', 0.63208692379869902),\n",
" ('progress', 0.6317782341836532),\n",
" ('concert', 0.63127177684185776),\n",
" ('larger', 0.63127177684185776),\n",
" ('nation', 0.6296337748376194),\n",
" ('albeit', 0.62739580299716491),\n",
" ('adapted', 0.62613647027698516),\n",
" ('discovers', 0.62542900650499444),\n",
" ('classic', 0.62504956428050518),\n",
" ('segment', 0.62335141862440335),\n",
" ('morgan', 0.62303761437291871),\n",
" ('mouse', 0.62294292188669675),\n",
" ('impressive', 0.62211140744319349),\n",
" ('artist', 0.62168821657780038),\n",
" ('ultimate', 0.62168821657780038),\n",
" ('griffith', 0.62117368093485603),\n",
" ('emily', 0.62082651898031915),\n",
" ('drew', 0.62082651898031915),\n",
" ('moved', 0.6197197120051281),\n",
" ('profound', 0.61903920840622351),\n",
" ('families', 0.61903920840622351),\n",
" ('innocent', 0.61851219917136446),\n",
" ('versions', 0.61730910416844087),\n",
" ('eddie', 0.61691981517206107),\n",
" ('criticism', 0.61651395453902935),\n",
" ('nature', 0.61594514653194088),\n",
" ('recognized', 0.61518563909023349),\n",
" ('sexuality', 0.61467556511845012),\n",
" ('contract', 0.61400986000122149),\n",
" ('brian', 0.61344043794920278),\n",
" ('remembered', 0.6131044728864089),\n",
" ('determined', 0.6123858239154869),\n",
" ('offers', 0.61207935747116349),\n",
" ('pleasure', 0.61195702582993206),\n",
" ('washington', 0.61180154110599294),\n",
" ('images', 0.61159731359583758),\n",
" ('games', 0.61067095873570676),\n",
" ('academy', 0.60872983874736208),\n",
" ('fashioned', 0.60798937221963845),\n",
" ('melodrama', 0.60749173598145145),\n",
" ('peoples', 0.60613580357031549),\n",
" ('charismatic', 0.60613580357031549),\n",
" ('rough', 0.60613580357031549),\n",
" ('dealing', 0.60517840761398811),\n",
" ('fine', 0.60496962268013299),\n",
" ('tap', 0.60391604683200273),\n",
" ('trio', 0.60157998703445481),\n",
" ('russell', 0.60120968523425966),\n",
" ('figures', 0.60077386042893011),\n",
" ('ward', 0.60005675749393339),\n",
" ('shine', 0.59911823091166894),\n",
" ('brady', 0.59911823091166894),\n",
" ('job', 0.59845562125168661),\n",
" ('satisfied', 0.59652034487087369),\n",
" ('river', 0.59637962862495086),\n",
" ('brown', 0.595773016534769),\n",
" ('believable', 0.59566072133302495),\n",
" ('bound', 0.59470710774669278),\n",
" ('always', 0.59470710774669278),\n",
" ('hall', 0.5933967777928858),\n",
" ('cook', 0.5916777203950857),\n",
" ('claire', 0.59136448625000293),\n",
" ('broadway', 0.59033768669372433),\n",
" ('anna', 0.58778666490211906),\n",
" ('peace', 0.58628403501758408),\n",
" ('visually', 0.58539431926349916),\n",
" ('falk', 0.58525821854876026),\n",
" ('morality', 0.58525821854876026),\n",
" ('growing', 0.58466653756587539),\n",
" ('experiences', 0.58314628534561685),\n",
" ('stood', 0.58314628534561685),\n",
" ('touch', 0.58122926435596001),\n",
" ('lives', 0.5810976767513224),\n",
" ('kubrick', 0.58066919713325493),\n",
" ('timing', 0.58047401805583243),\n",
" ('struggles', 0.57981849525294216),\n",
" ('expressions', 0.57981849525294216),\n",
" ('authentic', 0.57848427223980559),\n",
" ('helen', 0.57763429343810091),\n",
" ('pre', 0.57700753064729182),\n",
" ('quirky', 0.5753641449035618),\n",
" ('young', 0.57531672344534313),\n",
" ('inner', 0.57454143815209846),\n",
" ('mexico', 0.57443087372056334),\n",
" ('clint', 0.57380042292737909),\n",
" ('sisters', 0.57286101468544337),\n",
" ('realism', 0.57226528899949558),\n",
" ('personalities', 0.5720692490067093),\n",
" ('french', 0.5720692490067093),\n",
" ('surprises', 0.57113222999698177),\n",
" ('adventures', 0.57113222999698177),\n",
" ('overcome', 0.5697681593994407),\n",
" ('timothy', 0.56953322459276867),\n",
" ('tales', 0.56909453188996639),\n",
" ('war', 0.56843317302781682),\n",
" ('civil', 0.5679840376059393),\n",
" ('countries', 0.56737779327091187),\n",
" ('streep', 0.56710645966458029),\n",
" ('tradition', 0.56685345523565323),\n",
" ('oliver', 0.56673325570428668),\n",
" ('australia', 0.56580775818334383),\n",
" ('understanding', 0.56531380905006046),\n",
" ('players', 0.56509525370004821),\n",
" ('knowing', 0.56489284503626647),\n",
" ('rogers', 0.56421349718405212),\n",
" ('suspenseful', 0.56368911332305849),\n",
" ('variety', 0.56368911332305849),\n",
" ('true', 0.56281525180810066),\n",
" ('jr', 0.56220982311246936),\n",
" ('psychological', 0.56108745854687891),\n",
" ('branagh', 0.55961578793542266),\n",
" ('wealth', 0.55961578793542266),\n",
" ('performing', 0.55961578793542266),\n",
" ('odds', 0.55961578793542266),\n",
" ('sent', 0.55961578793542266),\n",
" ('reminiscent', 0.55961578793542266),\n",
" ('grand', 0.55961578793542266),\n",
" ('overwhelming', 0.55961578793542266),\n",
" ('brothers', 0.55891181043362848),\n",
" ('howard', 0.55811089675600245),\n",
" ('david', 0.55693122256475369),\n",
" ('generation', 0.55628799784274796),\n",
" ('grow', 0.55612538299565417),\n",
" ('survival', 0.55594605904646033),\n",
" ('mainstream', 0.55574731115750231),\n",
" ('dick', 0.55431073570572953),\n",
" ('charm', 0.55288175575407861),\n",
" ('kirk', 0.55278982286502287),\n",
" ('twists', 0.55244729845681018),\n",
" ('gangster', 0.55206858230003986),\n",
" ('jeff', 0.55179306225421365),\n",
" ('family', 0.55116244510065526),\n",
" ('tend', 0.55053307336110335),\n",
" ('thanks', 0.55049088015842218),\n",
" ('world', 0.54744234723432639),\n",
" ('sutherland', 0.54743536937855164),\n",
" ('life', 0.54695514434959924),\n",
" ('disc', 0.54654370636806993),\n",
" ('bug', 0.54654370636806993),\n",
" ('tribute', 0.5455111817538808),\n",
" ('europe', 0.54522705048332309),\n",
" ('sacrifice', 0.54430155296238014),\n",
" ('color', 0.54405127139431109),\n",
" ('superior', 0.54333490233128523),\n",
" ('york', 0.54318235866536513),\n",
" ('pulls', 0.54266622962164945),\n",
" ('hearts', 0.54232429082536171),\n",
" ('jackson', 0.54232429082536171),\n",
" ('enjoy', 0.54124285135906114),\n",
" ('redemption', 0.54056759296472823),\n",
" ('madness', 0.540384426007535),\n",
" ('hamilton', 0.5389965007326869),\n",
" ('stands', 0.5389965007326869),\n",
" ('trial', 0.5389965007326869),\n",
" ('greek', 0.5389965007326869),\n",
" ('each', 0.5388212312554177),\n",
" ('faithful', 0.53773307668591508),\n",
" ('received', 0.5372768098531604),\n",
" ('jealous', 0.53714293208336406),\n",
" ('documentaries', 0.53714293208336406),\n",
" ('different', 0.53709860682460819),\n",
" ('describes', 0.53680111016925136),\n",
" ('shorts', 0.53596159703753288),\n",
" ('brilliance', 0.53551823635636209),\n",
" ('mountains', 0.53492317534505118),\n",
" ('share', 0.53408248593025787),\n",
" ('dealt', 0.53408248593025787),\n",
" ('providing', 0.53329847961804933),\n",
" ('explore', 0.53329847961804933),\n",
" ('series', 0.5325809226575603),\n",
" ('fellow', 0.5323318289869543),\n",
" ('loves', 0.53062825106217038),\n",
" ('olivier', 0.53062825106217038),\n",
" ('revolution', 0.53062825106217038),\n",
" ('roman', 0.53062825106217038),\n",
" ('century', 0.53002783074992665),\n",
" ('musical', 0.52966871156747064),\n",
" ('heroic', 0.52925932545482868),\n",
" ('ironically', 0.52806743020049673),\n",
" ('approach', 0.52806743020049673),\n",
" ('temple', 0.52806743020049673),\n",
" ('moves', 0.5279372642387119),\n",
" ('gift', 0.52702030968597136),\n",
" ('julie', 0.52609309589677911),\n",
" ('tells', 0.52415107836314001),\n",
" ('radio', 0.52394671172868779),\n",
" ('uncle', 0.52354439617376536),\n",
" ('union', 0.52324814376454787),\n",
" ('deep', 0.52309571635780505),\n",
" ('reminds', 0.52157841554225237),\n",
" ('famous', 0.52118841080153722),\n",
" ('jazz', 0.52053443789295151),\n",
" ('dennis', 0.51987545928590861),\n",
" ('epic', 0.51919387343650736),\n",
" ('adult', 0.519167695083386),\n",
" ('shows', 0.51915322220375304),\n",
" ('performed', 0.5191244265806858),\n",
" ('demons', 0.5191244265806858),\n",
" ('eric', 0.51879379341516751),\n",
" ('discovered', 0.51879379341516751),\n",
" ('youth', 0.5185626062681431),\n",
" ('human', 0.51851411224987087),\n",
" ('tarzan', 0.51813827061227724),\n",
" ('ourselves', 0.51794309153485463),\n",
" ('wwii', 0.51758240622887042),\n",
" ('passion', 0.5162164724008671),\n",
" ('desire', 0.51607497965213445),\n",
" ('pays', 0.51581316527702981),\n",
" ('fox', 0.51557622652458857),\n",
" ('dirty', 0.51557622652458857),\n",
" ('symbolism', 0.51546600332249293),\n",
" ('sympathetic', 0.51546600332249293),\n",
" ('attitude', 0.51530993621331933),\n",
" ('appearances', 0.51466440007315639),\n",
" ('jeremy', 0.51466440007315639),\n",
" ('fun', 0.51439068993048687),\n",
" ('south', 0.51420972175023116),\n",
" ('arrives', 0.51409894911095988),\n",
" ('present', 0.51341965894303732),\n",
" ('com', 0.51326167856387173),\n",
" ('smile', 0.51265880484765169),\n",
" ('fits', 0.51082562376599072),\n",
" ('provided', 0.51082562376599072),\n",
" ('carter', 0.51082562376599072),\n",
" ('ring', 0.51082562376599072),\n",
" ('aging', 0.51082562376599072),\n",
" ('countryside', 0.51082562376599072),\n",
" ('alan', 0.51082562376599072),\n",
" ('visit', 0.51082562376599072),\n",
" ('begins', 0.51015650363396647),\n",
" ('success', 0.50900578704900468),\n",
" ('japan', 0.50900578704900468),\n",
" ('accurate', 0.50895471583017893),\n",
" ('proud', 0.50800474742434931),\n",
" ('daily', 0.5075946031845443),\n",
" ('atmospheric', 0.50724780241810674),\n",
" ('karloff', 0.50724780241810674),\n",
" ('recently', 0.50714914903668207),\n",
" ('fu', 0.50704490092608467),\n",
" ('horrors', 0.50656122497953315),\n",
" ('finding', 0.50637127341661037),\n",
" ('lust', 0.5059356384717989),\n",
" ('hitchcock', 0.50574947073413001),\n",
" ('among', 0.50334004951332734),\n",
" ('viewing', 0.50302139827440906),\n",
" ('shining', 0.50262885656181222),\n",
" ('investigation', 0.50262885656181222),\n",
" ('duo', 0.5020919437972361),\n",
" ('cameron', 0.5020919437972361),\n",
" ('finds', 0.50128303100539795),\n",
" ('contemporary', 0.50077528791248915),\n",
" ('genuine', 0.50046283673044401),\n",
" ('frightening', 0.49995595152908684),\n",
" ('plays', 0.49975983848890226),\n",
" ('age', 0.49941323171424595),\n",
" ('position', 0.49899116611898781),\n",
" ('continues', 0.49863035067217237),\n",
" ('roles', 0.49839716550752178),\n",
" ('james', 0.49837216269470402),\n",
" ('individuals', 0.49824684155913052),\n",
" ('brought', 0.49783842823917956),\n",
" ('hilarious', 0.49714551986191058),\n",
" ('brutal', 0.49681488669639234),\n",
" ('appropriate', 0.49643688631389105),\n",
" ('dance', 0.49581998314812048),\n",
" ('league', 0.49578774640145024),\n",
" ('helping', 0.49578774640145024),\n",
" ('answers', 0.49578774640145024),\n",
" ('stunts', 0.49561620510246196),\n",
" ('traveling', 0.49532143723002542),\n",
" ('thoroughly', 0.49414593456733524),\n",
" ('depicted', 0.49317068852726992),\n",
" ('honor', 0.49247648509779424),\n",
" ('combination', 0.49247648509779424),\n",
" ('differences', 0.49247648509779424),\n",
" ('fully', 0.49213349075383811),\n",
" ('tracy', 0.49159426183810306),\n",
" ('battles', 0.49140753790888908),\n",
" ('possibility', 0.49112055268665822),\n",
" ('romance', 0.4901589869574316),\n",
" ('initially', 0.49002249613622745),\n",
" ('happy', 0.4898997500608791),\n",
" ('crime', 0.48977221456815834),\n",
" ('singing', 0.4893852925281213),\n",
" ('especially', 0.48901267837860624),\n",
" ('shakespeare', 0.48754793889664511),\n",
" ('hugh', 0.48729512635579658),\n",
" ('detail', 0.48609484250827351),\n",
" ('guide', 0.48550781578170082),\n",
" ('companion', 0.48550781578170082),\n",
" ('julia', 0.48550781578170082),\n",
" ('san', 0.48550781578170082),\n",
" ('desperation', 0.48550781578170082),\n",
" ('strongly', 0.48460242866688824),\n",
" ('necessary', 0.48302334245403883),\n",
" ('humanity', 0.48265474679929443),\n",
" ('drama', 0.48221998493060503),\n",
" ('warming', 0.48183808689273838),\n",
" ('intrigue', 0.48183808689273838),\n",
" ('nonetheless', 0.48183808689273838),\n",
" ('cuba', 0.48183808689273838),\n",
" ('planned', 0.47957308026188628),\n",
" ('pictures', 0.47929937011921681),\n",
" ('broadcast', 0.47849024312305422),\n",
" ('nine', 0.47803580094299974),\n",
" ('settings', 0.47743860773325364),\n",
" ('history', 0.47732966933780852),\n",
" ('ordinary', 0.47725880012690741),\n",
" ('trade', 0.47692407209030935),\n",
" ('primary', 0.47608267532211779),\n",
" ('official', 0.47608267532211779),\n",
" ('episode', 0.47529620261150429),\n",
" ('role', 0.47520268270188676),\n",
" ('spirit', 0.47477690799839323),\n",
" ('grey', 0.47409361449726067),\n",
" ('ways', 0.47323464982718205),\n",
" ('cup', 0.47260441094579297),\n",
" ('piano', 0.47260441094579297),\n",
" ('familiar', 0.47241617565111949),\n",
" ('sinister', 0.47198579044972683),\n",
" ('reveal', 0.47171449364936496),\n",
" ('max', 0.47150852042515579),\n",
" ('dated', 0.47121648567094482),\n",
" ('discovery', 0.47000362924573563),\n",
" ('vicious', 0.47000362924573563),\n",
" ('losing', 0.47000362924573563),\n",
" ('genuinely', 0.46871413841586385),\n",
" ('hatred', 0.46734051182625186),\n",
" ('mistaken', 0.46702300110759781),\n",
" ('dream', 0.46608972992459924),\n",
" ('challenge', 0.46608972992459924),\n",
" ('crisis', 0.46575733836428446),\n",
" ('photographed', 0.46488852857896512),\n",
" ('machines', 0.46430560813109778),\n",
" ('critics', 0.46430560813109778),\n",
" ('bird', 0.46430560813109778),\n",
" ('born', 0.46411383518967209),\n",
" ('detective', 0.4636633473511525),\n",
" ('higher', 0.46328467899699055),\n",
" ('remains', 0.46262352194811296),\n",
" ('inevitable', 0.46262352194811296),\n",
" ('soviet', 0.4618180446592961),\n",
" ('ryan', 0.46134556650262099),\n",
" ('african', 0.46112595521371813),\n",
" ('smaller', 0.46081520319132935),\n",
" ('techniques', 0.46052488529119184),\n",
" ('information', 0.46034171833399862),\n",
" ('deserved', 0.45999798712841444),\n",
" ('cynical', 0.45953232937844013),\n",
" ('lynch', 0.45953232937844013),\n",
" ('francisco', 0.45953232937844013),\n",
" ('tour', 0.45953232937844013),\n",
" ('spielberg', 0.45953232937844013),\n",
" ('struggle', 0.45911782160048453),\n",
" ('language', 0.45902121257712653),\n",
" ('visual', 0.45823514408822852),\n",
" ('warner', 0.45724137763188427),\n",
" ('social', 0.45720078250735313),\n",
" ('reality', 0.45719346885019546),\n",
" ('hidden', 0.45675840249571492),\n",
" ('breaking', 0.45601738727099561),\n",
" ('sometimes', 0.45563021171182794),\n",
" ('modern', 0.45500247579345005),\n",
" ('surfing', 0.45425527227759638),\n",
" ('popular', 0.45410691533051023),\n",
" ('surprised', 0.4534409399850382),\n",
" ('follows', 0.45245361754408348),\n",
" ('keeps', 0.45234869400701483),\n",
" ('john', 0.4520909494482197),\n",
" ('defeat', 0.45198512374305722),\n",
" ('mixed', 0.45198512374305722),\n",
" ('justice', 0.45142724367280018),\n",
" ('treasure', 0.45083371313801535),\n",
" ('presents', 0.44973793178615257),\n",
" ('years', 0.44919197032104968),\n",
" ('chief', 0.44895022004790319),\n",
" ('shadows', 0.44802472252696035),\n",
" ('closely', 0.44701411102103689),\n",
" ('segments', 0.44701411102103689),\n",
" ('lose', 0.44658335503763702),\n",
" ('caine', 0.44628710262841953),\n",
" ('caught', 0.44610275383999071),\n",
" ('hamlet', 0.44558510189758965),\n",
" ('chinese', 0.44507424620321018),\n",
" ('welcome', 0.44438052435783792),\n",
" ('birth', 0.44368632092836219),\n",
" ('represents', 0.44320543609101143),\n",
" ('puts', 0.44279106572085081),\n",
" ('fame', 0.44183275227903923),\n",
" ('closer', 0.44183275227903923),\n",
" ('visuals', 0.44183275227903923),\n",
" ('web', 0.44183275227903923),\n",
" ('criminal', 0.4412745608048752),\n",
" ('minor', 0.4409224199448939),\n",
" ('jon', 0.44086703515908027),\n",
" ('liked', 0.44074991514020723),\n",
" ('restaurant', 0.44031183943833246),\n",
" ('flaws', 0.43983275161237217),\n",
" ('de', 0.43983275161237217),\n",
" ('searching', 0.4393666597838457),\n",
" ('rap', 0.43891304217570443),\n",
" ('light', 0.43884433018199892),\n",
" ('elizabeth', 0.43872232986464677),\n",
" ('marry', 0.43861731542506488),\n",
" ('oz', 0.43825493093115531),\n",
" ('controversial', 0.43825493093115531),\n",
" ('learned', 0.43825493093115531),\n",
" ('slowly', 0.43785660389939979),\n",
" ('bridge', 0.43721380642274466),\n",
" ('thrilling', 0.43721380642274466),\n",
" ('wayne', 0.43721380642274466),\n",
" ('comedic', 0.43721380642274466),\n",
" ('married', 0.43658501682196887),\n",
" ('nazi', 0.4361020775700542),\n",
" ('murder', 0.4353180712578455),\n",
" ('physical', 0.4353180712578455),\n",
" ('johnny', 0.43483971678806865),\n",
" ('michelle', 0.43445264498141672),\n",
" ('wallace', 0.43403848055222038),\n",
" ('silent', 0.43395706390247063),\n",
" ('comedies', 0.43395706390247063),\n",
" ('played', 0.43387244114515305),\n",
" ('international', 0.43363598507486073),\n",
" ('vision', 0.43286408229627887),\n",
" ('intelligent', 0.43196704885367099),\n",
" ('shop', 0.43078291609245434),\n",
" ('also', 0.43036720209769169),\n",
" ('levels', 0.4302451371066513),\n",
" ('miss', 0.43006426712153217),\n",
" ('ocean', 0.4295626596872249),\n",
" ...]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# words most frequently seen in a review with a \"POSITIVE\" label\n",
"pos_neg_ratios.most_common()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"[('boll', -4.0778152602708904),\n",
" ('uwe', -3.9218753018711578),\n",
" ('seagal', -3.3202501058581921),\n",
" ('unwatchable', -3.0269848170580955),\n",
" ('stinker', -2.9876839403711624),\n",
" ('mst', -2.7753833211707968),\n",
" ('incoherent', -2.7641396677532537),\n",
" ('unfunny', -2.5545257844967644),\n",
" ('waste', -2.4907515123361046),\n",
" ('blah', -2.4475792789485005),\n",
" ('horrid', -2.3715779644809971),\n",
" ('pointless', -2.3451073877136341),\n",
" ('atrocious', -2.3187369339642556),\n",
" ('redeeming', -2.2667790015910296),\n",
" ('prom', -2.2601040980178784),\n",
" ('drivel', -2.2476029585766928),\n",
" ('lousy', -2.2118080125207054),\n",
" ('worst', -2.1930856334332267),\n",
" ('laughable', -2.172468615469592),\n",
" ('awful', -2.1385076866397488),\n",
" ('poorly', -2.1326133844207011),\n",
" ('wasting', -2.1178155545614512),\n",
" ('remotely', -2.111046881095167),\n",
" ('existent', -2.0024805005437076),\n",
" ('boredom', -1.9241486572738005),\n",
" ('miserably', -1.9216610938019989),\n",
" ('sucks', -1.9166645809588516),\n",
" ('uninspired', -1.9131499212248517),\n",
" ('lame', -1.9117232884159072),\n",
" ('insult', -1.9085323769376259)]"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# words most frequently seen in a review with a \"NEGATIVE\" label\n",
"list(reversed(pos_neg_ratios.most_common()))[0:30]"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}