You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

5103 lines
267 KiB
Plaintext

7 years ago
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sentiment Classification & How To \"Frame Problems\" for a Neural Network\n",
"\n",
"by Andrew Trask\n",
"\n",
"- **Twitter**: @iamtrask\n",
"- **Blog**: http://iamtrask.github.io"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What You Should Already Know\n",
"\n",
"- neural networks, forward and back-propagation\n",
"- stochastic gradient descent\n",
"- mean squared error\n",
"- and train/test splits\n",
"\n",
"### Where to Get Help if You Need it\n",
"- Re-watch previous Udacity Lectures\n",
"- Leverage the recommended Course Reading Material - [Grokking Deep Learning](https://www.manning.com/books/grokking-deep-learning) (40% Off: **traskud17**)\n",
"- Shoot me a tweet @iamtrask\n",
"\n",
"\n",
"### Tutorial Outline:\n",
"\n",
"- Intro: The Importance of \"Framing a Problem\"\n",
"\n",
"\n",
"- Curate a Dataset\n",
"- Developing a \"Predictive Theory\"\n",
"- **PROJECT 1**: Quick Theory Validation\n",
"\n",
"\n",
"- Transforming Text to Numbers\n",
"- **PROJECT 2**: Creating the Input/Output Data\n",
"\n",
"\n",
"- Putting it all together in a Neural Network\n",
"- **PROJECT 3**: Building our Neural Network\n",
"\n",
"\n",
"- Understanding Neural Noise\n",
"- **PROJECT 4**: Making Learning Faster by Reducing Noise\n",
"\n",
"\n",
"- Analyzing Inefficiencies in our Network\n",
"- **PROJECT 5**: Making our Network Train and Run Faster\n",
"\n",
"\n",
"- Further Noise Reduction\n",
"- **PROJECT 6**: Reducing Noise by Strategically Reducing the Vocabulary\n",
"\n",
"\n",
"- Analysis: What's going on in the weights?"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "56bb3cba-260c-4ebe-9ed6-b995b4c72aa3"
}
},
"source": [
"# Lesson: Curate a Dataset"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"nbpresent": {
"id": "eba2b193-0419-431e-8db9-60f34dd3fe83"
}
},
"outputs": [],
"source": [
"def pretty_print_review_and_label(i):\n",
" print(labels[i] + \"\\t:\\t\" + reviews[i][:80] + \"...\")\n",
"\n",
"g = open('reviews.txt','r') # What we know!\n",
"reviews = list(map(lambda x:x[:-1],g.readlines()))\n",
"g.close()\n",
"\n",
"g = open('labels.txt','r') # What we WANT to know!\n",
"labels = list(map(lambda x:x[:-1].upper(),g.readlines()))\n",
"g.close()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"25000"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(reviews)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"nbpresent": {
"id": "bb95574b-21a0-4213-ae50-34363cf4f87f"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life such as teachers . my years in the teaching profession lead me to believe that bromwell high s satire is much closer to reality than is teachers . the scramble to survive financially the insightful students who can see right through their pathetic teachers pomp the pettiness of the whole situation all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn t '"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"reviews[0]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false,
"nbpresent": {
"id": "e0408810-c424-4ed4-afb9-1735e9ddbd0a"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'POSITIVE'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"labels[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lesson: Develop a Predictive Theory"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false,
"nbpresent": {
"id": "e67a709f-234f-4493-bae6-4fb192141ee0"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"labels.txt \t : \t reviews.txt\n",
"\n",
"NEGATIVE\t:\tthis movie is terrible but it has some good effects . ...\n",
"POSITIVE\t:\tadrian pasdar is excellent is this film . he makes a fascinating woman . ...\n",
"NEGATIVE\t:\tcomment this movie is impossible . is terrible very improbable bad interpretat...\n",
"POSITIVE\t:\texcellent episode movie ala pulp fiction . days suicides . it doesnt get more...\n",
"NEGATIVE\t:\tif you haven t seen this it s terrible . it is pure trash . i saw this about ...\n",
"POSITIVE\t:\tthis schiffer guy is a real genius the movie is of excellent quality and both e...\n"
]
}
],
"source": [
"print(\"labels.txt \\t : \\t reviews.txt\\n\")\n",
"pretty_print_review_and_label(2137)\n",
"pretty_print_review_and_label(12816)\n",
"pretty_print_review_and_label(6267)\n",
"pretty_print_review_and_label(21934)\n",
"pretty_print_review_and_label(5297)\n",
"pretty_print_review_and_label(4998)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Project 1: Quick Theory Validation"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from collections import Counter\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"positive_counts = Counter()\n",
"negative_counts = Counter()\n",
"total_counts = Counter()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"for i in range(len(reviews)):\n",
" if(labels[i] == 'POSITIVE'):\n",
" for word in reviews[i].split(\" \"):\n",
" positive_counts[word] += 1\n",
" total_counts[word] += 1\n",
" else:\n",
" for word in reviews[i].split(\" \"):\n",
" negative_counts[word] += 1\n",
" total_counts[word] += 1"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[('', 550468),\n",
" ('the', 173324),\n",
" ('.', 159654),\n",
" ('and', 89722),\n",
" ('a', 83688),\n",
" ('of', 76855),\n",
" ('to', 66746),\n",
" ('is', 57245),\n",
" ('in', 50215),\n",
" ('br', 49235),\n",
" ('it', 48025),\n",
" ('i', 40743),\n",
" ('that', 35630),\n",
" ('this', 35080),\n",
" ('s', 33815),\n",
" ('as', 26308),\n",
" ('with', 23247),\n",
" ('for', 22416),\n",
" ('was', 21917),\n",
" ('film', 20937),\n",
" ('but', 20822),\n",
" ('movie', 19074),\n",
" ('his', 17227),\n",
" ('on', 17008),\n",
" ('you', 16681),\n",
" ('he', 16282),\n",
" ('are', 14807),\n",
" ('not', 14272),\n",
" ('t', 13720),\n",
" ('one', 13655),\n",
" ('have', 12587),\n",
" ('be', 12416),\n",
" ('by', 11997),\n",
" ('all', 11942),\n",
" ('who', 11464),\n",
" ('an', 11294),\n",
" ('at', 11234),\n",
" ('from', 10767),\n",
" ('her', 10474),\n",
" ('they', 9895),\n",
" ('has', 9186),\n",
" ('so', 9154),\n",
" ('like', 9038),\n",
" ('about', 8313),\n",
" ('very', 8305),\n",
" ('out', 8134),\n",
" ('there', 8057),\n",
" ('she', 7779),\n",
" ('what', 7737),\n",
" ('or', 7732),\n",
" ('good', 7720),\n",
" ('more', 7521),\n",
" ('when', 7456),\n",
" ('some', 7441),\n",
" ('if', 7285),\n",
" ('just', 7152),\n",
" ('can', 7001),\n",
" ('story', 6780),\n",
" ('time', 6515),\n",
" ('my', 6488),\n",
" ('great', 6419),\n",
" ('well', 6405),\n",
" ('up', 6321),\n",
" ('which', 6267),\n",
" ('their', 6107),\n",
" ('see', 6026),\n",
" ('also', 5550),\n",
" ('we', 5531),\n",
" ('really', 5476),\n",
" ('would', 5400),\n",
" ('will', 5218),\n",
" ('me', 5167),\n",
" ('had', 5148),\n",
" ('only', 5137),\n",
" ('him', 5018),\n",
" ('even', 4964),\n",
" ('most', 4864),\n",
" ('other', 4858),\n",
" ('were', 4782),\n",
" ('first', 4755),\n",
" ('than', 4736),\n",
" ('much', 4685),\n",
" ('its', 4622),\n",
" ('no', 4574),\n",
" ('into', 4544),\n",
" ('people', 4479),\n",
" ('best', 4319),\n",
" ('love', 4301),\n",
" ('get', 4272),\n",
" ('how', 4213),\n",
" ('life', 4199),\n",
" ('been', 4189),\n",
" ('because', 4079),\n",
" ('way', 4036),\n",
" ('do', 3941),\n",
" ('made', 3823),\n",
" ('films', 3813),\n",
" ('them', 3805),\n",
" ('after', 3800),\n",
" ('many', 3766),\n",
" ('two', 3733),\n",
" ('too', 3659),\n",
" ('think', 3655),\n",
" ('movies', 3586),\n",
" ('characters', 3560),\n",
" ('character', 3514),\n",
" ('don', 3468),\n",
" ('man', 3460),\n",
" ('show', 3432),\n",
" ('watch', 3424),\n",
" ('seen', 3414),\n",
" ('then', 3358),\n",
" ('little', 3341),\n",
" ('still', 3340),\n",
" ('make', 3303),\n",
" ('could', 3237),\n",
" ('never', 3226),\n",
" ('being', 3217),\n",
" ('where', 3173),\n",
" ('does', 3069),\n",
" ('over', 3017),\n",
" ('any', 3002),\n",
" ('while', 2899),\n",
" ('know', 2833),\n",
" ('did', 2790),\n",
" ('years', 2758),\n",
" ('here', 2740),\n",
" ('ever', 2734),\n",
" ('end', 2696),\n",
" ('these', 2694),\n",
" ('such', 2590),\n",
" ('real', 2568),\n",
" ('scene', 2567),\n",
" ('back', 2547),\n",
" ('those', 2485),\n",
" ('though', 2475),\n",
" ('off', 2463),\n",
" ('new', 2458),\n",
" ('your', 2453),\n",
" ('go', 2440),\n",
" ('acting', 2437),\n",
" ('plot', 2432),\n",
" ('world', 2429),\n",
" ('scenes', 2427),\n",
" ('say', 2414),\n",
" ('through', 2409),\n",
" ('makes', 2390),\n",
" ('better', 2381),\n",
" ('now', 2368),\n",
" ('work', 2346),\n",
" ('young', 2343),\n",
" ('old', 2311),\n",
" ('ve', 2307),\n",
" ('find', 2272),\n",
" ('both', 2248),\n",
" ('before', 2177),\n",
" ('us', 2162),\n",
" ('again', 2158),\n",
" ('series', 2153),\n",
" ('quite', 2143),\n",
" ('something', 2135),\n",
" ('cast', 2133),\n",
" ('should', 2121),\n",
" ('part', 2098),\n",
" ('always', 2088),\n",
" ('lot', 2087),\n",
" ('another', 2075),\n",
" ('actors', 2047),\n",
" ('director', 2040),\n",
" ('family', 2032),\n",
" ('between', 2016),\n",
" ('own', 2016),\n",
" ('m', 1998),\n",
" ('may', 1997),\n",
" ('same', 1972),\n",
" ('role', 1967),\n",
" ('watching', 1966),\n",
" ('every', 1954),\n",
" ('funny', 1953),\n",
" ('doesn', 1935),\n",
" ('performance', 1928),\n",
" ('few', 1918),\n",
" ('bad', 1907),\n",
" ('look', 1900),\n",
" ('re', 1884),\n",
" ('why', 1855),\n",
" ('things', 1849),\n",
" ('times', 1832),\n",
" ('big', 1815),\n",
" ('however', 1795),\n",
" ('actually', 1790),\n",
" ('action', 1789),\n",
" ('going', 1783),\n",
" ('bit', 1757),\n",
" ('comedy', 1742),\n",
" ('down', 1740),\n",
" ('music', 1738),\n",
" ('must', 1728),\n",
" ('take', 1709),\n",
" ('saw', 1692),\n",
" ('long', 1690),\n",
" ('right', 1688),\n",
" ('fun', 1686),\n",
" ('fact', 1684),\n",
" ('excellent', 1683),\n",
" ('around', 1674),\n",
" ('didn', 1672),\n",
" ('without', 1671),\n",
" ('thing', 1662),\n",
" ('thought', 1639),\n",
" ('got', 1635),\n",
" ('each', 1630),\n",
" ('day', 1614),\n",
" ('feel', 1597),\n",
" ('seems', 1596),\n",
" ('come', 1594),\n",
" ('done', 1586),\n",
" ('beautiful', 1580),\n",
" ('especially', 1572),\n",
" ('played', 1571),\n",
" ('almost', 1566),\n",
" ('want', 1562),\n",
" ('yet', 1556),\n",
" ('give', 1553),\n",
" ('pretty', 1549),\n",
" ('last', 1543),\n",
" ('since', 1519),\n",
" ('different', 1504),\n",
" ('although', 1501),\n",
" ('gets', 1490),\n",
" ('true', 1487),\n",
" ('interesting', 1481),\n",
" ('job', 1470),\n",
" ('enough', 1455),\n",
" ('our', 1454),\n",
" ('shows', 1447),\n",
" ('horror', 1441),\n",
" ('woman', 1439),\n",
" ('tv', 1400),\n",
" ('probably', 1398),\n",
" ('father', 1395),\n",
" ('original', 1393),\n",
" ('girl', 1390),\n",
" ('point', 1379),\n",
" ('plays', 1378),\n",
" ('wonderful', 1372),\n",
" ('far', 1358),\n",
" ('course', 1358),\n",
" ('john', 1350),\n",
" ('rather', 1340),\n",
" ('isn', 1328),\n",
" ('ll', 1326),\n",
" ('later', 1324),\n",
" ('dvd', 1324),\n",
" ('war', 1310),\n",
" ('whole', 1310),\n",
" ('d', 1307),\n",
" ('away', 1306),\n",
" ('found', 1306),\n",
" ('screen', 1305),\n",
" ('nothing', 1300),\n",
" ('year', 1297),\n",
" ('once', 1296),\n",
" ('hard', 1294),\n",
" ('together', 1280),\n",
" ('am', 1277),\n",
" ('set', 1277),\n",
" ('having', 1266),\n",
" ('making', 1265),\n",
" ('place', 1263),\n",
" ('comes', 1260),\n",
" ('might', 1260),\n",
" ('sure', 1253),\n",
" ('american', 1248),\n",
" ('play', 1245),\n",
" ('kind', 1244),\n",
" ('takes', 1242),\n",
" ('perfect', 1242),\n",
" ('performances', 1237),\n",
" ('himself', 1230),\n",
" ('worth', 1221),\n",
" ('everyone', 1221),\n",
" ('anyone', 1214),\n",
" ('actor', 1203),\n",
" ('three', 1201),\n",
" ('wife', 1196),\n",
" ('classic', 1192),\n",
" ('goes', 1186),\n",
" ('ending', 1178),\n",
" ('version', 1168),\n",
" ('star', 1149),\n",
" ('enjoy', 1146),\n",
" ('book', 1142),\n",
" ('nice', 1132),\n",
" ('everything', 1128),\n",
" ('during', 1124),\n",
" ('put', 1118),\n",
" ('seeing', 1111),\n",
" ('least', 1102),\n",
" ('house', 1100),\n",
" ('high', 1095),\n",
" ('watched', 1094),\n",
" ('men', 1087),\n",
" ('loved', 1087),\n",
" ('night', 1082),\n",
" ('anything', 1075),\n",
" ('guy', 1071),\n",
" ('believe', 1071),\n",
" ('top', 1063),\n",
" ('amazing', 1058),\n",
" ('hollywood', 1056),\n",
" ('looking', 1053),\n",
" ('main', 1044),\n",
" ('definitely', 1043),\n",
" ('gives', 1031),\n",
" ('home', 1029),\n",
" ('seem', 1028),\n",
" ('episode', 1023),\n",
" ('sense', 1020),\n",
" ('audience', 1020),\n",
" ('truly', 1017),\n",
" ('special', 1011),\n",
" ('fan', 1009),\n",
" ('second', 1009),\n",
" ('short', 1009),\n",
" ('mind', 1005),\n",
" ('human', 1001),\n",
" ('recommend', 999),\n",
" ('full', 996),\n",
" ('black', 995),\n",
" ('help', 991),\n",
" ('along', 989),\n",
" ('trying', 987),\n",
" ('small', 986),\n",
" ('death', 985),\n",
" ('friends', 981),\n",
" ('remember', 974),\n",
" ('often', 970),\n",
" ('said', 966),\n",
" ('favorite', 962),\n",
" ('heart', 959),\n",
" ('early', 957),\n",
" ('left', 956),\n",
" ('until', 955),\n",
" ('let', 954),\n",
" ('script', 954),\n",
" ('maybe', 937),\n",
" ('today', 936),\n",
" ('live', 934),\n",
" ('less', 934),\n",
" ('moments', 933),\n",
" ('others', 929),\n",
" ('brilliant', 926),\n",
" ('shot', 925),\n",
" ('liked', 923),\n",
" ('become', 916),\n",
" ('won', 915),\n",
" ('used', 910),\n",
" ('style', 907),\n",
" ('mother', 895),\n",
" ('lives', 894),\n",
" ('came', 893),\n",
" ('stars', 890),\n",
" ('cinema', 889),\n",
" ('looks', 885),\n",
" ('perhaps', 884),\n",
" ('read', 882),\n",
" ('enjoyed', 879),\n",
" ('boy', 875),\n",
" ('drama', 873),\n",
" ('highly', 871),\n",
" ('given', 870),\n",
" ('playing', 867),\n",
" ('use', 864),\n",
" ('next', 859),\n",
" ('women', 858),\n",
" ('fine', 857),\n",
" ('effects', 856),\n",
" ('kids', 854),\n",
" ('entertaining', 853),\n",
" ('need', 852),\n",
" ('line', 850),\n",
" ('works', 848),\n",
" ('someone', 847),\n",
" ('mr', 836),\n",
" ('simply', 835),\n",
" ('children', 833),\n",
" ('picture', 833),\n",
" ('face', 831),\n",
" ('friend', 831),\n",
" ('keep', 831),\n",
" ('dark', 830),\n",
" ('overall', 828),\n",
" ('certainly', 828),\n",
" ('minutes', 827),\n",
" ('wasn', 824),\n",
" ('history', 822),\n",
" ('finally', 820),\n",
" ('couple', 816),\n",
" ('against', 815),\n",
" ('son', 809),\n",
" ('understand', 808),\n",
" ('lost', 807),\n",
" ('michael', 805),\n",
" ('else', 801),\n",
" ('throughout', 798),\n",
" ('fans', 797),\n",
" ('city', 792),\n",
" ('reason', 789),\n",
" ('written', 787),\n",
" ('production', 787),\n",
" ('several', 784),\n",
" ('school', 783),\n",
" ('rest', 781),\n",
" ('based', 781),\n",
" ('try', 780),\n",
" ('dead', 776),\n",
" ('hope', 775),\n",
" ('strong', 768),\n",
" ('white', 765),\n",
" ('tell', 759),\n",
" ('itself', 758),\n",
" ('half', 753),\n",
" ('person', 749),\n",
" ('sometimes', 746),\n",
" ('past', 744),\n",
" ('start', 744),\n",
" ('genre', 743),\n",
" ('final', 739),\n",
" ('beginning', 739),\n",
" ('town', 738),\n",
" ('art', 734),\n",
" ('game', 732),\n",
" ('humor', 732),\n",
" ('yes', 731),\n",
" ('idea', 731),\n",
" ('late', 730),\n",
" ('becomes', 729),\n",
" ('despite', 729),\n",
" ('able', 726),\n",
" ('case', 726),\n",
" ('money', 723),\n",
" ('child', 721),\n",
" ('completely', 721),\n",
" ('side', 719),\n",
" ('camera', 716),\n",
" ('getting', 714),\n",
" ('instead', 712),\n",
" ('soon', 702),\n",
" ('under', 700),\n",
" ('viewer', 699),\n",
" ('age', 697),\n",
" ('days', 696),\n",
" ('stories', 696),\n",
" ('felt', 694),\n",
" ('simple', 694),\n",
" ('roles', 693),\n",
" ('video', 688),\n",
" ('name', 683),\n",
" ('either', 683),\n",
" ('doing', 677),\n",
" ('turns', 674),\n",
" ('wants', 671),\n",
" ('close', 671),\n",
" ('title', 669),\n",
" ('wrong', 668),\n",
" ('went', 666),\n",
" ('james', 665),\n",
" ('evil', 659),\n",
" ('budget', 657),\n",
" ('episodes', 657),\n",
" ('relationship', 655),\n",
" ('piece', 653),\n",
" ('fantastic', 653),\n",
" ('david', 651),\n",
" ('turn', 648),\n",
" ('murder', 646),\n",
" ('parts', 645),\n",
" ('brother', 644),\n",
" ('head', 643),\n",
" ('absolutely', 643),\n",
" ('experience', 642),\n",
" ('eyes', 641),\n",
" ('sex', 638),\n",
" ('direction', 637),\n",
" ('called', 637),\n",
" ('directed', 636),\n",
" ('lines', 634),\n",
" ('behind', 633),\n",
" ('sort', 632),\n",
" ('actress', 631),\n",
" ('lead', 630),\n",
" ('oscar', 628),\n",
" ('example', 627),\n",
" ('including', 627),\n",
" ('known', 625),\n",
" ('musical', 625),\n",
" ('chance', 621),\n",
" ('score', 620),\n",
" ('feeling', 619),\n",
" ('already', 619),\n",
" ('hit', 619),\n",
" ('voice', 615),\n",
" ('moment', 612),\n",
" ('living', 612),\n",
" ('low', 610),\n",
" ('supporting', 610),\n",
" ('ago', 609),\n",
" ('themselves', 608),\n",
" ('hilarious', 605),\n",
" ('reality', 605),\n",
" ('jack', 604),\n",
" ('told', 603),\n",
" ('hand', 601),\n",
" ('moving', 600),\n",
" ('dialogue', 600),\n",
" ('quality', 600),\n",
" ('song', 599),\n",
" ('happy', 599),\n",
" ('paul', 598),\n",
" ('matter', 598),\n",
" ('light', 594),\n",
" ('future', 593),\n",
" ('entire', 592),\n",
" ('finds', 591),\n",
" ('gave', 589),\n",
" ('laugh', 587),\n",
" ('released', 586),\n",
" ('expect', 584),\n",
" ('fight', 581),\n",
" ('particularly', 580),\n",
" ('cinematography', 579),\n",
" ('police', 579),\n",
" ('whose', 578),\n",
" ('type', 578),\n",
" ('sound', 578),\n",
" ('enjoyable', 573),\n",
" ('view', 573),\n",
" ('husband', 572),\n",
" ('romantic', 572),\n",
" ('number', 572),\n",
" ('daughter', 572),\n",
" ('documentary', 571),\n",
" ('self', 570),\n",
" ('modern', 569),\n",
" ('robert', 569),\n",
" ('took', 569),\n",
" ('superb', 569),\n",
" ('mean', 566),\n",
" ('shown', 563),\n",
" ('coming', 561),\n",
" ('important', 560),\n",
" ('king', 559),\n",
" ('leave', 559),\n",
" ('change', 558),\n",
" ('wanted', 555),\n",
" ('somewhat', 555),\n",
" ('tells', 554),\n",
" ('run', 552),\n",
" ('events', 552),\n",
" ('country', 552),\n",
" ('career', 552),\n",
" ('heard', 550),\n",
" ('season', 550),\n",
" ('girls', 549),\n",
" ('greatest', 549),\n",
" ('etc', 547),\n",
" ('care', 546),\n",
" ('starts', 545),\n",
" ('english', 542),\n",
" ('killer', 541),\n",
" ('animation', 540),\n",
" ('guys', 540),\n",
" ('totally', 540),\n",
" ('tale', 540),\n",
" ('usual', 539),\n",
" ('opinion', 535),\n",
" ('miss', 535),\n",
" ('violence', 531),\n",
" ('easy', 531),\n",
" ('songs', 530),\n",
" ('british', 528),\n",
" ('says', 526),\n",
" ('realistic', 525),\n",
" ('writing', 524),\n",
" ('act', 522),\n",
" ('writer', 522),\n",
" ('comic', 521),\n",
" ('thriller', 519),\n",
" ('television', 517),\n",
" ('power', 516),\n",
" ('ones', 515),\n",
" ('kid', 514),\n",
" ('novel', 513),\n",
" ('york', 513),\n",
" ('problem', 512),\n",
" ('alone', 512),\n",
" ('attention', 509),\n",
" ('involved', 508),\n",
" ('kill', 507),\n",
" ('extremely', 507),\n",
" ('seemed', 506),\n",
" ('hero', 505),\n",
" ('french', 505),\n",
" ('rock', 504),\n",
" ('stuff', 501),\n",
" ('wish', 499),\n",
" ('begins', 498),\n",
" ('taken', 497),\n",
" ('sad', 497),\n",
" ('ways', 496),\n",
" ('richard', 495),\n",
" ('knows', 494),\n",
" ('atmosphere', 493),\n",
" ('surprised', 491),\n",
" ('similar', 491),\n",
" ('taking', 491),\n",
" ('car', 491),\n",
" ('george', 490),\n",
" ('perfectly', 490),\n",
" ('across', 489),\n",
" ('sequence', 489),\n",
" ('eye', 489),\n",
" ('team', 489),\n",
" ('serious', 488),\n",
" ('powerful', 488),\n",
" ('room', 488),\n",
" ('due', 488),\n",
" ('among', 488),\n",
" ('order', 487),\n",
" ('b', 487),\n",
" ('cannot', 487),\n",
" ('strange', 487),\n",
" ('beauty', 486),\n",
" ('famous', 485),\n",
" ('tries', 484),\n",
" ('myself', 484),\n",
" ('happened', 484),\n",
" ('herself', 484),\n",
" ('class', 483),\n",
" ('four', 482),\n",
" ('cool', 481),\n",
" ('release', 479),\n",
" ('anyway', 479),\n",
" ('theme', 479),\n",
" ('opening', 478),\n",
" ('entertainment', 477),\n",
" ('unique', 475),\n",
" ('ends', 475),\n",
" ('slow', 475),\n",
" ('exactly', 475),\n",
" ('red', 474),\n",
" ('o', 474),\n",
" ('level', 474),\n",
" ('easily', 474),\n",
" ('interest', 472),\n",
" ('happen', 471),\n",
" ('crime', 470),\n",
" ('viewing', 468),\n",
" ('memorable', 467),\n",
" ('sets', 467),\n",
" ('group', 466),\n",
" ('stop', 466),\n",
" ('dance', 463),\n",
" ('message', 463),\n",
" ('sister', 463),\n",
" ('working', 463),\n",
" ('problems', 463),\n",
" ('knew', 462),\n",
" ('mystery', 461),\n",
" ('nature', 461),\n",
" ('bring', 460),\n",
" ('believable', 459),\n",
" ('thinking', 459),\n",
" ('brought', 459),\n",
" ('mostly', 458),\n",
" ('couldn', 457),\n",
" ('disney', 457),\n",
" ('society', 456),\n",
" ('within', 455),\n",
" ('lady', 455),\n",
" ('blood', 454),\n",
" ('upon', 453),\n",
" ('viewers', 453),\n",
" ('parents', 453),\n",
" ('meets', 452),\n",
" ('form', 452),\n",
" ('soundtrack', 452),\n",
" ('usually', 452),\n",
" ('tom', 452),\n",
" ('peter', 452),\n",
" ('local', 450),\n",
" ('certain', 448),\n",
" ('follow', 448),\n",
" ('whether', 447),\n",
" ('possible', 446),\n",
" ('emotional', 445),\n",
" ('killed', 444),\n",
" ('de', 444),\n",
" ('above', 444),\n",
" ('middle', 443),\n",
" ('god', 443),\n",
" ('happens', 442),\n",
" ('flick', 442),\n",
" ('needs', 442),\n",
" ('masterpiece', 441),\n",
" ('major', 440),\n",
" ('period', 440),\n",
" ('haven', 439),\n",
" ('named', 439),\n",
" ('th', 438),\n",
" ('particular', 438),\n",
" ('earth', 437),\n",
" ('feature', 437),\n",
" ('stand', 436),\n",
" ('words', 435),\n",
" ('typical', 435),\n",
" ('obviously', 433),\n",
" ('elements', 433),\n",
" ('romance', 431),\n",
" ('jane', 430),\n",
" ('yourself', 427),\n",
" ('showing', 427),\n",
" ('fantasy', 426),\n",
" ('brings', 426),\n",
" ('america', 423),\n",
" ('guess', 423),\n",
" ('huge', 422),\n",
" ('unfortunately', 422),\n",
" ('indeed', 421),\n",
" ('running', 421),\n",
" ('talent', 420),\n",
" ('stage', 419),\n",
" ('started', 418),\n",
" ('sweet', 417),\n",
" ('leads', 417),\n",
" ('japanese', 417),\n",
" ('poor', 416),\n",
" ('deal', 416),\n",
" ('personal', 413),\n",
" ('incredible', 413),\n",
" ('fast', 412),\n",
" ('became', 410),\n",
" ('deep', 410),\n",
" ('hours', 409),\n",
" ('nearly', 408),\n",
" ('dream', 408),\n",
" ('giving', 408),\n",
" ('turned', 407),\n",
" ('clearly', 407),\n",
" ('near', 406),\n",
" ('obvious', 406),\n",
" ('cut', 405),\n",
" ('surprise', 405),\n",
" ('body', 404),\n",
" ('era', 404),\n",
" ('female', 403),\n",
" ('hour', 403),\n",
" ('five', 403),\n",
" ('note', 399),\n",
" ('learn', 398),\n",
" ('truth', 398),\n",
" ('match', 397),\n",
" ('feels', 397),\n",
" ('except', 397),\n",
" ('tony', 397),\n",
" ('filmed', 394),\n",
" ('complete', 394),\n",
" ('clear', 394),\n",
" ('older', 393),\n",
" ('street', 393),\n",
" ('lots', 393),\n",
" ('eventually', 393),\n",
" ('keeps', 393),\n",
" ('buy', 392),\n",
" ('stewart', 391),\n",
" ('william', 391),\n",
" ('joe', 390),\n",
" ('meet', 390),\n",
" ('fall', 390),\n",
" ('shots', 389),\n",
" ('talking', 389),\n",
" ('difficult', 389),\n",
" ('unlike', 389),\n",
" ('rating', 389),\n",
" ('means', 388),\n",
" ('dramatic', 388),\n",
" ('appears', 386),\n",
" ('subject', 386),\n",
" ('wonder', 386),\n",
" ('present', 386),\n",
" ('situation', 386),\n",
" ('comments', 385),\n",
" ('sequences', 383),\n",
" ('general', 383),\n",
" ('lee', 383),\n",
" ('earlier', 382),\n",
" ('points', 382),\n",
" ('check', 379),\n",
" ('gone', 379),\n",
" ('ten', 378),\n",
" ('suspense', 378),\n",
" ('recommended', 378),\n",
" ('business', 377),\n",
" ('third', 377),\n",
" ('talk', 375),\n",
" ('leaves', 375),\n",
" ('beyond', 375),\n",
" ('portrayal', 374),\n",
" ('beautifully', 373),\n",
" ('single', 372),\n",
" ('bill', 372),\n",
" ('word', 371),\n",
" ('plenty', 371),\n",
" ('falls', 370),\n",
" ('whom', 370),\n",
" ('figure', 369),\n",
" ('battle', 369),\n",
" ('scary', 369),\n",
" ('non', 369),\n",
" ('return', 368),\n",
" ('using', 368),\n",
" ('doubt', 367),\n",
" ('add', 367),\n",
" ('hear', 366),\n",
" ('solid', 366),\n",
" ('success', 366),\n",
" ('touching', 365),\n",
" ('political', 365),\n",
" ('oh', 365),\n",
" ('jokes', 365),\n",
" ('awesome', 364),\n",
" ('hell', 364),\n",
" ('boys', 364),\n",
" ('dog', 362),\n",
" ('recently', 362),\n",
" ('sexual', 362),\n",
" ('please', 361),\n",
" ('wouldn', 361),\n",
" ('features', 361),\n",
" ('straight', 361),\n",
" ('lack', 360),\n",
" ('forget', 360),\n",
" ('setting', 360),\n",
" ('mark', 359),\n",
" ('married', 359),\n",
" ('social', 357),\n",
" ('adventure', 356),\n",
" ('interested', 356),\n",
" ('brothers', 355),\n",
" ('sees', 355),\n",
" ('actual', 355),\n",
" ('terrific', 355),\n",
" ('move', 354),\n",
" ('call', 354),\n",
" ('various', 353),\n",
" ('dr', 353),\n",
" ('theater', 353),\n",
" ('animated', 352),\n",
" ('western', 351),\n",
" ('space', 350),\n",
" ('baby', 350),\n",
" ('leading', 348),\n",
" ('disappointed', 348),\n",
" ('portrayed', 346),\n",
" ('aren', 346),\n",
" ('screenplay', 345),\n",
" ('smith', 345),\n",
" ('hate', 344),\n",
" ('towards', 344),\n",
" ('noir', 343),\n",
" ('outstanding', 342),\n",
" ('decent', 342),\n",
" ('kelly', 342),\n",
" ('directors', 341),\n",
" ('journey', 341),\n",
" ('none', 340),\n",
" ('effective', 340),\n",
" ('looked', 340),\n",
" ('caught', 339),\n",
" ('cold', 339),\n",
" ('storyline', 339),\n",
" ('fi', 339),\n",
" ('sci', 339),\n",
" ('mary', 339),\n",
" ('rich', 338),\n",
" ('charming', 338),\n",
" ('harry', 337),\n",
" ('popular', 337),\n",
" ('manages', 337),\n",
" ('rare', 337),\n",
" ('spirit', 336),\n",
" ('open', 335),\n",
" ('appreciate', 335),\n",
" ('basically', 334),\n",
" ('moves', 334),\n",
" ('acted', 334),\n",
" ('deserves', 333),\n",
" ('subtle', 333),\n",
" ('mention', 333),\n",
" ('inside', 333),\n",
" ('pace', 333),\n",
" ('century', 333),\n",
" ('boring', 333),\n",
" ('familiar', 332),\n",
" ('background', 332),\n",
" ('ben', 331),\n",
" ('creepy', 330),\n",
" ('supposed', 330),\n",
" ('secret', 329),\n",
" ('jim', 328),\n",
" ('die', 328),\n",
" ('question', 327),\n",
" ('effect', 327),\n",
" ('natural', 327),\n",
" ('rate', 326),\n",
" ('language', 326),\n",
" ('impressive', 326),\n",
" ('intelligent', 325),\n",
" ('saying', 325),\n",
" ('material', 324),\n",
" ('realize', 324),\n",
" ('telling', 324),\n",
" ('scott', 324),\n",
" ('singing', 323),\n",
" ('dancing', 322),\n",
" ('adult', 321),\n",
" ('imagine', 321),\n",
" ('visual', 321),\n",
" ('kept', 320),\n",
" ('office', 320),\n",
" ('uses', 319),\n",
" ('pure', 318),\n",
" ('wait', 318),\n",
" ('stunning', 318),\n",
" ('copy', 317),\n",
" ('review', 317),\n",
" ('previous', 317),\n",
" ('seriously', 317),\n",
" ('somehow', 316),\n",
" ('created', 316),\n",
" ('magic', 316),\n",
" ('create', 316),\n",
" ('hot', 316),\n",
" ('reading', 316),\n",
" ('crazy', 315),\n",
" ('air', 315),\n",
" ('frank', 315),\n",
" ('stay', 315),\n",
" ('escape', 315),\n",
" ('attempt', 315),\n",
" ('hands', 314),\n",
" ('filled', 313),\n",
" ('surprisingly', 312),\n",
" ('expected', 312),\n",
" ('average', 312),\n",
" ('complex', 311),\n",
" ('studio', 310),\n",
" ('successful', 310),\n",
" ('quickly', 310),\n",
" ('male', 309),\n",
" ('plus', 309),\n",
" ('co', 307),\n",
" ('minute', 306),\n",
" ('images', 306),\n",
" ('casting', 306),\n",
" ('exciting', 306),\n",
" ('following', 306),\n",
" ('members', 305),\n",
" ('german', 305),\n",
" ('e', 305),\n",
" ('reasons', 305),\n",
" ('follows', 305),\n",
" ('themes', 305),\n",
" ('touch', 304),\n",
" ('genius', 304),\n",
" ('free', 304),\n",
" ('edge', 304),\n",
" ('cute', 304),\n",
" ('outside', 303),\n",
" ('ok', 302),\n",
" ('admit', 302),\n",
" ('younger', 302),\n",
" ('reviews', 302),\n",
" ('odd', 301),\n",
" ('fighting', 301),\n",
" ('master', 301),\n",
" ('break', 300),\n",
" ('thanks', 300),\n",
" ('recent', 300),\n",
" ('comment', 300),\n",
" ('apart', 299),\n",
" ('lovely', 298),\n",
" ('begin', 298),\n",
" ('emotions', 298),\n",
" ('doctor', 297),\n",
" ('italian', 297),\n",
" ('party', 297),\n",
" ('la', 296),\n",
" ('missed', 296),\n",
" ...]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"positive_counts.most_common()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"pos_neg_ratios = Counter()\n",
"\n",
"for term,cnt in list(total_counts.most_common()):\n",
" if(cnt > 100):\n",
" pos_neg_ratio = positive_counts[term] / float(negative_counts[term]+1)\n",
" pos_neg_ratios[term] = pos_neg_ratio\n",
"\n",
"for word,ratio in pos_neg_ratios.most_common():\n",
" if(ratio > 1):\n",
" pos_neg_ratios[word] = np.log(ratio)\n",
" else:\n",
" pos_neg_ratios[word] = -np.log((1 / (ratio+0.01)))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[('edie', 4.6913478822291435),\n",
" ('paulie', 4.0775374439057197),\n",
" ('felix', 3.1527360223636558),\n",
" ('polanski', 2.8233610476132043),\n",
" ('matthau', 2.8067217286092401),\n",
" ('victoria', 2.6810215287142909),\n",
" ('mildred', 2.6026896854443837),\n",
" ('gandhi', 2.5389738710582761),\n",
" ('flawless', 2.451005098112319),\n",
" ('superbly', 2.2600254785752498),\n",
" ('perfection', 2.1594842493533721),\n",
" ('astaire', 2.1400661634962708),\n",
" ('captures', 2.0386195471595809),\n",
" ('voight', 2.0301704926730531),\n",
" ('wonderfully', 2.0218960560332353),\n",
" ('powell', 1.9783454248084671),\n",
" ('brosnan', 1.9547990964725592),\n",
" ('lily', 1.9203768470501485),\n",
" ('bakshi', 1.9029851043382795),\n",
" ('lincoln', 1.9014583864844796),\n",
" ('refreshing', 1.8551812956655511),\n",
" ('breathtaking', 1.8481124057791867),\n",
" ('bourne', 1.8478489358790986),\n",
" ('lemmon', 1.8458266904983307),\n",
" ('delightful', 1.8002701588959635),\n",
" ('flynn', 1.7996646487351682),\n",
" ('andrews', 1.7764919970972666),\n",
" ('homer', 1.7692866133759964),\n",
" ('beautifully', 1.7626953362841438),\n",
" ('soccer', 1.7578579175523736),\n",
" ('elvira', 1.7397031072720019),\n",
" ('underrated', 1.7197859696029656),\n",
" ('gripping', 1.7165360479904674),\n",
" ('superb', 1.7091514458966952),\n",
" ('delight', 1.6714733033535532),\n",
" ('welles', 1.6677068205580761),\n",
" ('sadness', 1.663505133704376),\n",
" ('sinatra', 1.6389967146756448),\n",
" ('touching', 1.637217476541176),\n",
" ('timeless', 1.62924053973028),\n",
" ('macy', 1.6211339521972916),\n",
" ('unforgettable', 1.6177367152487956),\n",
" ('favorites', 1.6158688027643908),\n",
" ('stewart', 1.6119987332957739),\n",
" ('hartley', 1.6094379124341003),\n",
" ('sullivan', 1.6094379124341003),\n",
" ('extraordinary', 1.6094379124341003),\n",
" ('brilliantly', 1.5950491749820008),\n",
" ('friendship', 1.5677652160335325),\n",
" ('wonderful', 1.5645425925262093),\n",
" ('palma', 1.5553706911638245),\n",
" ('magnificent', 1.54663701119507),\n",
" ('finest', 1.5462590108125689),\n",
" ('jackie', 1.5439233053234738),\n",
" ('ritter', 1.5404450409471491),\n",
" ('tremendous', 1.5184661342283736),\n",
" ('freedom', 1.5091151908062312),\n",
" ('fantastic', 1.5048433868558566),\n",
" ('terrific', 1.5026699370083942),\n",
" ('noir', 1.493925025312256),\n",
" ('sidney', 1.493925025312256),\n",
" ('outstanding', 1.4910053152089213),\n",
" ('mann', 1.4894785973551214),\n",
" ('pleasantly', 1.4894785973551214),\n",
" ('nancy', 1.488077055429833),\n",
" ('marie', 1.4825711915553104),\n",
" ('marvelous', 1.4739999415389962),\n",
" ('excellent', 1.4647538505723599),\n",
" ('ruth', 1.4596256342054401),\n",
" ('stanwyck', 1.4412101187160054),\n",
" ('widmark', 1.4350845252893227),\n",
" ('splendid', 1.4271163556401458),\n",
" ('chan', 1.423108334242607),\n",
" ('exceptional', 1.4201959127955721),\n",
" ('tender', 1.410986973710262),\n",
" ('gentle', 1.4078005663408544),\n",
" ('poignant', 1.4022947024663317),\n",
" ('gem', 1.3932148039644643),\n",
" ('amazing', 1.3919815802404802),\n",
" ('chilling', 1.3862943611198906),\n",
" ('captivating', 1.3862943611198906),\n",
" ('fisher', 1.3862943611198906),\n",
" ('davies', 1.3862943611198906),\n",
" ('darker', 1.3652409519220583),\n",
" ('april', 1.3499267169490159),\n",
" ('kelly', 1.3461743673304654),\n",
" ('blake', 1.3418425985490567),\n",
" ('overlooked', 1.329135947279942),\n",
" ('ralph', 1.32818673031261),\n",
" ('bette', 1.3156767939059373),\n",
" ('hoffman', 1.3150668518315229),\n",
" ('cole', 1.3121863889661687),\n",
" ('shines', 1.3049487216659381),\n",
" ('powerful', 1.2999662776313934),\n",
" ('notch', 1.2950456896547455),\n",
" ('remarkable', 1.2883688239495823),\n",
" ('pitt', 1.286210902562908),\n",
" ('winters', 1.2833463918674481),\n",
" ('vivid', 1.2762934659055623),\n",
" ('gritty', 1.2757524867200667),\n",
" ('giallo', 1.2745029551317739),\n",
" ('portrait', 1.2704625455947689),\n",
" ('innocence', 1.2694300209805796),\n",
" ('psychiatrist', 1.2685113254635072),\n",
" ('favorite', 1.2668956297860055),\n",
" ('ensemble', 1.2656663733312759),\n",
" ('stunning', 1.2622417124499117),\n",
" ('burns', 1.259880436264232),\n",
" ('garbo', 1.258954938743289),\n",
" ('barbara', 1.2580400255962119),\n",
" ('panic', 1.2527629684953681),\n",
" ('holly', 1.2527629684953681),\n",
" ('philip', 1.2527629684953681),\n",
" ('carol', 1.2481440226390734),\n",
" ('perfect', 1.246742480713785),\n",
" ('appreciated', 1.2462482874741743),\n",
" ('favourite', 1.2411123512753928),\n",
" ('journey', 1.2367626271489269),\n",
" ('rural', 1.235471471385307),\n",
" ('bond', 1.2321436812926323),\n",
" ('builds', 1.2305398317106577),\n",
" ('brilliant', 1.2287554137664785),\n",
" ('brooklyn', 1.2286654169163074),\n",
" ('von', 1.225175011976539),\n",
" ('unfolds', 1.2163953243244932),\n",
" ('recommended', 1.2163953243244932),\n",
" ('daniel', 1.20215296760895),\n",
" ('perfectly', 1.1971931173405572),\n",
" ('crafted', 1.1962507582320256),\n",
" ('prince', 1.1939224684724346),\n",
" ('troubled', 1.192138346678933),\n",
" ('consequences', 1.1865810616140668),\n",
" ('haunting', 1.1814999484738773),\n",
" ('cinderella', 1.180052620608284),\n",
" ('alexander', 1.1759989522835299),\n",
" ('emotions', 1.1753049094563641),\n",
" ('boxing', 1.1735135968412274),\n",
" ('subtle', 1.1734135017508081),\n",
" ('curtis', 1.1649873576129823),\n",
" ('rare', 1.1566438362402944),\n",
" ('loved', 1.1563661500586044),\n",
" ('daughters', 1.1526795099383853),\n",
" ('courage', 1.1438688802562305),\n",
" ('dentist', 1.1426722784621401),\n",
" ('highly', 1.1420208631618658),\n",
" ('nominated', 1.1409146683587992),\n",
" ('tony', 1.1397491942285991),\n",
" ('draws', 1.1325138403437911),\n",
" ('everyday', 1.1306150197542835),\n",
" ('contrast', 1.1284652518177909),\n",
" ('cried', 1.1213405397456659),\n",
" ('fabulous', 1.1210851445201684),\n",
" ('ned', 1.120591195386885),\n",
" ('fay', 1.120591195386885),\n",
" ('emma', 1.1184149159642893),\n",
" ('sensitive', 1.113318436057805),\n",
" ('smooth', 1.1089750757036563),\n",
" ('dramas', 1.1080910326226534),\n",
" ('today', 1.1050431789984001),\n",
" ('helps', 1.1023091505494358),\n",
" ('inspiring', 1.0986122886681098),\n",
" ('jimmy', 1.0937696641923216),\n",
" ('awesome', 1.0931328229034842),\n",
" ('unique', 1.0881409888008142),\n",
" ('tragic', 1.0871835928444868),\n",
" ('intense', 1.0870514662670339),\n",
" ('stellar', 1.0857088838322018),\n",
" ('rival', 1.0822184788924332),\n",
" ('provides', 1.0797081340289569),\n",
" ('depression', 1.0782034170369026),\n",
" ('shy', 1.0775588794702773),\n",
" ('carrie', 1.076139432816051),\n",
" ('blend', 1.0753554265038423),\n",
" ('hank', 1.0736109864626924),\n",
" ('diana', 1.0726368022648489),\n",
" ('adorable', 1.0726368022648489),\n",
" ('unexpected', 1.0722255334949147),\n",
" ('achievement', 1.0668635903535293),\n",
" ('bettie', 1.0663514264498881),\n",
" ('happiness', 1.0632729222228008),\n",
" ('glorious', 1.0608719606852626),\n",
" ('davis', 1.0541605260972757),\n",
" ('terrifying', 1.0525211814678428),\n",
" ('beauty', 1.050410186850232),\n",
" ('ideal', 1.0479685558493548),\n",
" ('fears', 1.0467872208035236),\n",
" ('hong', 1.0438040521731147),\n",
" ('seasons', 1.0433496099930604),\n",
" ('fascinating', 1.0414538748281612),\n",
" ('carries', 1.0345904299031787),\n",
" ('satisfying', 1.0321225473992768),\n",
" ('definite', 1.0319209141694374),\n",
" ('touched', 1.0296194171811581),\n",
" ('greatest', 1.0248947127715422),\n",
" ('creates', 1.0241097613701886),\n",
" ('aunt', 1.023388867430522),\n",
" ('walter', 1.022328983918479),\n",
" ('spectacular', 1.0198314108149955),\n",
" ('portrayal', 1.0189810189761024),\n",
" ('ann', 1.0127808528183286),\n",
" ('enterprise', 1.0116009116784799),\n",
" ('musicals', 1.0096648026516135),\n",
" ('deeply', 1.0094845087721023),\n",
" ('incredible', 1.0061677561461084),\n",
" ('mature', 1.0060195018402847),\n",
" ('triumph', 0.99682959435816731),\n",
" ('margaret', 0.99682959435816731),\n",
" ('navy', 0.99493385919326827),\n",
" ('harry', 0.99176919305006062),\n",
" ('lucas', 0.990398704027877),\n",
" ('sweet', 0.98966110487955483),\n",
" ('joey', 0.98794672078059009),\n",
" ('oscar', 0.98721905111049713),\n",
" ('balance', 0.98649499054740353),\n",
" ('warm', 0.98485340331145166),\n",
" ('ages', 0.98449898190068863),\n",
" ('glover', 0.98082925301172619),\n",
" ('guilt', 0.98082925301172619),\n",
" ('carrey', 0.98082925301172619),\n",
" ('learns', 0.97881108885548895),\n",
" ('unusual', 0.97788374278196932),\n",
" ('sons', 0.97777581552483595),\n",
" ('complex', 0.97761897738147796),\n",
" ('essence', 0.97753435711487369),\n",
" ('brazil', 0.9769153536905899),\n",
" ('widow', 0.97650959186720987),\n",
" ('solid', 0.97537964824416146),\n",
" ('beautiful', 0.97326301262841053),\n",
" ('holmes', 0.97246100334120955),\n",
" ('awe', 0.97186058302896583),\n",
" ('vhs', 0.97116734209998934),\n",
" ('eerie', 0.97116734209998934),\n",
" ('lonely', 0.96873720724669754),\n",
" ('grim', 0.96873720724669754),\n",
" ('sport', 0.96825047080486615),\n",
" ('debut', 0.96508089604358704),\n",
" ('destiny', 0.96343751029985703),\n",
" ('thrillers', 0.96281074750904794),\n",
" ('tears', 0.95977584381389391),\n",
" ('rose', 0.95664202739772253),\n",
" ('feelings', 0.95551144502743635),\n",
" ('ginger', 0.95551144502743635),\n",
" ('winning', 0.95471810900804055),\n",
" ('stanley', 0.95387344302319799),\n",
" ('cox', 0.95343027882361187),\n",
" ('paris', 0.95278479030472663),\n",
" ('heart', 0.95238806924516806),\n",
" ('hooked', 0.95155887071161305),\n",
" ('comfortable', 0.94803943018873538),\n",
" ('mgm', 0.94446160884085151),\n",
" ('masterpiece', 0.94155039863339296),\n",
" ('themes', 0.94118828349588235),\n",
" ('danny', 0.93967118051821874),\n",
" ('anime', 0.93378388932167222),\n",
" ('perry', 0.93328830824272613),\n",
" ('joy', 0.93301752567946861),\n",
" ('lovable', 0.93081883243706487),\n",
" ('hal', 0.92953595862417571),\n",
" ('mysteries', 0.92953595862417571),\n",
" ('louis', 0.92871325187271225),\n",
" ('charming', 0.92520609553210742),\n",
" ('urban', 0.92367083917177761),\n",
" ('allows', 0.92183091224977043),\n",
" ('impact', 0.91815814604895041),\n",
" ('gradually', 0.91629073187415511),\n",
" ('lifestyle', 0.91629073187415511),\n",
" ('italy', 0.91629073187415511),\n",
" ('spy', 0.91289514287301687),\n",
" ('treat', 0.91193342650519937),\n",
" ('subsequent', 0.91056005716517008),\n",
" ('kennedy', 0.90981821736853763),\n",
" ('loving', 0.90967549275543591),\n",
" ('surprising', 0.90937028902958128),\n",
" ('quiet', 0.90648673177753425),\n",
" ('winter', 0.90624039602065365),\n",
" ('reveals', 0.90490540964902977),\n",
" ('raw', 0.90445627422715225),\n",
" ('funniest', 0.90078654533818991),\n",
" ('pleased', 0.89994159387262562),\n",
" ('norman', 0.89994159387262562),\n",
" ('thief', 0.89874642222324552),\n",
" ('season', 0.89827222637147675),\n",
" ('secrets', 0.89794159320595857),\n",
" ('colorful', 0.89705936994626756),\n",
" ('highest', 0.8967461358011849),\n",
" ('compelling', 0.89462923509297576),\n",
" ('danes', 0.89248008318043659),\n",
" ('castle', 0.88967708335606499),\n",
" ('kudos', 0.88889175768604067),\n",
" ('great', 0.88810470901464589),\n",
" ('baseball', 0.88730319500090271),\n",
" ('subtitles', 0.88730319500090271),\n",
" ('bleak', 0.88730319500090271),\n",
" ('winner', 0.88643776872447388),\n",
" ('tragedy', 0.88563699078315261),\n",
" ('todd', 0.88551907320740142),\n",
" ('nicely', 0.87924946019380601),\n",
" ('arthur', 0.87546873735389985),\n",
" ('essential', 0.87373111745535925),\n",
" ('gorgeous', 0.8731725250935497),\n",
" ('fonda', 0.87294029100054127),\n",
" ('eastwood', 0.87139541196626402),\n",
" ('focuses', 0.87082835779739776),\n",
" ('enjoyed', 0.87070195951624607),\n",
" ('natural', 0.86997924506912838),\n",
" ('intensity', 0.86835126958503595),\n",
" ('witty', 0.86824103423244681),\n",
" ('rob', 0.8642954367557748),\n",
" ('worlds', 0.86377269759070874),\n",
" ('health', 0.86113891179907498),\n",
" ('magical', 0.85953791528170564),\n",
" ('deeper', 0.85802182375017932),\n",
" ('lucy', 0.85618680780444956),\n",
" ('moving', 0.85566611005772031),\n",
" ('lovely', 0.85290640004681306),\n",
" ('purple', 0.8513711857748395),\n",
" ('memorable', 0.84801189112086062),\n",
" ('sings', 0.84729786038720367),\n",
" ('craig', 0.84342938360928321),\n",
" ('modesty', 0.84342938360928321),\n",
" ('relate', 0.84326559685926517),\n",
" ('episodes', 0.84223712084137292),\n",
" ('strong', 0.84167135777060931),\n",
" ('smith', 0.83959811108590054),\n",
" ('tear', 0.83704136022001441),\n",
" ('apartment', 0.83333115290549531),\n",
" ('princess', 0.83290912293510388),\n",
" ('disagree', 0.83290912293510388),\n",
" ('kung', 0.83173334384609199),\n",
" ('adventure', 0.83150561393278388),\n",
" ('columbo', 0.82667857318446791),\n",
" ('jake', 0.82667857318446791),\n",
" ('adds', 0.82485652591452319),\n",
" ('hart', 0.82472353834866463),\n",
" ('strength', 0.82417544296634937),\n",
" ('realizes', 0.82360006895738058),\n",
" ('dave', 0.8232003088081431),\n",
" ('childhood', 0.82208086393583857),\n",
" ('forbidden', 0.81989888619908913),\n",
" ('tight', 0.81883539572344199),\n",
" ('surreal', 0.8178506590609026),\n",
" ('manager', 0.81770990320170756),\n",
" ('dancer', 0.81574950265227764),\n",
" ('con', 0.81093021621632877),\n",
" ('studios', 0.81093021621632877),\n",
" ('miike', 0.80821651034473263),\n",
" ('realistic', 0.80807714723392232),\n",
" ('explicit', 0.80792269515237358),\n",
" ('kurt', 0.8060875917405409),\n",
" ('traditional', 0.80535917116687328),\n",
" ('deals', 0.80535917116687328),\n",
" ('holds', 0.80493858654806194),\n",
" ('carl', 0.80437281567016972),\n",
" ('touches', 0.80396154690023547),\n",
" ('gene', 0.80314807577427383),\n",
" ('albert', 0.8027669055771679),\n",
" ('abc', 0.80234647252493729),\n",
" ('cry', 0.80011930011211307),\n",
" ('sides', 0.7995275841185171),\n",
" ('develops', 0.79850769621777162),\n",
" ('eyre', 0.79850769621777162),\n",
" ('dances', 0.79694397424158891),\n",
" ('oscars', 0.79633141679517616),\n",
" ('legendary', 0.79600456599965308),\n",
" ('importance', 0.79492987486988764),\n",
" ('hearted', 0.79492987486988764),\n",
" ('portraying', 0.79356592830699269),\n",
" ('impressed', 0.79258107754813223),\n",
" ('waters', 0.79112758892014912),\n",
" ('empire', 0.79078565012386137),\n",
" ('edge', 0.789774016249017),\n",
" ('environment', 0.78845736036427028),\n",
" ('jean', 0.78845736036427028),\n",
" ('sentimental', 0.7864791203521645),\n",
" ('captured', 0.78623760362595729),\n",
" ('styles', 0.78592891401091158),\n",
" ('daring', 0.78592891401091158),\n",
" ('backgrounds', 0.78275933924963248),\n",
" ('frank', 0.78275933924963248),\n",
" ('matches', 0.78275933924963248),\n",
" ('tense', 0.78275933924963248),\n",
" ('gothic', 0.78209466657644144),\n",
" ('sharp', 0.7814397877056235),\n",
" ('achieved', 0.78015855754957497),\n",
" ('court', 0.77947526404844247),\n",
" ('steals', 0.7789140023173704),\n",
" ('rules', 0.77844476107184035),\n",
" ('colors', 0.77684619943659217),\n",
" ('reunion', 0.77318988823348167),\n",
" ('covers', 0.77139937745969345),\n",
" ('tale', 0.77010822169607374),\n",
" ('rain', 0.7683706017975328),\n",
" ('denzel', 0.76804848873306297),\n",
" ('stays', 0.76787072675588186),\n",
" ('blob', 0.76725515271366718),\n",
" ('conventional', 0.76214005204689672),\n",
" ('maria', 0.76214005204689672),\n",
" ('fresh', 0.76158434211317383),\n",
" ('midnight', 0.76096977689870637),\n",
" ('landscape', 0.75852993982279704),\n",
" ('animated', 0.75768570169751648),\n",
" ('titanic', 0.75666058628227129),\n",
" ('sunday', 0.75666058628227129),\n",
" ('spring', 0.7537718023763802),\n",
" ('cagney', 0.7537718023763802),\n",
" ('enjoyable', 0.75246375771636476),\n",
" ('immensely', 0.75198768058287868),\n",
" ('sir', 0.7507762933965817),\n",
" ('nevertheless', 0.75067102469813185),\n",
" ('driven', 0.74994477895307854),\n",
" ('performances', 0.74883252516063137),\n",
" ('memories', 0.74721440183022114),\n",
" ('nowadays', 0.74721440183022114),\n",
" ('simple', 0.74641420974143258),\n",
" ('golden', 0.74533293373051557),\n",
" ('leslie', 0.74533293373051557),\n",
" ('lovers', 0.74497224842453125),\n",
" ('relationship', 0.74484232345601786),\n",
" ('supporting', 0.74357803418683721),\n",
" ('che', 0.74262723782331497),\n",
" ('packed', 0.7410032017375805),\n",
" ('trek', 0.74021469141793106),\n",
" ('provoking', 0.73840377214806618),\n",
" ('strikes', 0.73759894313077912),\n",
" ('depiction', 0.73682224406260699),\n",
" ('emotional', 0.73678211645681524),\n",
" ('secretary', 0.7366322924996842),\n",
" ('influenced', 0.73511137965897755),\n",
" ('florida', 0.73511137965897755),\n",
" ('germany', 0.73288750920945944),\n",
" ('brings', 0.73142936713096229),\n",
" ('lewis', 0.73129894652432159),\n",
" ('elderly', 0.73088750854279239),\n",
" ('owner', 0.72743625403857748),\n",
" ('streets', 0.72666987259858895),\n",
" ('henry', 0.72642196944481741),\n",
" ('portrays', 0.72593700338293632),\n",
" ('bears', 0.7252354951114458),\n",
" ('china', 0.72489587887452556),\n",
" ('anger', 0.72439972406404984),\n",
" ('society', 0.72433010799663333),\n",
" ('available', 0.72415741730250549),\n",
" ('best', 0.72347034060446314),\n",
" ('bugs', 0.72270598280148979),\n",
" ('magic', 0.71878961117328299),\n",
" ('verhoeven', 0.71846498854423513),\n",
" ('delivers', 0.71846498854423513),\n",
" ('jim', 0.71783979315031676),\n",
" ('donald', 0.71667767797013937),\n",
" ('endearing', 0.71465338578090898),\n",
" ('relationships', 0.71393795022901896),\n",
" ('greatly', 0.71256526641704687),\n",
" ('charlie', 0.71024161391924534),\n",
" ('brad', 0.71024161391924534),\n",
" ('simon', 0.70967648251115578),\n",
" ('effectively', 0.70914752190638641),\n",
" ('march', 0.70774597998109789),\n",
" ('atmosphere', 0.70744773070214162),\n",
" ('influence', 0.70733181555190172),\n",
" ('genius', 0.706392407309966),\n",
" ('emotionally', 0.70556970055850243),\n",
" ('ken', 0.70526854109229009),\n",
" ('identity', 0.70484322032313651),\n",
" ('sophisticated', 0.70470800296102132),\n",
" ('dan', 0.70457587638356811),\n",
" ('andrew', 0.70329955202396321),\n",
" ('india', 0.70144598337464037),\n",
" ('roy', 0.69970458110610434),\n",
" ('surprisingly', 0.6995780708902356),\n",
" ('sky', 0.69780919366575667),\n",
" ('romantic', 0.69664981111114743),\n",
" ('match', 0.69566924999265523),\n",
" ('britain', 0.69314718055994529),\n",
" ('beatty', 0.69314718055994529),\n",
" ('affected', 0.69314718055994529),\n",
" ('cowboy', 0.69314718055994529),\n",
" ('wave', 0.69314718055994529),\n",
" ('stylish', 0.69314718055994529),\n",
" ('bitter', 0.69314718055994529),\n",
" ('patient', 0.69314718055994529),\n",
" ('meets', 0.69314718055994529),\n",
" ('love', 0.69198533541937324),\n",
" ('paul', 0.68980827929443067),\n",
" ('andy', 0.68846333124751902),\n",
" ('performance', 0.68797386327972465),\n",
" ('patrick', 0.68645819240914863),\n",
" ('unlike', 0.68546468438792907),\n",
" ('brooks', 0.68433655087779044),\n",
" ('refuses', 0.68348526964820844),\n",
" ('award', 0.6824518914431974),\n",
" ('complaint', 0.6824518914431974),\n",
" ('ride', 0.68229716453587952),\n",
" ('dawson', 0.68171848473632257),\n",
" ('luke', 0.68158635815886937),\n",
" ('wells', 0.68087708796813096),\n",
" ('france', 0.6804081547825156),\n",
" ('handsome', 0.68007509899259255),\n",
" ('sports', 0.68007509899259255),\n",
" ('rebel', 0.67875844310784572),\n",
" ('directs', 0.67875844310784572),\n",
" ('greater', 0.67605274720064523),\n",
" ('dreams', 0.67599410133369586),\n",
" ('effective', 0.67565402311242806),\n",
" ('interpretation', 0.67479804189174875),\n",
" ('works', 0.67445504754779284),\n",
" ('brando', 0.67445504754779284),\n",
" ('noble', 0.6737290947028437),\n",
" ('paced', 0.67314651385327573),\n",
" ('le', 0.67067432470788668),\n",
" ('master', 0.67015766233524654),\n",
" ('h', 0.6696166831497512),\n",
" ('rings', 0.66904962898088483),\n",
" ('easy', 0.66895995494594152),\n",
" ('city', 0.66820823221269321),\n",
" ('sunshine', 0.66782937257565544),\n",
" ('succeeds', 0.66647893347778397),\n",
" ('relations', 0.664159643686693),\n",
" ('england', 0.66387679825983203),\n",
" ('glimpse', 0.66329421741026418),\n",
" ('aired', 0.66268797307523675),\n",
" ('sees', 0.66263163663399482),\n",
" ('both', 0.66248336767382998),\n",
" ('definitely', 0.66199789483898808),\n",
" ('imaginative', 0.66139848224536502),\n",
" ('appreciate', 0.66083893732728749),\n",
" ('tricks', 0.66071190480679143),\n",
" ('striking', 0.66071190480679143),\n",
" ('carefully', 0.65999497324304479),\n",
" ('complicated', 0.65981076029235353),\n",
" ('perspective', 0.65962448852130173),\n",
" ('trilogy', 0.65877953705573755),\n",
" ('future', 0.65834665141052828),\n",
" ('lion', 0.65742909795786608),\n",
" ('victor', 0.65540685257709819),\n",
" ('douglas', 0.65540685257709819),\n",
" ('inspired', 0.65459851044271034),\n",
" ('marriage', 0.65392646740666405),\n",
" ('demands', 0.65392646740666405),\n",
" ('father', 0.65172321672194655),\n",
" ('page', 0.65123628494430852),\n",
" ('instant', 0.65058756614114943),\n",
" ('era', 0.6495567444850836),\n",
" ('ruthless', 0.64934455790155243),\n",
" ('saga', 0.64934455790155243),\n",
" ('joan', 0.64891392558311978),\n",
" ('joseph', 0.64841128671855386),\n",
" ('workers', 0.64829661439459352),\n",
" ('fantasy', 0.64726757480925168),\n",
" ('accomplished', 0.64551913157069074),\n",
" ('distant', 0.64551913157069074),\n",
" ('manhattan', 0.64435701639051324),\n",
" ('personal', 0.64355023942057321),\n",
" ('pushing', 0.64313675998528386),\n",
" ('meeting', 0.64313675998528386),\n",
" ('individual', 0.64313675998528386),\n",
" ('pleasant', 0.64250344774119039),\n",
" ('brave', 0.64185388617239469),\n",
" ('william', 0.64083139119578469),\n",
" ('hudson', 0.64077919504262937),\n",
" ('friendly', 0.63949446706762514),\n",
" ('eccentric', 0.63907995928966954),\n",
" ('awards', 0.63875310849414646),\n",
" ('jack', 0.63838309514997038),\n",
" ('seeking', 0.63808740337691783),\n",
" ('colonel', 0.63757732940513456),\n",
" ('divorce', 0.63757732940513456),\n",
" ('jane', 0.63443957973316734),\n",
" ('keeping', 0.63414883979798953),\n",
" ('gives', 0.63383568159497883),\n",
" ('ted', 0.63342794585832296),\n",
" ('animation', 0.63208692379869902),\n",
" ('progress', 0.6317782341836532),\n",
" ('concert', 0.63127177684185776),\n",
" ('larger', 0.63127177684185776),\n",
" ('nation', 0.6296337748376194),\n",
" ('albeit', 0.62739580299716491),\n",
" ('adapted', 0.62613647027698516),\n",
" ('discovers', 0.62542900650499444),\n",
" ('classic', 0.62504956428050518),\n",
" ('segment', 0.62335141862440335),\n",
" ('morgan', 0.62303761437291871),\n",
" ('mouse', 0.62294292188669675),\n",
" ('impressive', 0.62211140744319349),\n",
" ('artist', 0.62168821657780038),\n",
" ('ultimate', 0.62168821657780038),\n",
" ('griffith', 0.62117368093485603),\n",
" ('emily', 0.62082651898031915),\n",
" ('drew', 0.62082651898031915),\n",
" ('moved', 0.6197197120051281),\n",
" ('profound', 0.61903920840622351),\n",
" ('families', 0.61903920840622351),\n",
" ('innocent', 0.61851219917136446),\n",
" ('versions', 0.61730910416844087),\n",
" ('eddie', 0.61691981517206107),\n",
" ('criticism', 0.61651395453902935),\n",
" ('nature', 0.61594514653194088),\n",
" ('recognized', 0.61518563909023349),\n",
" ('sexuality', 0.61467556511845012),\n",
" ('contract', 0.61400986000122149),\n",
" ('brian', 0.61344043794920278),\n",
" ('remembered', 0.6131044728864089),\n",
" ('determined', 0.6123858239154869),\n",
" ('offers', 0.61207935747116349),\n",
" ('pleasure', 0.61195702582993206),\n",
" ('washington', 0.61180154110599294),\n",
" ('images', 0.61159731359583758),\n",
" ('games', 0.61067095873570676),\n",
" ('academy', 0.60872983874736208),\n",
" ('fashioned', 0.60798937221963845),\n",
" ('melodrama', 0.60749173598145145),\n",
" ('peoples', 0.60613580357031549),\n",
" ('charismatic', 0.60613580357031549),\n",
" ('rough', 0.60613580357031549),\n",
" ('dealing', 0.60517840761398811),\n",
" ('fine', 0.60496962268013299),\n",
" ('tap', 0.60391604683200273),\n",
" ('trio', 0.60157998703445481),\n",
" ('russell', 0.60120968523425966),\n",
" ('figures', 0.60077386042893011),\n",
" ('ward', 0.60005675749393339),\n",
" ('shine', 0.59911823091166894),\n",
" ('brady', 0.59911823091166894),\n",
" ('job', 0.59845562125168661),\n",
" ('satisfied', 0.59652034487087369),\n",
" ('river', 0.59637962862495086),\n",
" ('brown', 0.595773016534769),\n",
" ('believable', 0.59566072133302495),\n",
" ('bound', 0.59470710774669278),\n",
" ('always', 0.59470710774669278),\n",
" ('hall', 0.5933967777928858),\n",
" ('cook', 0.5916777203950857),\n",
" ('claire', 0.59136448625000293),\n",
" ('broadway', 0.59033768669372433),\n",
" ('anna', 0.58778666490211906),\n",
" ('peace', 0.58628403501758408),\n",
" ('visually', 0.58539431926349916),\n",
" ('falk', 0.58525821854876026),\n",
" ('morality', 0.58525821854876026),\n",
" ('growing', 0.58466653756587539),\n",
" ('experiences', 0.58314628534561685),\n",
" ('stood', 0.58314628534561685),\n",
" ('touch', 0.58122926435596001),\n",
" ('lives', 0.5810976767513224),\n",
" ('kubrick', 0.58066919713325493),\n",
" ('timing', 0.58047401805583243),\n",
" ('struggles', 0.57981849525294216),\n",
" ('expressions', 0.57981849525294216),\n",
" ('authentic', 0.57848427223980559),\n",
" ('helen', 0.57763429343810091),\n",
" ('pre', 0.57700753064729182),\n",
" ('quirky', 0.5753641449035618),\n",
" ('young', 0.57531672344534313),\n",
" ('inner', 0.57454143815209846),\n",
" ('mexico', 0.57443087372056334),\n",
" ('clint', 0.57380042292737909),\n",
" ('sisters', 0.57286101468544337),\n",
" ('realism', 0.57226528899949558),\n",
" ('personalities', 0.5720692490067093),\n",
" ('french', 0.5720692490067093),\n",
" ('surprises', 0.57113222999698177),\n",
" ('adventures', 0.57113222999698177),\n",
" ('overcome', 0.5697681593994407),\n",
" ('timothy', 0.56953322459276867),\n",
" ('tales', 0.56909453188996639),\n",
" ('war', 0.56843317302781682),\n",
" ('civil', 0.5679840376059393),\n",
" ('countries', 0.56737779327091187),\n",
" ('streep', 0.56710645966458029),\n",
" ('tradition', 0.56685345523565323),\n",
" ('oliver', 0.56673325570428668),\n",
" ('australia', 0.56580775818334383),\n",
" ('understanding', 0.56531380905006046),\n",
" ('players', 0.56509525370004821),\n",
" ('knowing', 0.56489284503626647),\n",
" ('rogers', 0.56421349718405212),\n",
" ('suspenseful', 0.56368911332305849),\n",
" ('variety', 0.56368911332305849),\n",
" ('true', 0.56281525180810066),\n",
" ('jr', 0.56220982311246936),\n",
" ('psychological', 0.56108745854687891),\n",
" ('branagh', 0.55961578793542266),\n",
" ('wealth', 0.55961578793542266),\n",
" ('performing', 0.55961578793542266),\n",
" ('odds', 0.55961578793542266),\n",
" ('sent', 0.55961578793542266),\n",
" ('reminiscent', 0.55961578793542266),\n",
" ('grand', 0.55961578793542266),\n",
" ('overwhelming', 0.55961578793542266),\n",
" ('brothers', 0.55891181043362848),\n",
" ('howard', 0.55811089675600245),\n",
" ('david', 0.55693122256475369),\n",
" ('generation', 0.55628799784274796),\n",
" ('grow', 0.55612538299565417),\n",
" ('survival', 0.55594605904646033),\n",
" ('mainstream', 0.55574731115750231),\n",
" ('dick', 0.55431073570572953),\n",
" ('charm', 0.55288175575407861),\n",
" ('kirk', 0.55278982286502287),\n",
" ('twists', 0.55244729845681018),\n",
" ('gangster', 0.55206858230003986),\n",
" ('jeff', 0.55179306225421365),\n",
" ('family', 0.55116244510065526),\n",
" ('tend', 0.55053307336110335),\n",
" ('thanks', 0.55049088015842218),\n",
" ('world', 0.54744234723432639),\n",
" ('sutherland', 0.54743536937855164),\n",
" ('life', 0.54695514434959924),\n",
" ('disc', 0.54654370636806993),\n",
" ('bug', 0.54654370636806993),\n",
" ('tribute', 0.5455111817538808),\n",
" ('europe', 0.54522705048332309),\n",
" ('sacrifice', 0.54430155296238014),\n",
" ('color', 0.54405127139431109),\n",
" ('superior', 0.54333490233128523),\n",
" ('york', 0.54318235866536513),\n",
" ('pulls', 0.54266622962164945),\n",
" ('hearts', 0.54232429082536171),\n",
" ('jackson', 0.54232429082536171),\n",
" ('enjoy', 0.54124285135906114),\n",
" ('redemption', 0.54056759296472823),\n",
" ('madness', 0.540384426007535),\n",
" ('hamilton', 0.5389965007326869),\n",
" ('stands', 0.5389965007326869),\n",
" ('trial', 0.5389965007326869),\n",
" ('greek', 0.5389965007326869),\n",
" ('each', 0.5388212312554177),\n",
" ('faithful', 0.53773307668591508),\n",
" ('received', 0.5372768098531604),\n",
" ('jealous', 0.53714293208336406),\n",
" ('documentaries', 0.53714293208336406),\n",
" ('different', 0.53709860682460819),\n",
" ('describes', 0.53680111016925136),\n",
" ('shorts', 0.53596159703753288),\n",
" ('brilliance', 0.53551823635636209),\n",
" ('mountains', 0.53492317534505118),\n",
" ('share', 0.53408248593025787),\n",
" ('dealt', 0.53408248593025787),\n",
" ('providing', 0.53329847961804933),\n",
" ('explore', 0.53329847961804933),\n",
" ('series', 0.5325809226575603),\n",
" ('fellow', 0.5323318289869543),\n",
" ('loves', 0.53062825106217038),\n",
" ('olivier', 0.53062825106217038),\n",
" ('revolution', 0.53062825106217038),\n",
" ('roman', 0.53062825106217038),\n",
" ('century', 0.53002783074992665),\n",
" ('musical', 0.52966871156747064),\n",
" ('heroic', 0.52925932545482868),\n",
" ('ironically', 0.52806743020049673),\n",
" ('approach', 0.52806743020049673),\n",
" ('temple', 0.52806743020049673),\n",
" ('moves', 0.5279372642387119),\n",
" ('gift', 0.52702030968597136),\n",
" ('julie', 0.52609309589677911),\n",
" ('tells', 0.52415107836314001),\n",
" ('radio', 0.52394671172868779),\n",
" ('uncle', 0.52354439617376536),\n",
" ('union', 0.52324814376454787),\n",
" ('deep', 0.52309571635780505),\n",
" ('reminds', 0.52157841554225237),\n",
" ('famous', 0.52118841080153722),\n",
" ('jazz', 0.52053443789295151),\n",
" ('dennis', 0.51987545928590861),\n",
" ('epic', 0.51919387343650736),\n",
" ('adult', 0.519167695083386),\n",
" ('shows', 0.51915322220375304),\n",
" ('performed', 0.5191244265806858),\n",
" ('demons', 0.5191244265806858),\n",
" ('eric', 0.51879379341516751),\n",
" ('discovered', 0.51879379341516751),\n",
" ('youth', 0.5185626062681431),\n",
" ('human', 0.51851411224987087),\n",
" ('tarzan', 0.51813827061227724),\n",
" ('ourselves', 0.51794309153485463),\n",
" ('wwii', 0.51758240622887042),\n",
" ('passion', 0.5162164724008671),\n",
" ('desire', 0.51607497965213445),\n",
" ('pays', 0.51581316527702981),\n",
" ('fox', 0.51557622652458857),\n",
" ('dirty', 0.51557622652458857),\n",
" ('symbolism', 0.51546600332249293),\n",
" ('sympathetic', 0.51546600332249293),\n",
" ('attitude', 0.51530993621331933),\n",
" ('appearances', 0.51466440007315639),\n",
" ('jeremy', 0.51466440007315639),\n",
" ('fun', 0.51439068993048687),\n",
" ('south', 0.51420972175023116),\n",
" ('arrives', 0.51409894911095988),\n",
" ('present', 0.51341965894303732),\n",
" ('com', 0.51326167856387173),\n",
" ('smile', 0.51265880484765169),\n",
" ('fits', 0.51082562376599072),\n",
" ('provided', 0.51082562376599072),\n",
" ('carter', 0.51082562376599072),\n",
" ('ring', 0.51082562376599072),\n",
" ('aging', 0.51082562376599072),\n",
" ('countryside', 0.51082562376599072),\n",
" ('alan', 0.51082562376599072),\n",
" ('visit', 0.51082562376599072),\n",
" ('begins', 0.51015650363396647),\n",
" ('success', 0.50900578704900468),\n",
" ('japan', 0.50900578704900468),\n",
" ('accurate', 0.50895471583017893),\n",
" ('proud', 0.50800474742434931),\n",
" ('daily', 0.5075946031845443),\n",
" ('atmospheric', 0.50724780241810674),\n",
" ('karloff', 0.50724780241810674),\n",
" ('recently', 0.50714914903668207),\n",
" ('fu', 0.50704490092608467),\n",
" ('horrors', 0.50656122497953315),\n",
" ('finding', 0.50637127341661037),\n",
" ('lust', 0.5059356384717989),\n",
" ('hitchcock', 0.50574947073413001),\n",
" ('among', 0.50334004951332734),\n",
" ('viewing', 0.50302139827440906),\n",
" ('shining', 0.50262885656181222),\n",
" ('investigation', 0.50262885656181222),\n",
" ('duo', 0.5020919437972361),\n",
" ('cameron', 0.5020919437972361),\n",
" ('finds', 0.50128303100539795),\n",
" ('contemporary', 0.50077528791248915),\n",
" ('genuine', 0.50046283673044401),\n",
" ('frightening', 0.49995595152908684),\n",
" ('plays', 0.49975983848890226),\n",
" ('age', 0.49941323171424595),\n",
" ('position', 0.49899116611898781),\n",
" ('continues', 0.49863035067217237),\n",
" ('roles', 0.49839716550752178),\n",
" ('james', 0.49837216269470402),\n",
" ('individuals', 0.49824684155913052),\n",
" ('brought', 0.49783842823917956),\n",
" ('hilarious', 0.49714551986191058),\n",
" ('brutal', 0.49681488669639234),\n",
" ('appropriate', 0.49643688631389105),\n",
" ('dance', 0.49581998314812048),\n",
" ('league', 0.49578774640145024),\n",
" ('helping', 0.49578774640145024),\n",
" ('answers', 0.49578774640145024),\n",
" ('stunts', 0.49561620510246196),\n",
" ('traveling', 0.49532143723002542),\n",
" ('thoroughly', 0.49414593456733524),\n",
" ('depicted', 0.49317068852726992),\n",
" ('honor', 0.49247648509779424),\n",
" ('combination', 0.49247648509779424),\n",
" ('differences', 0.49247648509779424),\n",
" ('fully', 0.49213349075383811),\n",
" ('tracy', 0.49159426183810306),\n",
" ('battles', 0.49140753790888908),\n",
" ('possibility', 0.49112055268665822),\n",
" ('romance', 0.4901589869574316),\n",
" ('initially', 0.49002249613622745),\n",
" ('happy', 0.4898997500608791),\n",
" ('crime', 0.48977221456815834),\n",
" ('singing', 0.4893852925281213),\n",
" ('especially', 0.48901267837860624),\n",
" ('shakespeare', 0.48754793889664511),\n",
" ('hugh', 0.48729512635579658),\n",
" ('detail', 0.48609484250827351),\n",
" ('guide', 0.48550781578170082),\n",
" ('companion', 0.48550781578170082),\n",
" ('julia', 0.48550781578170082),\n",
" ('san', 0.48550781578170082),\n",
" ('desperation', 0.48550781578170082),\n",
" ('strongly', 0.48460242866688824),\n",
" ('necessary', 0.48302334245403883),\n",
" ('humanity', 0.48265474679929443),\n",
" ('drama', 0.48221998493060503),\n",
" ('warming', 0.48183808689273838),\n",
" ('intrigue', 0.48183808689273838),\n",
" ('nonetheless', 0.48183808689273838),\n",
" ('cuba', 0.48183808689273838),\n",
" ('planned', 0.47957308026188628),\n",
" ('pictures', 0.47929937011921681),\n",
" ('broadcast', 0.47849024312305422),\n",
" ('nine', 0.47803580094299974),\n",
" ('settings', 0.47743860773325364),\n",
" ('history', 0.47732966933780852),\n",
" ('ordinary', 0.47725880012690741),\n",
" ('trade', 0.47692407209030935),\n",
" ('primary', 0.47608267532211779),\n",
" ('official', 0.47608267532211779),\n",
" ('episode', 0.47529620261150429),\n",
" ('role', 0.47520268270188676),\n",
" ('spirit', 0.47477690799839323),\n",
" ('grey', 0.47409361449726067),\n",
" ('ways', 0.47323464982718205),\n",
" ('cup', 0.47260441094579297),\n",
" ('piano', 0.47260441094579297),\n",
" ('familiar', 0.47241617565111949),\n",
" ('sinister', 0.47198579044972683),\n",
" ('reveal', 0.47171449364936496),\n",
" ('max', 0.47150852042515579),\n",
" ('dated', 0.47121648567094482),\n",
" ('discovery', 0.47000362924573563),\n",
" ('vicious', 0.47000362924573563),\n",
" ('losing', 0.47000362924573563),\n",
" ('genuinely', 0.46871413841586385),\n",
" ('hatred', 0.46734051182625186),\n",
" ('mistaken', 0.46702300110759781),\n",
" ('dream', 0.46608972992459924),\n",
" ('challenge', 0.46608972992459924),\n",
" ('crisis', 0.46575733836428446),\n",
" ('photographed', 0.46488852857896512),\n",
" ('machines', 0.46430560813109778),\n",
" ('critics', 0.46430560813109778),\n",
" ('bird', 0.46430560813109778),\n",
" ('born', 0.46411383518967209),\n",
" ('detective', 0.4636633473511525),\n",
" ('higher', 0.46328467899699055),\n",
" ('remains', 0.46262352194811296),\n",
" ('inevitable', 0.46262352194811296),\n",
" ('soviet', 0.4618180446592961),\n",
" ('ryan', 0.46134556650262099),\n",
" ('african', 0.46112595521371813),\n",
" ('smaller', 0.46081520319132935),\n",
" ('techniques', 0.46052488529119184),\n",
" ('information', 0.46034171833399862),\n",
" ('deserved', 0.45999798712841444),\n",
" ('cynical', 0.45953232937844013),\n",
" ('lynch', 0.45953232937844013),\n",
" ('francisco', 0.45953232937844013),\n",
" ('tour', 0.45953232937844013),\n",
" ('spielberg', 0.45953232937844013),\n",
" ('struggle', 0.45911782160048453),\n",
" ('language', 0.45902121257712653),\n",
" ('visual', 0.45823514408822852),\n",
" ('warner', 0.45724137763188427),\n",
" ('social', 0.45720078250735313),\n",
" ('reality', 0.45719346885019546),\n",
" ('hidden', 0.45675840249571492),\n",
" ('breaking', 0.45601738727099561),\n",
" ('sometimes', 0.45563021171182794),\n",
" ('modern', 0.45500247579345005),\n",
" ('surfing', 0.45425527227759638),\n",
" ('popular', 0.45410691533051023),\n",
" ('surprised', 0.4534409399850382),\n",
" ('follows', 0.45245361754408348),\n",
" ('keeps', 0.45234869400701483),\n",
" ('john', 0.4520909494482197),\n",
" ('defeat', 0.45198512374305722),\n",
" ('mixed', 0.45198512374305722),\n",
" ('justice', 0.45142724367280018),\n",
" ('treasure', 0.45083371313801535),\n",
" ('presents', 0.44973793178615257),\n",
" ('years', 0.44919197032104968),\n",
" ('chief', 0.44895022004790319),\n",
" ('shadows', 0.44802472252696035),\n",
" ('closely', 0.44701411102103689),\n",
" ('segments', 0.44701411102103689),\n",
" ('lose', 0.44658335503763702),\n",
" ('caine', 0.44628710262841953),\n",
" ('caught', 0.44610275383999071),\n",
" ('hamlet', 0.44558510189758965),\n",
" ('chinese', 0.44507424620321018),\n",
" ('welcome', 0.44438052435783792),\n",
" ('birth', 0.44368632092836219),\n",
" ('represents', 0.44320543609101143),\n",
" ('puts', 0.44279106572085081),\n",
" ('fame', 0.44183275227903923),\n",
" ('closer', 0.44183275227903923),\n",
" ('visuals', 0.44183275227903923),\n",
" ('web', 0.44183275227903923),\n",
" ('criminal', 0.4412745608048752),\n",
" ('minor', 0.4409224199448939),\n",
" ('jon', 0.44086703515908027),\n",
" ('liked', 0.44074991514020723),\n",
" ('restaurant', 0.44031183943833246),\n",
" ('flaws', 0.43983275161237217),\n",
" ('de', 0.43983275161237217),\n",
" ('searching', 0.4393666597838457),\n",
" ('rap', 0.43891304217570443),\n",
" ('light', 0.43884433018199892),\n",
" ('elizabeth', 0.43872232986464677),\n",
" ('marry', 0.43861731542506488),\n",
" ('oz', 0.43825493093115531),\n",
" ('controversial', 0.43825493093115531),\n",
" ('learned', 0.43825493093115531),\n",
" ('slowly', 0.43785660389939979),\n",
" ('bridge', 0.43721380642274466),\n",
" ('thrilling', 0.43721380642274466),\n",
" ('wayne', 0.43721380642274466),\n",
" ('comedic', 0.43721380642274466),\n",
" ('married', 0.43658501682196887),\n",
" ('nazi', 0.4361020775700542),\n",
" ('murder', 0.4353180712578455),\n",
" ('physical', 0.4353180712578455),\n",
" ('johnny', 0.43483971678806865),\n",
" ('michelle', 0.43445264498141672),\n",
" ('wallace', 0.43403848055222038),\n",
" ('silent', 0.43395706390247063),\n",
" ('comedies', 0.43395706390247063),\n",
" ('played', 0.43387244114515305),\n",
" ('international', 0.43363598507486073),\n",
" ('vision', 0.43286408229627887),\n",
" ('intelligent', 0.43196704885367099),\n",
" ('shop', 0.43078291609245434),\n",
" ('also', 0.43036720209769169),\n",
" ('levels', 0.4302451371066513),\n",
" ('miss', 0.43006426712153217),\n",
" ('ocean', 0.4295626596872249),\n",
" ...]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# words most frequently seen in a review with a \"POSITIVE\" label\n",
"pos_neg_ratios.most_common()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"[('boll', -4.0778152602708904),\n",
" ('uwe', -3.9218753018711578),\n",
" ('seagal', -3.3202501058581921),\n",
" ('unwatchable', -3.0269848170580955),\n",
" ('stinker', -2.9876839403711624),\n",
" ('mst', -2.7753833211707968),\n",
" ('incoherent', -2.7641396677532537),\n",
" ('unfunny', -2.5545257844967644),\n",
" ('waste', -2.4907515123361046),\n",
" ('blah', -2.4475792789485005),\n",
" ('horrid', -2.3715779644809971),\n",
" ('pointless', -2.3451073877136341),\n",
" ('atrocious', -2.3187369339642556),\n",
" ('redeeming', -2.2667790015910296),\n",
" ('prom', -2.2601040980178784),\n",
" ('drivel', -2.2476029585766928),\n",
" ('lousy', -2.2118080125207054),\n",
" ('worst', -2.1930856334332267),\n",
" ('laughable', -2.172468615469592),\n",
" ('awful', -2.1385076866397488),\n",
" ('poorly', -2.1326133844207011),\n",
" ('wasting', -2.1178155545614512),\n",
" ('remotely', -2.111046881095167),\n",
" ('existent', -2.0024805005437076),\n",
" ('boredom', -1.9241486572738005),\n",
" ('miserably', -1.9216610938019989),\n",
" ('sucks', -1.9166645809588516),\n",
" ('uninspired', -1.9131499212248517),\n",
" ('lame', -1.9117232884159072),\n",
" ('insult', -1.9085323769376259)]"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# words most frequently seen in a review with a \"NEGATIVE\" label\n",
"list(reversed(pos_neg_ratios.most_common()))[0:30]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Transforming Text into Numbers"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAiIAAAFKCAYAAAAg+zSAAAAABGdBTUEAALGPC/xhBQAAACBjSFJN\nAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAB1WlUWHRYTUw6Y29tLmFkb2Jl\nLnhtcAAAAAAAPHg6eG1wbWV0YSB4bWxuczp4PSJhZG9iZTpuczptZXRhLyIgeDp4bXB0az0iWE1Q\nIENvcmUgNS40LjAiPgogICA8cmRmOlJERiB4bWxuczpyZGY9Imh0dHA6Ly93d3cudzMub3JnLzE5\nOTkvMDIvMjItcmRmLXN5bnRheC1ucyMiPgogICAgICA8cmRmOkRlc2NyaXB0aW9uIHJkZjphYm91\ndD0iIgogICAgICAgICAgICB4bWxuczp0aWZmPSJodHRwOi8vbnMuYWRvYmUuY29tL3RpZmYvMS4w\nLyI+CiAgICAgICAgIDx0aWZmOkNvbXByZXNzaW9uPjE8L3RpZmY6Q29tcHJlc3Npb24+CiAgICAg\nICAgIDx0aWZmOk9yaWVudGF0aW9uPjE8L3RpZmY6T3JpZW50YXRpb24+CiAgICAgICAgIDx0aWZm\nOlBob3RvbWV0cmljSW50ZXJwcmV0YXRpb24+MjwvdGlmZjpQaG90b21ldHJpY0ludGVycHJldGF0\naW9uPgogICAgICA8L3JkZjpEZXNjcmlwdGlvbj4KICAgPC9yZGY6UkRGPgo8L3g6eG1wbWV0YT4K\nAtiABQAAQABJREFUeAHtnXvQXVV5/1daZxy1BUpJp1MhE5BSSSAgqBAV5BIuGaQJBoEUATEJAiXY\ncMsUTfMDK9MAMXKRAEmAgGkASUiGIgQSsEQgKGDCJV6GYkywfzRWibc/OuO8v/1Zuo7r3e/e5+zr\n2ZfzfWbOe/bZe12e9V373eu7n/WsZ40aCsRIhIAQEAJCQAgIASFQAQJ/UkGdqlIICAEhIASEgBAQ\nAhYBERHdCEJACAgBISAEhEBlCIiIVAa9KhYCQkAICAEhIARERHQPCAEhIASEgBAQApUhICJSGfSq\nWAgIASEgBISAEBAR0T0gBISAEBACQkAIVIaAiEhl0KtiISAEhIAQEAJCQERE94AQEAJCQAgIASFQ\nGQIiIpVBr4qFgBAQAkJACAgBERHdA0JACAgBISAEhEBlCIiIVAa9KhYCQkAICAEhIARERHQPCAEh\nIASEgBAQApUhICJSGfSqWAgIASEgBISAEBAR0T0gBISAEBACQkAIVIaAiEhl0KtiISAEhIAQEAJC\nQERE94AQEAJCQAgIASFQGQIiIpVBr4qFgBAQAkJACAgBERHdA0JACAgBISAEhEBlCIiIVAa9KhYC\nQkAICAEhIARERHQPCAEhIASEgBAQApUhICJSGfSqWAgIASEgBISAEBAR0T0gBISAEBACQkAIVIaA\niEhl0KtiISAEhIAQEAJCQERE94AQEAJCQAgIASFQGQIiIpVBr4qFgBAQAkJACAgBERHdA0JACAgB\nISAEhEBlCIiIVAa9KhYCQkAICAEhIARERHQPCAEhIASEgBAQApUhICJSAPQXX3yxGTVqlPnlL39Z\nQGkqQggIASEgBITA4CAgIjI4fR3Z0iVLlpgHH3ww8ppOCgEhIASEgBAoG4FRQ4GUXYnKry8CJ510\nknnf+95nbrvttvoqKc2EgBAQAkKgtQjIItLarlXDhIAQEAJCQAjUHwERkfr3kTQUAkJACAgBIdBa\nBERECuhanFXHjBkzrKR169ZZB9atW7daHwymQHBodZ8bbrhhWHp+kJbr+G1wHM7Db65FiV9f1HXO\noSO6ItRPXU888YRZvHhxRy853Fp49EcICAEhIAT6hICISMlAz5kzxyxbtszMmDHD4I7D5/HHHzfr\n16+3RCOq+u9973tm/Pjx5oMf/GAnD/kmTZpkLrjgAjN9+vSobKnOXXnllbbsE0880Vx00UWdenbb\nbbdU5SixEBACQkAICIE8CLwjT2bl7Y3Az372M/P0008bf4DHsrHPPvtYsoGFY9asWcMKwkJx5513\njjgPeTjqqKPMxIkTzWGHHWb4LRECQkAICAEh0GQEZBEpufcuvPDCYSTEVTdu3DhLRrZt2+ZOdb4h\nGWFy4i4eeeSR1oJxyy23uFP6FgJCQAgIASHQWAREREruuoMPPrhrDb/4xS9GXD/rrLNGnPNPHHPM\nMWbHjh1m06ZN/mkdCwEhIASEgBBoHAIiIiV3mT8lk7SqPfbYo2vS3Xff3V7ftWtX13S6KASEgBAQ\nAkKg7giIiNS9h7roJyLSBRxdEgJCQAgIgUYgICJSw256++23u2q1fft2ez28ZLhrJl0UAkJACAgB\nIVBDBEREatgp999/f1etnnrqKevoiuNqWOLigOBPgl+JRAgIASEgBIRAnRAQEalTb/xBl5dffjk2\ncBkb1EFU5s2bN0xzlvSyJHjjxo3DzrsfN910kzvUtxAQAkJACAiB2iAgIlKbrvijItdff725/fbb\nbRRU38JBNNQzzzzTLt8NL+/FKXb27NnmqquuGkZi3nrrLRsA7Uc/+pElKn+s5fdHe+65p3nhhRfC\np/VbCAgBISAEhEBfEBAR6QvM6Sph1cxLL71k9t13X3PQQQd1wq8TjZVAZ3E75RLgjOuQGBdKHisJ\nQlC1KMGysnPnzk56QstLhIAQEAJCQAj0C4FRQejwoX5Vpnq6IwAJILR7VFTV7jl1VQgIASEgBIRA\nMxGQRaSZ/SathYAQEAJCQAi0AgERkVZ0oxohBISAEBACQqCZCIiINLPfpLUQEAJCQAgIgVYgICLS\nim5UI4SAEBACQkAINBMBOas2s9+ktRAQAkJACAiBViAgi0grulGNEAJCQAgIASHQTARERJrZb9Ja\nCAgBISAEhEArEBARaUU3qhFCQAgIASEgBJqJgIhIM/tNWgsBISAEhIAQaAUCIiKt6MaRjfjVr35l\nHn744ZEXdEYICAEhIASEQI0Q0KqZGnVG0aq8973vNd/97nfN3/zN3xRdtMoTAkJACAgBIVAIArKI\nFAJjPQuZPn26WbZsWT2Vk1ZCQAgIASEgBAIEZBFp8W3wwx/+0Bx33HHmpz/9aYtbqaYJASEgBIRA\nkxGQRaTJvddD97/7u78zo0ePNhs2bOiRUpeFgBAQAkJACFSDgIhINbj3rdY5c+aYlStX9q0+VSQE\nhIAQEAJCIA0CmppJg1YD0/73f/+3wWn1l7/8pfnzP//zBrZAKgsBISAEhECbEZBFpM29G7SNFTMz\nZswwq1evbnlL1TwhIASEgBBoIgIiIk3stZQ6s3pm0aJFKXMpuRAQAkJACAiB8hEQESkf48prOP74\n483OnTsNq2gkQkAICAEhIATqhICISJ16o0RdLrzwQrNkyZISa1DRQkAICAEhIATSIyBn1fSYNTKH\nYoo0stuktBAQAkKg9QjIItL6Lv59A4kpwkf7zwxIh6uZQkAICIGGICAi0pCOKkLN2bNnmxUrVhRR\nlMoQAkJACAgBIVAIApqaKQTGZhTCjry77babDfmujfCa0WfSUggIASHQdgRkEWl7D3vtI6DZ5Zdf\nbh566CHvrA6FgBAQAkJACFSHgIhIddhXUvPkyZPNXXfdVUndqlQICAEhIASEQBgBEZEwIi3/TUwR\n5Lvf/W7LW6rmCQEhIASEQBMQEBFpQi8VrONnP/tZ88ADDxRcqooTAkJACAgBIZAeATmrpses8Tm0\nEV7ju1ANEAJCQAi0BgERkdZ0ZbqGnH766ebss882p512WrqMSt1KBJiq27p1q3n11VfNtm3bzBtv\nvGG2bNkyoq3Tpk0ze+yxh5kwYYIZP368+fCHP6xdnUegpBNCQAikQUBEJA1aLUpLYLNbbrnFPPXU\nUy1qlZqSBoENGzaYxx57zKxcudKMHj3aTJo0yRx88MFm3Lhxdpk3AfB8wZL205/+1Lz11lvmtdde\nM08//bT9QE5OPfVU88l
"text/plain": [
"<IPython.core.display.Image object>"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from IPython.display import Image\n",
"\n",
"review = \"This was a horrible, terrible movie.\"\n",
"\n",
"Image(filename='sentiment_network.png')"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAi4AAAECCAYAAADZzFwPAAAABGdBTUEAALGPC/xhBQAAACBjSFJN\nAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAB1WlUWHRYTUw6Y29tLmFkb2Jl\nLnhtcAAAAAAAPHg6eG1wbWV0YSB4bWxuczp4PSJhZG9iZTpuczptZXRhLyIgeDp4bXB0az0iWE1Q\nIENvcmUgNS40LjAiPgogICA8cmRmOlJERiB4bWxuczpyZGY9Imh0dHA6Ly93d3cudzMub3JnLzE5\nOTkvMDIvMjItcmRmLXN5bnRheC1ucyMiPgogICAgICA8cmRmOkRlc2NyaXB0aW9uIHJkZjphYm91\ndD0iIgogICAgICAgICAgICB4bWxuczp0aWZmPSJodHRwOi8vbnMuYWRvYmUuY29tL3RpZmYvMS4w\nLyI+CiAgICAgICAgIDx0aWZmOkNvbXByZXNzaW9uPjE8L3RpZmY6Q29tcHJlc3Npb24+CiAgICAg\nICAgIDx0aWZmOk9yaWVudGF0aW9uPjE8L3RpZmY6T3JpZW50YXRpb24+CiAgICAgICAgIDx0aWZm\nOlBob3RvbWV0cmljSW50ZXJwcmV0YXRpb24+MjwvdGlmZjpQaG90b21ldHJpY0ludGVycHJldGF0\naW9uPgogICAgICA8L3JkZjpEZXNjcmlwdGlvbj4KICAgPC9yZGY6UkRGPgo8L3g6eG1wbWV0YT4K\nAtiABQAAQABJREFUeAHtnXvQVdV5/xdNZjIxjRgrM52qFI01ERQVExWNeMMLQy0YiEiNEgOYaJAO\nitIaGYo2TFGQeElQAREjRa0oDEG8AKagosYYkEuSjjUEbP+orZFc/KMzmfe3Pys+57fOfvfZZ1/P\nWXu/zzNz3rPP3uvyrO/a717f/axnPatfTyBGRRFQBBQBRUARUAQUgQog8CcV0FFVVAQUAUVAEVAE\nFAFFwCKgxEVvBEVAEVAEFAFFQBGoDAJKXCrTVaqoIqAIKAKKgCKgCChx0XtAEVAEFAFFQBFQBCqD\ngBKXynSVKqoIKAKKgCKgCCgCSlz0HlAEFAFFQBFQBBSByiCgxKUyXaWKKgKKgCKgCCgCioASF70H\nFAFFQBFQBBQBRaAyCHy8MpqqooqAItAVBH784x+bPXv2mJ07d5q9e/eat99+2+zYsaOXLuPGjTOH\nHHKIGTp0qBkyZIg59dRTzac//ele6fSEIqAIKAJ5EOinkXPzwKd5FYF6IrBp0yazYcMGs2rVKjNg\nwAAzcuRIc8IJJ5jBgwebgw8+2Hzuc59ravh//dd/mf/8z/807777rtm1a5d58cUX7Qcyc8kll5gv\nf/nLSmKaENMfioAikBUBJS5ZkdN8ikDNEPjtb39rli9fbh566CHbshkzZpgLLrjA/MVf/EWmllLe\nxo0bzfr1682yZcvMjTfeaG644YbM5WVSQjMpAopA7RBQH5fadak2SBFIj8A999xjPv/5z5stW7aY\nJUuWmO3bt5tJkyblIhlME1166aVm6dKl1hqDVocffriZOXOmwUKjoggoAopAFgSUuGRBTfMoAjVB\nAP+Vk046yaxZs8Z+nnzySfPFL36x8NZhtVmwYEGDwFDHihUrCq9HC1QEFIH6I6BTRfXvY22hIhCJ\nAFaW+fPnm3nz5lnrSmSikk5CmKZOnWqOOeYYOz2lTrwlAa3FKgI1REAtLjXsVG2SIhCHAL4nU6ZM\nsRaWzZs3d5y0oBsWl61bt5pBgwbZKapf/OIXcSrrNUVAEVAEGgioxaUBhR4oAvVHANIyZswYc+ih\nh3pj6WDK6JZbbjGQqPBqpfr3iLZQEVAE0iKgcVzSIqbpFYGKIiCkZdiwYdbfxJdm4ATMEuvzzjtP\nyYsvnaJ6KAIeI6DExePOUdUUgaIQ8JW0SPtYfYQoeRFE9FsRUARaIaDEpRUyel4RqBECc+fOta1h\nZY+vAnn5zW9+YyZMmGD9X9Rh19eeUr0Uge4ioD4u3cVfa1cESkdAfEh+/vOfVyJ6LY7DH3zwgWFp\ntooioAgoAmEEdFVRGBH9rQjUCAECveH4SpyWqlgwFi1aZPdD0jgvNboRtSmKQIEIqMWlQDC1KEXA\nNwTGjx9vTjzxRDN79mzfVIvVhzgvY8eONVWxEsU2Ri8qAopAoQgocSkUTi1MEfAHgaoP/mwNgPjs\nl+NPb6smikDfQUCJS9/pa21pH0MAaws7M7PcuIrCNBd7G7HrdNaNHqvYbtVZEVAE4hFQ4hKPj15V\nBCqJgFhbGPSrLGp1qXLvqe6KQDkIKHEpB1ctVRHoKgKszBk6dKiZPn16V/XIWzlWF7YHUF+XvEhq\nfkWgPgjoqqL69KW2RBGwCBBsbtmyZYapoqoLU0RsA7Bx48aqN0X1VwQUgYIQUOJSEJBajCLgCwIM\n8pMnT66NXwg+OuvXr/cFXtVDEVAEuoyAEpcud4BWrwgUjQCD/FlnnVV0sV0r74ILLrAWpK4poBUr\nAoqAVwgocfGqO1QZRSA/Ahs2bDCnn356/oI8KYHponPPPdfgcKyiCCgCioASF70HFIEaIYAzK4Jf\nSJ2EHa23bdtWpyZpWxQBRSAjAkpcMgKn2RQBHxFg+fPw4cN9VC2XTieccILZt29frjI0syKgCNQD\nASUu9ehHbYUiYBHAKjFo0KDaoTF48GCzd+/e2rVLG6QIKALpEVDikh4zzaEIeI3AwIEDvdZPlVME\nFAFFIA8CSlzyoKd5FQFFoCMIfP7znzerV6/uSF1aiSKgCPiNgBIXv/tHtVMEFIEAgU9/+tOKgyKg\nCCgCFgElLnojKAKKgCKgCCgCikBlEFDiUpmuUkUVgb6LwC9+8YvaRALuu72oLVcEikFAiUsxOGop\nikDXEWCPogMHDnRdjzIU+M1vflPLZd5lYKVlKgJ1R+DjdW+gts8PBIh6umfPHrNz5067rPXtt982\nO3bsaFKOCKnEIDnkkEPszsYcszOwSmsEICvsnIwcfPDB5vjjj6/lvj4QFxVFQBFQBEBAiYveB6Uh\nsGnTJrNq1SpDCPoBAwaYkSNHGgKJTZgwwQ6y4eiuRH0lgNq7775rdu3aZWbNmmVefPFFu2Hg6NGj\nbX510jRGcKLjICsuuWOAf+edd0rr024VvHv3bjNixIhuVa/1KgKKgEcI9OsJxCN9VJWKI4AFYPny\n5Wb+/PmWrMyYMcOwSR7WlCxCeex2vHLlShvyfeLEieaGG27IXF4WHXzI45KVww8/PLb9/fr1MxCY\nOpG88ePHmyuuuMJceumlPnSH6qAIKAJdREB9XLoIfp2qhmDcc889hngbW7ZsMWvWrDHbt283kyZN\nih1k22HA4Mtg9eSTTzY22WPgnjlzpqHOOgsOqUyxyeaCWFb4tCOBbEj4+uuv1woaIgKfdtpptWqT\nNkYRUASyIaDEJRtumstBgIH1rLPOahAWSIY7feEkzXXIgL1gwQI7nfTBBx9YkrRixYpcZfqWWYgK\n37Q3KVlx2wFxeeWVV9xTlT4GC6Ya2xG2SjdSlVcEFIHECOhUUWKoNGEUArfffru5//77zbx586x1\nJSpNWecY0KZOnWq+8IUvmEWLFlV2aoR2iBRB+LDU4EeExasOwj32P//zP+buu++uQ3O0DYqAIpAT\nASUuOQHsq9mZphkzZoxt/qOPPtq1t2H0wI8Gh9TFixebsMOvj/2DzrISCP2KICvhdp500klm4cKF\n5vzzzw9fqtxvpgbXrVtnPvWpT1WifysHsCqsCFQMAZ0qqliH+aCukJZhw4aZtWvXdo20gAU+MEuX\nLjVMj5x33nkGa4OPgnMtlhU+HMsUUBmkhfZ//etftyu6fMQijU5PP/20JSvca0wVudapNOVoWkVA\nEagPAmpxqU9fdqQlLmnB38QnYZCbNm2a2bx5sxdv5mlWAhWNI/3EUmmWl1fZNwTL0Zw5c5pWE0Fe\nyiJ8RfeDlqcIKALFI6D
"text/plain": [
"<IPython.core.display.Image object>"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"review = \"The movie was excellent\"\n",
"\n",
"Image(filename='sentiment_network_pos.png')"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# Project 2: Creating the Input/Output Data"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"74074\n"
]
}
],
"source": [
"vocab = set(total_counts.keys())\n",
"vocab_size = len(vocab)\n",
"print(vocab_size)"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['',\n",
" 'inhabitants',\n",
" 'goku',\n",
" 'stunts',\n",
" 'catepillar',\n",
" 'kristensen',\n",
" 'senegal',\n",
" 'goddess',\n",
" 'distroy',\n",
" 'unexplainably',\n",
" 'concoctions',\n",
" 'petite',\n",
" 'scribe',\n",
" 'stevson',\n",
" 'sctv',\n",
" 'soundscape',\n",
" 'rana',\n",
" 'metamorphose',\n",
" 'immortalizer',\n",
" 'henstridge',\n",
" 'planning',\n",
" 'akiva',\n",
" 'plod',\n",
" 'eko',\n",
" 'orderly',\n",
" 'zeleznice',\n",
" 'verbose',\n",
" 'amplify',\n",
" 'resonation',\n",
" 'critize',\n",
" 'jefferies',\n",
" 'mountainbillies',\n",
" 'steinbichler',\n",
" 'vowel',\n",
" 'rafe',\n",
" 'bonbons',\n",
" 'tulipe',\n",
" 'clot',\n",
" 'distended',\n",
" 'his',\n",
" 'impatiently',\n",
" 'unfortuntly',\n",
" 'lung',\n",
" 'scapegoats',\n",
" 'muzzle',\n",
" 'pscychosexual',\n",
" 'outbid',\n",
" 'obit',\n",
" 'sideshows',\n",
" 'jugde',\n",
" 'particolare',\n",
" 'kevloun',\n",
" 'masterful',\n",
" 'quartier',\n",
" 'unravelling',\n",
" 'necessarily',\n",
" 'antiques',\n",
" 'strutts',\n",
" 'tilts',\n",
" 'disconcert',\n",
" 'dossiers',\n",
" 'sorriest',\n",
" 'blart',\n",
" 'iberia',\n",
" 'situations',\n",
" 'frmann',\n",
" 'daniell',\n",
" 'rays',\n",
" 'pried',\n",
" 'khoobsurat',\n",
" 'leavitt',\n",
" 'caiano',\n",
" 'sagan',\n",
" 'attractiveness',\n",
" 'kitaparaporn',\n",
" 'hamilton',\n",
" 'massages',\n",
" 'reasonably',\n",
" 'horgan',\n",
" 'chemist',\n",
" 'audrey',\n",
" 'jana',\n",
" 'dutch',\n",
" 'override',\n",
" 'spasms',\n",
" 'resumed',\n",
" 'stinson',\n",
" 'widows',\n",
" 'stonewall',\n",
" 'palatial',\n",
" 'neuman',\n",
" 'abandon',\n",
" 'anglophile',\n",
" 'marathon',\n",
" 'chevette',\n",
" 'unscary',\n",
" 'eponymously',\n",
" 'spoilerific',\n",
" 'fleashens',\n",
" 'brigand',\n",
" 'politeness',\n",
" 'clued',\n",
" 'dermatonecrotic',\n",
" 'grady',\n",
" 'mulligan',\n",
" 'ol',\n",
" 'bertolucci',\n",
" 'incubation',\n",
" 'oldboy',\n",
" 'snden',\n",
" 'plaintiffs',\n",
" 'fk',\n",
" 'deply',\n",
" 'franchot',\n",
" 'cyhper',\n",
" 'glorifying',\n",
" 'mazovia',\n",
" 'elizabeth',\n",
" 'palestine',\n",
" 'robby',\n",
" 'wongo',\n",
" 'moshing',\n",
" 'eeeee',\n",
" 'doltish',\n",
" 'bree',\n",
" 'postponed',\n",
" 'gunslinger',\n",
" 'debacles',\n",
" 'kamm',\n",
" 'herman',\n",
" 'rapture',\n",
" 'rolando',\n",
" 'tetsuothe',\n",
" 'premises',\n",
" 'bruck',\n",
" 'loosely',\n",
" 'boylen',\n",
" 'proportions',\n",
" 'grecianized',\n",
" 'wodehousian',\n",
" 'encapsuling',\n",
" 'partly',\n",
" 'posative',\n",
" 'calms',\n",
" 'stadling',\n",
" 'austrailia',\n",
" 'shortland',\n",
" 'wheeling',\n",
" 'darkie',\n",
" 'mckellar',\n",
" 'cushy',\n",
" 'ooookkkk',\n",
" 'milky',\n",
" 'unfolded',\n",
" 'degrades',\n",
" 'authenticating',\n",
" 'rotheroe',\n",
" 'beart',\n",
" 'neath',\n",
" 'grispin',\n",
" 'intoxicants',\n",
" 'nnette',\n",
" 'slinging',\n",
" 'tsukamoto',\n",
" 'stows',\n",
" 'suddenness',\n",
" 'waqt',\n",
" 'degrading',\n",
" 'camazotz',\n",
" 'blarney',\n",
" 'shakher',\n",
" 'delinquency',\n",
" 'tomreynolds',\n",
" 'insecticide',\n",
" 'charlton',\n",
" 'hare',\n",
" 'wayland',\n",
" 'nakada',\n",
" 'urbane',\n",
" 'sadomasochistic',\n",
" 'larnia',\n",
" 'hyping',\n",
" 'yr',\n",
" 'hebert',\n",
" 'accentuating',\n",
" 'deathrow',\n",
" 'galligan',\n",
" 'unmediated',\n",
" 'treble',\n",
" 'alphabet',\n",
" 'soad',\n",
" 'donen',\n",
" 'lord',\n",
" 'recess',\n",
" 'handsome',\n",
" 'center',\n",
" 'vignettes',\n",
" 'rescuers',\n",
" 'pairings',\n",
" 'uselful',\n",
" 'sanders',\n",
" 'nots',\n",
" 'hatsumomo',\n",
" 'appleby',\n",
" 'tampax',\n",
" 'sprinkling',\n",
" 'defacing',\n",
" 'lofty',\n",
" 'opaque',\n",
" 'tlc',\n",
" 'romagna',\n",
" 'tablespoons',\n",
" 'bernhard',\n",
" 'verger',\n",
" 'acumen',\n",
" 'percentages',\n",
" 'wendingo',\n",
" 'resonating',\n",
" 'vntoarea',\n",
" 'redundancies',\n",
" 'red',\n",
" 'pitied',\n",
" 'belying',\n",
" 'gleefulness',\n",
" 'bibbidi',\n",
" 'heiligt',\n",
" 'gitane',\n",
" 'journalist',\n",
" 'focusing',\n",
" 'plethora',\n",
" 'citizen',\n",
" 'coster',\n",
" 'clunkers',\n",
" 'deplorable',\n",
" 'forgive',\n",
" 'proplems',\n",
" 'magwood',\n",
" 'bankers',\n",
" 'aqua',\n",
" 'donated',\n",
" 'disbelieving',\n",
" 'acomplication',\n",
" 'immediately',\n",
" 'contrasted',\n",
" 'reidelsheimer',\n",
" 'fox',\n",
" 'springs',\n",
" 'toolbox',\n",
" 'contacting',\n",
" 'ace',\n",
" 'washrooms',\n",
" 'raving',\n",
" 'dynamism',\n",
" 'mae',\n",
" 'sky',\n",
" 'disharmony',\n",
" 'untutored',\n",
" 'icarus',\n",
" 'taint',\n",
" 'kargil',\n",
" 'captain',\n",
" 'paucity',\n",
" 'fits',\n",
" 'tumbles',\n",
" 'amer',\n",
" 'bueller',\n",
" 'redubbed',\n",
" 'cleansed',\n",
" 'kollos',\n",
" 'shara',\n",
" 'humma',\n",
" 'felichy',\n",
" 'outa',\n",
" 'piglets',\n",
" 'gombell',\n",
" 'supermen',\n",
" 'superlow',\n",
" 'enhance',\n",
" 'goode',\n",
" 'shalt',\n",
" 'kubanskie',\n",
" 'zenith',\n",
" 'ananda',\n",
" 'ocd',\n",
" 'matlin',\n",
" 'nosed',\n",
" 'presumptuous',\n",
" 'rerun',\n",
" 'toyko',\n",
" 'mazar',\n",
" 'sundry',\n",
" 'bilb',\n",
" 'fugly',\n",
" 'orchestrating',\n",
" 'prosaically',\n",
" 'maricarmen',\n",
" 'moveis',\n",
" 'conelly',\n",
" 'estrange',\n",
" 'lusciously',\n",
" 'seasonings',\n",
" 'sums',\n",
" 'delirious',\n",
" 'quincey',\n",
" 'flesh',\n",
" 'tootsie',\n",
" 'ai',\n",
" 'tenma',\n",
" 'appropriations',\n",
" 'chainsaw',\n",
" 'ides',\n",
" 'surrogacy',\n",
" 'pungent',\n",
" 'gallon',\n",
" 'damaso',\n",
" 'caribou',\n",
" 'perico',\n",
" 'supplying',\n",
" 'ro',\n",
" 'yuy',\n",
" 'valium',\n",
" 'debuted',\n",
" 'robbin',\n",
" 'mounts',\n",
" 'interpolated',\n",
" 'aetv',\n",
" 'plummer',\n",
" 'competence',\n",
" 'toadies',\n",
" 'dubiel',\n",
" 'clavichord',\n",
" 'asunder',\n",
" 'sublety',\n",
" 'airfix',\n",
" 'stoltzfus',\n",
" 'ruth',\n",
" 'fluorescent',\n",
" 'improves',\n",
" 'rebenga',\n",
" 'russells',\n",
" 'deliberation',\n",
" 'zsa',\n",
" 'dardino',\n",
" 'macs',\n",
" 'servile',\n",
" 'jlb',\n",
" 'apallonia',\n",
" 'crossbows',\n",
" 'locus',\n",
" 'mislead',\n",
" 'corey',\n",
" 'blundered',\n",
" 'jeopardizes',\n",
" 'disorganized',\n",
" 'discuss',\n",
" 'longish',\n",
" 'tieing',\n",
" 'ledger',\n",
" 'speechifying',\n",
" 'amitabhz',\n",
" 'bbc',\n",
" 'chimayo',\n",
" 'pranked',\n",
" 'superman',\n",
" 'aggravated',\n",
" 'rifleman',\n",
" 'yvone',\n",
" 'radiant',\n",
" 'galico',\n",
" 'debris',\n",
" 'waking',\n",
" 'btw',\n",
" 'havnt',\n",
" 'francen',\n",
" 'chattered',\n",
" 'scathed',\n",
" 'pic',\n",
" 'ceremonies',\n",
" 'watergate',\n",
" 'betsy',\n",
" 'majorca',\n",
" 'meercat',\n",
" 'noirs',\n",
" 'grunts',\n",
" 'drecky',\n",
" 'tribulations',\n",
" 'avery',\n",
" 'talladega',\n",
" 'eights',\n",
" 'dumbing',\n",
" 'alloimono',\n",
" 'scrutinising',\n",
" 'geta',\n",
" 'beltrami',\n",
" 'pvc',\n",
" 'horse',\n",
" 'tiburon',\n",
" 'huitime',\n",
" 'ripple',\n",
" 'loitering',\n",
" 'forensics',\n",
" 'nearly',\n",
" 'elizabethan',\n",
" 'ellington',\n",
" 'uzi',\n",
" 'sicily',\n",
" 'camion',\n",
" 'motivated',\n",
" 'rung',\n",
" 'gao',\n",
" 'licitates',\n",
" 'protocol',\n",
" 'smirker',\n",
" 'torin',\n",
" 'newlywed',\n",
" 'rich',\n",
" 'dismay',\n",
" 'skyler',\n",
" 'moonwalks',\n",
" 'haranguing',\n",
" 'sunburst',\n",
" 'grifter',\n",
" 'undersold',\n",
" 'chearator',\n",
" 'marino',\n",
" 'scala',\n",
" 'conditioner',\n",
" 'ulysses',\n",
" 'lamarre',\n",
" 'figueroa',\n",
" 'flane',\n",
" 'allllllll',\n",
" 'slide',\n",
" 'lateness',\n",
" 'selbst',\n",
" 'gandhis',\n",
" 'dramatizing',\n",
" 'catchphrase',\n",
" 'doable',\n",
" 'stadiums',\n",
" 'alexanderplatz',\n",
" 'pandemonium',\n",
" 'misrepresents',\n",
" 'earth',\n",
" 'mounties',\n",
" 'seeker',\n",
" 'cheat',\n",
" 'outbreaks',\n",
" 'snowstorm',\n",
" 'baur',\n",
" 'schedules',\n",
" 'bathetic',\n",
" 'incorrect',\n",
" 'johnathon',\n",
" 'rosanne',\n",
" 'mundanely',\n",
" 'cauldrons',\n",
" 'forrest',\n",
" 'poky',\n",
" 'legislation',\n",
" 'womanness',\n",
" 'spender',\n",
" 'crazy',\n",
" 'rational',\n",
" 'terrell',\n",
" 'zero',\n",
" 'coincides',\n",
" 'thoughout',\n",
" 'mathew',\n",
" 'narnia',\n",
" 'naseeruddin',\n",
" 'bucks',\n",
" 'affronts',\n",
" 'topple',\n",
" 'degree',\n",
" 'preyed',\n",
" 'passionately',\n",
" 'defeats',\n",
" 'torchwood',\n",
" 'sources',\n",
" 'botticelli',\n",
" 'compactor',\n",
" 'kosturica',\n",
" 'waiving',\n",
" 'gunnar',\n",
" 'stiffler',\n",
" 'fwd',\n",
" 'kawajiri',\n",
" 'eleanor',\n",
" 'sistahs',\n",
" 'soulhunter',\n",
" 'belies',\n",
" 'wrathful',\n",
" 'americans',\n",
" 'ferdinandvongalitzien',\n",
" 'kendra',\n",
" 'weirdy',\n",
" 'unforgivably',\n",
" 'chepart',\n",
" 'tatta',\n",
" 'departmentthe',\n",
" 'dig',\n",
" 'blatty',\n",
" 'marionettes',\n",
" 'atop',\n",
" 'chim',\n",
" 'saurian',\n",
" 'woes',\n",
" 'cloudscape',\n",
" 'resignedly',\n",
" 'unrooted',\n",
" 'keuck',\n",
" 'hitlerian',\n",
" 'stylings',\n",
" 'crewed',\n",
" 'bedeviled',\n",
" 'unfurnished',\n",
" 'reedus',\n",
" 'circumstances',\n",
" 'grasped',\n",
" 'smurfettes',\n",
" 'fn',\n",
" 'dishwashers',\n",
" 'roadie',\n",
" 'ruthlessness',\n",
" 'refrains',\n",
" 'lampooning',\n",
" 'semblance',\n",
" 'richart',\n",
" 'legions',\n",
" 'gwenneth',\n",
" 'enmity',\n",
" 'assess',\n",
" 'manufacturer',\n",
" 'bullosa',\n",
" 'outrun',\n",
" 'hogan',\n",
" 'chekov',\n",
" 'blithe',\n",
" 'code',\n",
" 'drillings',\n",
" 'revolvers',\n",
" 'aredavid',\n",
" 'robespierre',\n",
" 'achcha',\n",
" 'boyfriendhe',\n",
" 'wallow',\n",
" 'toga',\n",
" 'graphed',\n",
" 'tonking',\n",
" 'going',\n",
" 'bosnians',\n",
" 'willy',\n",
" 'rohauer',\n",
" 'fim',\n",
" 'forbidding',\n",
" 'yew',\n",
" 'rationalised',\n",
" 'shimomo',\n",
" 'opposition',\n",
" 'landis',\n",
" 'minded',\n",
" 'despicableness',\n",
" 'easting',\n",
" 'arghhhhh',\n",
" 'ebb',\n",
" 'trialat',\n",
" 'protected',\n",
" 'negras',\n",
" 'rick',\n",
" 'muti',\n",
" 'tracker',\n",
" 'shawl',\n",
" 'differentiates',\n",
" 'sweetheart',\n",
" 'deepened',\n",
" 'manmohan',\n",
" 'trevethyn',\n",
" 'brain',\n",
" 'incomprehensibly',\n",
" 'piercing',\n",
" 'pasadena',\n",
" 'shtick',\n",
" 'ute',\n",
" 'viggo',\n",
" 'supersedes',\n",
" 'ack',\n",
" 'cites',\n",
" 'taurus',\n",
" 'relevent',\n",
" 'minidress',\n",
" 'philosopher',\n",
" 'bel',\n",
" 'mahattan',\n",
" 'moden',\n",
" 'compiling',\n",
" 'advertising',\n",
" 'rogues',\n",
" 'unimaginative',\n",
" 'subpaar',\n",
" 'ademir',\n",
" 'darkly',\n",
" 'saturate',\n",
" 'fledgling',\n",
" 'breaths',\n",
" 'padre',\n",
" 'aszombi',\n",
" 'pachabel',\n",
" 'incalculable',\n",
" 'ozone',\n",
" 'sped',\n",
" 'mpho',\n",
" 'rawail',\n",
" 'forbid',\n",
" 'synth',\n",
" 'guttersnipe',\n",
" 'reputedly',\n",
" 'holiness',\n",
" 'unessential',\n",
" 'hampden',\n",
" 'asylum',\n",
" 'bolye',\n",
" 'strangers',\n",
" 'rantzen',\n",
" 'farrellys',\n",
" 'vigourous',\n",
" 'cantinflas',\n",
" 'enshrined',\n",
" 'boris',\n",
" 'expetations',\n",
" 'replaying',\n",
" 'prestige',\n",
" 'bukater',\n",
" 'overpaid',\n",
" 'exhude',\n",
" 'backsides',\n",
" 'topless',\n",
" 'sufferings',\n",
" 'nitwits',\n",
" 'cordova',\n",
" 'incensed',\n",
" 'danira',\n",
" 'unrelenting',\n",
" 'disabling',\n",
" 'ferdy',\n",
" 'gerard',\n",
" 'drewitt',\n",
" 'mero',\n",
" 'monsters',\n",
" 'precautions',\n",
" 'lamping',\n",
" 'relinquish',\n",
" 'demy',\n",
" 'drink',\n",
" 'chamberlin',\n",
" 'unjustifiably',\n",
" 'cove',\n",
" 'floodwaters',\n",
" 'searing',\n",
" 'isral',\n",
" 'ling',\n",
" 'grossness',\n",
" 'pickier',\n",
" 'pax',\n",
" 'wierd',\n",
" 'tereasa',\n",
" 'smog',\n",
" 'girotti',\n",
" 'spat',\n",
" 'sera',\n",
" 'noxious',\n",
" 'misbehaving',\n",
" 'scouts',\n",
" 'refreshments',\n",
" 'autobiographic',\n",
" 'shi',\n",
" 'toyomichi',\n",
" 'bits',\n",
" 'psychotics',\n",
" 'barzell',\n",
" 'colt',\n",
" 'shivering',\n",
" 'pugilist',\n",
" 'gladiator',\n",
" 'dryer',\n",
" 'reissues',\n",
" 'scrivener',\n",
" 'predicable',\n",
" 'objection',\n",
" 'marmalade',\n",
" 'seems',\n",
" 'spellbind',\n",
" 'trifecta',\n",
" 'innovator',\n",
" 'shriekfest',\n",
" 'inthused',\n",
" 'contestants',\n",
" 'goody',\n",
" 'samotri',\n",
" 'serviced',\n",
" 'nozires',\n",
" 'ins',\n",
" 'mutilating',\n",
" 'dupes',\n",
" 'launius',\n",
" 'widescreen',\n",
" 'joo',\n",
" 'discretionary',\n",
" 'enlivens',\n",
" 'bushes',\n",
" 'chills',\n",
" 'header',\n",
" 'activist',\n",
" 'gethsemane',\n",
" 'phoenixs',\n",
" 'wreathed',\n",
" 'sacrine',\n",
" 'electrifyingly',\n",
" 'basely',\n",
" 'ghidora',\n",
" 'binder',\n",
" 'dogfights',\n",
" 'sugar',\n",
" 'doddsville',\n",
" 'porkys',\n",
" 'scattershot',\n",
" 'refunded',\n",
" 'rudely',\n",
" 'insteadit',\n",
" 'zatichi',\n",
" 'eurotrash',\n",
" 'radioraptus',\n",
" 'hurls',\n",
" 'boogeman',\n",
" 'weighs',\n",
" 'danniele',\n",
" 'converging',\n",
" 'hypothermia',\n",
" 'glorfindel',\n",
" 'birthdays',\n",
" 'attentive',\n",
" 'mallepa',\n",
" 'spacewalk',\n",
" 'manoy',\n",
" 'bombshells',\n",
" 'farts',\n",
" 'lyoko',\n",
" 'southron',\n",
" 'destruction',\n",
" 'flemming',\n",
" 'manhole',\n",
" 'elainor',\n",
" 'bowersock',\n",
" 'lowly',\n",
" 'wfst',\n",
" 'limousines',\n",
" 'skolimowski',\n",
" 'saban',\n",
" 'koen',\n",
" 'malaysia',\n",
" 'uwi',\n",
" 'cyd',\n",
" 'apeing',\n",
" 'bonecrushing',\n",
" 'dini',\n",
" 'merest',\n",
" 'janina',\n",
" 'chemotrodes',\n",
" 'trials',\n",
" 'authorize',\n",
" 'whilhelm',\n",
" 'asthmatic',\n",
" 'broads',\n",
" 'missteps',\n",
" 'embittered',\n",
" 'chandeliers',\n",
" 'seeming',\n",
" 'miscalculate',\n",
" 'recommeded',\n",
" 'schoolwork',\n",
" 'coy',\n",
" 'mcconaughey',\n",
" 'philosophically',\n",
" 'waver',\n",
" 'fanny',\n",
" 'mestressat',\n",
" 'unwatchably',\n",
" 'saggy',\n",
" 'topness',\n",
" 'dwellings',\n",
" 'breakup',\n",
" 'hasselhoff',\n",
" 'superstars',\n",
" 'replay',\n",
" 'aggravates',\n",
" 'balances',\n",
" 'urging',\n",
" 'snidely',\n",
" 'aleksandar',\n",
" 'hildy',\n",
" 'kazuhiro',\n",
" 'slayer',\n",
" 'tangy',\n",
" 'brussels',\n",
" 'horne',\n",
" 'masayuki',\n",
" 'molden',\n",
" 'unravel',\n",
" 'goodtime',\n",
" 'interrogates',\n",
" 'bismillahhirrahmannirrahim',\n",
" 'rowboat',\n",
" 'dumann',\n",
" 'datedness',\n",
" 'astrotheology',\n",
" 'dekhiye',\n",
" 'valga',\n",
" 'kata',\n",
" 'wipes',\n",
" 'hostilities',\n",
" 'sentimentalising',\n",
" 'documentary',\n",
" 'salesman',\n",
" 'virtue',\n",
" 'unreasonably',\n",
" 'haver',\n",
" 'cei',\n",
" 'unglamorised',\n",
" 'balky',\n",
" 'complementary',\n",
" 'paychecks',\n",
" 'mnica',\n",
" 'wada',\n",
" 'ily',\n",
" 'prc',\n",
" 'ennobling',\n",
" 'functionality',\n",
" 'dissociated',\n",
" 'elk',\n",
" 'throbbing',\n",
" 'tempe',\n",
" 'linoleum',\n",
" 'photogrsphed',\n",
" 'bottacin',\n",
" 'hipper',\n",
" 'titillating',\n",
" 'barging',\n",
" 'untie',\n",
" 'sacchetti',\n",
" 'gnat',\n",
" 'roedel',\n",
" 'cohabitation',\n",
" 'performs',\n",
" 'sales',\n",
" 'migrs',\n",
" 'teachs',\n",
" 'nanavati',\n",
" 'fresco',\n",
" 'davison',\n",
" 'obstinate',\n",
" 'burglar',\n",
" 'masue',\n",
" 'dickory',\n",
" 'grills',\n",
" 'appelagate',\n",
" 'linkage',\n",
" 'enables',\n",
" 'loesser',\n",
" 'patties',\n",
" 'prudent',\n",
" 'mallorquins',\n",
" 'nativetex',\n",
" 'suprise',\n",
" 'drippy',\n",
" 'quill',\n",
" 'speeded',\n",
" 'farscape',\n",
" 'saddening',\n",
" 'centuries',\n",
" 'mos',\n",
" 'improvisationally',\n",
" 'neccessarily',\n",
" 'transmitter',\n",
" 'tankers',\n",
" 'latte',\n",
" 'mechanisation',\n",
" 'faracy',\n",
" 'synthetically',\n",
" 'thoughtless',\n",
" 'rake',\n",
" 'ropes',\n",
" 'desirable',\n",
" 'whitewashed',\n",
" 'donal',\n",
" 'crabby',\n",
" 'lifeless',\n",
" 'perfidy',\n",
" 'teresa',\n",
" 'bulldog',\n",
" 'cockamamie',\n",
" 'rasberries',\n",
" 'notethe',\n",
" 'captivity',\n",
" 'chiseling',\n",
" 'smaller',\n",
" 'clampets',\n",
" 'alerts',\n",
" 'tough',\n",
" 'wellingtonian',\n",
" 'aaaahhhhhhh',\n",
" 'dither',\n",
" 'incertitude',\n",
" 'florentine',\n",
" 'imperioli',\n",
" 'licking',\n",
" 'disparagement',\n",
" 'artfully',\n",
" 'feds',\n",
" 'fumiya',\n",
" 'tearfully',\n",
" 'lanchester',\n",
" 'undertaken',\n",
" 'longlost',\n",
" 'netted',\n",
" 'carrell',\n",
" 'uncompelling',\n",
" 'reliefs',\n",
" 'leona',\n",
" 'autorenfilm',\n",
" 'unfriendly',\n",
" 'typewriter',\n",
" 'shifted',\n",
" 'bertrand',\n",
" 'blesses',\n",
" 'tricking',\n",
" 'fireflies',\n",
" 'zanes',\n",
" 'unknowingly',\n",
" 'unnerve',\n",
" 'caning',\n",
" 'flat',\n",
" 'recluse',\n",
" 'dcreasy',\n",
" 'chipmunk',\n",
" 'dipper',\n",
" 'musee',\n",
" 'cousin',\n",
" 'shys',\n",
" 'berserkers',\n",
" 'eve',\n",
" 'conflagration',\n",
" 'irks',\n",
" 'restricts',\n",
" 'parsing',\n",
" 'positronic',\n",
" 'copout',\n",
" 'khala',\n",
" 'swiftness',\n",
" 'higginson',\n",
" 'imprint',\n",
" 'walter',\n",
" 'sundance',\n",
" 'whispering',\n",
" 'thematically',\n",
" 'underimpressed',\n",
" 'uno',\n",
" 'expressly',\n",
" 'russkies',\n",
" 'discos',\n",
" 'shaping',\n",
" 'verson',\n",
" 'prototype',\n",
" 'chapman',\n",
" 'trafficker',\n",
" 'semetary',\n",
" 'unrealistically',\n",
" 'lifewell',\n",
" 'rivas',\n",
" 'consequent',\n",
" 'katsu',\n",
" 'titantic',\n",
" 'jalees',\n",
" 'ranee',\n",
" 'shipbuilding',\n",
" 'gambles',\n",
" 'dispenses',\n",
" 'disfigurement',\n",
" 'bright',\n",
" 'cristian',\n",
" 'puertorricans',\n",
" 'constituent',\n",
" 'capta',\n",
" 'jewel',\n",
" 'erect',\n",
" 'farah',\n",
" 'despondently',\n",
" 'avoide',\n",
" 'inconnu',\n",
" 'headquarters',\n",
" 'sanguisga',\n",
" ...]"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(vocab)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0., 0., 0., ..., 0., 0., 0.]])"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"\n",
"layer_0 = np.zeros((1,vocab_size))\n",
"layer_0"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAiIAAAFKCAYAAAAg+zSAAAAABGdBTUEAALGPC/xhBQAAACBjSFJN\nAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAB1WlUWHRYTUw6Y29tLmFkb2Jl\nLnhtcAAAAAAAPHg6eG1wbWV0YSB4bWxuczp4PSJhZG9iZTpuczptZXRhLyIgeDp4bXB0az0iWE1Q\nIENvcmUgNS40LjAiPgogICA8cmRmOlJERiB4bWxuczpyZGY9Imh0dHA6Ly93d3cudzMub3JnLzE5\nOTkvMDIvMjItcmRmLXN5bnRheC1ucyMiPgogICAgICA8cmRmOkRlc2NyaXB0aW9uIHJkZjphYm91\ndD0iIgogICAgICAgICAgICB4bWxuczp0aWZmPSJodHRwOi8vbnMuYWRvYmUuY29tL3RpZmYvMS4w\nLyI+CiAgICAgICAgIDx0aWZmOkNvbXByZXNzaW9uPjE8L3RpZmY6Q29tcHJlc3Npb24+CiAgICAg\nICAgIDx0aWZmOk9yaWVudGF0aW9uPjE8L3RpZmY6T3JpZW50YXRpb24+CiAgICAgICAgIDx0aWZm\nOlBob3RvbWV0cmljSW50ZXJwcmV0YXRpb24+MjwvdGlmZjpQaG90b21ldHJpY0ludGVycHJldGF0\naW9uPgogICAgICA8L3JkZjpEZXNjcmlwdGlvbj4KICAgPC9yZGY6UkRGPgo8L3g6eG1wbWV0YT4K\nAtiABQAAQABJREFUeAHtnXvQXVV5/1daZxy1BUpJp1MhE5BSSSAgqBAV5BIuGaQJBoEUATEJAiXY\ncMsUTfMDK9MAMXKRAEmAgGkASUiGIgQSsEQgKGDCJV6GYkywfzRWibc/OuO8v/1Zuo7r3e/e5+zr\n2ZfzfWbOe/bZe12e9V373eu7n/WsZ40aCsRIhIAQEAJCQAgIASFQAQJ/UkGdqlIICAEhIASEgBAQ\nAhYBERHdCEJACAgBISAEhEBlCIiIVAa9KhYCQkAICAEhIARERHQPCAEhIASEgBAQApUhICJSGfSq\nWAgIASEgBISAEBAR0T0gBISAEBACQkAIVIaAiEhl0KtiISAEhIAQEAJCQERE94AQEAJCQAgIASFQ\nGQIiIpVBr4qFgBAQAkJACAgBERHdA0JACAgBISAEhEBlCIiIVAa9KhYCQkAICAEhIARERHQPCAEh\nIASEgBAQApUhICJSGfSqWAgIASEgBISAEBAR0T0gBISAEBACQkAIVIaAiEhl0KtiISAEhIAQEAJC\nQERE94AQEAJCQAgIASFQGQIiIpVBr4qFgBAQAkJACAgBERHdA0JACAgBISAEhEBlCIiIVAa9KhYC\nQkAICAEhIARERHQPCAEhIASEgBAQApUhICJSGfSqWAgIASEgBISAEBAR0T0gBISAEBACQkAIVIaA\niEhl0KtiISAEhIAQEAJCQERE94AQEAJCQAgIASFQGQIiIpVBr4qFgBAQAkJACAgBERHdA0JACAgB\nISAEhEBlCIiIVAa9KhYCQkAICAEhIARERHQPCAEhIASEgBAQApUhICJSAPQXX3yxGTVqlPnlL39Z\nQGkqQggIASEgBITA4CAgIjI4fR3Z0iVLlpgHH3ww8ppOCgEhIASEgBAoG4FRQ4GUXYnKry8CJ510\nknnf+95nbrvttvoqKc2EgBAQAkKgtQjIItLarlXDhIAQEAJCQAjUHwERkfr3kTQUAkJACAgBIdBa\nBERECuhanFXHjBkzrKR169ZZB9atW7daHwymQHBodZ8bbrhhWHp+kJbr+G1wHM7Db65FiV9f1HXO\noSO6ItRPXU888YRZvHhxRy853Fp49EcICAEhIAT6hICISMlAz5kzxyxbtszMmDHD4I7D5/HHHzfr\n16+3RCOq+u9973tm/Pjx5oMf/GAnD/kmTZpkLrjgAjN9+vSobKnOXXnllbbsE0880Vx00UWdenbb\nbbdU5SixEBACQkAICIE8CLwjT2bl7Y3Az372M/P0008bf4DHsrHPPvtYsoGFY9asWcMKwkJx5513\njjgPeTjqqKPMxIkTzWGHHWb4LRECQkAICAEh0GQEZBEpufcuvPDCYSTEVTdu3DhLRrZt2+ZOdb4h\nGWFy4i4eeeSR1oJxyy23uFP6FgJCQAgIASHQWAREREruuoMPPrhrDb/4xS9GXD/rrLNGnPNPHHPM\nMWbHjh1m06ZN/mkdCwEhIASEgBBoHAIiIiV3mT8lk7SqPfbYo2vS3Xff3V7ftWtX13S6KASEgBAQ\nAkKg7giIiNS9h7roJyLSBRxdEgJCQAgIgUYgICJSw256++23u2q1fft2ez28ZLhrJl0UAkJACAgB\nIVBDBEREatgp999/f1etnnrqKevoiuNqWOLigOBPgl+JRAgIASEgBIRAnRAQEalTb/xBl5dffjk2\ncBkb1EFU5s2bN0xzlvSyJHjjxo3DzrsfN910kzvUtxAQAkJACAiB2iAgIlKbrvijItdff725/fbb\nbRRU38JBNNQzzzzTLt8NL+/FKXb27NnmqquuGkZi3nrrLRsA7Uc/+pElKn+s5fdHe+65p3nhhRfC\np/VbCAgBISAEhEBfEBAR6QvM6Sph1cxLL71k9t13X3PQQQd1wq8TjZVAZ3E75RLgjOuQGBdKHisJ\nQlC1KMGysnPnzk56QstLhIAQEAJCQAj0C4FRQejwoX5Vpnq6IwAJILR7VFTV7jl1VQgIASEgBIRA\nMxGQRaSZ/SathYAQEAJCQAi0AgERkVZ0oxohBISAEBACQqCZCIiINLPfpLUQEAJCQAgIgVYgICLS\nim5UI4SAEBACQkAINBMBOas2s9+ktRAQAkJACAiBViAgi0grulGNEAJCQAgIASHQTARERJrZb9Ja\nCAgBISAEhEArEBARaUU3qhFCQAgIASEgBJqJgIhIM/tNWgsBISAEhIAQaAUCIiKt6MaRjfjVr35l\nHn744ZEXdEYICAEhIASEQI0Q0KqZGnVG0aq8973vNd/97nfN3/zN3xRdtMoTAkJACAgBIVAIArKI\nFAJjPQuZPn26WbZsWT2Vk1ZCQAgIASEgBAIEZBFp8W3wwx/+0Bx33HHmpz/9aYtbqaYJASEgBIRA\nkxGQRaTJvddD97/7u78zo0ePNhs2bOiRUpeFgBAQAkJACFSDgIhINbj3rdY5c+aYlStX9q0+VSQE\nhIAQEAJCIA0CmppJg1YD0/73f/+3wWn1l7/8pfnzP//zBrZAKgsBISAEhECbEZBFpM29G7SNFTMz\nZswwq1evbnlL1TwhIASEgBBoIgIiIk3stZQ6s3pm0aJFKXMpuRAQAkJACAiB8hEQESkf48prOP74\n483OnTsNq2gkQkAICAEhIATqhICISJ16o0RdLrzwQrNkyZISa1DRQkAICAEhIATSIyBn1fSYNTKH\nYoo0stuktBAQAkKg9QjIItL6Lv59A4kpwkf7zwxIh6uZQkAICIGGICAi0pCOKkLN2bNnmxUrVhRR\nlMoQAkJACAgBIVAIApqaKQTGZhTCjry77babDfmujfCa0WfSUggIASHQdgRkEWl7D3vtI6DZ5Zdf\nbh566CHvrA6FgBAQAkJACFSHgIhIddhXUvPkyZPNXXfdVUndqlQICAEhIASEQBgBEZEwIi3/TUwR\n5Lvf/W7LW6rmCQEhIASEQBMQEBFpQi8VrONnP/tZ88ADDxRcqooTAkJACAgBIZAeATmrpses8Tm0\nEV7ju1ANEAJCQAi0BgERkdZ0ZbqGnH766ebss882p512WrqMSt1KBJiq27p1q3n11VfNtm3bzBtv\nvGG2bNkyoq3Tpk0ze+yxh5kwYYIZP368+fCHP6xdnUegpBNCQAikQUBEJA1aLUpLYLNbbrnFPPXU\nUy1qlZqSBoENGzaYxx57zKxcudKMHj3aTJo0yRx88MFm3Lhxdpk3AfB8wZL205/+1Lz11lvmtdde\nM08//bT9QE5OPfVU88l
"text/plain": [
"<IPython.core.display.Image object>"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from IPython.display import Image\n",
"Image(filename='sentiment_network.png')"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'': 0,\n",
" 'inhabitants': 1,\n",
" 'goku': 2,\n",
" 'stunts': 3,\n",
" 'catepillar': 4,\n",
" 'kristensen': 5,\n",
" 'goddess': 7,\n",
" 'offing': 49797,\n",
" 'distroy': 8,\n",
" 'unexplainably': 9,\n",
" 'concoctions': 10,\n",
" 'petite': 11,\n",
" 'paramilitary': 24759,\n",
" 'scribe': 12,\n",
" 'stevson': 13,\n",
" 'senegal': 6,\n",
" 'sctv': 14,\n",
" 'soundscape': 15,\n",
" 'rana': 16,\n",
" 'immortalizer': 18,\n",
" 'rene': 67354,\n",
" 'eko': 23,\n",
" 'planning': 20,\n",
" 'akiva': 21,\n",
" 'plod': 22,\n",
" 'orderly': 24,\n",
" 'zeleznice': 25,\n",
" 'critize': 29,\n",
" 'baguettes': 25649,\n",
" 'jefferies': 30,\n",
" 'uncertainties': 61695,\n",
" 'mountainbillies': 31,\n",
" 'steinbichler': 32,\n",
" 'vowel': 33,\n",
" 'rafe': 34,\n",
" 'donig': 68719,\n",
" 'tulipe': 36,\n",
" 'clot': 37,\n",
" 'hack': 12526,\n",
" 'distended': 38,\n",
" 'cornered': 37116,\n",
" 'impatiently': 40,\n",
" 'batrice': 12525,\n",
" 'unfortuntly': 41,\n",
" 'lung': 42,\n",
" 'scapegoats': 43,\n",
" 'pscychosexual': 45,\n",
" 'outbid': 46,\n",
" 'obit': 47,\n",
" 'sideshows': 48,\n",
" 'jugde': 49,\n",
" 'kevloun': 51,\n",
" 'quartier': 53,\n",
" 'harp': 61948,\n",
" 'unravelling': 54,\n",
" 'antiques': 56,\n",
" 'strutts': 57,\n",
" 'tilts': 58,\n",
" 'disconcert': 59,\n",
" 'dossiers': 60,\n",
" 'sorriest': 61,\n",
" 'craftsman': 49412,\n",
" 'blart': 62,\n",
" 'dependence': 37120,\n",
" 'sated': 61698,\n",
" 'iberia': 63,\n",
" 'sagan': 72,\n",
" 'frmann': 65,\n",
" 'daniell': 66,\n",
" 'rays': 67,\n",
" 'pried': 68,\n",
" 'khoobsurat': 69,\n",
" 'leavitt': 70,\n",
" 'caiano': 71,\n",
" 'attractiveness': 73,\n",
" 'kitaparaporn': 74,\n",
" 'hamilton': 75,\n",
" 'massages': 76,\n",
" 'horgan': 78,\n",
" 'chemist': 79,\n",
" 'audrey': 80,\n",
" 'yeow': 55655,\n",
" 'jana': 81,\n",
" 'dutch': 82,\n",
" 'pinchot': 24773,\n",
" 'override': 83,\n",
" 'dwervick': 63223,\n",
" 'spasms': 84,\n",
" 'resumed': 85,\n",
" 'tamale': 66259,\n",
" 'calibanian': 49636,\n",
" 'stinson': 86,\n",
" 'widows': 87,\n",
" 'stonewall': 88,\n",
" 'palatial': 89,\n",
" 'neuman': 90,\n",
" 'abandon': 91,\n",
" 'lemmings': 65314,\n",
" 'anglophile': 92,\n",
" 'ertha': 61706,\n",
" 'chevette': 94,\n",
" 'unscary': 95,\n",
" 'spoilerific': 97,\n",
" 'neworleans': 67639,\n",
" 'metamorphose': 17,\n",
" 'brigand': 99,\n",
" 'cheating': 41603,\n",
" 'clued': 101,\n",
" 'dermatonecrotic': 102,\n",
" 'grady': 103,\n",
" 'mulligan': 104,\n",
" 'ol': 105,\n",
" 'incubation': 107,\n",
" 'plaintiffs': 110,\n",
" 'snden': 109,\n",
" 'fk': 111,\n",
" 'deply': 112,\n",
" 'franchot': 113,\n",
" 'henstridge': 19,\n",
" 'cyhper': 114,\n",
" 'verbose': 26,\n",
" 'mazovia': 116,\n",
" 'elizabeth': 117,\n",
" 'palestine': 118,\n",
" 'robby': 119,\n",
" 'wongo': 120,\n",
" 'moshing': 121,\n",
" 'mstified': 12543,\n",
" 'eeeee': 122,\n",
" 'doltish': 123,\n",
" 'bree': 124,\n",
" 'postponed': 125,\n",
" 'debacles': 127,\n",
" 'amplify': 27,\n",
" 'kamm': 128,\n",
" 'phantom': 18893,\n",
" 'boylen': 136,\n",
" 'rolando': 131,\n",
" 'premises': 133,\n",
" 'bruck': 134,\n",
" 'loosely': 135,\n",
" 'wodehousian': 139,\n",
" 'onishi': 70389,\n",
" 'encapsuling': 140,\n",
" 'partly': 141,\n",
" 'stadling': 144,\n",
" 'calms': 143,\n",
" 'darkie': 148,\n",
" 'wheeling': 147,\n",
" 'ursla': 15875,\n",
" 'subsidized': 49420,\n",
" 'mckellar': 149,\n",
" 'ooookkkk': 151,\n",
" 'milky': 152,\n",
" 'unfolded': 153,\n",
" 'degrades': 154,\n",
" 'authenticating': 155,\n",
" 'writeup': 12548,\n",
" 'rotheroe': 156,\n",
" 'beart': 157,\n",
" 'intoxicants': 160,\n",
" 'grispin': 159,\n",
" 'cannes': 61718,\n",
" 'antithetical': 70398,\n",
" 'nnette': 161,\n",
" 'tsukamoto': 163,\n",
" 'antwones': 44205,\n",
" 'stows': 164,\n",
" 'suddenness': 165,\n",
" 'vol': 61720,\n",
" 'waqt': 166,\n",
" 'camazotz': 168,\n",
" 'paps': 55042,\n",
" 'shakher': 170,\n",
" 'terminate': 63868,\n",
" 'kotex': 56419,\n",
" 'delinquency': 171,\n",
" 'bromwell': 25214,\n",
" 'insecticide': 173,\n",
" 'charlton': 174,\n",
" 'nakada': 177,\n",
" 'titted': 24791,\n",
" 'urbane': 178,\n",
" 'depicted': 54491,\n",
" 'sadomasochistic': 179,\n",
" 'hyping': 181,\n",
" 'yr': 182,\n",
" 'hebert': 183,\n",
" 'waxwork': 12990,\n",
" 'deathrow': 185,\n",
" 'nourishes': 24792,\n",
" 'unmediated': 187,\n",
" 'tamper': 37143,\n",
" 'soad': 190,\n",
" 'alphabet': 189,\n",
" 'donen': 191,\n",
" 'lord': 192,\n",
" 'recess': 193,\n",
" 'watchably': 61023,\n",
" 'handsome': 194,\n",
" 'vignettes': 196,\n",
" 'pairings': 198,\n",
" 'uselful': 199,\n",
" 'sanders': 200,\n",
" 'outbursts': 72891,\n",
" 'nots': 201,\n",
" 'hatsumomo': 202,\n",
" 'actioned': 18292,\n",
" 'krimi': 24797,\n",
" 'appleby': 203,\n",
" 'tampax': 204,\n",
" 'sprinkling': 205,\n",
" 'defacing': 206,\n",
" 'lofty': 207,\n",
" 'verger': 213,\n",
" 'tablespoons': 211,\n",
" 'bernhard': 212,\n",
" 'goosebump': 64565,\n",
" 'acumen': 214,\n",
" 'percentages': 215,\n",
" 'wendingo': 216,\n",
" 'resonating': 217,\n",
" 'vntoarea': 218,\n",
" 'redundancies': 219,\n",
" 'strictly': 57081,\n",
" 'pitied': 221,\n",
" 'belying': 222,\n",
" 'michelangelo': 53153,\n",
" 'gleefulness': 223,\n",
" 'environmentalist': 24803,\n",
" 'gitane': 226,\n",
" 'corrected': 66547,\n",
" 'journalist': 227,\n",
" 'focusing': 228,\n",
" 'plethora': 229,\n",
" 'his': 39,\n",
" 'citizen': 230,\n",
" 'south': 55579,\n",
" 'clunkers': 232,\n",
" 'pendulous': 55991,\n",
" 'mounds': 24805,\n",
" 'deplorable': 233,\n",
" 'forgive': 234,\n",
" 'proplems': 235,\n",
" 'bankers': 237,\n",
" 'aqua': 238,\n",
" 'donated': 239,\n",
" 'disbelieving': 240,\n",
" 'acomplication': 241,\n",
" 'contrasted': 243,\n",
" 'muzzle': 44,\n",
" 'amphibians': 72141,\n",
" 'springs': 246,\n",
" 'reformatted': 49443,\n",
" 'toolbox': 247,\n",
" 'contacting': 248,\n",
" 'washrooms': 250,\n",
" 'raving': 251,\n",
" 'dynamism': 252,\n",
" 'mae': 253,\n",
" 'disharmony': 255,\n",
" 'molls': 72979,\n",
" 'dewaere': 12569,\n",
" 'untutored': 256,\n",
" 'icarus': 257,\n",
" 'taint': 258,\n",
" 'kargil': 259,\n",
" 'captain': 260,\n",
" 'paucity': 261,\n",
" 'fits': 262,\n",
" 'tumbles': 263,\n",
" 'amer': 264,\n",
" 'bueller': 265,\n",
" 'cleansed': 267,\n",
" 'shara': 269,\n",
" 'humma': 270,\n",
" 'outa': 272,\n",
" 'piglets': 273,\n",
" 'gombell': 274,\n",
" 'supermen': 275,\n",
" 'superlow': 276,\n",
" 'kubanskie': 280,\n",
" 'goode': 278,\n",
" 'disorganised': 45570,\n",
" 'zenith': 281,\n",
" 'ananda': 282,\n",
" 'matlin': 284,\n",
" 'particolare': 50,\n",
" 'presumptuous': 286,\n",
" 'rerun': 287,\n",
" 'toyko': 288,\n",
" 'bilb': 291,\n",
" 'sundry': 290,\n",
" 'fugly': 292,\n",
" 'orchestrating': 293,\n",
" 'prosaically': 294,\n",
" 'moveis': 296,\n",
" 'conelly': 297,\n",
" 'estrange': 298,\n",
" 'elfriede': 49455,\n",
" 'masterful': 52,\n",
" 'seasonings': 300,\n",
" 'quincey': 303,\n",
" 'frowning': 49456,\n",
" 'painkillers': 53444,\n",
" 'high': 25515,\n",
" 'flesh': 304,\n",
" 'tootsie': 305,\n",
" 'ai': 306,\n",
" 'tenma': 307,\n",
" 'duguay': 71257,\n",
" 'appropriations': 308,\n",
" 'ides': 310,\n",
" 'rui': 61734,\n",
" 'surrogacy': 311,\n",
" 'pungent': 312,\n",
" 'damaso': 314,\n",
" 'authoritarian': 61736,\n",
" 'caribou': 315,\n",
" 'ro': 318,\n",
" 'supplying': 317,\n",
" 'yuy': 319,\n",
" 'debuted': 321,\n",
" 'mounts': 323,\n",
" 'interpolated': 324,\n",
" 'aetv': 325,\n",
" 'plummer': 326,\n",
" 'asunder': 331,\n",
" 'airfix': 333,\n",
" 'dubiel': 329,\n",
" 'clavichord': 330,\n",
" 'crafty': 50465,\n",
" 'sublety': 332,\n",
" 'stoltzfus': 334,\n",
" 'ruth': 335,\n",
" 'fluorescent': 336,\n",
" 'improves': 337,\n",
" 'russells': 339,\n",
" 'tick': 43838,\n",
" 'zsa': 341,\n",
" 'macs': 343,\n",
" 'jlb': 345,\n",
" 'locus': 348,\n",
" 'mislead': 349,\n",
" 'merly': 49461,\n",
" 'corey': 350,\n",
" 'blundered': 351,\n",
" 'humourless': 3568,\n",
" 'disorganized': 353,\n",
" 'discuss': 354,\n",
" 'sharifi': 45391,\n",
" 'tieing': 356,\n",
" 'kats': 34784,\n",
" 'bbc': 360,\n",
" 'pranked': 362,\n",
" 'superman': 363,\n",
" 'holroyd': 9223,\n",
" 'aggravated': 364,\n",
" 'rifleman': 365,\n",
" 'yvone': 366,\n",
" 'vaugier': 24820,\n",
" 'radiant': 367,\n",
" 'galico': 368,\n",
" 'debris': 369,\n",
" 'btw': 371,\n",
" 'denote': 24822,\n",
" 'havnt': 372,\n",
" 'francen': 373,\n",
" 'chattered': 374,\n",
" 'scathed': 375,\n",
" 'pic': 376,\n",
" 'ceremonies': 377,\n",
" 'everyplace': 65309,\n",
" 'betsy': 379,\n",
" 'finster': 37176,\n",
" 'meercat': 381,\n",
" 'noirs': 382,\n",
" 'grunts': 383,\n",
" 'tribulations': 385,\n",
" 'apparatus': 47673,\n",
" 'martnez': 25825,\n",
" 'telethons': 24825,\n",
" 'talladega': 387,\n",
" 'alloimono': 390,\n",
" 'situations': 64,\n",
" 'scrutinising': 391,\n",
" 'geta': 392,\n",
" 'beltrami': 393,\n",
" 'pvc': 394,\n",
" 'horse': 395,\n",
" 'tiburon': 396,\n",
" 'huitime': 397,\n",
" 'ripple': 398,\n",
" 'exceed': 61748,\n",
" 'loitering': 399,\n",
" 'forensics': 400,\n",
" 'nearly': 401,\n",
" 'ellington': 403,\n",
" 'uzi': 404,\n",
" 'rung': 408,\n",
" 'pillaged': 24829,\n",
" 'gao': 409,\n",
" 'licitates': 410,\n",
" 'protocol': 411,\n",
" 'smirker': 412,\n",
" 'torin': 413,\n",
" 'vizier': 31853,\n",
" 'newlywed': 414,\n",
" 'dismay': 416,\n",
" 'moonwalks': 418,\n",
" 'skyler': 417,\n",
" 'invested': 18455,\n",
" 'grifter': 421,\n",
" 'undersold': 422,\n",
" 'chearator': 423,\n",
" 'marino': 424,\n",
" 'scala': 425,\n",
" 'conditioner': 426,\n",
" 'lamarre': 428,\n",
" 'figueroa': 429,\n",
" 'mcinnerny': 61753,\n",
" 'allllllll': 431,\n",
" 'slide': 432,\n",
" 'lateness': 433,\n",
" 'selbst': 434,\n",
" 'dramatizing': 436,\n",
" 'doable': 438,\n",
" 'hollywoodize': 27207,\n",
" 'alexanderplatz': 440,\n",
" 'wholesome': 45745,\n",
" 'pandemonium': 441,\n",
" 'earth': 443,\n",
" 'mounties': 444,\n",
" 'seeker': 445,\n",
" 'cheat': 446,\n",
" 'outbreaks': 447,\n",
" 'savagely': 61759,\n",
" 'snowstorm': 448,\n",
" 'baur': 449,\n",
" 'schedules': 450,\n",
" 'bathetic': 451,\n",
" 'johnathon': 453,\n",
" 'origonal': 57843,\n",
" 'rosanne': 454,\n",
" 'cauldrons': 456,\n",
" 'forrest': 457,\n",
" 'poky': 458,\n",
" 'aristos': 54856,\n",
" 'womanness': 460,\n",
" 'spender': 461,\n",
" 'pagliai': 37108,\n",
" 'rational': 463,\n",
" 'terrell': 464,\n",
" 'affronts': 472,\n",
" 'concise': 49476,\n",
" 'mathew': 468,\n",
" 'narnia': 469,\n",
" 'naseeruddin': 470,\n",
" 'bucks': 471,\n",
" 'proceeds': 69809,\n",
" 'topple': 473,\n",
" 'degree': 474,\n",
" 'passionately': 476,\n",
" 'defeats': 477,\n",
" 'gras': 49477,\n",
" 'sources': 479,\n",
" 'pflug': 49976,\n",
" 'botticelli': 480,\n",
" 'fwd': 486,\n",
" 'waiving': 483,\n",
" 'gunnar': 484,\n",
" 'stiffler': 485,\n",
" 'unwise': 49480,\n",
" 'kawajiri': 487,\n",
" 'sistahs': 489,\n",
" 'swallowed': 30511,\n",
" 'soulhunter': 490,\n",
" 'belies': 491,\n",
" 'wrathful': 492,\n",
" 'badmouth': 16696,\n",
" 'floradora': 61766,\n",
" 'unforgivably': 497,\n",
" 'weirdy': 496,\n",
" 'violation': 63309,\n",
" 'chepart': 498,\n",
" 'departmentthe': 500,\n",
" 'posehn': 49483,\n",
" 'peyote': 37188,\n",
" 'psychiatrically': 24846,\n",
" 'marionettes': 503,\n",
" 'blatty': 502,\n",
" 'atop': 504,\n",
" 'debases': 25135,\n",
" 'henze': 24845,\n",
" 'unrooted': 510,\n",
" 'cloudscape': 508,\n",
" 'resignedly': 509,\n",
" 'begin': 49917,\n",
" 'hitlerian': 512,\n",
" 'reedus': 517,\n",
" 'crewed': 514,\n",
" 'bedeviled': 515,\n",
" 'unfurnished': 516,\n",
" 'herrmann': 12602,\n",
" 'circumstances': 518,\n",
" 'grasped': 519,\n",
" 'fn': 521,\n",
" 'beefed': 22200,\n",
" 'scwatch': 64018,\n",
" 'dishwashers': 522,\n",
" 'roadie': 523,\n",
" 'ruthlessness': 524,\n",
" 'migrant': 12605,\n",
" 'refrains': 525,\n",
" 'preponderance': 44377,\n",
" 'lampooning': 526,\n",
" 'richart': 528,\n",
" 'gwenneth': 530,\n",
" 'enmity': 531,\n",
" 'vortex': 61772,\n",
" 'assess': 532,\n",
" 'manufacturer': 533,\n",
" 'bullosa': 534,\n",
" 'citizenship': 61774,\n",
" 'chekov': 537,\n",
" 'hogan': 536,\n",
" 'blithe': 538,\n",
" 'aredavid': 542,\n",
" 'drillings': 540,\n",
" 'revolvers': 541,\n",
" 'boyfriendhe': 545,\n",
" 'achcha': 544,\n",
" 'wallow': 546,\n",
" 'toga': 547,\n",
" 'bosnians': 551,\n",
" 'going': 550,\n",
" 'willy': 552,\n",
" 'fim': 554,\n",
" 'forbidding': 555,\n",
" 'delete': 56779,\n",
" 'rationalised': 557,\n",
" 'shimomo': 558,\n",
" 'opposition': 559,\n",
" 'landis': 560,\n",
" 'minded': 561,\n",
" 'arghhhhh': 564,\n",
" 'trialat': 566,\n",
" 'protected': 567,\n",
" 'negras': 568,\n",
" 'tracker': 571,\n",
" 'muti': 570,\n",
" 'dinky': 49489,\n",
" 'shawl': 572,\n",
" 'differentiates': 573,\n",
" 'dipaolo': 61779,\n",
" 'sweetheart': 574,\n",
" 'manmohan': 576,\n",
" 'enamored': 66265,\n",
" 'trevethyn': 577,\n",
" 'brain': 578,\n",
" 'incomprehensibly': 579,\n",
" 'pasadena': 581,\n",
" 'bruton': 59142,\n",
" 'shtick': 582,\n",
" 'ute': 583,\n",
" 'viggo': 584,\n",
" 'relevent': 589,\n",
" 'cites': 587,\n",
" 'greenaways': 61781,\n",
" 'minidress': 590,\n",
" 'philosopher': 591,\n",
" 'mahattan': 593,\n",
" 'moden': 594,\n",
" 'compiling': 595,\n",
" 'unimaginative': 598,\n",
" 'rogues': 597,\n",
" 'subpaar': 599,\n",
" 'darkly': 601,\n",
" 'saturate': 602,\n",
" 'fledgling': 603,\n",
" 'breaths': 604,\n",
" 'sceam': 37206,\n",
" 'empathized': 58870,\n",
" 'aszombi': 606,\n",
" 'incalculable': 608,\n",
" 'formations': 28596,\n",
" 'hampden': 619,\n",
" 'rawail': 612,\n",
" 'forbid': 613,\n",
" 'holiness': 617,\n",
" 'unessential': 618,\n",
" 'reputedly': 616,\n",
" 'wage': 63181,\n",
" 'kewpie': 24860,\n",
" 'asylum': 620,\n",
" 'bolye': 621,\n",
" 'celticism': 63189,\n",
" 'strangers': 622,\n",
" 'rantzen': 623,\n",
" 'farrellys': 624,\n",
" 'marathon': 93,\n",
" 'cantinflas': 626,\n",
" 'disproportionately': 12617,\n",
" 'bared': 67212,\n",
" 'enshrined': 627,\n",
" 'expetations': 629,\n",
" 'replaying': 630,\n",
" 'topless': 636,\n",
" 'bukater': 632,\n",
" 'overpaid': 633,\n",
" 'exhude': 634,\n",
" 'nitwits': 638,\n",
" 'tsst': 51554,\n",
" 'sufferings': 637,\n",
" 'ci': 24693,\n",
" 'eponymously': 96,\n",
" 'ferdy': 644,\n",
" 'danira': 641,\n",
" 'unrelenting': 642,\n",
" 'disabling': 643,\n",
" 'gerard': 645,\n",
" 'drewitt': 646,\n",
" 'lamping': 650,\n",
" 'demy': 652,\n",
" 'wicklow': 37214,\n",
" 'relinquish': 651,\n",
" 'feminized': 64196,\n",
" 'drink': 653,\n",
" 'chamberlin': 654,\n",
" 'floodwaters': 657,\n",
" 'searing': 658,\n",
" 'isral': 659,\n",
" 'ling': 660,\n",
" 'grossness': 661,\n",
" 'sassier': 24865,\n",
" 'pickier': 662,\n",
" 'pax': 663,\n",
" 'fleashens': 98,\n",
" 'wierd': 664,\n",
" 'tereasa': 665,\n",
" 'smog': 666,\n",
" 'girotti': 667,\n",
" 'zooey': 64814,\n",
" 'spat': 668,\n",
" 'sera': 669,\n",
" 'misbehaving': 671,\n",
" 'scouts': 672,\n",
" 'refreshments': 673,\n",
" 'itll': 39668,\n",
" 'toyomichi': 676,\n",
" 'politeness': 100,\n",
" 'bits': 677,\n",
" 'psychotics': 678,\n",
" 'optimistic': 61796,\n",
" 'barzell': 679,\n",
" 'colt': 680,\n",
" 'anita': 49501,\n",
" 'shivering': 681,\n",
" 'utah': 59297,\n",
" 'scrivener': 686,\n",
" 'predicable': 687,\n",
" 'dryer': 684,\n",
" 'reissues': 685,\n",
" 'sexier': 26115,\n",
" 'spellbind': 691,\n",
" 'marmalade': 689,\n",
" 'seems': 690,\n",
" 'wyke': 37223,\n",
" 'innovator': 693,\n",
" 'inthused': 695,\n",
" 'scatman': 6309,\n",
" 'contestants': 696,\n",
" 'bertolucci': 106,\n",
" 'serviced': 699,\n",
" 'nozires': 700,\n",
" 'ins': 701,\n",
" 'mutilating': 702,\n",
" 'dupes': 703,\n",
" 'launius': 704,\n",
" 'widescreen': 705,\n",
" 'joo': 706,\n",
" 'discretionary': 707,\n",
" 'enlivens': 708,\n",
" 'manos': 55596,\n",
" 'bushes': 709,\n",
" 'header': 711,\n",
" 'activist': 712,\n",
" 'gethsemane': 713,\n",
" 'phoenixs': 714,\n",
" 'wreathed': 715,\n",
" 'oldboy': 108,\n",
" 'electrifyingly': 717,\n",
" 'inseparability': 24874,\n",
" 'ghidora': 719,\n",
" 'binder': 720,\n",
" 'tibet': 51530,\n",
" 'doddsville': 723,\n",
" 'sugar': 722,\n",
" 'porkys': 724,\n",
" 'hopefully': 37226,\n",
" 'scattershot': 725,\n",
" 'refunded': 726,\n",
" 'rudely': 727,\n",
" 'enacts': 67435,\n",
" 'insteadit': 728,\n",
" 'nightwatch': 61803,\n",
" 'eurotrash': 730,\n",
" 'radioraptus': 731,\n",
" 'unreservedly': 73710,\n",
" 'vall': 49508,\n",
" 'boogeman': 733,\n",
" 'flunked': 24880,\n",
" 'weighs': 734,\n",
" 'glorfindel': 738,\n",
" 'hypothermia': 737,\n",
" 'misled': 64919,\n",
" 'toiletries': 71501,\n",
" 'birthdays': 739,\n",
" 'attentive': 740,\n",
" 'mallepa': 741,\n",
" 'manoy': 743,\n",
" 'bombshells': 744,\n",
" 'glorifying': 115,\n",
" 'southron': 747,\n",
" 'destruction': 748,\n",
" 'manhole': 750,\n",
" 'elainor': 751,\n",
" 'bounder': 13003,\n",
" 'bowersock': 752,\n",
" 'lowly': 753,\n",
" 'wfst': 754,\n",
" 'limousines': 755,\n",
" 'skolimowski': 756,\n",
" 'saban': 757,\n",
" 'malaysia': 759,\n",
" 'cyd': 761,\n",
" 'bonecrushing': 763,\n",
" 'merest': 765,\n",
" 'janina': 766,\n",
" 'chemotrodes': 767,\n",
" 'trials': 768,\n",
" 'whilhelm': 770,\n",
" 'asthmatic': 771,\n",
" 'missteps': 773,\n",
" 'melyvn': 24885,\n",
" 'embittered': 774,\n",
" 'profit': 37234,\n",
" 'seeming': 776,\n",
" 'miscalculate': 777,\n",
" 'recommeded': 778,\n",
" 'mankin': 37235,\n",
" 'schoolwork': 779,\n",
" 'coy': 780,\n",
" 'mcconaughey': 781,\n",
" 'waver': 783,\n",
" 'unwatchably': 786,\n",
" 'saggy': 787,\n",
" 'breakup': 790,\n",
" 'pufnstuf': 37237,\n",
" 'superstars': 792,\n",
" 'replay': 793,\n",
" 'aggravates': 794,\n",
" 'urging': 796,\n",
" 'snidely': 797,\n",
" 'aleksandar': 798,\n",
" 'hildy': 799,\n",
" 'kazuhiro': 800,\n",
" 'slayer': 801,\n",
" 'tangy': 802,\n",
" 'horne': 804,\n",
" 'masayuki': 805,\n",
" 'molden': 806,\n",
" 'unravel': 807,\n",
" 'goodtime': 808,\n",
" 'rowboat': 811,\n",
" 'dekhiye': 815,\n",
" 'datedness': 813,\n",
" 'astrotheology': 814,\n",
" 'suriani': 59610,\n",
" 'hostilities': 819,\n",
" 'wipes': 818,\n",
" 'sentimentalising': 820,\n",
" 'documentary': 821,\n",
" 'virtue': 823,\n",
" 'unreasonably': 824,\n",
" 'cei': 826,\n",
" 'hobbled': 37240,\n",
" 'unglamorised': 827,\n",
" 'balky': 828,\n",
" 'complementary': 829,\n",
" 'paychecks': 830,\n",
" 'tughlaq': 45551,\n",
" 'functionality': 836,\n",
" 'ily': 833,\n",
" 'prc': 834,\n",
" 'ennobling': 835,\n",
" 'dissociated': 837,\n",
" 'elk': 838,\n",
" 'throbbing': 839,\n",
" 'tempe': 840,\n",
" 'linoleum': 841,\n",
" 'bottacin': 843,\n",
" 'hipper': 844,\n",
" 'barging': 846,\n",
" 'untie': 847,\n",
" 'sacchetti': 848,\n",
" 'gnat': 849,\n",
" 'roedel': 850,\n",
" 'performs': 852,\n",
" 'nanavati': 856,\n",
" 'migrs': 854,\n",
" 'teachs': 855,\n",
" 'gunslinger': 126,\n",
" 'fresco': 857,\n",
" 'davison': 858,\n",
" 'jet': 59446,\n",
" 'burglar': 860,\n",
" 'jerker': 69267,\n",
" 'masue': 861,\n",
" 'dickory': 862,\n",
" 'muggy': 46634,\n",
" 'grills': 863,\n",
" 'figment': 28693,\n",
" 'monogamistic': 49527,\n",
" 'appelagate': 864,\n",
" 'linkage': 865,\n",
" 'loesser': 867,\n",
" 'patties': 868,\n",
" 'prudent': 869,\n",
" 'mallorquins': 870,\n",
" 'nativetex': 871,\n",
" 'suprise': 872,\n",
" 'quill': 874,\n",
" 'angsty': 71451,\n",
" 'speeded': 875,\n",
" 'farscape': 876,\n",
" 'herman': 129,\n",
" 'saddening': 877,\n",
" 'centuries': 878,\n",
" 'mos': 879,\n",
" 'neccessarily': 881,\n",
" 'tankers': 883,\n",
" 'latte': 884,\n",
" 'faracy': 886,\n",
" 'stilts': 24897,\n",
" 'synthetically': 887,\n",
" 'thoughtless': 888,\n",
" 'authoring': 62813,\n",
" 'rake': 889,\n",
" 'ropes': 890,\n",
" 'whitewashed': 892,\n",
" 'donal': 893,\n",
" 'arching': 4910,\n",
" 'cockamamie': 899,\n",
" 'lifeless': 895,\n",
" 'perfidy': 896,\n",
" 'teresa': 897,\n",
" 'bulldog': 898,\n",
" 'vingh': 73726,\n",
" 'evacuees': 65858,\n",
" 'rasberries': 900,\n",
" 'chiseling': 903,\n",
" 'clampets': 905,\n",
" 'grecianized': 138,\n",
" 'smaller': 904,\n",
" 'kluznick': 62184,\n",
" 'alerts': 906,\n",
" 'aaaahhhhhhh': 909,\n",
" 'wellingtonian': 908,\n",
" 'dither': 910,\n",
" 'incertitude': 911,\n",
" 'florentine': 912,\n",
" 'imperioli': 913,\n",
" 'licking': 914,\n",
" 'disparagement': 915,\n",
" 'artfully': 916,\n",
" 'feds': 917,\n",
" 'fumiya': 918,\n",
" 'jbl': 52774,\n",
" 'tearfully': 919,\n",
" 'welfare': 24905,\n",
" 'idyllically': 49534,\n",
" 'isha': 43702,\n",
" 'lanchester': 920,\n",
" 'undertaken': 921,\n",
" 'longlost': 922,\n",
" 'netted': 923,\n",
" 'carrell': 924,\n",
" 'uncompelling': 925,\n",
" 'stems': 37258,\n",
" 'reliefs': 926,\n",
" 'leona': 927,\n",
" 'autorenfilm': 928,\n",
" 'unfriendly': 929,\n",
" 'typewriter': 930,\n",
" 'shifted': 931,\n",
" 'bertrand': 932,\n",
" 'blesses': 933,\n",
" 'leukemia': 12666,\n",
" 'posative': 142,\n",
" 'tricking': 934,\n",
" 'zanes': 936,\n",
" 'dashboard': 12667,\n",
" 'unknowingly': 937,\n",
" 'flatmates': 51897,\n",
" 'unnerve': 938,\n",
" 'caning': 939,\n",
" 'shortland': 146,\n",
" 'recluse': 941,\n",
" 'dcreasy': 942,\n",
" 'scratchiness': 24911,\n",
" 'pms': 30930,\n",
" 'chipmunk': 943,\n",
" 'tkachenko': 49537,\n",
" 'dipper': 944,\n",
" 'europeans': 61601,\n",
" 'berserkers': 948,\n",
" 'shys': 947,\n",
" 'monte': 68505,\n",
" 'eve': 949,\n",
" 'luxury': 61828,\n",
" 'conflagration': 950,\n",
" 'water': 46389,\n",
" 'irks': 951,\n",
" 'positronic': 954,\n",
" 'cushy': 150,\n",
" 'swiftness': 957,\n",
" 'underimpressed': 964,\n",
" 'imprint': 959,\n",
" 'sundance': 961,\n",
" 'aida': 31951,\n",
" 'thematically': 963,\n",
" 'uno': 965,\n",
" 'expressly': 966,\n",
" 'russkies': 967,\n",
" 'discos': 968,\n",
" 'shaping': 969,\n",
" 'verson': 970,\n",
" 'blushed': 61831,\n",
" 'prototype': 971,\n",
" 'lifewell': 976,\n",
" 'trafficker': 973,\n",
" 'crucifixions': 62188,\n",
" 'unrealistically': 975,\n",
" 'rivas': 977,\n",
" 'consequent': 978,\n",
" 'katsu': 979,\n",
" 'titantic': 980,\n",
" 'jalees': 981,\n",
" 'ranee': 982,\n",
" 'gambles': 984,\n",
" 'dispenses': 985,\n",
" 'disfigurement': 986,\n",
" 'bright': 987,\n",
" 'cristian': 988,\n",
" 'subculture': 37268,\n",
" 'capta': 991,\n",
" 'jewel': 992,\n",
" 'erect': 993,\n",
" 'avoide': 996,\n",
" 'inconnu': 997,\n",
" 'headquarters': 998,\n",
" 'babbling': 1000,\n",
" 'pac': 1001,\n",
" 'performace': 1003,\n",
" 'dorrit': 1004,\n",
" 'runners': 1005,\n",
" 'sentimentality': 1006,\n",
" 'marred': 1007,\n",
" 'commemorative': 1008,\n",
" 'helpers': 1012,\n",
" 'chiles': 1011,\n",
" 'snowy': 1013,\n",
" 'cheddar': 1014,\n",
" 'neath': 158,\n",
" 'outshine': 1016,\n",
" 'nadu': 1019,\n",
" 'wellbeing': 1020,\n",
" 'envisioned': 43779,\n",
" 'fanaticism': 1021,\n",
" 'morrisette': 12687,\n",
" 'sesame': 1024,\n",
" 'gran': 1023,\n",
" 'marlina': 1025,\n",
" 'artificiality': 1030,\n",
" 'coinsidence': 1027,\n",
" 'founders': 1028,\n",
" 'dismissably': 1029,\n",
" 'dracht': 66299,\n",
" 'scavengers': 1031,\n",
" 'neese': 12685,\n",
" 'pangborn': 1034,\n",
" 'elmore': 1039,\n",
" 'bristol': 71162,\n",
" 'lillies': 1035,\n",
" 'parkers': 1036,\n",
" 'skipped': 1038,\n",
" 'clipboard': 1042,\n",
" 'jucier': 1041,\n",
" 'haifa': 1043,\n",
" ...}"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"word2index = {}\n",
"\n",
"for i,word in enumerate(vocab):\n",
" word2index[word] = i\n",
"word2index"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def update_input_layer(review):\n",
" \n",
" global layer_0\n",
" \n",
" # clear out previous state, reset the layer to be all 0s\n",
" layer_0 *= 0\n",
" for word in review.split(\" \"):\n",
" layer_0[0][word2index[word]] += 1\n",
"\n",
"update_input_layer(reviews[0])"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 18., 0., 0., ..., 0., 0., 0.]])"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"layer_0"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_target_for_label(label):\n",
" if(label == 'POSITIVE'):\n",
" return 1\n",
" else:\n",
" return 0"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'POSITIVE'"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"labels[0]"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"get_target_for_label(labels[0])"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'NEGATIVE'"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"labels[1]"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"get_target_for_label(labels[1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Project 3: Building a Neural Network"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"- Start with your neural network from the last chapter\n",
"- 3 layer neural network\n",
"- no non-linearity in hidden layer\n",
"- use our functions to create the training data\n",
"- create a \"pre_process_data\" function to create vocabulary for our training data generating functions\n",
"- modify \"train\" to train over the entire corpus"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Where to Get Help if You Need it\n",
"- Re-watch previous week's Udacity Lectures\n",
"- Chapters 3-5 - [Grokking Deep Learning](https://www.manning.com/books/grokking-deep-learning) - (40% Off: **traskud17**)"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import time\n",
"import sys\n",
"import numpy as np\n",
"\n",
"# Let's tweak our network from before to model these phenomena\n",
"class SentimentNetwork:\n",
" def __init__(self, reviews,labels,hidden_nodes = 10, learning_rate = 0.1):\n",
" \n",
" # set our random number generator \n",
" np.random.seed(1)\n",
" \n",
" self.pre_process_data(reviews, labels)\n",
" \n",
" self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)\n",
" \n",
" \n",
" def pre_process_data(self, reviews, labels):\n",
" \n",
" review_vocab = set()\n",
" for review in reviews:\n",
" for word in review.split(\" \"):\n",
" review_vocab.add(word)\n",
" self.review_vocab = list(review_vocab)\n",
" \n",
" label_vocab = set()\n",
" for label in labels:\n",
" label_vocab.add(label)\n",
" \n",
" self.label_vocab = list(label_vocab)\n",
" \n",
" self.review_vocab_size = len(self.review_vocab)\n",
" self.label_vocab_size = len(self.label_vocab)\n",
" \n",
" self.word2index = {}\n",
" for i, word in enumerate(self.review_vocab):\n",
" self.word2index[word] = i\n",
" \n",
" self.label2index = {}\n",
" for i, label in enumerate(self.label_vocab):\n",
" self.label2index[label] = i\n",
" \n",
" \n",
" def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):\n",
" # Set number of nodes in input, hidden and output layers.\n",
" self.input_nodes = input_nodes\n",
" self.hidden_nodes = hidden_nodes\n",
" self.output_nodes = output_nodes\n",
"\n",
" # Initialize weights\n",
" self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))\n",
" \n",
" self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5, \n",
" (self.hidden_nodes, self.output_nodes))\n",
" \n",
" self.learning_rate = learning_rate\n",
" \n",
" self.layer_0 = np.zeros((1,input_nodes))\n",
" \n",
" \n",
" def update_input_layer(self,review):\n",
"\n",
" # clear out previous state, reset the layer to be all 0s\n",
" self.layer_0 *= 0\n",
" for word in review.split(\" \"):\n",
" if(word in self.word2index.keys()):\n",
" self.layer_0[0][self.word2index[word]] += 1\n",
" \n",
" def get_target_for_label(self,label):\n",
" if(label == 'POSITIVE'):\n",
" return 1\n",
" else:\n",
" return 0\n",
" \n",
" def sigmoid(self,x):\n",
" return 1 / (1 + np.exp(-x))\n",
" \n",
" \n",
" def sigmoid_output_2_derivative(self,output):\n",
" return output * (1 - output)\n",
" \n",
" def train(self, training_reviews, training_labels):\n",
" \n",
" assert(len(training_reviews) == len(training_labels))\n",
" \n",
" correct_so_far = 0\n",
" \n",
" start = time.time()\n",
" \n",
" for i in range(len(training_reviews)):\n",
" \n",
" review = training_reviews[i]\n",
" label = training_labels[i]\n",
" \n",
" #### Implement the forward pass here ####\n",
" ### Forward pass ###\n",
"\n",
" # Input Layer\n",
" self.update_input_layer(review)\n",
"\n",
" # Hidden layer\n",
" layer_1 = self.layer_0.dot(self.weights_0_1)\n",
"\n",
" # Output layer\n",
" layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))\n",
"\n",
" #### Implement the backward pass here ####\n",
" ### Backward pass ###\n",
"\n",
" # TODO: Output error\n",
" layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.\n",
" layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)\n",
"\n",
" # TODO: Backpropagated error\n",
" layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer\n",
" layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error\n",
"\n",
" # TODO: Update the weights\n",
" self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step\n",
" self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step\n",
"\n",
" if(np.abs(layer_2_error) < 0.5):\n",
" correct_so_far += 1\n",
" \n",
" reviews_per_second = i / float(time.time() - start)\n",
" \n",
" sys.stdout.write(\"\\rProgress:\" + str(100 * i/float(len(training_reviews)))[:4] + \"% Speed(reviews/sec):\" + str(reviews_per_second)[0:5] + \" #Correct:\" + str(correct_so_far) + \" #Trained:\" + str(i+1) + \" Training Accuracy:\" + str(correct_so_far * 100 / float(i+1))[:4] + \"%\")\n",
" if(i % 2500 == 0):\n",
" print(\"\")\n",
" \n",
" def test(self, testing_reviews, testing_labels):\n",
" \n",
" correct = 0\n",
" \n",
" start = time.time()\n",
" \n",
" for i in range(len(testing_reviews)):\n",
" pred = self.run(testing_reviews[i])\n",
" if(pred == testing_labels[i]):\n",
" correct += 1\n",
" \n",
" reviews_per_second = i / float(time.time() - start)\n",
" \n",
" sys.stdout.write(\"\\rProgress:\" + str(100 * i/float(len(testing_reviews)))[:4] \\\n",
" + \"% Speed(reviews/sec):\" + str(reviews_per_second)[0:5] \\\n",
" + \"% #Correct:\" + str(correct) + \" #Tested:\" + str(i+1) + \" Testing Accuracy:\" + str(correct * 100 / float(i+1))[:4] + \"%\")\n",
" \n",
" def run(self, review):\n",
" \n",
" # Input Layer\n",
" self.update_input_layer(review.lower())\n",
"\n",
" # Hidden layer\n",
" layer_1 = self.layer_0.dot(self.weights_0_1)\n",
"\n",
" # Output layer\n",
" layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))\n",
" \n",
" if(layer_2[0] > 0.5):\n",
" return \"POSITIVE\"\n",
" else:\n",
" return \"NEGATIVE\"\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Progress:99.9% Speed(reviews/sec):587.5% #Correct:500 #Tested:1000 Testing Accuracy:50.0%"
]
}
],
"source": [
"# evaluate our model before training (just to show how horrible it is)\n",
"mlp.test(reviews[-1000:],labels[-1000:])"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%\n",
"Progress:10.4% Speed(reviews/sec):89.58 #Correct:1250 #Trained:2501 Training Accuracy:49.9%\n",
"Progress:20.8% Speed(reviews/sec):95.03 #Correct:2500 #Trained:5001 Training Accuracy:49.9%\n",
"Progress:27.4% Speed(reviews/sec):95.46 #Correct:3295 #Trained:6592 Training Accuracy:49.9%"
]
},
{
"ename": "KeyboardInterrupt",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-62-d0f5d85ad402>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# train the network\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mmlp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mreviews\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1000\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mlabels\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1000\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<ipython-input-59-6334c4ec4642>\u001b[0m in \u001b[0;36mtrain\u001b[0;34m(self, training_reviews, training_labels)\u001b[0m\n\u001b[1;32m 117\u001b[0m \u001b[0;31m# TODO: Update the weights\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 118\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mweights_1_2\u001b[0m \u001b[0;34m-=\u001b[0m \u001b[0mlayer_1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mT\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlayer_2_delta\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlearning_rate\u001b[0m \u001b[0;31m# update hidden-to-output weights with gradient descent step\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 119\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mweights_0_1\u001b[0m \u001b[0;34m-=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlayer_0\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mT\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlayer_1_delta\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlearning_rate\u001b[0m \u001b[0;31m# update input-to-hidden weights with gradient descent step\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 120\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 121\u001b[0m \u001b[0;32mif\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mabs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlayer_2_error\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0;36m0.5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mKeyboardInterrupt\u001b[0m: "
]
}
],
"source": [
"# train the network\n",
"mlp.train(reviews[:-1000],labels[:-1000])"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.01)"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%\n",
"Progress:10.4% Speed(reviews/sec):96.39 #Correct:1247 #Trained:2501 Training Accuracy:49.8%\n",
"Progress:20.8% Speed(reviews/sec):99.31 #Correct:2497 #Trained:5001 Training Accuracy:49.9%\n",
"Progress:22.8% Speed(reviews/sec):99.02 #Correct:2735 #Trained:5476 Training Accuracy:49.9%"
]
},
{
"ename": "KeyboardInterrupt",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-64-d0f5d85ad402>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# train the network\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mmlp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mreviews\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1000\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mlabels\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1000\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<ipython-input-59-6334c4ec4642>\u001b[0m in \u001b[0;36mtrain\u001b[0;34m(self, training_reviews, training_labels)\u001b[0m\n\u001b[1;32m 117\u001b[0m \u001b[0;31m# TODO: Update the weights\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 118\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mweights_1_2\u001b[0m \u001b[0;34m-=\u001b[0m \u001b[0mlayer_1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mT\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlayer_2_delta\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlearning_rate\u001b[0m \u001b[0;31m# update hidden-to-output weights with gradient descent step\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 119\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mweights_0_1\u001b[0m \u001b[0;34m-=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlayer_0\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mT\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlayer_1_delta\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlearning_rate\u001b[0m \u001b[0;31m# update input-to-hidden weights with gradient descent step\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 120\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 121\u001b[0m \u001b[0;32mif\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mabs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlayer_2_error\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0;36m0.5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mKeyboardInterrupt\u001b[0m: "
]
}
],
"source": [
"# train the network\n",
"mlp.train(reviews[:-1000],labels[:-1000])"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.001)"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%\n",
"Progress:10.4% Speed(reviews/sec):98.77 #Correct:1267 #Trained:2501 Training Accuracy:50.6%\n",
"Progress:20.8% Speed(reviews/sec):98.79 #Correct:2640 #Trained:5001 Training Accuracy:52.7%\n",
"Progress:31.2% Speed(reviews/sec):98.58 #Correct:4109 #Trained:7501 Training Accuracy:54.7%\n",
"Progress:41.6% Speed(reviews/sec):93.78 #Correct:5638 #Trained:10001 Training Accuracy:56.3%\n",
"Progress:52.0% Speed(reviews/sec):91.76 #Correct:7246 #Trained:12501 Training Accuracy:57.9%\n",
"Progress:62.5% Speed(reviews/sec):92.42 #Correct:8841 #Trained:15001 Training Accuracy:58.9%\n",
"Progress:69.4% Speed(reviews/sec):92.58 #Correct:9934 #Trained:16668 Training Accuracy:59.5%"
]
},
{
"ename": "KeyboardInterrupt",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-66-d0f5d85ad402>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# train the network\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mmlp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mreviews\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1000\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mlabels\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1000\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<ipython-input-59-6334c4ec4642>\u001b[0m in \u001b[0;36mtrain\u001b[0;34m(self, training_reviews, training_labels)\u001b[0m\n\u001b[1;32m 117\u001b[0m \u001b[0;31m# TODO: Update the weights\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 118\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mweights_1_2\u001b[0m \u001b[0;34m-=\u001b[0m \u001b[0mlayer_1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mT\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlayer_2_delta\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlearning_rate\u001b[0m \u001b[0;31m# update hidden-to-output weights with gradient descent step\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 119\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mweights_0_1\u001b[0m \u001b[0;34m-=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlayer_0\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mT\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlayer_1_delta\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlearning_rate\u001b[0m \u001b[0;31m# update input-to-hidden weights with gradient descent step\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 120\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 121\u001b[0m \u001b[0;32mif\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mabs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlayer_2_error\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0;36m0.5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mKeyboardInterrupt\u001b[0m: "
]
}
],
"source": [
"# train the network\n",
"mlp.train(reviews[:-1000],labels[:-1000])"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}