{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sentiment Classification & How To \"Frame Problems\" for a Neural Network\n", "\n", "by Andrew Trask\n", "\n", "- **Twitter**: @iamtrask\n", "- **Blog**: http://iamtrask.github.io" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What You Should Already Know\n", "\n", "- neural networks, forward and back-propagation\n", "- stochastic gradient descent\n", "- mean squared error\n", "- and train/test splits\n", "\n", "### Where to Get Help if You Need it\n", "- Re-watch previous Udacity Lectures\n", "- Leverage the recommended Course Reading Material - [Grokking Deep Learning](https://www.manning.com/books/grokking-deep-learning) (40% Off: **traskud17**)\n", "- Shoot me a tweet @iamtrask\n", "\n", "\n", "### Tutorial Outline:\n", "\n", "- Intro: The Importance of \"Framing a Problem\"\n", "\n", "\n", "- Curate a Dataset\n", "- Developing a \"Predictive Theory\"\n", "- **PROJECT 1**: Quick Theory Validation\n", "\n", "\n", "- Transforming Text to Numbers\n", "- **PROJECT 2**: Creating the Input/Output Data\n", "\n", "\n", "- Putting it all together in a Neural Network\n", "- **PROJECT 3**: Building our Neural Network\n", "\n", "\n", "- Understanding Neural Noise\n", "- **PROJECT 4**: Making Learning Faster by Reducing Noise\n", "\n", "\n", "- Analyzing Inefficiencies in our Network\n", "- **PROJECT 5**: Making our Network Train and Run Faster\n", "\n", "\n", "- Further Noise Reduction\n", "- **PROJECT 6**: Reducing Noise by Strategically Reducing the Vocabulary\n", "\n", "\n", "- Analysis: What's going on in the weights?" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "56bb3cba-260c-4ebe-9ed6-b995b4c72aa3" } }, "source": [ "# Lesson: Curate a Dataset" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "nbpresent": { "id": "eba2b193-0419-431e-8db9-60f34dd3fe83" } }, "outputs": [], "source": [ "def pretty_print_review_and_label(i):\n", " print(labels[i] + \"\\t:\\t\" + reviews[i][:80] + \"...\")\n", "\n", "g = open('reviews.txt','r') # What we know!\n", "reviews = list(map(lambda x:x[:-1],g.readlines()))\n", "g.close()\n", "\n", "g = open('labels.txt','r') # What we WANT to know!\n", "labels = list(map(lambda x:x[:-1].upper(),g.readlines()))\n", "g.close()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "25000" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(reviews)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "nbpresent": { "id": "bb95574b-21a0-4213-ae50-34363cf4f87f" } }, "outputs": [ { "data": { "text/plain": [ "'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life such as teachers . my years in the teaching profession lead me to believe that bromwell high s satire is much closer to reality than is teachers . the scramble to survive financially the insightful students who can see right through their pathetic teachers pomp the pettiness of the whole situation all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn t '" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews[0]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "nbpresent": { "id": "e0408810-c424-4ed4-afb9-1735e9ddbd0a" } }, "outputs": [ { "data": { "text/plain": [ "'POSITIVE'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "labels[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson: Develop a Predictive Theory" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "nbpresent": { "id": "e67a709f-234f-4493-bae6-4fb192141ee0" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "labels.txt \t : \t reviews.txt\n", "\n", "NEGATIVE\t:\tthis movie is terrible but it has some good effects . ...\n", "POSITIVE\t:\tadrian pasdar is excellent is this film . he makes a fascinating woman . ...\n", "NEGATIVE\t:\tcomment this movie is impossible . is terrible very improbable bad interpretat...\n", "POSITIVE\t:\texcellent episode movie ala pulp fiction . days suicides . it doesnt get more...\n", "NEGATIVE\t:\tif you haven t seen this it s terrible . it is pure trash . i saw this about ...\n", "POSITIVE\t:\tthis schiffer guy is a real genius the movie is of excellent quality and both e...\n" ] } ], "source": [ "print(\"labels.txt \\t : \\t reviews.txt\\n\")\n", "pretty_print_review_and_label(2137)\n", "pretty_print_review_and_label(12816)\n", "pretty_print_review_and_label(6267)\n", "pretty_print_review_and_label(21934)\n", "pretty_print_review_and_label(5297)\n", "pretty_print_review_and_label(4998)" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 1 }