You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

237 lines
6.4 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sentiment Classification & How To \"Frame Problems\" for a Neural Network\n",
"\n",
"by Andrew Trask\n",
"\n",
"- **Twitter**: @iamtrask\n",
"- **Blog**: http://iamtrask.github.io"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What You Should Already Know\n",
"\n",
"- neural networks, forward and back-propagation\n",
"- stochastic gradient descent\n",
"- mean squared error\n",
"- and train/test splits\n",
"\n",
"### Where to Get Help if You Need it\n",
"- Re-watch previous Udacity Lectures\n",
"- Leverage the recommended Course Reading Material - [Grokking Deep Learning](https://www.manning.com/books/grokking-deep-learning) (40% Off: **traskud17**)\n",
"- Shoot me a tweet @iamtrask\n",
"\n",
"\n",
"### Tutorial Outline:\n",
"\n",
"- Intro: The Importance of \"Framing a Problem\"\n",
"\n",
"\n",
"- Curate a Dataset\n",
"- Developing a \"Predictive Theory\"\n",
"- **PROJECT 1**: Quick Theory Validation\n",
"\n",
"\n",
"- Transforming Text to Numbers\n",
"- **PROJECT 2**: Creating the Input/Output Data\n",
"\n",
"\n",
"- Putting it all together in a Neural Network\n",
"- **PROJECT 3**: Building our Neural Network\n",
"\n",
"\n",
"- Understanding Neural Noise\n",
"- **PROJECT 4**: Making Learning Faster by Reducing Noise\n",
"\n",
"\n",
"- Analyzing Inefficiencies in our Network\n",
"- **PROJECT 5**: Making our Network Train and Run Faster\n",
"\n",
"\n",
"- Further Noise Reduction\n",
"- **PROJECT 6**: Reducing Noise by Strategically Reducing the Vocabulary\n",
"\n",
"\n",
"- Analysis: What's going on in the weights?"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "56bb3cba-260c-4ebe-9ed6-b995b4c72aa3"
}
},
"source": [
"# Lesson: Curate a Dataset"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"nbpresent": {
"id": "eba2b193-0419-431e-8db9-60f34dd3fe83"
}
},
"outputs": [],
"source": [
"def pretty_print_review_and_label(i):\n",
" print(labels[i] + \"\\t:\\t\" + reviews[i][:80] + \"...\")\n",
"\n",
"g = open('reviews.txt','r') # What we know!\n",
"reviews = list(map(lambda x:x[:-1],g.readlines()))\n",
"g.close()\n",
"\n",
"g = open('labels.txt','r') # What we WANT to know!\n",
"labels = list(map(lambda x:x[:-1].upper(),g.readlines()))\n",
"g.close()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"25000"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(reviews)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"nbpresent": {
"id": "bb95574b-21a0-4213-ae50-34363cf4f87f"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life such as teachers . my years in the teaching profession lead me to believe that bromwell high s satire is much closer to reality than is teachers . the scramble to survive financially the insightful students who can see right through their pathetic teachers pomp the pettiness of the whole situation all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn t '"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"reviews[0]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false,
"nbpresent": {
"id": "e0408810-c424-4ed4-afb9-1735e9ddbd0a"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'POSITIVE'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"labels[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lesson: Develop a Predictive Theory"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false,
"nbpresent": {
"id": "e67a709f-234f-4493-bae6-4fb192141ee0"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"labels.txt \t : \t reviews.txt\n",
"\n",
"NEGATIVE\t:\tthis movie is terrible but it has some good effects . ...\n",
"POSITIVE\t:\tadrian pasdar is excellent is this film . he makes a fascinating woman . ...\n",
"NEGATIVE\t:\tcomment this movie is impossible . is terrible very improbable bad interpretat...\n",
"POSITIVE\t:\texcellent episode movie ala pulp fiction . days suicides . it doesnt get more...\n",
"NEGATIVE\t:\tif you haven t seen this it s terrible . it is pure trash . i saw this about ...\n",
"POSITIVE\t:\tthis schiffer guy is a real genius the movie is of excellent quality and both e...\n"
]
}
],
"source": [
"print(\"labels.txt \\t : \\t reviews.txt\\n\")\n",
"pretty_print_review_and_label(2137)\n",
"pretty_print_review_and_label(12816)\n",
"pretty_print_review_and_label(6267)\n",
"pretty_print_review_and_label(21934)\n",
"pretty_print_review_and_label(5297)\n",
"pretty_print_review_and_label(4998)"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}