You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
237 lines
6.4 KiB
Plaintext
237 lines
6.4 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Sentiment Classification & How To \"Frame Problems\" for a Neural Network\n",
|
|
"\n",
|
|
"by Andrew Trask\n",
|
|
"\n",
|
|
"- **Twitter**: @iamtrask\n",
|
|
"- **Blog**: http://iamtrask.github.io"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### What You Should Already Know\n",
|
|
"\n",
|
|
"- neural networks, forward and back-propagation\n",
|
|
"- stochastic gradient descent\n",
|
|
"- mean squared error\n",
|
|
"- and train/test splits\n",
|
|
"\n",
|
|
"### Where to Get Help if You Need it\n",
|
|
"- Re-watch previous Udacity Lectures\n",
|
|
"- Leverage the recommended Course Reading Material - [Grokking Deep Learning](https://www.manning.com/books/grokking-deep-learning) (40% Off: **traskud17**)\n",
|
|
"- Shoot me a tweet @iamtrask\n",
|
|
"\n",
|
|
"\n",
|
|
"### Tutorial Outline:\n",
|
|
"\n",
|
|
"- Intro: The Importance of \"Framing a Problem\"\n",
|
|
"\n",
|
|
"\n",
|
|
"- Curate a Dataset\n",
|
|
"- Developing a \"Predictive Theory\"\n",
|
|
"- **PROJECT 1**: Quick Theory Validation\n",
|
|
"\n",
|
|
"\n",
|
|
"- Transforming Text to Numbers\n",
|
|
"- **PROJECT 2**: Creating the Input/Output Data\n",
|
|
"\n",
|
|
"\n",
|
|
"- Putting it all together in a Neural Network\n",
|
|
"- **PROJECT 3**: Building our Neural Network\n",
|
|
"\n",
|
|
"\n",
|
|
"- Understanding Neural Noise\n",
|
|
"- **PROJECT 4**: Making Learning Faster by Reducing Noise\n",
|
|
"\n",
|
|
"\n",
|
|
"- Analyzing Inefficiencies in our Network\n",
|
|
"- **PROJECT 5**: Making our Network Train and Run Faster\n",
|
|
"\n",
|
|
"\n",
|
|
"- Further Noise Reduction\n",
|
|
"- **PROJECT 6**: Reducing Noise by Strategically Reducing the Vocabulary\n",
|
|
"\n",
|
|
"\n",
|
|
"- Analysis: What's going on in the weights?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"nbpresent": {
|
|
"id": "56bb3cba-260c-4ebe-9ed6-b995b4c72aa3"
|
|
}
|
|
},
|
|
"source": [
|
|
"# Lesson: Curate a Dataset"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"nbpresent": {
|
|
"id": "eba2b193-0419-431e-8db9-60f34dd3fe83"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def pretty_print_review_and_label(i):\n",
|
|
" print(labels[i] + \"\\t:\\t\" + reviews[i][:80] + \"...\")\n",
|
|
"\n",
|
|
"g = open('reviews.txt','r') # What we know!\n",
|
|
"reviews = list(map(lambda x:x[:-1],g.readlines()))\n",
|
|
"g.close()\n",
|
|
"\n",
|
|
"g = open('labels.txt','r') # What we WANT to know!\n",
|
|
"labels = list(map(lambda x:x[:-1].upper(),g.readlines()))\n",
|
|
"g.close()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"25000"
|
|
]
|
|
},
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"len(reviews)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"nbpresent": {
|
|
"id": "bb95574b-21a0-4213-ae50-34363cf4f87f"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life such as teachers . my years in the teaching profession lead me to believe that bromwell high s satire is much closer to reality than is teachers . the scramble to survive financially the insightful students who can see right through their pathetic teachers pomp the pettiness of the whole situation all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn t '"
|
|
]
|
|
},
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"reviews[0]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"nbpresent": {
|
|
"id": "e0408810-c424-4ed4-afb9-1735e9ddbd0a"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"'POSITIVE'"
|
|
]
|
|
},
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"labels[0]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Lesson: Develop a Predictive Theory"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"nbpresent": {
|
|
"id": "e67a709f-234f-4493-bae6-4fb192141ee0"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"labels.txt \t : \t reviews.txt\n",
|
|
"\n",
|
|
"NEGATIVE\t:\tthis movie is terrible but it has some good effects . ...\n",
|
|
"POSITIVE\t:\tadrian pasdar is excellent is this film . he makes a fascinating woman . ...\n",
|
|
"NEGATIVE\t:\tcomment this movie is impossible . is terrible very improbable bad interpretat...\n",
|
|
"POSITIVE\t:\texcellent episode movie ala pulp fiction . days suicides . it doesnt get more...\n",
|
|
"NEGATIVE\t:\tif you haven t seen this it s terrible . it is pure trash . i saw this about ...\n",
|
|
"POSITIVE\t:\tthis schiffer guy is a real genius the movie is of excellent quality and both e...\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"print(\"labels.txt \\t : \\t reviews.txt\\n\")\n",
|
|
"pretty_print_review_and_label(2137)\n",
|
|
"pretty_print_review_and_label(12816)\n",
|
|
"pretty_print_review_and_label(6267)\n",
|
|
"pretty_print_review_and_label(21934)\n",
|
|
"pretty_print_review_and_label(5297)\n",
|
|
"pretty_print_review_and_label(4998)"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"anaconda-cloud": {},
|
|
"kernelspec": {
|
|
"display_name": "Python [default]",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.5.2"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 1
|
|
}
|