master
spike 7 years ago
commit 42975de5e8

8
.gitattributes vendored

@ -0,0 +1,8 @@
**/data_batch_* filter=lfs diff=lfs merge=lfs -text
**/preprocess*.p filter=lfs diff=lfs merge=lfs -text
**/*.zip filter=lfs diff=lfs merge=lfs -text
**/*.pickle filter=lfs diff=lfs merge=lfs -text
**/*.gz filter=lfs diff=lfs merge=lfs -text
**/*.txt filter=lfs diff=lfs merge=lfs -text
**/*data*-of* filter=lfs diff=lfs merge=lfs -text
**/logs filter=lfs diff=lfs merge=lfs -text

@ -0,0 +1,600 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"# Skip-gram word2vec\n",
"\n",
"In this notebook, I'll lead you through using TensorFlow to implement the word2vec algorithm using the skip-gram architecture. By implementing this, you'll learn about embedding words for use in natural language processing. This will come in handy when dealing with things like translations.\n",
"\n",
"## Readings\n",
"\n",
"Here are the resources I used to build this notebook. I suggest reading these either beforehand or while you're working on this material.\n",
"\n",
"* A really good [conceptual overview](http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/) of word2vec from Chris McCormick \n",
"* [First word2vec paper](https://arxiv.org/pdf/1301.3781.pdf) from Mikolov et al.\n",
"* [NIPS paper](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) with improvements for word2vec also from Mikolov et al.\n",
"* An [implementation of word2vec](http://www.thushv.com/natural_language_processing/word2vec-part-1-nlp-with-deep-learning-with-tensorflow-skip-gram/) from Thushan Ganegedara\n",
"* TensorFlow [word2vec tutorial](https://www.tensorflow.org/tutorials/word2vec)\n",
"\n",
"## Word embeddings\n",
"\n",
"When you're dealing with language and words, you end up with tens of thousands of classes to predict, one for each word. Trying to one-hot encode these words is massively inefficient, you'll have one element set to 1 and the other 50,000 set to 0. The word2vec algorithm finds much more efficient representations by finding vectors that represent the words. These vectors also contain semantic information about the words. Words that show up in similar contexts, such as \"black\", \"white\", and \"red\" will have vectors near each other. There are two architectures for implementing word2vec, CBOW (Continuous Bag-Of-Words) and Skip-gram.\n",
"\n",
"<img src=\"assets/word2vec_architectures.png\" width=\"500\">\n",
"\n",
"In this implementation, we'll be using the skip-gram architecture because it performs better than CBOW. Here, we pass in a word and try to predict the words surrounding it in the text. In this way, we can train the network to learn representations for words that show up in similar contexts.\n",
"\n",
"First up, importing packages."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"import time\n",
"\n",
"import numpy as np\n",
"import tensorflow as tf\n",
"\n",
"import utils"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load the [text8 dataset](http://mattmahoney.net/dc/textdata.html), a file of cleaned up Wikipedia articles from Matt Mahoney. The next cell will download the data set to the `data` folder. Then you can extract it and delete the archive file to save storage space."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from urllib.request import urlretrieve\n",
"from os.path import isfile, isdir\n",
"from tqdm import tqdm\n",
"import zipfile\n",
"\n",
"dataset_folder_path = 'data'\n",
"dataset_filename = 'text8.zip'\n",
"dataset_name = 'Text8 Dataset'\n",
"\n",
"class DLProgress(tqdm):\n",
" last_block = 0\n",
"\n",
" def hook(self, block_num=1, block_size=1, total_size=None):\n",
" self.total = total_size\n",
" self.update((block_num - self.last_block) * block_size)\n",
" self.last_block = block_num\n",
"\n",
"if not isfile(dataset_filename):\n",
" with DLProgress(unit='B', unit_scale=True, miniters=1, desc=dataset_name) as pbar:\n",
" urlretrieve(\n",
" 'http://mattmahoney.net/dc/text8.zip',\n",
" dataset_filename,\n",
" pbar.hook)\n",
"\n",
"if not isdir(dataset_folder_path):\n",
" with zipfile.ZipFile(dataset_filename) as zip_ref:\n",
" zip_ref.extractall(dataset_folder_path)\n",
" \n",
"with open('data/text8') as f:\n",
" text = f.read()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preprocessing\n",
"\n",
"Here I'm fixing up the text to make training easier. This comes from the `utils` module I wrote. The `preprocess` function coverts any punctuation into tokens, so a period is changed to ` <PERIOD> `. In this data set, there aren't any periods, but it will help in other NLP problems. I'm also removing all words that show up five or fewer times in the dataset. This will greatly reduce issues due to noise in the data and improve the quality of the vector representations. If you want to write your own functions for this stuff, go for it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"words = utils.preprocess(text)\n",
"print(words[:30])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(\"Total words: {}\".format(len(words)))\n",
"print(\"Unique words: {}\".format(len(set(words))))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And here I'm creating dictionaries to covert words to integers and backwards, integers to words. The integers are assigned in descending frequency order, so the most frequent word (\"the\") is given the integer 0 and the next most frequent is 1 and so on. The words are converted to integers and stored in the list `int_words`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"vocab_to_int, int_to_vocab = utils.create_lookup_tables(words)\n",
"int_words = [vocab_to_int[word] for word in words]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Subsampling\n",
"\n",
"Words that show up often such as \"the\", \"of\", and \"for\" don't provide much context to the nearby words. If we discard some of them, we can remove some of the noise from our data and in return get faster training and better representations. This process is called subsampling by Mikolov. For each word $w_i$ in the training set, we'll discard it with probability given by \n",
"\n",
"$$ P(w_i) = 1 - \\sqrt{\\frac{t}{f(w_i)}} $$\n",
"\n",
"where $t$ is a threshold parameter and $f(w_i)$ is the frequency of word $w_i$ in the total dataset.\n",
"\n",
"I'm going to leave this up to you as an exercise. This is more of a programming challenge, than about deep learning specifically. But, being able to prepare your data for your network is an important skill to have. Check out my solution to see how I did it.\n",
"\n",
"> **Exercise:** Implement subsampling for the words in `int_words`. That is, go through `int_words` and discard each word given the probablility $P(w_i)$ shown above. Note that $P(w_i)$ is the probability that a word is discarded. Assign the subsampled data to `train_words`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"## Your code here\n",
"train_words = # The final subsampled word list"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Making batches"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"Now that our data is in good shape, we need to get it into the proper form to pass it into our network. With the skip-gram architecture, for each word in the text, we want to grab all the words in a window around that word, with size $C$. \n",
"\n",
"From [Mikolov et al.](https://arxiv.org/pdf/1301.3781.pdf): \n",
"\n",
"\"Since the more distant words are usually less related to the current word than those close to it, we give less weight to the distant words by sampling less from those words in our training examples... If we choose $C = 5$, for each training word we will select randomly a number $R$ in range $< 1; C >$, and then use $R$ words from history and $R$ words from the future of the current word as correct labels.\"\n",
"\n",
"> **Exercise:** Implement a function `get_target` that receives a list of words, an index, and a window size, then returns a list of words in the window around the index. Make sure to use the algorithm described above, where you choose a random number of words from the window."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"def get_target(words, idx, window_size=5):\n",
" ''' Get a list of words in a window around an index. '''\n",
" \n",
" # Your code here\n",
" \n",
" return"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's a function that returns batches for our network. The idea is that it grabs `batch_size` words from a words list. Then for each of those words, it gets the target words in the window. I haven't found a way to pass in a random number of target words and get it to work with the architecture, so I make one row per input-target pair. This is a generator function by the way, helps save memory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"def get_batches(words, batch_size, window_size=5):\n",
" ''' Create a generator of word batches as a tuple (inputs, targets) '''\n",
" \n",
" n_batches = len(words)//batch_size\n",
" \n",
" # only full batches\n",
" words = words[:n_batches*batch_size]\n",
" \n",
" for idx in range(0, len(words), batch_size):\n",
" x, y = [], []\n",
" batch = words[idx:idx+batch_size]\n",
" for ii in range(len(batch)):\n",
" batch_x = batch[ii]\n",
" batch_y = get_target(batch, ii, window_size)\n",
" y.extend(batch_y)\n",
" x.extend([batch_x]*len(batch_y))\n",
" yield x, y\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## Building the graph\n",
"\n",
"From Chris McCormick's blog, we can see the general structure of our network.\n",
"![embedding_network](./assets/skip_gram_net_arch.png)\n",
"\n",
"The input words are passed in as one-hot encoded vectors. This will go into a hidden layer of linear units, then into a softmax layer. We'll use the softmax layer to make a prediction like normal.\n",
"\n",
"The idea here is to train the hidden layer weight matrix to find efficient representations for our words. This weight matrix is usually called the embedding matrix or embedding look-up table. We can discard the softmax layer becuase we don't really care about making predictions with this network. We just want the embedding matrix so we can use it in other networks we build from the dataset.\n",
"\n",
"I'm going to have you build the graph in stages now. First off, creating the `inputs` and `labels` placeholders like normal.\n",
"\n",
"> **Exercise:** Assign `inputs` and `labels` using `tf.placeholder`. We're going to be passing in integers, so set the data types to `tf.int32`. The batches we're passing in will have varying sizes, so set the batch sizes to [`None`]. To make things work later, you'll need to set the second dimension of `labels` to `None` or `1`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"train_graph = tf.Graph()\n",
"with train_graph.as_default():\n",
" inputs = \n",
" labels = "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Embedding\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"The embedding matrix has a size of the number of words by the number of neurons in the hidden layer. So, if you have 10,000 words and 300 hidden units, the matrix will have size $10,000 \\times 300$. Remember that we're using one-hot encoded vectors for our inputs. When you do the matrix multiplication of the one-hot vector with the embedding matrix, you end up selecting only one row out of the entire matrix:\n",
"\n",
"![one-hot matrix multiplication](assets/matrix_mult_w_one_hot.png)\n",
"\n",
"You don't actually need to do the matrix multiplication, you just need to select the row in the embedding matrix that corresponds to the input word. Then, the embedding matrix becomes a lookup table, you're looking up a vector the size of the hidden layer that represents the input word.\n",
"\n",
"<img src=\"assets/word2vec_weight_matrix_lookup_table.png\" width=500>\n",
"\n",
"\n",
"> **Exercise:** Tensorflow provides a convenient function [`tf.nn.embedding_lookup`](https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup) that does this lookup for us. You pass in the embedding matrix and a tensor of integers, then it returns rows in the matrix corresponding to those integers. Below, set the number of embedding features you'll use (200 is a good start), create the embedding matrix variable, and use [`tf.nn.embedding_lookup`](https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup) to get the embedding tensors. For the embedding matrix, I suggest you initialize it with a uniform random numbers between -1 and 1 using [tf.random_uniform](https://www.tensorflow.org/api_docs/python/tf/random_uniform). This [TensorFlow tutorial](https://www.tensorflow.org/tutorials/word2vec) will help if you get stuck."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"n_vocab = len(int_to_vocab)\n",
"n_embedding = # Number of embedding features \n",
"with train_graph.as_default():\n",
" embedding = # create embedding weight matrix here\n",
" embed = # use tf.nn.embedding_lookup to get the hidden layer output"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Negative sampling\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For every example we give the network, we train it using the output from the softmax layer. That means for each input, we're making very small changes to millions of weights even though we only have one true example. This makes training the network very inefficient. We can approximate the loss from the softmax layer by only updating a small subset of all the weights at once. We'll update the weights for the correct label, but only a small number of incorrect labels. This is called [\"negative sampling\"](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf). Tensorflow has a convenient function to do this, [`tf.nn.sampled_softmax_loss`](https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss).\n",
"\n",
"> **Exercise:** Below, create weights and biases for the softmax layer. Then, use [`tf.nn.sampled_softmax_loss`](https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss) to calculate the loss. Be sure to read the documentation to figure out how it works."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"# Number of negative labels to sample\n",
"n_sampled = 100\n",
"with train_graph.as_default():\n",
" softmax_w = # create softmax weight matrix here\n",
" softmax_b = # create softmax biases here\n",
" \n",
" # Calculate the loss using negative sampling\n",
" loss = tf.nn.sampled_softmax_loss \n",
" \n",
" cost = tf.reduce_mean(loss)\n",
" optimizer = tf.train.AdamOptimizer().minimize(cost)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Validation\n",
"\n",
"This code is from Thushan Ganegedara's implementation. Here we're going to choose a few common words and few uncommon words. Then, we'll print out the closest words to them. It's a nice way to check that our embedding table is grouping together words with similar semantic meanings."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"with train_graph.as_default():\n",
" ## From Thushan Ganegedara's implementation\n",
" valid_size = 16 # Random set of words to evaluate similarity on.\n",
" valid_window = 100\n",
" # pick 8 samples from (0,100) and (1000,1100) each ranges. lower id implies more frequent \n",
" valid_examples = np.array(random.sample(range(valid_window), valid_size//2))\n",
" valid_examples = np.append(valid_examples, \n",
" random.sample(range(1000,1000+valid_window), valid_size//2))\n",
"\n",
" valid_dataset = tf.constant(valid_examples, dtype=tf.int32)\n",
" \n",
" # We use the cosine distance:\n",
" norm = tf.sqrt(tf.reduce_sum(tf.square(embedding), 1, keep_dims=True))\n",
" normalized_embedding = embedding / norm\n",
" valid_embedding = tf.nn.embedding_lookup(normalized_embedding, valid_dataset)\n",
" similarity = tf.matmul(valid_embedding, tf.transpose(normalized_embedding))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"# If the checkpoints directory doesn't exist:\n",
"!mkdir checkpoints"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training\n",
"\n",
"Below is the code to train the network. Every 100 batches it reports the training loss. Every 1000 batches, it'll print out the validation words."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"epochs = 10\n",
"batch_size = 1000\n",
"window_size = 10\n",
"\n",
"with train_graph.as_default():\n",
" saver = tf.train.Saver()\n",
"\n",
"with tf.Session(graph=train_graph) as sess:\n",
" iteration = 1\n",
" loss = 0\n",
" sess.run(tf.global_variables_initializer())\n",
"\n",
" for e in range(1, epochs+1):\n",
" batches = get_batches(train_words, batch_size, window_size)\n",
" start = time.time()\n",
" for x, y in batches:\n",
" \n",
" feed = {inputs: x,\n",
" labels: np.array(y)[:, None]}\n",
" train_loss, _ = sess.run([cost, optimizer], feed_dict=feed)\n",
" \n",
" loss += train_loss\n",
" \n",
" if iteration % 100 == 0: \n",
" end = time.time()\n",
" print(\"Epoch {}/{}\".format(e, epochs),\n",
" \"Iteration: {}\".format(iteration),\n",
" \"Avg. Training loss: {:.4f}\".format(loss/100),\n",
" \"{:.4f} sec/batch\".format((end-start)/100))\n",
" loss = 0\n",
" start = time.time()\n",
" \n",
" if iteration % 1000 == 0:\n",
" ## From Thushan Ganegedara's implementation\n",
" # note that this is expensive (~20% slowdown if computed every 500 steps)\n",
" sim = similarity.eval()\n",
" for i in range(valid_size):\n",
" valid_word = int_to_vocab[valid_examples[i]]\n",
" top_k = 8 # number of nearest neighbors\n",
" nearest = (-sim[i, :]).argsort()[1:top_k+1]\n",
" log = 'Nearest to %s:' % valid_word\n",
" for k in range(top_k):\n",
" close_word = int_to_vocab[nearest[k]]\n",
" log = '%s %s,' % (log, close_word)\n",
" print(log)\n",
" \n",
" iteration += 1\n",
" save_path = saver.save(sess, \"checkpoints/text8.ckpt\")\n",
" embed_mat = sess.run(normalized_embedding)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Restore the trained network if you need to:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"with train_graph.as_default():\n",
" saver = tf.train.Saver()\n",
"\n",
"with tf.Session(graph=train_graph) as sess:\n",
" saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))\n",
" embed_mat = sess.run(embedding)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualizing the word vectors\n",
"\n",
"Below we'll use T-SNE to visualize how our high-dimensional word vectors cluster together. T-SNE is used to project these vectors into two dimensions while preserving local stucture. Check out [this post from Christopher Olah](http://colah.github.io/posts/2014-10-Visualizing-MNIST/) to learn more about T-SNE and other ways to visualize high-dimensional data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"%config InlineBackend.figure_format = 'retina'\n",
"\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.manifold import TSNE"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"viz_words = 500\n",
"tsne = TSNE()\n",
"embed_tsne = tsne.fit_transform(embed_mat[:viz_words, :])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"fig, ax = plt.subplots(figsize=(14, 14))\n",
"for idx in range(viz_words):\n",
" plt.scatter(*embed_tsne[idx, :], color='steelblue')\n",
" plt.annotate(int_to_vocab[idx], (embed_tsne[idx, 0], embed_tsne[idx, 1]), alpha=0.7)"
]
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -0,0 +1,599 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"# Skip-gram word2vec\n",
"\n",
"In this notebook, I'll lead you through using TensorFlow to implement the word2vec algorithm using the skip-gram architecture. By implementing this, you'll learn about embedding words for use in natural language processing. This will come in handy when dealing with things like translations.\n",
"\n",
"## Readings\n",
"\n",
"Here are the resources I used to build this notebook. I suggest reading these either beforehand or while you're working on this material.\n",
"\n",
"* A really good [conceptual overview](http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/) of word2vec from Chris McCormick \n",
"* [First word2vec paper](https://arxiv.org/pdf/1301.3781.pdf) from Mikolov et al.\n",
"* [NIPS paper](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) with improvements for word2vec also from Mikolov et al.\n",
"* An [implementation of word2vec](http://www.thushv.com/natural_language_processing/word2vec-part-1-nlp-with-deep-learning-with-tensorflow-skip-gram/) from Thushan Ganegedara\n",
"* TensorFlow [word2vec tutorial](https://www.tensorflow.org/tutorials/word2vec)\n",
"\n",
"## Word embeddings\n",
"\n",
"When you're dealing with language and words, you end up with tens of thousands of classes to predict, one for each word. Trying to one-hot encode these words is massively inefficient, you'll have one element set to 1 and the other 50,000 set to 0. The word2vec algorithm finds much more efficient representations by finding vectors that represent the words. These vectors also contain semantic information about the words. Words that show up in similar contexts, such as \"black\", \"white\", and \"red\" will have vectors near each other. There are two architectures for implementing word2vec, CBOW (Continuous Bag-Of-Words) and Skip-gram.\n",
"\n",
"<img src=\"assets/word2vec_architectures.png\" width=\"500\">\n",
"\n",
"In this implementation, we'll be using the skip-gram architecture because it performs better than CBOW. Here, we pass in a word and try to predict the words surrounding it in the text. In this way, we can train the network to learn representations for words that show up in similar contexts.\n",
"\n",
"First up, importing packages."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"import time\n",
"\n",
"import numpy as np\n",
"import tensorflow as tf\n",
"\n",
"import utils"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load the [text8 dataset](http://mattmahoney.net/dc/textdata.html), a file of cleaned up Wikipedia articles from Matt Mahoney. The next cell will download the data set to the `data` folder. Then you can extract it and delete the archive file to save storage space."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from urllib.request import urlretrieve\n",
"from os.path import isfile, isdir\n",
"from tqdm import tqdm\n",
"import zipfile\n",
"\n",
"dataset_folder_path = 'data'\n",
"dataset_filename = 'text8.zip'\n",
"dataset_name = 'Text8 Dataset'\n",
"\n",
"class DLProgress(tqdm):\n",
" last_block = 0\n",
"\n",
" def hook(self, block_num=1, block_size=1, total_size=None):\n",
" self.total = total_size\n",
" self.update((block_num - self.last_block) * block_size)\n",
" self.last_block = block_num\n",
"\n",
"if not isfile(dataset_filename):\n",
" with DLProgress(unit='B', unit_scale=True, miniters=1, desc=dataset_name) as pbar:\n",
" urlretrieve(\n",
" 'http://mattmahoney.net/dc/text8.zip',\n",
" dataset_filename,\n",
" pbar.hook)\n",
"\n",
"if not isdir(dataset_folder_path):\n",
" with zipfile.ZipFile(dataset_filename) as zip_ref:\n",
" zip_ref.extractall(dataset_folder_path)\n",
" \n",
"with open('data/text8') as f:\n",
" text = f.read()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preprocessing\n",
"\n",
"Here I'm fixing up the text to make training easier. This comes from the `utils` module I wrote. The `preprocess` function coverts any punctuation into tokens, so a period is changed to ` <PERIOD> `. In this data set, there aren't any periods, but it will help in other NLP problems. I'm also removing all words that show up five or fewer times in the dataset. This will greatly reduce issues due to noise in the data and improve the quality of the vector representations. If you want to write your own functions for this stuff, go for it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"words = utils.preprocess(text)\n",
"print(words[:30])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(\"Total words: {}\".format(len(words)))\n",
"print(\"Unique words: {}\".format(len(set(words))))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And here I'm creating dictionaries to covert words to integers and backwards, integers to words. The integers are assigned in descending frequency order, so the most frequent word (\"the\") is given the integer 0 and the next most frequent is 1 and so on. The words are converted to integers and stored in the list `int_words`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"vocab_to_int, int_to_vocab = utils.create_lookup_tables(words)\n",
"int_words = [vocab_to_int[word] for word in words]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Subsampling\n",
"\n",
"Words that show up often such as \"the\", \"of\", and \"for\" don't provide much context to the nearby words. If we discard some of them, we can remove some of the noise from our data and in return get faster training and better representations. This process is called subsampling by Mikolov. For each word $w_i$ in the training set, we'll discard it with probability given by \n",
"\n",
"$$ P(w_i) = 1 - \\sqrt{\\frac{t}{f(w_i)}} $$\n",
"\n",
"where $t$ is a threshold parameter and $f(w_i)$ is the frequency of word $w_i$ in the total dataset.\n",
"\n",
"I'm going to leave this up to you as an exercise. This is more of a programming challenge, than about deep learning specifically. But, being able to prepare your data for your network is an important skill to have. Check out my solution to see how I did it.\n",
"\n",
"> **Exercise:** Implement subsampling for the words in `int_words`. That is, go through `int_words` and discard each word given the probablility $P(w_i)$ shown above. Note that $P(w_i)$ is the probability that a word is discarded. Assign the subsampled data to `train_words`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"## Your code here\n",
"train_words = # The final subsampled word list"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Making batches"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"Now that our data is in good shape, we need to get it into the proper form to pass it into our network. With the skip-gram architecture, for each word in the text, we want to grab all the words in a window around that word, with size $C$. \n",
"\n",
"From [Mikolov et al.](https://arxiv.org/pdf/1301.3781.pdf): \n",
"\n",
"\"Since the more distant words are usually less related to the current word than those close to it, we give less weight to the distant words by sampling less from those words in our training examples... If we choose $C = 5$, for each training word we will select randomly a number $R$ in range $< 1; C >$, and then use $R$ words from history and $R$ words from the future of the current word as correct labels.\"\n",
"\n",
"> **Exercise:** Implement a function `get_target` that receives a list of words, an index, and a window size, then returns a list of words in the window around the index. Make sure to use the algorithm described above, where you choose a random number of words from the window."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"def get_target(words, idx, window_size=5):\n",
" ''' Get a list of words in a window around an index. '''\n",
" \n",
" # Your code here\n",
" \n",
" return"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's a function that returns batches for our network. The idea is that it grabs `batch_size` words from a words list. Then for each of those words, it gets the target words in the window. I haven't found a way to pass in a random number of target words and get it to work with the architecture, so I make one row per input-target pair. This is a generator function by the way, helps save memory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"def get_batches(words, batch_size, window_size=5):\n",
" ''' Create a generator of word batches as a tuple (inputs, targets) '''\n",
" \n",
" n_batches = len(words)//batch_size\n",
" \n",
" # only full batches\n",
" words = words[:n_batches*batch_size]\n",
" \n",
" for idx in range(0, len(words), batch_size):\n",
" x, y = [], []\n",
" batch = words[idx:idx+batch_size]\n",
" for ii in range(len(batch)):\n",
" batch_x = batch[ii]\n",
" batch_y = get_target(batch, ii, window_size)\n",
" y.extend(batch_y)\n",
" x.extend([batch_x]*len(batch_y))\n",
" yield x, y\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## Building the graph\n",
"\n",
"From Chris McCormick's blog, we can see the general structure of our network.\n",
"![embedding_network](./assets/skip_gram_net_arch.png)\n",
"\n",
"The input words are passed in as one-hot encoded vectors. This will go into a hidden layer of linear units, then into a softmax layer. We'll use the softmax layer to make a prediction like normal.\n",
"\n",
"The idea here is to train the hidden layer weight matrix to find efficient representations for our words. This weight matrix is usually called the embedding matrix or embedding look-up table. We can discard the softmax layer becuase we don't really care about making predictions with this network. We just want the embedding matrix so we can use it in other networks we build from the dataset.\n",
"\n",
"I'm going to have you build the graph in stages now. First off, creating the `inputs` and `labels` placeholders like normal.\n",
"\n",
"> **Exercise:** Assign `inputs` and `labels` using `tf.placeholder`. We're going to be passing in integers, so set the data types to `tf.int32`. The batches we're passing in will have varying sizes, so set the batch sizes to [`None`]. To make things work later, you'll need to set the second dimension of `labels` to `None` or `1`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"train_graph = tf.Graph()\n",
"with train_graph.as_default():\n",
" inputs = \n",
" labels = "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Embedding\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"The embedding matrix has a size of the number of words by the number of neurons in the hidden layer. So, if you have 10,000 words and 300 hidden units, the matrix will have size $10,000 \\times 300$. Remember that we're using one-hot encoded vectors for our inputs. When you do the matrix multiplication of the one-hot vector with the embedding matrix, you end up selecting only one row out of the entire matrix:\n",
"\n",
"![one-hot matrix multiplication](assets/matrix_mult_w_one_hot.png)\n",
"\n",
"You don't actually need to do the matrix multiplication, you just need to select the row in the embedding matrix that corresponds to the input word. Then, the embedding matrix becomes a lookup table, you're looking up a vector the size of the hidden layer that represents the input word.\n",
"\n",
"<img src=\"assets/word2vec_weight_matrix_lookup_table.png\" width=500>\n",
"\n",
"\n",
"> **Exercise:** Tensorflow provides a convenient function [`tf.nn.embedding_lookup`](https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup) that does this lookup for us. You pass in the embedding matrix and a tensor of integers, then it returns rows in the matrix corresponding to those integers. Below, set the number of embedding features you'll use (200 is a good start), create the embedding matrix variable, and use [`tf.nn.embedding_lookup`](https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup) to get the embedding tensors. For the embedding matrix, I suggest you initialize it with a uniform random numbers between -1 and 1 using [tf.random_uniform](https://www.tensorflow.org/api_docs/python/tf/random_uniform). This [TensorFlow tutorial](https://www.tensorflow.org/tutorials/word2vec) will help if you get stuck."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"n_vocab = len(int_to_vocab)\n",
"n_embedding = # Number of embedding features \n",
"with train_graph.as_default():\n",
" embedding = # create embedding weight matrix here\n",
" embed = # use tf.nn.embedding_lookup to get the hidden layer output"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Negative sampling\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For every example we give the network, we train it using the output from the softmax layer. That means for each input, we're making very small changes to millions of weights even though we only have one true example. This makes training the network very inefficient. We can approximate the loss from the softmax layer by only updating a small subset of all the weights at once. We'll update the weights for the correct label, but only a small number of incorrect labels. This is called [\"negative sampling\"](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf). Tensorflow has a convenient function to do this, [`tf.nn.sampled_softmax_loss`](https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss).\n",
"\n",
"> **Exercise:** Below, create weights and biases for the softmax layer. Then, use [`tf.nn.sampled_softmax_loss`](https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss) to calculate the loss. Be sure to read the documentation to figure out how it works."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"# Number of negative labels to sample\n",
"n_sampled = 100\n",
"with train_graph.as_default():\n",
" softmax_w = # create softmax weight matrix here\n",
" softmax_b = # create softmax biases here\n",
" \n",
" # Calculate the loss using negative sampling\n",
" loss = tf.nn.sampled_softmax_loss \n",
" \n",
" cost = tf.reduce_mean(loss)\n",
" optimizer = tf.train.AdamOptimizer().minimize(cost)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Validation\n",
"\n",
"This code is from Thushan Ganegedara's implementation. Here we're going to choose a few common words and few uncommon words. Then, we'll print out the closest words to them. It's a nice way to check that our embedding table is grouping together words with similar semantic meanings."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"with train_graph.as_default():\n",
" ## From Thushan Ganegedara's implementation\n",
" valid_size = 16 # Random set of words to evaluate similarity on.\n",
" valid_window = 100\n",
" # pick 8 samples from (0,100) and (1000,1100) each ranges. lower id implies more frequent \n",
" valid_examples = np.array(random.sample(range(valid_window), valid_size//2))\n",
" valid_examples = np.append(valid_examples, \n",
" random.sample(range(1000,1000+valid_window), valid_size//2))\n",
"\n",
" valid_dataset = tf.constant(valid_examples, dtype=tf.int32)\n",
" \n",
" # We use the cosine distance:\n",
" norm = tf.sqrt(tf.reduce_sum(tf.square(embedding), 1, keep_dims=True))\n",
" normalized_embedding = embedding / norm\n",
" valid_embedding = tf.nn.embedding_lookup(normalized_embedding, valid_dataset)\n",
" similarity = tf.matmul(valid_embedding, tf.transpose(normalized_embedding))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"# If the checkpoints directory doesn't exist:\n",
"!mkdir checkpoints"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training\n",
"\n",
"Below is the code to train the network. Every 100 batches it reports the training loss. Every 1000 batches, it'll print out the validation words."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"epochs = 10\n",
"batch_size = 1000\n",
"window_size = 10\n",
"\n",
"with train_graph.as_default():\n",
" saver = tf.train.Saver()\n",
"\n",
"with tf.Session(graph=train_graph) as sess:\n",
" iteration = 1\n",
" loss = 0\n",
" sess.run(tf.global_variables_initializer())\n",
"\n",
" for e in range(1, epochs+1):\n",
" batches = get_batches(train_words, batch_size, window_size)\n",
" start = time.time()\n",
" for x, y in batches:\n",
" \n",
" feed = {inputs: x,\n",
" labels: np.array(y)[:, None]}\n",
" train_loss, _ = sess.run([cost, optimizer], feed_dict=feed)\n",
" \n",
" loss += train_loss\n",
" \n",
" if iteration % 100 == 0: \n",
" end = time.time()\n",
" print(\"Epoch {}/{}\".format(e, epochs),\n",
" \"Iteration: {}\".format(iteration),\n",
" \"Avg. Training loss: {:.4f}\".format(loss/100),\n",
" \"{:.4f} sec/batch\".format((end-start)/100))\n",
" loss = 0\n",
" start = time.time()\n",
" \n",
" if iteration % 1000 == 0:\n",
" ## From Thushan Ganegedara's implementation\n",
" # note that this is expensive (~20% slowdown if computed every 500 steps)\n",
" sim = similarity.eval()\n",
" for i in range(valid_size):\n",
" valid_word = int_to_vocab[valid_examples[i]]\n",
" top_k = 8 # number of nearest neighbors\n",
" nearest = (-sim[i, :]).argsort()[1:top_k+1]\n",
" log = 'Nearest to %s:' % valid_word\n",
" for k in range(top_k):\n",
" close_word = int_to_vocab[nearest[k]]\n",
" log = '%s %s,' % (log, close_word)\n",
" print(log)\n",
" \n",
" iteration += 1\n",
" save_path = saver.save(sess, \"checkpoints/text8.ckpt\")\n",
" embed_mat = sess.run(normalized_embedding)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Restore the trained network if you need to:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"with train_graph.as_default():\n",
" saver = tf.train.Saver()\n",
"\n",
"with tf.Session(graph=train_graph) as sess:\n",
" saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))\n",
" embed_mat = sess.run(embedding)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualizing the word vectors\n",
"\n",
"Below we'll use T-SNE to visualize how our high-dimensional word vectors cluster together. T-SNE is used to project these vectors into two dimensions while preserving local stucture. Check out [this post from Christopher Olah](http://colah.github.io/posts/2014-10-Visualizing-MNIST/) to learn more about T-SNE and other ways to visualize high-dimensional data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"%config InlineBackend.figure_format = 'retina'\n",
"\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.manifold import TSNE"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"viz_words = 500\n",
"tsne = TSNE()\n",
"embed_tsne = tsne.fit_transform(embed_mat[:viz_words, :])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"fig, ax = plt.subplots(figsize=(14, 14))\n",
"for idx in range(viz_words):\n",
" plt.scatter(*embed_tsne[idx, :], color='steelblue')\n",
" plt.annotate(int_to_vocab[idx], (embed_tsne[idx, 0], embed_tsne[idx, 1]), alpha=0.7)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 121 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

@ -0,0 +1,59 @@
import re
from collections import Counter
def preprocess(text):
# Replace punctuation with tokens so we can use them in our model
text = text.lower()
text = text.replace('.', ' <PERIOD> ')
text = text.replace(',', ' <COMMA> ')
text = text.replace('"', ' <QUOTATION_MARK> ')
text = text.replace(';', ' <SEMICOLON> ')
text = text.replace('!', ' <EXCLAMATION_MARK> ')
text = text.replace('?', ' <QUESTION_MARK> ')
text = text.replace('(', ' <LEFT_PAREN> ')
text = text.replace(')', ' <RIGHT_PAREN> ')
text = text.replace('--', ' <HYPHENS> ')
text = text.replace('?', ' <QUESTION_MARK> ')
# text = text.replace('\n', ' <NEW_LINE> ')
text = text.replace(':', ' <COLON> ')
words = text.split()
# Remove all words with 5 or fewer occurences
word_counts = Counter(words)
trimmed_words = [word for word in words if word_counts[word] > 5]
return trimmed_words
def get_batches(int_text, batch_size, seq_length):
"""
Return batches of input and target
:param int_text: Text with the words replaced by their ids
:param batch_size: The size of batch
:param seq_length: The length of sequence
:return: A list where each item is a tuple of (batch of input, batch of target).
"""
n_batches = int(len(int_text) / (batch_size * seq_length))
# Drop the last few characters to make only full batches
xdata = np.array(int_text[: n_batches * batch_size * seq_length])
ydata = np.array(int_text[1: n_batches * batch_size * seq_length + 1])
x_batches = np.split(xdata.reshape(batch_size, -1), n_batches, 1)
y_batches = np.split(ydata.reshape(batch_size, -1), n_batches, 1)
return list(zip(x_batches, y_batches))
def create_lookup_tables(words):
"""
Create lookup tables for vocabulary
:param words: Input list of words
:return: A tuple of dicts. The first dict....
"""
word_counts = Counter(words)
sorted_vocab = sorted(word_counts, key=word_counts.get, reverse=True)
int_to_vocab = {ii: word for ii, word in enumerate(sorted_vocab)}
vocab_to_int = {word: ii for ii, word in int_to_vocab.items()}
return vocab_to_int, int_to_vocab

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b92c8628622948bf0828c43a1b315d1517c3d7fd63654730d0dea379b8e88175
size 5607

File diff suppressed because one or more lines are too long

@ -0,0 +1,732 @@
instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600
6,2011-01-06,1,0,1,0,4,1,1,0.204348,0.233209,0.518261,0.0895652,88,1518,1606
7,2011-01-07,1,0,1,0,5,1,2,0.196522,0.208839,0.498696,0.168726,148,1362,1510
8,2011-01-08,1,0,1,0,6,0,2,0.165,0.162254,0.535833,0.266804,68,891,959
9,2011-01-09,1,0,1,0,0,0,1,0.138333,0.116175,0.434167,0.36195,54,768,822
10,2011-01-10,1,0,1,0,1,1,1,0.150833,0.150888,0.482917,0.223267,41,1280,1321
11,2011-01-11,1,0,1,0,2,1,2,0.169091,0.191464,0.686364,0.122132,43,1220,1263
12,2011-01-12,1,0,1,0,3,1,1,0.172727,0.160473,0.599545,0.304627,25,1137,1162
13,2011-01-13,1,0,1,0,4,1,1,0.165,0.150883,0.470417,0.301,38,1368,1406
14,2011-01-14,1,0,1,0,5,1,1,0.16087,0.188413,0.537826,0.126548,54,1367,1421
15,2011-01-15,1,0,1,0,6,0,2,0.233333,0.248112,0.49875,0.157963,222,1026,1248
16,2011-01-16,1,0,1,0,0,0,1,0.231667,0.234217,0.48375,0.188433,251,953,1204
17,2011-01-17,1,0,1,1,1,0,2,0.175833,0.176771,0.5375,0.194017,117,883,1000
18,2011-01-18,1,0,1,0,2,1,2,0.216667,0.232333,0.861667,0.146775,9,674,683
19,2011-01-19,1,0,1,0,3,1,2,0.292174,0.298422,0.741739,0.208317,78,1572,1650
20,2011-01-20,1,0,1,0,4,1,2,0.261667,0.25505,0.538333,0.195904,83,1844,1927
21,2011-01-21,1,0,1,0,5,1,1,0.1775,0.157833,0.457083,0.353242,75,1468,1543
22,2011-01-22,1,0,1,0,6,0,1,0.0591304,0.0790696,0.4,0.17197,93,888,981
23,2011-01-23,1,0,1,0,0,0,1,0.0965217,0.0988391,0.436522,0.2466,150,836,986
24,2011-01-24,1,0,1,0,1,1,1,0.0973913,0.11793,0.491739,0.15833,86,1330,1416
25,2011-01-25,1,0,1,0,2,1,2,0.223478,0.234526,0.616957,0.129796,186,1799,1985
26,2011-01-26,1,0,1,0,3,1,3,0.2175,0.2036,0.8625,0.29385,34,472,506
27,2011-01-27,1,0,1,0,4,1,1,0.195,0.2197,0.6875,0.113837,15,416,431
28,2011-01-28,1,0,1,0,5,1,2,0.203478,0.223317,0.793043,0.1233,38,1129,1167
29,2011-01-29,1,0,1,0,6,0,1,0.196522,0.212126,0.651739,0.145365,123,975,1098
30,2011-01-30,1,0,1,0,0,0,1,0.216522,0.250322,0.722174,0.0739826,140,956,1096
31,2011-01-31,1,0,1,0,1,1,2,0.180833,0.18625,0.60375,0.187192,42,1459,1501
32,2011-02-01,1,0,2,0,2,1,2,0.192174,0.23453,0.829565,0.053213,47,1313,1360
33,2011-02-02,1,0,2,0,3,1,2,0.26,0.254417,0.775417,0.264308,72,1454,1526
34,2011-02-03,1,0,2,0,4,1,1,0.186957,0.177878,0.437826,0.277752,61,1489,1550
35,2011-02-04,1,0,2,0,5,1,2,0.211304,0.228587,0.585217,0.127839,88,1620,1708
36,2011-02-05,1,0,2,0,6,0,2,0.233333,0.243058,0.929167,0.161079,100,905,1005
37,2011-02-06,1,0,2,0,0,0,1,0.285833,0.291671,0.568333,0.1418,354,1269,1623
38,2011-02-07,1,0,2,0,1,1,1,0.271667,0.303658,0.738333,0.0454083,120,1592,1712
39,2011-02-08,1,0,2,0,2,1,1,0.220833,0.198246,0.537917,0.36195,64,1466,1530
40,2011-02-09,1,0,2,0,3,1,2,0.134783,0.144283,0.494783,0.188839,53,1552,1605
41,2011-02-10,1,0,2,0,4,1,1,0.144348,0.149548,0.437391,0.221935,47,1491,1538
42,2011-02-11,1,0,2,0,5,1,1,0.189091,0.213509,0.506364,0.10855,149,1597,1746
43,2011-02-12,1,0,2,0,6,0,1,0.2225,0.232954,0.544167,0.203367,288,1184,1472
44,2011-02-13,1,0,2,0,0,0,1,0.316522,0.324113,0.457391,0.260883,397,1192,1589
45,2011-02-14,1,0,2,0,1,1,1,0.415,0.39835,0.375833,0.417908,208,1705,1913
46,2011-02-15,1,0,2,0,2,1,1,0.266087,0.254274,0.314348,0.291374,140,1675,1815
47,2011-02-16,1,0,2,0,3,1,1,0.318261,0.3162,0.423478,0.251791,218,1897,2115
48,2011-02-17,1,0,2,0,4,1,1,0.435833,0.428658,0.505,0.230104,259,2216,2475
49,2011-02-18,1,0,2,0,5,1,1,0.521667,0.511983,0.516667,0.264925,579,2348,2927
50,2011-02-19,1,0,2,0,6,0,1,0.399167,0.391404,0.187917,0.507463,532,1103,1635
51,2011-02-20,1,0,2,0,0,0,1,0.285217,0.27733,0.407826,0.223235,639,1173,1812
52,2011-02-21,1,0,2,1,1,0,2,0.303333,0.284075,0.605,0.307846,195,912,1107
53,2011-02-22,1,0,2,0,2,1,1,0.182222,0.186033,0.577778,0.195683,74,1376,1450
54,2011-02-23,1,0,2,0,3,1,1,0.221739,0.245717,0.423043,0.094113,139,1778,1917
55,2011-02-24,1,0,2,0,4,1,2,0.295652,0.289191,0.697391,0.250496,100,1707,1807
56,2011-02-25,1,0,2,0,5,1,2,0.364348,0.350461,0.712174,0.346539,120,1341,1461
57,2011-02-26,1,0,2,0,6,0,1,0.2825,0.282192,0.537917,0.186571,424,1545,1969
58,2011-02-27,1,0,2,0,0,0,1,0.343478,0.351109,0.68,0.125248,694,1708,2402
59,2011-02-28,1,0,2,0,1,1,2,0.407273,0.400118,0.876364,0.289686,81,1365,1446
60,2011-03-01,1,0,3,0,2,1,1,0.266667,0.263879,0.535,0.216425,137,1714,1851
61,2011-03-02,1,0,3,0,3,1,1,0.335,0.320071,0.449583,0.307833,231,1903,2134
62,2011-03-03,1,0,3,0,4,1,1,0.198333,0.200133,0.318333,0.225754,123,1562,1685
63,2011-03-04,1,0,3,0,5,1,2,0.261667,0.255679,0.610417,0.203346,214,1730,1944
64,2011-03-05,1,0,3,0,6,0,2,0.384167,0.378779,0.789167,0.251871,640,1437,2077
65,2011-03-06,1,0,3,0,0,0,2,0.376522,0.366252,0.948261,0.343287,114,491,605
66,2011-03-07,1,0,3,0,1,1,1,0.261739,0.238461,0.551304,0.341352,244,1628,1872
67,2011-03-08,1,0,3,0,2,1,1,0.2925,0.3024,0.420833,0.12065,316,1817,2133
68,2011-03-09,1,0,3,0,3,1,2,0.295833,0.286608,0.775417,0.22015,191,1700,1891
69,2011-03-10,1,0,3,0,4,1,3,0.389091,0.385668,0,0.261877,46,577,623
70,2011-03-11,1,0,3,0,5,1,2,0.316522,0.305,0.649565,0.23297,247,1730,1977
71,2011-03-12,1,0,3,0,6,0,1,0.329167,0.32575,0.594583,0.220775,724,1408,2132
72,2011-03-13,1,0,3,0,0,0,1,0.384348,0.380091,0.527391,0.270604,982,1435,2417
73,2011-03-14,1,0,3,0,1,1,1,0.325217,0.332,0.496957,0.136926,359,1687,2046
74,2011-03-15,1,0,3,0,2,1,2,0.317391,0.318178,0.655652,0.184309,289,1767,2056
75,2011-03-16,1,0,3,0,3,1,2,0.365217,0.36693,0.776522,0.203117,321,1871,2192
76,2011-03-17,1,0,3,0,4,1,1,0.415,0.410333,0.602917,0.209579,424,2320,2744
77,2011-03-18,1,0,3,0,5,1,1,0.54,0.527009,0.525217,0.231017,884,2355,3239
78,2011-03-19,1,0,3,0,6,0,1,0.4725,0.466525,0.379167,0.368167,1424,1693,3117
79,2011-03-20,1,0,3,0,0,0,1,0.3325,0.32575,0.47375,0.207721,1047,1424,2471
80,2011-03-21,2,0,3,0,1,1,2,0.430435,0.409735,0.737391,0.288783,401,1676,2077
81,2011-03-22,2,0,3,0,2,1,1,0.441667,0.440642,0.624583,0.22575,460,2243,2703
82,2011-03-23,2,0,3,0,3,1,2,0.346957,0.337939,0.839565,0.234261,203,1918,2121
83,2011-03-24,2,0,3,0,4,1,2,0.285,0.270833,0.805833,0.243787,166,1699,1865
84,2011-03-25,2,0,3,0,5,1,1,0.264167,0.256312,0.495,0.230725,300,1910,2210
85,2011-03-26,2,0,3,0,6,0,1,0.265833,0.257571,0.394167,0.209571,981,1515,2496
86,2011-03-27,2,0,3,0,0,0,2,0.253043,0.250339,0.493913,0.1843,472,1221,1693
87,2011-03-28,2,0,3,0,1,1,1,0.264348,0.257574,0.302174,0.212204,222,1806,2028
88,2011-03-29,2,0,3,0,2,1,1,0.3025,0.292908,0.314167,0.226996,317,2108,2425
89,2011-03-30,2,0,3,0,3,1,2,0.3,0.29735,0.646667,0.172888,168,1368,1536
90,2011-03-31,2,0,3,0,4,1,3,0.268333,0.257575,0.918333,0.217646,179,1506,1685
91,2011-04-01,2,0,4,0,5,1,2,0.3,0.283454,0.68625,0.258708,307,1920,2227
92,2011-04-02,2,0,4,0,6,0,2,0.315,0.315637,0.65375,0.197146,898,1354,2252
93,2011-04-03,2,0,4,0,0,0,1,0.378333,0.378767,0.48,0.182213,1651,1598,3249
94,2011-04-04,2,0,4,0,1,1,1,0.573333,0.542929,0.42625,0.385571,734,2381,3115
95,2011-04-05,2,0,4,0,2,1,2,0.414167,0.39835,0.642083,0.388067,167,1628,1795
96,2011-04-06,2,0,4,0,3,1,1,0.390833,0.387608,0.470833,0.263063,413,2395,2808
97,2011-04-07,2,0,4,0,4,1,1,0.4375,0.433696,0.602917,0.162312,571,2570,3141
98,2011-04-08,2,0,4,0,5,1,2,0.335833,0.324479,0.83625,0.226992,172,1299,1471
99,2011-04-09,2,0,4,0,6,0,2,0.3425,0.341529,0.8775,0.133083,879,1576,2455
100,2011-04-10,2,0,4,0,0,0,2,0.426667,0.426737,0.8575,0.146767,1188,1707,2895
101,2011-04-11,2,0,4,0,1,1,2,0.595652,0.565217,0.716956,0.324474,855,2493,3348
102,2011-04-12,2,0,4,0,2,1,2,0.5025,0.493054,0.739167,0.274879,257,1777,2034
103,2011-04-13,2,0,4,0,3,1,2,0.4125,0.417283,0.819167,0.250617,209,1953,2162
104,2011-04-14,2,0,4,0,4,1,1,0.4675,0.462742,0.540417,0.1107,529,2738,3267
105,2011-04-15,2,0,4,1,5,0,1,0.446667,0.441913,0.67125,0.226375,642,2484,3126
106,2011-04-16,2,0,4,0,6,0,3,0.430833,0.425492,0.888333,0.340808,121,674,795
107,2011-04-17,2,0,4,0,0,0,1,0.456667,0.445696,0.479583,0.303496,1558,2186,3744
108,2011-04-18,2,0,4,0,1,1,1,0.5125,0.503146,0.5425,0.163567,669,2760,3429
109,2011-04-19,2,0,4,0,2,1,2,0.505833,0.489258,0.665833,0.157971,409,2795,3204
110,2011-04-20,2,0,4,0,3,1,1,0.595,0.564392,0.614167,0.241925,613,3331,3944
111,2011-04-21,2,0,4,0,4,1,1,0.459167,0.453892,0.407083,0.325258,745,3444,4189
112,2011-04-22,2,0,4,0,5,1,2,0.336667,0.321954,0.729583,0.219521,177,1506,1683
113,2011-04-23,2,0,4,0,6,0,2,0.46,0.450121,0.887917,0.230725,1462,2574,4036
114,2011-04-24,2,0,4,0,0,0,2,0.581667,0.551763,0.810833,0.192175,1710,2481,4191
115,2011-04-25,2,0,4,0,1,1,1,0.606667,0.5745,0.776667,0.185333,773,3300,4073
116,2011-04-26,2,0,4,0,2,1,1,0.631667,0.594083,0.729167,0.3265,678,3722,4400
117,2011-04-27,2,0,4,0,3,1,2,0.62,0.575142,0.835417,0.3122,547,3325,3872
118,2011-04-28,2,0,4,0,4,1,2,0.6175,0.578929,0.700833,0.320908,569,3489,4058
119,2011-04-29,2,0,4,0,5,1,1,0.51,0.497463,0.457083,0.240063,878,3717,4595
120,2011-04-30,2,0,4,0,6,0,1,0.4725,0.464021,0.503333,0.235075,1965,3347,5312
121,2011-05-01,2,0,5,0,0,0,2,0.451667,0.448204,0.762083,0.106354,1138,2213,3351
122,2011-05-02,2,0,5,0,1,1,2,0.549167,0.532833,0.73,0.183454,847,3554,4401
123,2011-05-03,2,0,5,0,2,1,2,0.616667,0.582079,0.697083,0.342667,603,3848,4451
124,2011-05-04,2,0,5,0,3,1,2,0.414167,0.40465,0.737083,0.328996,255,2378,2633
125,2011-05-05,2,0,5,0,4,1,1,0.459167,0.441917,0.444167,0.295392,614,3819,4433
126,2011-05-06,2,0,5,0,5,1,1,0.479167,0.474117,0.59,0.228246,894,3714,4608
127,2011-05-07,2,0,5,0,6,0,1,0.52,0.512621,0.54125,0.16045,1612,3102,4714
128,2011-05-08,2,0,5,0,0,0,1,0.528333,0.518933,0.631667,0.0746375,1401,2932,4333
129,2011-05-09,2,0,5,0,1,1,1,0.5325,0.525246,0.58875,0.176,664,3698,4362
130,2011-05-10,2,0,5,0,2,1,1,0.5325,0.522721,0.489167,0.115671,694,4109,4803
131,2011-05-11,2,0,5,0,3,1,1,0.5425,0.5284,0.632917,0.120642,550,3632,4182
132,2011-05-12,2,0,5,0,4,1,1,0.535,0.523363,0.7475,0.189667,695,4169,4864
133,2011-05-13,2,0,5,0,5,1,2,0.5125,0.4943,0.863333,0.179725,692,3413,4105
134,2011-05-14,2,0,5,0,6,0,2,0.520833,0.500629,0.9225,0.13495,902,2507,3409
135,2011-05-15,2,0,5,0,0,0,2,0.5625,0.536,0.867083,0.152979,1582,2971,4553
136,2011-05-16,2,0,5,0,1,1,1,0.5775,0.550512,0.787917,0.126871,773,3185,3958
137,2011-05-17,2,0,5,0,2,1,2,0.561667,0.538529,0.837917,0.277354,678,3445,4123
138,2011-05-18,2,0,5,0,3,1,2,0.55,0.527158,0.87,0.201492,536,3319,3855
139,2011-05-19,2,0,5,0,4,1,2,0.530833,0.510742,0.829583,0.108213,735,3840,4575
140,2011-05-20,2,0,5,0,5,1,1,0.536667,0.529042,0.719583,0.125013,909,4008,4917
141,2011-05-21,2,0,5,0,6,0,1,0.6025,0.571975,0.626667,0.12065,2258,3547,5805
142,2011-05-22,2,0,5,0,0,0,1,0.604167,0.5745,0.749583,0.148008,1576,3084,4660
143,2011-05-23,2,0,5,0,1,1,2,0.631667,0.590296,0.81,0.233842,836,3438,4274
144,2011-05-24,2,0,5,0,2,1,2,0.66,0.604813,0.740833,0.207092,659,3833,4492
145,2011-05-25,2,0,5,0,3,1,1,0.660833,0.615542,0.69625,0.154233,740,4238,4978
146,2011-05-26,2,0,5,0,4,1,1,0.708333,0.654688,0.6775,0.199642,758,3919,4677
147,2011-05-27,2,0,5,0,5,1,1,0.681667,0.637008,0.65375,0.240679,871,3808,4679
148,2011-05-28,2,0,5,0,6,0,1,0.655833,0.612379,0.729583,0.230092,2001,2757,4758
149,2011-05-29,2,0,5,0,0,0,1,0.6675,0.61555,0.81875,0.213938,2355,2433,4788
150,2011-05-30,2,0,5,1,1,0,1,0.733333,0.671092,0.685,0.131225,1549,2549,4098
151,2011-05-31,2,0,5,0,2,1,1,0.775,0.725383,0.636667,0.111329,673,3309,3982
152,2011-06-01,2,0,6,0,3,1,2,0.764167,0.720967,0.677083,0.207092,513,3461,3974
153,2011-06-02,2,0,6,0,4,1,1,0.715,0.643942,0.305,0.292287,736,4232,4968
154,2011-06-03,2,0,6,0,5,1,1,0.62,0.587133,0.354167,0.253121,898,4414,5312
155,2011-06-04,2,0,6,0,6,0,1,0.635,0.594696,0.45625,0.123142,1869,3473,5342
156,2011-06-05,2,0,6,0,0,0,2,0.648333,0.616804,0.6525,0.138692,1685,3221,4906
157,2011-06-06,2,0,6,0,1,1,1,0.678333,0.621858,0.6,0.121896,673,3875,4548
158,2011-06-07,2,0,6,0,2,1,1,0.7075,0.65595,0.597917,0.187808,763,4070,4833
159,2011-06-08,2,0,6,0,3,1,1,0.775833,0.727279,0.622083,0.136817,676,3725,4401
160,2011-06-09,2,0,6,0,4,1,2,0.808333,0.757579,0.568333,0.149883,563,3352,3915
161,2011-06-10,2,0,6,0,5,1,1,0.755,0.703292,0.605,0.140554,815,3771,4586
162,2011-06-11,2,0,6,0,6,0,1,0.725,0.678038,0.654583,0.15485,1729,3237,4966
163,2011-06-12,2,0,6,0,0,0,1,0.6925,0.643325,0.747917,0.163567,1467,2993,4460
164,2011-06-13,2,0,6,0,1,1,1,0.635,0.601654,0.494583,0.30535,863,4157,5020
165,2011-06-14,2,0,6,0,2,1,1,0.604167,0.591546,0.507083,0.269283,727,4164,4891
166,2011-06-15,2,0,6,0,3,1,1,0.626667,0.587754,0.471667,0.167912,769,4411,5180
167,2011-06-16,2,0,6,0,4,1,2,0.628333,0.595346,0.688333,0.206471,545,3222,3767
168,2011-06-17,2,0,6,0,5,1,1,0.649167,0.600383,0.735833,0.143029,863,3981,4844
169,2011-06-18,2,0,6,0,6,0,1,0.696667,0.643954,0.670417,0.119408,1807,3312,5119
170,2011-06-19,2,0,6,0,0,0,2,0.699167,0.645846,0.666667,0.102,1639,3105,4744
171,2011-06-20,2,0,6,0,1,1,2,0.635,0.595346,0.74625,0.155475,699,3311,4010
172,2011-06-21,3,0,6,0,2,1,2,0.680833,0.637646,0.770417,0.171025,774,4061,4835
173,2011-06-22,3,0,6,0,3,1,1,0.733333,0.693829,0.7075,0.172262,661,3846,4507
174,2011-06-23,3,0,6,0,4,1,2,0.728333,0.693833,0.703333,0.238804,746,4044,4790
175,2011-06-24,3,0,6,0,5,1,1,0.724167,0.656583,0.573333,0.222025,969,4022,4991
176,2011-06-25,3,0,6,0,6,0,1,0.695,0.643313,0.483333,0.209571,1782,3420,5202
177,2011-06-26,3,0,6,0,0,0,1,0.68,0.637629,0.513333,0.0945333,1920,3385,5305
178,2011-06-27,3,0,6,0,1,1,2,0.6825,0.637004,0.658333,0.107588,854,3854,4708
179,2011-06-28,3,0,6,0,2,1,1,0.744167,0.692558,0.634167,0.144283,732,3916,4648
180,2011-06-29,3,0,6,0,3,1,1,0.728333,0.654688,0.497917,0.261821,848,4377,5225
181,2011-06-30,3,0,6,0,4,1,1,0.696667,0.637008,0.434167,0.185312,1027,4488,5515
182,2011-07-01,3,0,7,0,5,1,1,0.7225,0.652162,0.39625,0.102608,1246,4116,5362
183,2011-07-02,3,0,7,0,6,0,1,0.738333,0.667308,0.444583,0.115062,2204,2915,5119
184,2011-07-03,3,0,7,0,0,0,2,0.716667,0.668575,0.6825,0.228858,2282,2367,4649
185,2011-07-04,3,0,7,1,1,0,2,0.726667,0.665417,0.637917,0.0814792,3065,2978,6043
186,2011-07-05,3,0,7,0,2,1,1,0.746667,0.696338,0.590417,0.126258,1031,3634,4665
187,2011-07-06,3,0,7,0,3,1,1,0.72,0.685633,0.743333,0.149883,784,3845,4629
188,2011-07-07,3,0,7,0,4,1,1,0.75,0.686871,0.65125,0.1592,754,3838,4592
189,2011-07-08,3,0,7,0,5,1,2,0.709167,0.670483,0.757917,0.225129,692,3348,4040
190,2011-07-09,3,0,7,0,6,0,1,0.733333,0.664158,0.609167,0.167912,1988,3348,5336
191,2011-07-10,3,0,7,0,0,0,1,0.7475,0.690025,0.578333,0.183471,1743,3138,4881
192,2011-07-11,3,0,7,0,1,1,1,0.7625,0.729804,0.635833,0.282337,723,3363,4086
193,2011-07-12,3,0,7,0,2,1,1,0.794167,0.739275,0.559167,0.200254,662,3596,4258
194,2011-07-13,3,0,7,0,3,1,1,0.746667,0.689404,0.631667,0.146133,748,3594,4342
195,2011-07-14,3,0,7,0,4,1,1,0.680833,0.635104,0.47625,0.240667,888,4196,5084
196,2011-07-15,3,0,7,0,5,1,1,0.663333,0.624371,0.59125,0.182833,1318,4220,5538
197,2011-07-16,3,0,7,0,6,0,1,0.686667,0.638263,0.585,0.208342,2418,3505,5923
198,2011-07-17,3,0,7,0,0,0,1,0.719167,0.669833,0.604167,0.245033,2006,3296,5302
199,2011-07-18,3,0,7,0,1,1,1,0.746667,0.703925,0.65125,0.215804,841,3617,4458
200,2011-07-19,3,0,7,0,2,1,1,0.776667,0.747479,0.650417,0.1306,752,3789,4541
201,2011-07-20,3,0,7,0,3,1,1,0.768333,0.74685,0.707083,0.113817,644,3688,4332
202,2011-07-21,3,0,7,0,4,1,2,0.815,0.826371,0.69125,0.222021,632,3152,3784
203,2011-07-22,3,0,7,0,5,1,1,0.848333,0.840896,0.580417,0.1331,562,2825,3387
204,2011-07-23,3,0,7,0,6,0,1,0.849167,0.804287,0.5,0.131221,987,2298,3285
205,2011-07-24,3,0,7,0,0,0,1,0.83,0.794829,0.550833,0.169171,1050,2556,3606
206,2011-07-25,3,0,7,0,1,1,1,0.743333,0.720958,0.757083,0.0908083,568,3272,3840
207,2011-07-26,3,0,7,0,2,1,1,0.771667,0.696979,0.540833,0.200258,750,3840,4590
208,2011-07-27,3,0,7,0,3,1,1,0.775,0.690667,0.402917,0.183463,755,3901,4656
209,2011-07-28,3,0,7,0,4,1,1,0.779167,0.7399,0.583333,0.178479,606,3784,4390
210,2011-07-29,3,0,7,0,5,1,1,0.838333,0.785967,0.5425,0.174138,670,3176,3846
211,2011-07-30,3,0,7,0,6,0,1,0.804167,0.728537,0.465833,0.168537,1559,2916,4475
212,2011-07-31,3,0,7,0,0,0,1,0.805833,0.729796,0.480833,0.164813,1524,2778,4302
213,2011-08-01,3,0,8,0,1,1,1,0.771667,0.703292,0.550833,0.156717,729,3537,4266
214,2011-08-02,3,0,8,0,2,1,1,0.783333,0.707071,0.49125,0.20585,801,4044,4845
215,2011-08-03,3,0,8,0,3,1,2,0.731667,0.679937,0.6575,0.135583,467,3107,3574
216,2011-08-04,3,0,8,0,4,1,2,0.71,0.664788,0.7575,0.19715,799,3777,4576
217,2011-08-05,3,0,8,0,5,1,1,0.710833,0.656567,0.630833,0.184696,1023,3843,4866
218,2011-08-06,3,0,8,0,6,0,2,0.716667,0.676154,0.755,0.22825,1521,2773,4294
219,2011-08-07,3,0,8,0,0,0,1,0.7425,0.715292,0.752917,0.201487,1298,2487,3785
220,2011-08-08,3,0,8,0,1,1,1,0.765,0.703283,0.592083,0.192175,846,3480,4326
221,2011-08-09,3,0,8,0,2,1,1,0.775,0.724121,0.570417,0.151121,907,3695,4602
222,2011-08-10,3,0,8,0,3,1,1,0.766667,0.684983,0.424167,0.200258,884,3896,4780
223,2011-08-11,3,0,8,0,4,1,1,0.7175,0.651521,0.42375,0.164796,812,3980,4792
224,2011-08-12,3,0,8,0,5,1,1,0.708333,0.654042,0.415,0.125621,1051,3854,4905
225,2011-08-13,3,0,8,0,6,0,2,0.685833,0.645858,0.729583,0.211454,1504,2646,4150
226,2011-08-14,3,0,8,0,0,0,2,0.676667,0.624388,0.8175,0.222633,1338,2482,3820
227,2011-08-15,3,0,8,0,1,1,1,0.665833,0.616167,0.712083,0.208954,775,3563,4338
228,2011-08-16,3,0,8,0,2,1,1,0.700833,0.645837,0.578333,0.236329,721,4004,4725
229,2011-08-17,3,0,8,0,3,1,1,0.723333,0.666671,0.575417,0.143667,668,4026,4694
230,2011-08-18,3,0,8,0,4,1,1,0.711667,0.662258,0.654583,0.233208,639,3166,3805
231,2011-08-19,3,0,8,0,5,1,2,0.685,0.633221,0.722917,0.139308,797,3356,4153
232,2011-08-20,3,0,8,0,6,0,1,0.6975,0.648996,0.674167,0.104467,1914,3277,5191
233,2011-08-21,3,0,8,0,0,0,1,0.710833,0.675525,0.77,0.248754,1249,2624,3873
234,2011-08-22,3,0,8,0,1,1,1,0.691667,0.638254,0.47,0.27675,833,3925,4758
235,2011-08-23,3,0,8,0,2,1,1,0.640833,0.606067,0.455417,0.146763,1281,4614,5895
236,2011-08-24,3,0,8,0,3,1,1,0.673333,0.630692,0.605,0.253108,949,4181,5130
237,2011-08-25,3,0,8,0,4,1,2,0.684167,0.645854,0.771667,0.210833,435,3107,3542
238,2011-08-26,3,0,8,0,5,1,1,0.7,0.659733,0.76125,0.0839625,768,3893,4661
239,2011-08-27,3,0,8,0,6,0,2,0.68,0.635556,0.85,0.375617,226,889,1115
240,2011-08-28,3,0,8,0,0,0,1,0.707059,0.647959,0.561765,0.304659,1415,2919,4334
241,2011-08-29,3,0,8,0,1,1,1,0.636667,0.607958,0.554583,0.159825,729,3905,4634
242,2011-08-30,3,0,8,0,2,1,1,0.639167,0.594704,0.548333,0.125008,775,4429,5204
243,2011-08-31,3,0,8,0,3,1,1,0.656667,0.611121,0.597917,0.0833333,688,4370,5058
244,2011-09-01,3,0,9,0,4,1,1,0.655,0.614921,0.639167,0.141796,783,4332,5115
245,2011-09-02,3,0,9,0,5,1,2,0.643333,0.604808,0.727083,0.139929,875,3852,4727
246,2011-09-03,3,0,9,0,6,0,1,0.669167,0.633213,0.716667,0.185325,1935,2549,4484
247,2011-09-04,3,0,9,0,0,0,1,0.709167,0.665429,0.742083,0.206467,2521,2419,4940
248,2011-09-05,3,0,9,1,1,0,2,0.673333,0.625646,0.790417,0.212696,1236,2115,3351
249,2011-09-06,3,0,9,0,2,1,3,0.54,0.5152,0.886957,0.343943,204,2506,2710
250,2011-09-07,3,0,9,0,3,1,3,0.599167,0.544229,0.917083,0.0970208,118,1878,1996
251,2011-09-08,3,0,9,0,4,1,3,0.633913,0.555361,0.939565,0.192748,153,1689,1842
252,2011-09-09,3,0,9,0,5,1,2,0.65,0.578946,0.897917,0.124379,417,3127,3544
253,2011-09-10,3,0,9,0,6,0,1,0.66,0.607962,0.75375,0.153608,1750,3595,5345
254,2011-09-11,3,0,9,0,0,0,1,0.653333,0.609229,0.71375,0.115054,1633,3413,5046
255,2011-09-12,3,0,9,0,1,1,1,0.644348,0.60213,0.692174,0.088913,690,4023,4713
256,2011-09-13,3,0,9,0,2,1,1,0.650833,0.603554,0.7125,0.141804,701,4062,4763
257,2011-09-14,3,0,9,0,3,1,1,0.673333,0.6269,0.697083,0.1673,647,4138,4785
258,2011-09-15,3,0,9,0,4,1,2,0.5775,0.553671,0.709167,0.271146,428,3231,3659
259,2011-09-16,3,0,9,0,5,1,2,0.469167,0.461475,0.590417,0.164183,742,4018,4760
260,2011-09-17,3,0,9,0,6,0,2,0.491667,0.478512,0.718333,0.189675,1434,3077,4511
261,2011-09-18,3,0,9,0,0,0,1,0.5075,0.490537,0.695,0.178483,1353,2921,4274
262,2011-09-19,3,0,9,0,1,1,2,0.549167,0.529675,0.69,0.151742,691,3848,4539
263,2011-09-20,3,0,9,0,2,1,2,0.561667,0.532217,0.88125,0.134954,438,3203,3641
264,2011-09-21,3,0,9,0,3,1,2,0.595,0.550533,0.9,0.0964042,539,3813,4352
265,2011-09-22,3,0,9,0,4,1,2,0.628333,0.554963,0.902083,0.128125,555,4240,4795
266,2011-09-23,4,0,9,0,5,1,2,0.609167,0.522125,0.9725,0.0783667,258,2137,2395
267,2011-09-24,4,0,9,0,6,0,2,0.606667,0.564412,0.8625,0.0783833,1776,3647,5423
268,2011-09-25,4,0,9,0,0,0,2,0.634167,0.572637,0.845,0.0503792,1544,3466,5010
269,2011-09-26,4,0,9,0,1,1,2,0.649167,0.589042,0.848333,0.1107,684,3946,4630
270,2011-09-27,4,0,9,0,2,1,2,0.636667,0.574525,0.885417,0.118171,477,3643,4120
271,2011-09-28,4,0,9,0,3,1,2,0.635,0.575158,0.84875,0.148629,480,3427,3907
272,2011-09-29,4,0,9,0,4,1,1,0.616667,0.574512,0.699167,0.172883,653,4186,4839
273,2011-09-30,4,0,9,0,5,1,1,0.564167,0.544829,0.6475,0.206475,830,4372,5202
274,2011-10-01,4,0,10,0,6,0,2,0.41,0.412863,0.75375,0.292296,480,1949,2429
275,2011-10-02,4,0,10,0,0,0,2,0.356667,0.345317,0.791667,0.222013,616,2302,2918
276,2011-10-03,4,0,10,0,1,1,2,0.384167,0.392046,0.760833,0.0833458,330,3240,3570
277,2011-10-04,4,0,10,0,2,1,1,0.484167,0.472858,0.71,0.205854,486,3970,4456
278,2011-10-05,4,0,10,0,3,1,1,0.538333,0.527138,0.647917,0.17725,559,4267,4826
279,2011-10-06,4,0,10,0,4,1,1,0.494167,0.480425,0.620833,0.134954,639,4126,4765
280,2011-10-07,4,0,10,0,5,1,1,0.510833,0.504404,0.684167,0.0223917,949,4036,4985
281,2011-10-08,4,0,10,0,6,0,1,0.521667,0.513242,0.70125,0.0454042,2235,3174,5409
282,2011-10-09,4,0,10,0,0,0,1,0.540833,0.523983,0.7275,0.06345,2397,3114,5511
283,2011-10-10,4,0,10,1,1,0,1,0.570833,0.542925,0.73375,0.0423042,1514,3603,5117
284,2011-10-11,4,0,10,0,2,1,2,0.566667,0.546096,0.80875,0.143042,667,3896,4563
285,2011-10-12,4,0,10,0,3,1,3,0.543333,0.517717,0.90625,0.24815,217,2199,2416
286,2011-10-13,4,0,10,0,4,1,2,0.589167,0.551804,0.896667,0.141787,290,2623,2913
287,2011-10-14,4,0,10,0,5,1,2,0.550833,0.529675,0.71625,0.223883,529,3115,3644
288,2011-10-15,4,0,10,0,6,0,1,0.506667,0.498725,0.483333,0.258083,1899,3318,5217
289,2011-10-16,4,0,10,0,0,0,1,0.511667,0.503154,0.486667,0.281717,1748,3293,5041
290,2011-10-17,4,0,10,0,1,1,1,0.534167,0.510725,0.579583,0.175379,713,3857,4570
291,2011-10-18,4,0,10,0,2,1,2,0.5325,0.522721,0.701667,0.110087,637,4111,4748
292,2011-10-19,4,0,10,0,3,1,3,0.541739,0.513848,0.895217,0.243339,254,2170,2424
293,2011-10-20,4,0,10,0,4,1,1,0.475833,0.466525,0.63625,0.422275,471,3724,4195
294,2011-10-21,4,0,10,0,5,1,1,0.4275,0.423596,0.574167,0.221396,676,3628,4304
295,2011-10-22,4,0,10,0,6,0,1,0.4225,0.425492,0.629167,0.0926667,1499,2809,4308
296,2011-10-23,4,0,10,0,0,0,1,0.421667,0.422333,0.74125,0.0995125,1619,2762,4381
297,2011-10-24,4,0,10,0,1,1,1,0.463333,0.457067,0.772083,0.118792,699,3488,4187
298,2011-10-25,4,0,10,0,2,1,1,0.471667,0.463375,0.622917,0.166658,695,3992,4687
299,2011-10-26,4,0,10,0,3,1,2,0.484167,0.472846,0.720417,0.148642,404,3490,3894
300,2011-10-27,4,0,10,0,4,1,2,0.47,0.457046,0.812917,0.197763,240,2419,2659
301,2011-10-28,4,0,10,0,5,1,2,0.330833,0.318812,0.585833,0.229479,456,3291,3747
302,2011-10-29,4,0,10,0,6,0,3,0.254167,0.227913,0.8825,0.351371,57,570,627
303,2011-10-30,4,0,10,0,0,0,1,0.319167,0.321329,0.62375,0.176617,885,2446,3331
304,2011-10-31,4,0,10,0,1,1,1,0.34,0.356063,0.703333,0.10635,362,3307,3669
305,2011-11-01,4,0,11,0,2,1,1,0.400833,0.397088,0.68375,0.135571,410,3658,4068
306,2011-11-02,4,0,11,0,3,1,1,0.3775,0.390133,0.71875,0.0820917,370,3816,4186
307,2011-11-03,4,0,11,0,4,1,1,0.408333,0.405921,0.702083,0.136817,318,3656,3974
308,2011-11-04,4,0,11,0,5,1,2,0.403333,0.403392,0.6225,0.271779,470,3576,4046
309,2011-11-05,4,0,11,0,6,0,1,0.326667,0.323854,0.519167,0.189062,1156,2770,3926
310,2011-11-06,4,0,11,0,0,0,1,0.348333,0.362358,0.734583,0.0920542,952,2697,3649
311,2011-11-07,4,0,11,0,1,1,1,0.395,0.400871,0.75875,0.057225,373,3662,4035
312,2011-11-08,4,0,11,0,2,1,1,0.408333,0.412246,0.721667,0.0690375,376,3829,4205
313,2011-11-09,4,0,11,0,3,1,1,0.4,0.409079,0.758333,0.0621958,305,3804,4109
314,2011-11-10,4,0,11,0,4,1,2,0.38,0.373721,0.813333,0.189067,190,2743,2933
315,2011-11-11,4,0,11,1,5,0,1,0.324167,0.306817,0.44625,0.314675,440,2928,3368
316,2011-11-12,4,0,11,0,6,0,1,0.356667,0.357942,0.552917,0.212062,1275,2792,4067
317,2011-11-13,4,0,11,0,0,0,1,0.440833,0.43055,0.458333,0.281721,1004,2713,3717
318,2011-11-14,4,0,11,0,1,1,1,0.53,0.524612,0.587083,0.306596,595,3891,4486
319,2011-11-15,4,0,11,0,2,1,2,0.53,0.507579,0.68875,0.199633,449,3746,4195
320,2011-11-16,4,0,11,0,3,1,3,0.456667,0.451988,0.93,0.136829,145,1672,1817
321,2011-11-17,4,0,11,0,4,1,2,0.341667,0.323221,0.575833,0.305362,139,2914,3053
322,2011-11-18,4,0,11,0,5,1,1,0.274167,0.272721,0.41,0.168533,245,3147,3392
323,2011-11-19,4,0,11,0,6,0,1,0.329167,0.324483,0.502083,0.224496,943,2720,3663
324,2011-11-20,4,0,11,0,0,0,2,0.463333,0.457058,0.684583,0.18595,787,2733,3520
325,2011-11-21,4,0,11,0,1,1,3,0.4475,0.445062,0.91,0.138054,220,2545,2765
326,2011-11-22,4,0,11,0,2,1,3,0.416667,0.421696,0.9625,0.118792,69,1538,1607
327,2011-11-23,4,0,11,0,3,1,2,0.440833,0.430537,0.757917,0.335825,112,2454,2566
328,2011-11-24,4,0,11,1,4,0,1,0.373333,0.372471,0.549167,0.167304,560,935,1495
329,2011-11-25,4,0,11,0,5,1,1,0.375,0.380671,0.64375,0.0988958,1095,1697,2792
330,2011-11-26,4,0,11,0,6,0,1,0.375833,0.385087,0.681667,0.0684208,1249,1819,3068
331,2011-11-27,4,0,11,0,0,0,1,0.459167,0.4558,0.698333,0.208954,810,2261,3071
332,2011-11-28,4,0,11,0,1,1,1,0.503478,0.490122,0.743043,0.142122,253,3614,3867
333,2011-11-29,4,0,11,0,2,1,2,0.458333,0.451375,0.830833,0.258092,96,2818,2914
334,2011-11-30,4,0,11,0,3,1,1,0.325,0.311221,0.613333,0.271158,188,3425,3613
335,2011-12-01,4,0,12,0,4,1,1,0.3125,0.305554,0.524583,0.220158,182,3545,3727
336,2011-12-02,4,0,12,0,5,1,1,0.314167,0.331433,0.625833,0.100754,268,3672,3940
337,2011-12-03,4,0,12,0,6,0,1,0.299167,0.310604,0.612917,0.0957833,706,2908,3614
338,2011-12-04,4,0,12,0,0,0,1,0.330833,0.3491,0.775833,0.0839583,634,2851,3485
339,2011-12-05,4,0,12,0,1,1,2,0.385833,0.393925,0.827083,0.0622083,233,3578,3811
340,2011-12-06,4,0,12,0,2,1,3,0.4625,0.4564,0.949583,0.232583,126,2468,2594
341,2011-12-07,4,0,12,0,3,1,3,0.41,0.400246,0.970417,0.266175,50,655,705
342,2011-12-08,4,0,12,0,4,1,1,0.265833,0.256938,0.58,0.240058,150,3172,3322
343,2011-12-09,4,0,12,0,5,1,1,0.290833,0.317542,0.695833,0.0827167,261,3359,3620
344,2011-12-10,4,0,12,0,6,0,1,0.275,0.266412,0.5075,0.233221,502,2688,3190
345,2011-12-11,4,0,12,0,0,0,1,0.220833,0.253154,0.49,0.0665417,377,2366,2743
346,2011-12-12,4,0,12,0,1,1,1,0.238333,0.270196,0.670833,0.06345,143,3167,3310
347,2011-12-13,4,0,12,0,2,1,1,0.2825,0.301138,0.59,0.14055,155,3368,3523
348,2011-12-14,4,0,12,0,3,1,2,0.3175,0.338362,0.66375,0.0609583,178,3562,3740
349,2011-12-15,4,0,12,0,4,1,2,0.4225,0.412237,0.634167,0.268042,181,3528,3709
350,2011-12-16,4,0,12,0,5,1,2,0.375,0.359825,0.500417,0.260575,178,3399,3577
351,2011-12-17,4,0,12,0,6,0,2,0.258333,0.249371,0.560833,0.243167,275,2464,2739
352,2011-12-18,4,0,12,0,0,0,1,0.238333,0.245579,0.58625,0.169779,220,2211,2431
353,2011-12-19,4,0,12,0,1,1,1,0.276667,0.280933,0.6375,0.172896,260,3143,3403
354,2011-12-20,4,0,12,0,2,1,2,0.385833,0.396454,0.595417,0.0615708,216,3534,3750
355,2011-12-21,1,0,12,0,3,1,2,0.428333,0.428017,0.858333,0.2214,107,2553,2660
356,2011-12-22,1,0,12,0,4,1,2,0.423333,0.426121,0.7575,0.047275,227,2841,3068
357,2011-12-23,1,0,12,0,5,1,1,0.373333,0.377513,0.68625,0.274246,163,2046,2209
358,2011-12-24,1,0,12,0,6,0,1,0.3025,0.299242,0.5425,0.190304,155,856,1011
359,2011-12-25,1,0,12,0,0,0,1,0.274783,0.279961,0.681304,0.155091,303,451,754
360,2011-12-26,1,0,12,1,1,0,1,0.321739,0.315535,0.506957,0.239465,430,887,1317
361,2011-12-27,1,0,12,0,2,1,2,0.325,0.327633,0.7625,0.18845,103,1059,1162
362,2011-12-28,1,0,12,0,3,1,1,0.29913,0.279974,0.503913,0.293961,255,2047,2302
363,2011-12-29,1,0,12,0,4,1,1,0.248333,0.263892,0.574167,0.119412,254,2169,2423
364,2011-12-30,1,0,12,0,5,1,1,0.311667,0.318812,0.636667,0.134337,491,2508,2999
365,2011-12-31,1,0,12,0,6,0,1,0.41,0.414121,0.615833,0.220154,665,1820,2485
366,2012-01-01,1,1,1,0,0,0,1,0.37,0.375621,0.6925,0.192167,686,1608,2294
367,2012-01-02,1,1,1,1,1,0,1,0.273043,0.252304,0.381304,0.329665,244,1707,1951
368,2012-01-03,1,1,1,0,2,1,1,0.15,0.126275,0.44125,0.365671,89,2147,2236
369,2012-01-04,1,1,1,0,3,1,2,0.1075,0.119337,0.414583,0.1847,95,2273,2368
370,2012-01-05,1,1,1,0,4,1,1,0.265833,0.278412,0.524167,0.129987,140,3132,3272
371,2012-01-06,1,1,1,0,5,1,1,0.334167,0.340267,0.542083,0.167908,307,3791,4098
372,2012-01-07,1,1,1,0,6,0,1,0.393333,0.390779,0.531667,0.174758,1070,3451,4521
373,2012-01-08,1,1,1,0,0,0,1,0.3375,0.340258,0.465,0.191542,599,2826,3425
374,2012-01-09,1,1,1,0,1,1,2,0.224167,0.247479,0.701667,0.0989,106,2270,2376
375,2012-01-10,1,1,1,0,2,1,1,0.308696,0.318826,0.646522,0.187552,173,3425,3598
376,2012-01-11,1,1,1,0,3,1,2,0.274167,0.282821,0.8475,0.131221,92,2085,2177
377,2012-01-12,1,1,1,0,4,1,2,0.3825,0.381938,0.802917,0.180967,269,3828,4097
378,2012-01-13,1,1,1,0,5,1,1,0.274167,0.249362,0.5075,0.378108,174,3040,3214
379,2012-01-14,1,1,1,0,6,0,1,0.18,0.183087,0.4575,0.187183,333,2160,2493
380,2012-01-15,1,1,1,0,0,0,1,0.166667,0.161625,0.419167,0.251258,284,2027,2311
381,2012-01-16,1,1,1,1,1,0,1,0.19,0.190663,0.5225,0.231358,217,2081,2298
382,2012-01-17,1,1,1,0,2,1,2,0.373043,0.364278,0.716087,0.34913,127,2808,2935
383,2012-01-18,1,1,1,0,3,1,1,0.303333,0.275254,0.443333,0.415429,109,3267,3376
384,2012-01-19,1,1,1,0,4,1,1,0.19,0.190038,0.4975,0.220158,130,3162,3292
385,2012-01-20,1,1,1,0,5,1,2,0.2175,0.220958,0.45,0.20275,115,3048,3163
386,2012-01-21,1,1,1,0,6,0,2,0.173333,0.174875,0.83125,0.222642,67,1234,1301
387,2012-01-22,1,1,1,0,0,0,2,0.1625,0.16225,0.79625,0.199638,196,1781,1977
388,2012-01-23,1,1,1,0,1,1,2,0.218333,0.243058,0.91125,0.110708,145,2287,2432
389,2012-01-24,1,1,1,0,2,1,1,0.3425,0.349108,0.835833,0.123767,439,3900,4339
390,2012-01-25,1,1,1,0,3,1,1,0.294167,0.294821,0.64375,0.161071,467,3803,4270
391,2012-01-26,1,1,1,0,4,1,2,0.341667,0.35605,0.769583,0.0733958,244,3831,4075
392,2012-01-27,1,1,1,0,5,1,2,0.425,0.415383,0.74125,0.342667,269,3187,3456
393,2012-01-28,1,1,1,0,6,0,1,0.315833,0.326379,0.543333,0.210829,775,3248,4023
394,2012-01-29,1,1,1,0,0,0,1,0.2825,0.272721,0.31125,0.24005,558,2685,3243
395,2012-01-30,1,1,1,0,1,1,1,0.269167,0.262625,0.400833,0.215792,126,3498,3624
396,2012-01-31,1,1,1,0,2,1,1,0.39,0.381317,0.416667,0.261817,324,4185,4509
397,2012-02-01,1,1,2,0,3,1,1,0.469167,0.466538,0.507917,0.189067,304,4275,4579
398,2012-02-02,1,1,2,0,4,1,2,0.399167,0.398971,0.672917,0.187187,190,3571,3761
399,2012-02-03,1,1,2,0,5,1,1,0.313333,0.309346,0.526667,0.178496,310,3841,4151
400,2012-02-04,1,1,2,0,6,0,2,0.264167,0.272725,0.779583,0.121896,384,2448,2832
401,2012-02-05,1,1,2,0,0,0,2,0.265833,0.264521,0.687917,0.175996,318,2629,2947
402,2012-02-06,1,1,2,0,1,1,1,0.282609,0.296426,0.622174,0.1538,206,3578,3784
403,2012-02-07,1,1,2,0,2,1,1,0.354167,0.361104,0.49625,0.147379,199,4176,4375
404,2012-02-08,1,1,2,0,3,1,2,0.256667,0.266421,0.722917,0.133721,109,2693,2802
405,2012-02-09,1,1,2,0,4,1,1,0.265,0.261988,0.562083,0.194037,163,3667,3830
406,2012-02-10,1,1,2,0,5,1,2,0.280833,0.293558,0.54,0.116929,227,3604,3831
407,2012-02-11,1,1,2,0,6,0,3,0.224167,0.210867,0.73125,0.289796,192,1977,2169
408,2012-02-12,1,1,2,0,0,0,1,0.1275,0.101658,0.464583,0.409212,73,1456,1529
409,2012-02-13,1,1,2,0,1,1,1,0.2225,0.227913,0.41125,0.167283,94,3328,3422
410,2012-02-14,1,1,2,0,2,1,2,0.319167,0.333946,0.50875,0.141179,135,3787,3922
411,2012-02-15,1,1,2,0,3,1,1,0.348333,0.351629,0.53125,0.1816,141,4028,4169
412,2012-02-16,1,1,2,0,4,1,2,0.316667,0.330162,0.752917,0.091425,74,2931,3005
413,2012-02-17,1,1,2,0,5,1,1,0.343333,0.351629,0.634583,0.205846,349,3805,4154
414,2012-02-18,1,1,2,0,6,0,1,0.346667,0.355425,0.534583,0.190929,1435,2883,4318
415,2012-02-19,1,1,2,0,0,0,2,0.28,0.265788,0.515833,0.253112,618,2071,2689
416,2012-02-20,1,1,2,1,1,0,1,0.28,0.273391,0.507826,0.229083,502,2627,3129
417,2012-02-21,1,1,2,0,2,1,1,0.287826,0.295113,0.594348,0.205717,163,3614,3777
418,2012-02-22,1,1,2,0,3,1,1,0.395833,0.392667,0.567917,0.234471,394,4379,4773
419,2012-02-23,1,1,2,0,4,1,1,0.454167,0.444446,0.554583,0.190913,516,4546,5062
420,2012-02-24,1,1,2,0,5,1,2,0.4075,0.410971,0.7375,0.237567,246,3241,3487
421,2012-02-25,1,1,2,0,6,0,1,0.290833,0.255675,0.395833,0.421642,317,2415,2732
422,2012-02-26,1,1,2,0,0,0,1,0.279167,0.268308,0.41,0.205229,515,2874,3389
423,2012-02-27,1,1,2,0,1,1,1,0.366667,0.357954,0.490833,0.268033,253,4069,4322
424,2012-02-28,1,1,2,0,2,1,1,0.359167,0.353525,0.395833,0.193417,229,4134,4363
425,2012-02-29,1,1,2,0,3,1,2,0.344348,0.34847,0.804783,0.179117,65,1769,1834
426,2012-03-01,1,1,3,0,4,1,1,0.485833,0.475371,0.615417,0.226987,325,4665,4990
427,2012-03-02,1,1,3,0,5,1,2,0.353333,0.359842,0.657083,0.144904,246,2948,3194
428,2012-03-03,1,1,3,0,6,0,2,0.414167,0.413492,0.62125,0.161079,956,3110,4066
429,2012-03-04,1,1,3,0,0,0,1,0.325833,0.303021,0.403333,0.334571,710,2713,3423
430,2012-03-05,1,1,3,0,1,1,1,0.243333,0.241171,0.50625,0.228858,203,3130,3333
431,2012-03-06,1,1,3,0,2,1,1,0.258333,0.255042,0.456667,0.200875,221,3735,3956
432,2012-03-07,1,1,3,0,3,1,1,0.404167,0.3851,0.513333,0.345779,432,4484,4916
433,2012-03-08,1,1,3,0,4,1,1,0.5275,0.524604,0.5675,0.441563,486,4896,5382
434,2012-03-09,1,1,3,0,5,1,2,0.410833,0.397083,0.407083,0.4148,447,4122,4569
435,2012-03-10,1,1,3,0,6,0,1,0.2875,0.277767,0.350417,0.22575,968,3150,4118
436,2012-03-11,1,1,3,0,0,0,1,0.361739,0.35967,0.476957,0.222587,1658,3253,4911
437,2012-03-12,1,1,3,0,1,1,1,0.466667,0.459592,0.489167,0.207713,838,4460,5298
438,2012-03-13,1,1,3,0,2,1,1,0.565,0.542929,0.6175,0.23695,762,5085,5847
439,2012-03-14,1,1,3,0,3,1,1,0.5725,0.548617,0.507083,0.115062,997,5315,6312
440,2012-03-15,1,1,3,0,4,1,1,0.5575,0.532825,0.579583,0.149883,1005,5187,6192
441,2012-03-16,1,1,3,0,5,1,2,0.435833,0.436229,0.842083,0.113192,548,3830,4378
442,2012-03-17,1,1,3,0,6,0,2,0.514167,0.505046,0.755833,0.110704,3155,4681,7836
443,2012-03-18,1,1,3,0,0,0,2,0.4725,0.464,0.81,0.126883,2207,3685,5892
444,2012-03-19,1,1,3,0,1,1,1,0.545,0.532821,0.72875,0.162317,982,5171,6153
445,2012-03-20,1,1,3,0,2,1,1,0.560833,0.538533,0.807917,0.121271,1051,5042,6093
446,2012-03-21,2,1,3,0,3,1,2,0.531667,0.513258,0.82125,0.0895583,1122,5108,6230
447,2012-03-22,2,1,3,0,4,1,1,0.554167,0.531567,0.83125,0.117562,1334,5537,6871
448,2012-03-23,2,1,3,0,5,1,2,0.601667,0.570067,0.694167,0.1163,2469,5893,8362
449,2012-03-24,2,1,3,0,6,0,2,0.5025,0.486733,0.885417,0.192783,1033,2339,3372
450,2012-03-25,2,1,3,0,0,0,2,0.4375,0.437488,0.880833,0.220775,1532,3464,4996
451,2012-03-26,2,1,3,0,1,1,1,0.445833,0.43875,0.477917,0.386821,795,4763,5558
452,2012-03-27,2,1,3,0,2,1,1,0.323333,0.315654,0.29,0.187192,531,4571,5102
453,2012-03-28,2,1,3,0,3,1,1,0.484167,0.47095,0.48125,0.291671,674,5024,5698
454,2012-03-29,2,1,3,0,4,1,1,0.494167,0.482304,0.439167,0.31965,834,5299,6133
455,2012-03-30,2,1,3,0,5,1,2,0.37,0.375621,0.580833,0.138067,796,4663,5459
456,2012-03-31,2,1,3,0,6,0,2,0.424167,0.421708,0.738333,0.250617,2301,3934,6235
457,2012-04-01,2,1,4,0,0,0,2,0.425833,0.417287,0.67625,0.172267,2347,3694,6041
458,2012-04-02,2,1,4,0,1,1,1,0.433913,0.427513,0.504348,0.312139,1208,4728,5936
459,2012-04-03,2,1,4,0,2,1,1,0.466667,0.461483,0.396667,0.100133,1348,5424,6772
460,2012-04-04,2,1,4,0,3,1,1,0.541667,0.53345,0.469583,0.180975,1058,5378,6436
461,2012-04-05,2,1,4,0,4,1,1,0.435,0.431163,0.374167,0.219529,1192,5265,6457
462,2012-04-06,2,1,4,0,5,1,1,0.403333,0.390767,0.377083,0.300388,1807,4653,6460
463,2012-04-07,2,1,4,0,6,0,1,0.4375,0.426129,0.254167,0.274871,3252,3605,6857
464,2012-04-08,2,1,4,0,0,0,1,0.5,0.492425,0.275833,0.232596,2230,2939,5169
465,2012-04-09,2,1,4,0,1,1,1,0.489167,0.476638,0.3175,0.358196,905,4680,5585
466,2012-04-10,2,1,4,0,2,1,1,0.446667,0.436233,0.435,0.249375,819,5099,5918
467,2012-04-11,2,1,4,0,3,1,1,0.348696,0.337274,0.469565,0.295274,482,4380,4862
468,2012-04-12,2,1,4,0,4,1,1,0.3975,0.387604,0.46625,0.290429,663,4746,5409
469,2012-04-13,2,1,4,0,5,1,1,0.4425,0.431808,0.408333,0.155471,1252,5146,6398
470,2012-04-14,2,1,4,0,6,0,1,0.495,0.487996,0.502917,0.190917,2795,4665,7460
471,2012-04-15,2,1,4,0,0,0,1,0.606667,0.573875,0.507917,0.225129,2846,4286,7132
472,2012-04-16,2,1,4,1,1,0,1,0.664167,0.614925,0.561667,0.284829,1198,5172,6370
473,2012-04-17,2,1,4,0,2,1,1,0.608333,0.598487,0.390417,0.273629,989,5702,6691
474,2012-04-18,2,1,4,0,3,1,2,0.463333,0.457038,0.569167,0.167912,347,4020,4367
475,2012-04-19,2,1,4,0,4,1,1,0.498333,0.493046,0.6125,0.0659292,846,5719,6565
476,2012-04-20,2,1,4,0,5,1,1,0.526667,0.515775,0.694583,0.149871,1340,5950,7290
477,2012-04-21,2,1,4,0,6,0,1,0.57,0.542921,0.682917,0.283587,2541,4083,6624
478,2012-04-22,2,1,4,0,0,0,3,0.396667,0.389504,0.835417,0.344546,120,907,1027
479,2012-04-23,2,1,4,0,1,1,2,0.321667,0.301125,0.766667,0.303496,195,3019,3214
480,2012-04-24,2,1,4,0,2,1,1,0.413333,0.405283,0.454167,0.249383,518,5115,5633
481,2012-04-25,2,1,4,0,3,1,1,0.476667,0.470317,0.427917,0.118792,655,5541,6196
482,2012-04-26,2,1,4,0,4,1,2,0.498333,0.483583,0.756667,0.176625,475,4551,5026
483,2012-04-27,2,1,4,0,5,1,1,0.4575,0.452637,0.400833,0.347633,1014,5219,6233
484,2012-04-28,2,1,4,0,6,0,2,0.376667,0.377504,0.489583,0.129975,1120,3100,4220
485,2012-04-29,2,1,4,0,0,0,1,0.458333,0.450121,0.587083,0.116908,2229,4075,6304
486,2012-04-30,2,1,4,0,1,1,2,0.464167,0.457696,0.57,0.171638,665,4907,5572
487,2012-05-01,2,1,5,0,2,1,2,0.613333,0.577021,0.659583,0.156096,653,5087,5740
488,2012-05-02,2,1,5,0,3,1,1,0.564167,0.537896,0.797083,0.138058,667,5502,6169
489,2012-05-03,2,1,5,0,4,1,2,0.56,0.537242,0.768333,0.133696,764,5657,6421
490,2012-05-04,2,1,5,0,5,1,1,0.6275,0.590917,0.735417,0.162938,1069,5227,6296
491,2012-05-05,2,1,5,0,6,0,2,0.621667,0.584608,0.756667,0.152992,2496,4387,6883
492,2012-05-06,2,1,5,0,0,0,2,0.5625,0.546737,0.74,0.149879,2135,4224,6359
493,2012-05-07,2,1,5,0,1,1,2,0.5375,0.527142,0.664167,0.230721,1008,5265,6273
494,2012-05-08,2,1,5,0,2,1,2,0.581667,0.557471,0.685833,0.296029,738,4990,5728
495,2012-05-09,2,1,5,0,3,1,2,0.575,0.553025,0.744167,0.216412,620,4097,4717
496,2012-05-10,2,1,5,0,4,1,1,0.505833,0.491783,0.552083,0.314063,1026,5546,6572
497,2012-05-11,2,1,5,0,5,1,1,0.533333,0.520833,0.360417,0.236937,1319,5711,7030
498,2012-05-12,2,1,5,0,6,0,1,0.564167,0.544817,0.480417,0.123133,2622,4807,7429
499,2012-05-13,2,1,5,0,0,0,1,0.6125,0.585238,0.57625,0.225117,2172,3946,6118
500,2012-05-14,2,1,5,0,1,1,2,0.573333,0.5499,0.789583,0.212692,342,2501,2843
501,2012-05-15,2,1,5,0,2,1,2,0.611667,0.576404,0.794583,0.147392,625,4490,5115
502,2012-05-16,2,1,5,0,3,1,1,0.636667,0.595975,0.697917,0.122512,991,6433,7424
503,2012-05-17,2,1,5,0,4,1,1,0.593333,0.572613,0.52,0.229475,1242,6142,7384
504,2012-05-18,2,1,5,0,5,1,1,0.564167,0.551121,0.523333,0.136817,1521,6118,7639
505,2012-05-19,2,1,5,0,6,0,1,0.6,0.566908,0.45625,0.083975,3410,4884,8294
506,2012-05-20,2,1,5,0,0,0,1,0.620833,0.583967,0.530417,0.254367,2704,4425,7129
507,2012-05-21,2,1,5,0,1,1,2,0.598333,0.565667,0.81125,0.233204,630,3729,4359
508,2012-05-22,2,1,5,0,2,1,2,0.615,0.580825,0.765833,0.118167,819,5254,6073
509,2012-05-23,2,1,5,0,3,1,2,0.621667,0.584612,0.774583,0.102,766,4494,5260
510,2012-05-24,2,1,5,0,4,1,1,0.655,0.6067,0.716667,0.172896,1059,5711,6770
511,2012-05-25,2,1,5,0,5,1,1,0.68,0.627529,0.747083,0.14055,1417,5317,6734
512,2012-05-26,2,1,5,0,6,0,1,0.6925,0.642696,0.7325,0.198992,2855,3681,6536
513,2012-05-27,2,1,5,0,0,0,1,0.69,0.641425,0.697083,0.215171,3283,3308,6591
514,2012-05-28,2,1,5,1,1,0,1,0.7125,0.6793,0.67625,0.196521,2557,3486,6043
515,2012-05-29,2,1,5,0,2,1,1,0.7225,0.672992,0.684583,0.2954,880,4863,5743
516,2012-05-30,2,1,5,0,3,1,2,0.656667,0.611129,0.67,0.134329,745,6110,6855
517,2012-05-31,2,1,5,0,4,1,1,0.68,0.631329,0.492917,0.195279,1100,6238,7338
518,2012-06-01,2,1,6,0,5,1,2,0.654167,0.607962,0.755417,0.237563,533,3594,4127
519,2012-06-02,2,1,6,0,6,0,1,0.583333,0.566288,0.549167,0.186562,2795,5325,8120
520,2012-06-03,2,1,6,0,0,0,1,0.6025,0.575133,0.493333,0.184087,2494,5147,7641
521,2012-06-04,2,1,6,0,1,1,1,0.5975,0.578283,0.487083,0.284833,1071,5927,6998
522,2012-06-05,2,1,6,0,2,1,2,0.540833,0.525892,0.613333,0.209575,968,6033,7001
523,2012-06-06,2,1,6,0,3,1,1,0.554167,0.542292,0.61125,0.077125,1027,6028,7055
524,2012-06-07,2,1,6,0,4,1,1,0.6025,0.569442,0.567083,0.15735,1038,6456,7494
525,2012-06-08,2,1,6,0,5,1,1,0.649167,0.597862,0.467917,0.175383,1488,6248,7736
526,2012-06-09,2,1,6,0,6,0,1,0.710833,0.648367,0.437083,0.144287,2708,4790,7498
527,2012-06-10,2,1,6,0,0,0,1,0.726667,0.663517,0.538333,0.133721,2224,4374,6598
528,2012-06-11,2,1,6,0,1,1,2,0.720833,0.659721,0.587917,0.207713,1017,5647,6664
529,2012-06-12,2,1,6,0,2,1,2,0.653333,0.597875,0.833333,0.214546,477,4495,4972
530,2012-06-13,2,1,6,0,3,1,1,0.655833,0.611117,0.582083,0.343279,1173,6248,7421
531,2012-06-14,2,1,6,0,4,1,1,0.648333,0.624383,0.569583,0.253733,1180,6183,7363
532,2012-06-15,2,1,6,0,5,1,1,0.639167,0.599754,0.589583,0.176617,1563,6102,7665
533,2012-06-16,2,1,6,0,6,0,1,0.631667,0.594708,0.504167,0.166667,2963,4739,7702
534,2012-06-17,2,1,6,0,0,0,1,0.5925,0.571975,0.59875,0.144904,2634,4344,6978
535,2012-06-18,2,1,6,0,1,1,2,0.568333,0.544842,0.777917,0.174746,653,4446,5099
536,2012-06-19,2,1,6,0,2,1,1,0.688333,0.654692,0.69,0.148017,968,5857,6825
537,2012-06-20,2,1,6,0,3,1,1,0.7825,0.720975,0.592083,0.113812,872,5339,6211
538,2012-06-21,3,1,6,0,4,1,1,0.805833,0.752542,0.567917,0.118787,778,5127,5905
539,2012-06-22,3,1,6,0,5,1,1,0.7775,0.724121,0.57375,0.182842,964,4859,5823
540,2012-06-23,3,1,6,0,6,0,1,0.731667,0.652792,0.534583,0.179721,2657,4801,7458
541,2012-06-24,3,1,6,0,0,0,1,0.743333,0.674254,0.479167,0.145525,2551,4340,6891
542,2012-06-25,3,1,6,0,1,1,1,0.715833,0.654042,0.504167,0.300383,1139,5640,6779
543,2012-06-26,3,1,6,0,2,1,1,0.630833,0.594704,0.373333,0.347642,1077,6365,7442
544,2012-06-27,3,1,6,0,3,1,1,0.6975,0.640792,0.36,0.271775,1077,6258,7335
545,2012-06-28,3,1,6,0,4,1,1,0.749167,0.675512,0.4225,0.17165,921,5958,6879
546,2012-06-29,3,1,6,0,5,1,1,0.834167,0.786613,0.48875,0.165417,829,4634,5463
547,2012-06-30,3,1,6,0,6,0,1,0.765,0.687508,0.60125,0.161071,1455,4232,5687
548,2012-07-01,3,1,7,0,0,0,1,0.815833,0.750629,0.51875,0.168529,1421,4110,5531
549,2012-07-02,3,1,7,0,1,1,1,0.781667,0.702038,0.447083,0.195267,904,5323,6227
550,2012-07-03,3,1,7,0,2,1,1,0.780833,0.70265,0.492083,0.126237,1052,5608,6660
551,2012-07-04,3,1,7,1,3,0,1,0.789167,0.732337,0.53875,0.13495,2562,4841,7403
552,2012-07-05,3,1,7,0,4,1,1,0.8275,0.761367,0.457917,0.194029,1405,4836,6241
553,2012-07-06,3,1,7,0,5,1,1,0.828333,0.752533,0.450833,0.146142,1366,4841,6207
554,2012-07-07,3,1,7,0,6,0,1,0.861667,0.804913,0.492083,0.163554,1448,3392,4840
555,2012-07-08,3,1,7,0,0,0,1,0.8225,0.790396,0.57375,0.125629,1203,3469,4672
556,2012-07-09,3,1,7,0,1,1,2,0.710833,0.654054,0.683333,0.180975,998,5571,6569
557,2012-07-10,3,1,7,0,2,1,2,0.720833,0.664796,0.6675,0.151737,954,5336,6290
558,2012-07-11,3,1,7,0,3,1,1,0.716667,0.650271,0.633333,0.151733,975,6289,7264
559,2012-07-12,3,1,7,0,4,1,1,0.715833,0.654683,0.529583,0.146775,1032,6414,7446
560,2012-07-13,3,1,7,0,5,1,2,0.731667,0.667933,0.485833,0.08085,1511,5988,7499
561,2012-07-14,3,1,7,0,6,0,2,0.703333,0.666042,0.699167,0.143679,2355,4614,6969
562,2012-07-15,3,1,7,0,0,0,1,0.745833,0.705196,0.717917,0.166667,1920,4111,6031
563,2012-07-16,3,1,7,0,1,1,1,0.763333,0.724125,0.645,0.164187,1088,5742,6830
564,2012-07-17,3,1,7,0,2,1,1,0.818333,0.755683,0.505833,0.114429,921,5865,6786
565,2012-07-18,3,1,7,0,3,1,1,0.793333,0.745583,0.577083,0.137442,799,4914,5713
566,2012-07-19,3,1,7,0,4,1,1,0.77,0.714642,0.600417,0.165429,888,5703,6591
567,2012-07-20,3,1,7,0,5,1,2,0.665833,0.613025,0.844167,0.208967,747,5123,5870
568,2012-07-21,3,1,7,0,6,0,3,0.595833,0.549912,0.865417,0.2133,1264,3195,4459
569,2012-07-22,3,1,7,0,0,0,2,0.6675,0.623125,0.7625,0.0939208,2544,4866,7410
570,2012-07-23,3,1,7,0,1,1,1,0.741667,0.690017,0.694167,0.138683,1135,5831,6966
571,2012-07-24,3,1,7,0,2,1,1,0.750833,0.70645,0.655,0.211454,1140,6452,7592
572,2012-07-25,3,1,7,0,3,1,1,0.724167,0.654054,0.45,0.1648,1383,6790,8173
573,2012-07-26,3,1,7,0,4,1,1,0.776667,0.739263,0.596667,0.284813,1036,5825,6861
574,2012-07-27,3,1,7,0,5,1,1,0.781667,0.734217,0.594583,0.152992,1259,5645,6904
575,2012-07-28,3,1,7,0,6,0,1,0.755833,0.697604,0.613333,0.15735,2234,4451,6685
576,2012-07-29,3,1,7,0,0,0,1,0.721667,0.667933,0.62375,0.170396,2153,4444,6597
577,2012-07-30,3,1,7,0,1,1,1,0.730833,0.684987,0.66875,0.153617,1040,6065,7105
578,2012-07-31,3,1,7,0,2,1,1,0.713333,0.662896,0.704167,0.165425,968,6248,7216
579,2012-08-01,3,1,8,0,3,1,1,0.7175,0.667308,0.6775,0.141179,1074,6506,7580
580,2012-08-02,3,1,8,0,4,1,1,0.7525,0.707088,0.659583,0.129354,983,6278,7261
581,2012-08-03,3,1,8,0,5,1,2,0.765833,0.722867,0.6425,0.215792,1328,5847,7175
582,2012-08-04,3,1,8,0,6,0,1,0.793333,0.751267,0.613333,0.257458,2345,4479,6824
583,2012-08-05,3,1,8,0,0,0,1,0.769167,0.731079,0.6525,0.290421,1707,3757,5464
584,2012-08-06,3,1,8,0,1,1,2,0.7525,0.710246,0.654167,0.129354,1233,5780,7013
585,2012-08-07,3,1,8,0,2,1,2,0.735833,0.697621,0.70375,0.116908,1278,5995,7273
586,2012-08-08,3,1,8,0,3,1,2,0.75,0.707717,0.672917,0.1107,1263,6271,7534
587,2012-08-09,3,1,8,0,4,1,1,0.755833,0.699508,0.620417,0.1561,1196,6090,7286
588,2012-08-10,3,1,8,0,5,1,2,0.715833,0.667942,0.715833,0.238813,1065,4721,5786
589,2012-08-11,3,1,8,0,6,0,2,0.6925,0.638267,0.732917,0.206479,2247,4052,6299
590,2012-08-12,3,1,8,0,0,0,1,0.700833,0.644579,0.530417,0.122512,2182,4362,6544
591,2012-08-13,3,1,8,0,1,1,1,0.720833,0.662254,0.545417,0.136212,1207,5676,6883
592,2012-08-14,3,1,8,0,2,1,1,0.726667,0.676779,0.686667,0.169158,1128,5656,6784
593,2012-08-15,3,1,8,0,3,1,1,0.706667,0.654037,0.619583,0.169771,1198,6149,7347
594,2012-08-16,3,1,8,0,4,1,1,0.719167,0.654688,0.519167,0.141796,1338,6267,7605
595,2012-08-17,3,1,8,0,5,1,1,0.723333,0.2424,0.570833,0.231354,1483,5665,7148
596,2012-08-18,3,1,8,0,6,0,1,0.678333,0.618071,0.603333,0.177867,2827,5038,7865
597,2012-08-19,3,1,8,0,0,0,2,0.635833,0.603554,0.711667,0.08645,1208,3341,4549
598,2012-08-20,3,1,8,0,1,1,2,0.635833,0.595967,0.734167,0.129979,1026,5504,6530
599,2012-08-21,3,1,8,0,2,1,1,0.649167,0.601025,0.67375,0.0727708,1081,5925,7006
600,2012-08-22,3,1,8,0,3,1,1,0.6675,0.621854,0.677083,0.0702833,1094,6281,7375
601,2012-08-23,3,1,8,0,4,1,1,0.695833,0.637008,0.635833,0.0845958,1363,6402,7765
602,2012-08-24,3,1,8,0,5,1,2,0.7025,0.6471,0.615,0.0721458,1325,6257,7582
603,2012-08-25,3,1,8,0,6,0,2,0.661667,0.618696,0.712917,0.244408,1829,4224,6053
604,2012-08-26,3,1,8,0,0,0,2,0.653333,0.595996,0.845833,0.228858,1483,3772,5255
605,2012-08-27,3,1,8,0,1,1,1,0.703333,0.654688,0.730417,0.128733,989,5928,6917
606,2012-08-28,3,1,8,0,2,1,1,0.728333,0.66605,0.62,0.190925,935,6105,7040
607,2012-08-29,3,1,8,0,3,1,1,0.685,0.635733,0.552083,0.112562,1177,6520,7697
608,2012-08-30,3,1,8,0,4,1,1,0.706667,0.652779,0.590417,0.0771167,1172,6541,7713
609,2012-08-31,3,1,8,0,5,1,1,0.764167,0.6894,0.5875,0.168533,1433,5917,7350
610,2012-09-01,3,1,9,0,6,0,2,0.753333,0.702654,0.638333,0.113187,2352,3788,6140
611,2012-09-02,3,1,9,0,0,0,2,0.696667,0.649,0.815,0.0640708,2613,3197,5810
612,2012-09-03,3,1,9,1,1,0,1,0.7075,0.661629,0.790833,0.151121,1965,4069,6034
613,2012-09-04,3,1,9,0,2,1,1,0.725833,0.686888,0.755,0.236321,867,5997,6864
614,2012-09-05,3,1,9,0,3,1,1,0.736667,0.708983,0.74125,0.187808,832,6280,7112
615,2012-09-06,3,1,9,0,4,1,2,0.696667,0.655329,0.810417,0.142421,611,5592,6203
616,2012-09-07,3,1,9,0,5,1,1,0.703333,0.657204,0.73625,0.171646,1045,6459,7504
617,2012-09-08,3,1,9,0,6,0,2,0.659167,0.611121,0.799167,0.281104,1557,4419,5976
618,2012-09-09,3,1,9,0,0,0,1,0.61,0.578925,0.5475,0.224496,2570,5657,8227
619,2012-09-10,3,1,9,0,1,1,1,0.583333,0.565654,0.50375,0.258713,1118,6407,7525
620,2012-09-11,3,1,9,0,2,1,1,0.5775,0.554292,0.52,0.0920542,1070,6697,7767
621,2012-09-12,3,1,9,0,3,1,1,0.599167,0.570075,0.577083,0.131846,1050,6820,7870
622,2012-09-13,3,1,9,0,4,1,1,0.6125,0.579558,0.637083,0.0827208,1054,6750,7804
623,2012-09-14,3,1,9,0,5,1,1,0.633333,0.594083,0.6725,0.103863,1379,6630,8009
624,2012-09-15,3,1,9,0,6,0,1,0.608333,0.585867,0.501667,0.247521,3160,5554,8714
625,2012-09-16,3,1,9,0,0,0,1,0.58,0.563125,0.57,0.0901833,2166,5167,7333
626,2012-09-17,3,1,9,0,1,1,2,0.580833,0.55305,0.734583,0.151742,1022,5847,6869
627,2012-09-18,3,1,9,0,2,1,2,0.623333,0.565067,0.8725,0.357587,371,3702,4073
628,2012-09-19,3,1,9,0,3,1,1,0.5525,0.540404,0.536667,0.215175,788,6803,7591
629,2012-09-20,3,1,9,0,4,1,1,0.546667,0.532192,0.618333,0.118167,939,6781,7720
630,2012-09-21,3,1,9,0,5,1,1,0.599167,0.571971,0.66875,0.154229,1250,6917,8167
631,2012-09-22,3,1,9,0,6,0,1,0.65,0.610488,0.646667,0.283583,2512,5883,8395
632,2012-09-23,4,1,9,0,0,0,1,0.529167,0.518933,0.467083,0.223258,2454,5453,7907
633,2012-09-24,4,1,9,0,1,1,1,0.514167,0.502513,0.492917,0.142404,1001,6435,7436
634,2012-09-25,4,1,9,0,2,1,1,0.55,0.544179,0.57,0.236321,845,6693,7538
635,2012-09-26,4,1,9,0,3,1,1,0.635,0.596613,0.630833,0.2444,787,6946,7733
636,2012-09-27,4,1,9,0,4,1,2,0.65,0.607975,0.690833,0.134342,751,6642,7393
637,2012-09-28,4,1,9,0,5,1,2,0.619167,0.585863,0.69,0.164179,1045,6370,7415
638,2012-09-29,4,1,9,0,6,0,1,0.5425,0.530296,0.542917,0.227604,2589,5966,8555
639,2012-09-30,4,1,9,0,0,0,1,0.526667,0.517663,0.583333,0.134958,2015,4874,6889
640,2012-10-01,4,1,10,0,1,1,2,0.520833,0.512,0.649167,0.0908042,763,6015,6778
641,2012-10-02,4,1,10,0,2,1,3,0.590833,0.542333,0.871667,0.104475,315,4324,4639
642,2012-10-03,4,1,10,0,3,1,2,0.6575,0.599133,0.79375,0.0665458,728,6844,7572
643,2012-10-04,4,1,10,0,4,1,2,0.6575,0.607975,0.722917,0.117546,891,6437,7328
644,2012-10-05,4,1,10,0,5,1,1,0.615,0.580187,0.6275,0.10635,1516,6640,8156
645,2012-10-06,4,1,10,0,6,0,1,0.554167,0.538521,0.664167,0.268025,3031,4934,7965
646,2012-10-07,4,1,10,0,0,0,2,0.415833,0.419813,0.708333,0.141162,781,2729,3510
647,2012-10-08,4,1,10,1,1,0,2,0.383333,0.387608,0.709583,0.189679,874,4604,5478
648,2012-10-09,4,1,10,0,2,1,2,0.446667,0.438112,0.761667,0.1903,601,5791,6392
649,2012-10-10,4,1,10,0,3,1,1,0.514167,0.503142,0.630833,0.187821,780,6911,7691
650,2012-10-11,4,1,10,0,4,1,1,0.435,0.431167,0.463333,0.181596,834,6736,7570
651,2012-10-12,4,1,10,0,5,1,1,0.4375,0.433071,0.539167,0.235092,1060,6222,7282
652,2012-10-13,4,1,10,0,6,0,1,0.393333,0.391396,0.494583,0.146142,2252,4857,7109
653,2012-10-14,4,1,10,0,0,0,1,0.521667,0.508204,0.640417,0.278612,2080,4559,6639
654,2012-10-15,4,1,10,0,1,1,2,0.561667,0.53915,0.7075,0.296037,760,5115,5875
655,2012-10-16,4,1,10,0,2,1,1,0.468333,0.460846,0.558333,0.182221,922,6612,7534
656,2012-10-17,4,1,10,0,3,1,1,0.455833,0.450108,0.692917,0.101371,979,6482,7461
657,2012-10-18,4,1,10,0,4,1,2,0.5225,0.512625,0.728333,0.236937,1008,6501,7509
658,2012-10-19,4,1,10,0,5,1,2,0.563333,0.537896,0.815,0.134954,753,4671,5424
659,2012-10-20,4,1,10,0,6,0,1,0.484167,0.472842,0.572917,0.117537,2806,5284,8090
660,2012-10-21,4,1,10,0,0,0,1,0.464167,0.456429,0.51,0.166054,2132,4692,6824
661,2012-10-22,4,1,10,0,1,1,1,0.4875,0.482942,0.568333,0.0814833,830,6228,7058
662,2012-10-23,4,1,10,0,2,1,1,0.544167,0.530304,0.641667,0.0945458,841,6625,7466
663,2012-10-24,4,1,10,0,3,1,1,0.5875,0.558721,0.63625,0.0727792,795,6898,7693
664,2012-10-25,4,1,10,0,4,1,2,0.55,0.529688,0.800417,0.124375,875,6484,7359
665,2012-10-26,4,1,10,0,5,1,2,0.545833,0.52275,0.807083,0.132467,1182,6262,7444
666,2012-10-27,4,1,10,0,6,0,2,0.53,0.515133,0.72,0.235692,2643,5209,7852
667,2012-10-28,4,1,10,0,0,0,2,0.4775,0.467771,0.694583,0.398008,998,3461,4459
668,2012-10-29,4,1,10,0,1,1,3,0.44,0.4394,0.88,0.3582,2,20,22
669,2012-10-30,4,1,10,0,2,1,2,0.318182,0.309909,0.825455,0.213009,87,1009,1096
670,2012-10-31,4,1,10,0,3,1,2,0.3575,0.3611,0.666667,0.166667,419,5147,5566
671,2012-11-01,4,1,11,0,4,1,2,0.365833,0.369942,0.581667,0.157346,466,5520,5986
672,2012-11-02,4,1,11,0,5,1,1,0.355,0.356042,0.522083,0.266175,618,5229,5847
673,2012-11-03,4,1,11,0,6,0,2,0.343333,0.323846,0.49125,0.270529,1029,4109,5138
674,2012-11-04,4,1,11,0,0,0,1,0.325833,0.329538,0.532917,0.179108,1201,3906,5107
675,2012-11-05,4,1,11,0,1,1,1,0.319167,0.308075,0.494167,0.236325,378,4881,5259
676,2012-11-06,4,1,11,0,2,1,1,0.280833,0.281567,0.567083,0.173513,466,5220,5686
677,2012-11-07,4,1,11,0,3,1,2,0.295833,0.274621,0.5475,0.304108,326,4709,5035
678,2012-11-08,4,1,11,0,4,1,1,0.352174,0.341891,0.333478,0.347835,340,4975,5315
679,2012-11-09,4,1,11,0,5,1,1,0.361667,0.355413,0.540833,0.214558,709,5283,5992
680,2012-11-10,4,1,11,0,6,0,1,0.389167,0.393937,0.645417,0.0578458,2090,4446,6536
681,2012-11-11,4,1,11,0,0,0,1,0.420833,0.421713,0.659167,0.1275,2290,4562,6852
682,2012-11-12,4,1,11,1,1,0,1,0.485,0.475383,0.741667,0.173517,1097,5172,6269
683,2012-11-13,4,1,11,0,2,1,2,0.343333,0.323225,0.662917,0.342046,327,3767,4094
684,2012-11-14,4,1,11,0,3,1,1,0.289167,0.281563,0.552083,0.199625,373,5122,5495
685,2012-11-15,4,1,11,0,4,1,2,0.321667,0.324492,0.620417,0.152987,320,5125,5445
686,2012-11-16,4,1,11,0,5,1,1,0.345,0.347204,0.524583,0.171025,484,5214,5698
687,2012-11-17,4,1,11,0,6,0,1,0.325,0.326383,0.545417,0.179729,1313,4316,5629
688,2012-11-18,4,1,11,0,0,0,1,0.3425,0.337746,0.692917,0.227612,922,3747,4669
689,2012-11-19,4,1,11,0,1,1,2,0.380833,0.375621,0.623333,0.235067,449,5050,5499
690,2012-11-20,4,1,11,0,2,1,2,0.374167,0.380667,0.685,0.082725,534,5100,5634
691,2012-11-21,4,1,11,0,3,1,1,0.353333,0.364892,0.61375,0.103246,615,4531,5146
692,2012-11-22,4,1,11,1,4,0,1,0.34,0.350371,0.580417,0.0528708,955,1470,2425
693,2012-11-23,4,1,11,0,5,1,1,0.368333,0.378779,0.56875,0.148021,1603,2307,3910
694,2012-11-24,4,1,11,0,6,0,1,0.278333,0.248742,0.404583,0.376871,532,1745,2277
695,2012-11-25,4,1,11,0,0,0,1,0.245833,0.257583,0.468333,0.1505,309,2115,2424
696,2012-11-26,4,1,11,0,1,1,1,0.313333,0.339004,0.535417,0.04665,337,4750,5087
697,2012-11-27,4,1,11,0,2,1,2,0.291667,0.281558,0.786667,0.237562,123,3836,3959
698,2012-11-28,4,1,11,0,3,1,1,0.296667,0.289762,0.50625,0.210821,198,5062,5260
699,2012-11-29,4,1,11,0,4,1,1,0.28087,0.298422,0.555652,0.115522,243,5080,5323
700,2012-11-30,4,1,11,0,5,1,1,0.298333,0.323867,0.649583,0.0584708,362,5306,5668
701,2012-12-01,4,1,12,0,6,0,2,0.298333,0.316904,0.806667,0.0597042,951,4240,5191
702,2012-12-02,4,1,12,0,0,0,2,0.3475,0.359208,0.823333,0.124379,892,3757,4649
703,2012-12-03,4,1,12,0,1,1,1,0.4525,0.455796,0.7675,0.0827208,555,5679,6234
704,2012-12-04,4,1,12,0,2,1,1,0.475833,0.469054,0.73375,0.174129,551,6055,6606
705,2012-12-05,4,1,12,0,3,1,1,0.438333,0.428012,0.485,0.324021,331,5398,5729
706,2012-12-06,4,1,12,0,4,1,1,0.255833,0.258204,0.50875,0.174754,340,5035,5375
707,2012-12-07,4,1,12,0,5,1,2,0.320833,0.321958,0.764167,0.1306,349,4659,5008
708,2012-12-08,4,1,12,0,6,0,2,0.381667,0.389508,0.91125,0.101379,1153,4429,5582
709,2012-12-09,4,1,12,0,0,0,2,0.384167,0.390146,0.905417,0.157975,441,2787,3228
710,2012-12-10,4,1,12,0,1,1,2,0.435833,0.435575,0.925,0.190308,329,4841,5170
711,2012-12-11,4,1,12,0,2,1,2,0.353333,0.338363,0.596667,0.296037,282,5219,5501
712,2012-12-12,4,1,12,0,3,1,2,0.2975,0.297338,0.538333,0.162937,310,5009,5319
713,2012-12-13,4,1,12,0,4,1,1,0.295833,0.294188,0.485833,0.174129,425,5107,5532
714,2012-12-14,4,1,12,0,5,1,1,0.281667,0.294192,0.642917,0.131229,429,5182,5611
715,2012-12-15,4,1,12,0,6,0,1,0.324167,0.338383,0.650417,0.10635,767,4280,5047
716,2012-12-16,4,1,12,0,0,0,2,0.3625,0.369938,0.83875,0.100742,538,3248,3786
717,2012-12-17,4,1,12,0,1,1,2,0.393333,0.4015,0.907083,0.0982583,212,4373,4585
718,2012-12-18,4,1,12,0,2,1,1,0.410833,0.409708,0.66625,0.221404,433,5124,5557
719,2012-12-19,4,1,12,0,3,1,1,0.3325,0.342162,0.625417,0.184092,333,4934,5267
720,2012-12-20,4,1,12,0,4,1,2,0.33,0.335217,0.667917,0.132463,314,3814,4128
721,2012-12-21,1,1,12,0,5,1,2,0.326667,0.301767,0.556667,0.374383,221,3402,3623
722,2012-12-22,1,1,12,0,6,0,1,0.265833,0.236113,0.44125,0.407346,205,1544,1749
723,2012-12-23,1,1,12,0,0,0,1,0.245833,0.259471,0.515417,0.133083,408,1379,1787
724,2012-12-24,1,1,12,0,1,1,2,0.231304,0.2589,0.791304,0.0772304,174,746,920
725,2012-12-25,1,1,12,1,2,0,2,0.291304,0.294465,0.734783,0.168726,440,573,1013
726,2012-12-26,1,1,12,0,3,1,3,0.243333,0.220333,0.823333,0.316546,9,432,441
727,2012-12-27,1,1,12,0,4,1,2,0.254167,0.226642,0.652917,0.350133,247,1867,2114
728,2012-12-28,1,1,12,0,5,1,2,0.253333,0.255046,0.59,0.155471,644,2451,3095
729,2012-12-29,1,1,12,0,6,0,2,0.253333,0.2424,0.752917,0.124383,159,1182,1341
730,2012-12-30,1,1,12,0,0,0,1,0.255833,0.2317,0.483333,0.350754,364,1432,1796
731,2012-12-31,1,1,12,0,1,1,2,0.215833,0.223487,0.5775,0.154846,439,2290,2729
1 instant dteday season yr mnth holiday weekday workingday weathersit temp atemp hum windspeed casual registered cnt
2 1 2011-01-01 1 0 1 0 6 0 2 0.344167 0.363625 0.805833 0.160446 331 654 985
3 2 2011-01-02 1 0 1 0 0 0 2 0.363478 0.353739 0.696087 0.248539 131 670 801
4 3 2011-01-03 1 0 1 0 1 1 1 0.196364 0.189405 0.437273 0.248309 120 1229 1349
5 4 2011-01-04 1 0 1 0 2 1 1 0.2 0.212122 0.590435 0.160296 108 1454 1562
6 5 2011-01-05 1 0 1 0 3 1 1 0.226957 0.22927 0.436957 0.1869 82 1518 1600
7 6 2011-01-06 1 0 1 0 4 1 1 0.204348 0.233209 0.518261 0.0895652 88 1518 1606
8 7 2011-01-07 1 0 1 0 5 1 2 0.196522 0.208839 0.498696 0.168726 148 1362 1510
9 8 2011-01-08 1 0 1 0 6 0 2 0.165 0.162254 0.535833 0.266804 68 891 959
10 9 2011-01-09 1 0 1 0 0 0 1 0.138333 0.116175 0.434167 0.36195 54 768 822
11 10 2011-01-10 1 0 1 0 1 1 1 0.150833 0.150888 0.482917 0.223267 41 1280 1321
12 11 2011-01-11 1 0 1 0 2 1 2 0.169091 0.191464 0.686364 0.122132 43 1220 1263
13 12 2011-01-12 1 0 1 0 3 1 1 0.172727 0.160473 0.599545 0.304627 25 1137 1162
14 13 2011-01-13 1 0 1 0 4 1 1 0.165 0.150883 0.470417 0.301 38 1368 1406
15 14 2011-01-14 1 0 1 0 5 1 1 0.16087 0.188413 0.537826 0.126548 54 1367 1421
16 15 2011-01-15 1 0 1 0 6 0 2 0.233333 0.248112 0.49875 0.157963 222 1026 1248
17 16 2011-01-16 1 0 1 0 0 0 1 0.231667 0.234217 0.48375 0.188433 251 953 1204
18 17 2011-01-17 1 0 1 1 1 0 2 0.175833 0.176771 0.5375 0.194017 117 883 1000
19 18 2011-01-18 1 0 1 0 2 1 2 0.216667 0.232333 0.861667 0.146775 9 674 683
20 19 2011-01-19 1 0 1 0 3 1 2 0.292174 0.298422 0.741739 0.208317 78 1572 1650
21 20 2011-01-20 1 0 1 0 4 1 2 0.261667 0.25505 0.538333 0.195904 83 1844 1927
22 21 2011-01-21 1 0 1 0 5 1 1 0.1775 0.157833 0.457083 0.353242 75 1468 1543
23 22 2011-01-22 1 0 1 0 6 0 1 0.0591304 0.0790696 0.4 0.17197 93 888 981
24 23 2011-01-23 1 0 1 0 0 0 1 0.0965217 0.0988391 0.436522 0.2466 150 836 986
25 24 2011-01-24 1 0 1 0 1 1 1 0.0973913 0.11793 0.491739 0.15833 86 1330 1416
26 25 2011-01-25 1 0 1 0 2 1 2 0.223478 0.234526 0.616957 0.129796 186 1799 1985
27 26 2011-01-26 1 0 1 0 3 1 3 0.2175 0.2036 0.8625 0.29385 34 472 506
28 27 2011-01-27 1 0 1 0 4 1 1 0.195 0.2197 0.6875 0.113837 15 416 431
29 28 2011-01-28 1 0 1 0 5 1 2 0.203478 0.223317 0.793043 0.1233 38 1129 1167
30 29 2011-01-29 1 0 1 0 6 0 1 0.196522 0.212126 0.651739 0.145365 123 975 1098
31 30 2011-01-30 1 0 1 0 0 0 1 0.216522 0.250322 0.722174 0.0739826 140 956 1096
32 31 2011-01-31 1 0 1 0 1 1 2 0.180833 0.18625 0.60375 0.187192 42 1459 1501
33 32 2011-02-01 1 0 2 0 2 1 2 0.192174 0.23453 0.829565 0.053213 47 1313 1360
34 33 2011-02-02 1 0 2 0 3 1 2 0.26 0.254417 0.775417 0.264308 72 1454 1526
35 34 2011-02-03 1 0 2 0 4 1 1 0.186957 0.177878 0.437826 0.277752 61 1489 1550
36 35 2011-02-04 1 0 2 0 5 1 2 0.211304 0.228587 0.585217 0.127839 88 1620 1708
37 36 2011-02-05 1 0 2 0 6 0 2 0.233333 0.243058 0.929167 0.161079 100 905 1005
38 37 2011-02-06 1 0 2 0 0 0 1 0.285833 0.291671 0.568333 0.1418 354 1269 1623
39 38 2011-02-07 1 0 2 0 1 1 1 0.271667 0.303658 0.738333 0.0454083 120 1592 1712
40 39 2011-02-08 1 0 2 0 2 1 1 0.220833 0.198246 0.537917 0.36195 64 1466 1530
41 40 2011-02-09 1 0 2 0 3 1 2 0.134783 0.144283 0.494783 0.188839 53 1552 1605
42 41 2011-02-10 1 0 2 0 4 1 1 0.144348 0.149548 0.437391 0.221935 47 1491 1538
43 42 2011-02-11 1 0 2 0 5 1 1 0.189091 0.213509 0.506364 0.10855 149 1597 1746
44 43 2011-02-12 1 0 2 0 6 0 1 0.2225 0.232954 0.544167 0.203367 288 1184 1472
45 44 2011-02-13 1 0 2 0 0 0 1 0.316522 0.324113 0.457391 0.260883 397 1192 1589
46 45 2011-02-14 1 0 2 0 1 1 1 0.415 0.39835 0.375833 0.417908 208 1705 1913
47 46 2011-02-15 1 0 2 0 2 1 1 0.266087 0.254274 0.314348 0.291374 140 1675 1815
48 47 2011-02-16 1 0 2 0 3 1 1 0.318261 0.3162 0.423478 0.251791 218 1897 2115
49 48 2011-02-17 1 0 2 0 4 1 1 0.435833 0.428658 0.505 0.230104 259 2216 2475
50 49 2011-02-18 1 0 2 0 5 1 1 0.521667 0.511983 0.516667 0.264925 579 2348 2927
51 50 2011-02-19 1 0 2 0 6 0 1 0.399167 0.391404 0.187917 0.507463 532 1103 1635
52 51 2011-02-20 1 0 2 0 0 0 1 0.285217 0.27733 0.407826 0.223235 639 1173 1812
53 52 2011-02-21 1 0 2 1 1 0 2 0.303333 0.284075 0.605 0.307846 195 912 1107
54 53 2011-02-22 1 0 2 0 2 1 1 0.182222 0.186033 0.577778 0.195683 74 1376 1450
55 54 2011-02-23 1 0 2 0 3 1 1 0.221739 0.245717 0.423043 0.094113 139 1778 1917
56 55 2011-02-24 1 0 2 0 4 1 2 0.295652 0.289191 0.697391 0.250496 100 1707 1807
57 56 2011-02-25 1 0 2 0 5 1 2 0.364348 0.350461 0.712174 0.346539 120 1341 1461
58 57 2011-02-26 1 0 2 0 6 0 1 0.2825 0.282192 0.537917 0.186571 424 1545 1969
59 58 2011-02-27 1 0 2 0 0 0 1 0.343478 0.351109 0.68 0.125248 694 1708 2402
60 59 2011-02-28 1 0 2 0 1 1 2 0.407273 0.400118 0.876364 0.289686 81 1365 1446
61 60 2011-03-01 1 0 3 0 2 1 1 0.266667 0.263879 0.535 0.216425 137 1714 1851
62 61 2011-03-02 1 0 3 0 3 1 1 0.335 0.320071 0.449583 0.307833 231 1903 2134
63 62 2011-03-03 1 0 3 0 4 1 1 0.198333 0.200133 0.318333 0.225754 123 1562 1685
64 63 2011-03-04 1 0 3 0 5 1 2 0.261667 0.255679 0.610417 0.203346 214 1730 1944
65 64 2011-03-05 1 0 3 0 6 0 2 0.384167 0.378779 0.789167 0.251871 640 1437 2077
66 65 2011-03-06 1 0 3 0 0 0 2 0.376522 0.366252 0.948261 0.343287 114 491 605
67 66 2011-03-07 1 0 3 0 1 1 1 0.261739 0.238461 0.551304 0.341352 244 1628 1872
68 67 2011-03-08 1 0 3 0 2 1 1 0.2925 0.3024 0.420833 0.12065 316 1817 2133
69 68 2011-03-09 1 0 3 0 3 1 2 0.295833 0.286608 0.775417 0.22015 191 1700 1891
70 69 2011-03-10 1 0 3 0 4 1 3 0.389091 0.385668 0 0.261877 46 577 623
71 70 2011-03-11 1 0 3 0 5 1 2 0.316522 0.305 0.649565 0.23297 247 1730 1977
72 71 2011-03-12 1 0 3 0 6 0 1 0.329167 0.32575 0.594583 0.220775 724 1408 2132
73 72 2011-03-13 1 0 3 0 0 0 1 0.384348 0.380091 0.527391 0.270604 982 1435 2417
74 73 2011-03-14 1 0 3 0 1 1 1 0.325217 0.332 0.496957 0.136926 359 1687 2046
75 74 2011-03-15 1 0 3 0 2 1 2 0.317391 0.318178 0.655652 0.184309 289 1767 2056
76 75 2011-03-16 1 0 3 0 3 1 2 0.365217 0.36693 0.776522 0.203117 321 1871 2192
77 76 2011-03-17 1 0 3 0 4 1 1 0.415 0.410333 0.602917 0.209579 424 2320 2744
78 77 2011-03-18 1 0 3 0 5 1 1 0.54 0.527009 0.525217 0.231017 884 2355 3239
79 78 2011-03-19 1 0 3 0 6 0 1 0.4725 0.466525 0.379167 0.368167 1424 1693 3117
80 79 2011-03-20 1 0 3 0 0 0 1 0.3325 0.32575 0.47375 0.207721 1047 1424 2471
81 80 2011-03-21 2 0 3 0 1 1 2 0.430435 0.409735 0.737391 0.288783 401 1676 2077
82 81 2011-03-22 2 0 3 0 2 1 1 0.441667 0.440642 0.624583 0.22575 460 2243 2703
83 82 2011-03-23 2 0 3 0 3 1 2 0.346957 0.337939 0.839565 0.234261 203 1918 2121
84 83 2011-03-24 2 0 3 0 4 1 2 0.285 0.270833 0.805833 0.243787 166 1699 1865
85 84 2011-03-25 2 0 3 0 5 1 1 0.264167 0.256312 0.495 0.230725 300 1910 2210
86 85 2011-03-26 2 0 3 0 6 0 1 0.265833 0.257571 0.394167 0.209571 981 1515 2496
87 86 2011-03-27 2 0 3 0 0 0 2 0.253043 0.250339 0.493913 0.1843 472 1221 1693
88 87 2011-03-28 2 0 3 0 1 1 1 0.264348 0.257574 0.302174 0.212204 222 1806 2028
89 88 2011-03-29 2 0 3 0 2 1 1 0.3025 0.292908 0.314167 0.226996 317 2108 2425
90 89 2011-03-30 2 0 3 0 3 1 2 0.3 0.29735 0.646667 0.172888 168 1368 1536
91 90 2011-03-31 2 0 3 0 4 1 3 0.268333 0.257575 0.918333 0.217646 179 1506 1685
92 91 2011-04-01 2 0 4 0 5 1 2 0.3 0.283454 0.68625 0.258708 307 1920 2227
93 92 2011-04-02 2 0 4 0 6 0 2 0.315 0.315637 0.65375 0.197146 898 1354 2252
94 93 2011-04-03 2 0 4 0 0 0 1 0.378333 0.378767 0.48 0.182213 1651 1598 3249
95 94 2011-04-04 2 0 4 0 1 1 1 0.573333 0.542929 0.42625 0.385571 734 2381 3115
96 95 2011-04-05 2 0 4 0 2 1 2 0.414167 0.39835 0.642083 0.388067 167 1628 1795
97 96 2011-04-06 2 0 4 0 3 1 1 0.390833 0.387608 0.470833 0.263063 413 2395 2808
98 97 2011-04-07 2 0 4 0 4 1 1 0.4375 0.433696 0.602917 0.162312 571 2570 3141
99 98 2011-04-08 2 0 4 0 5 1 2 0.335833 0.324479 0.83625 0.226992 172 1299 1471
100 99 2011-04-09 2 0 4 0 6 0 2 0.3425 0.341529 0.8775 0.133083 879 1576 2455
101 100 2011-04-10 2 0 4 0 0 0 2 0.426667 0.426737 0.8575 0.146767 1188 1707 2895
102 101 2011-04-11 2 0 4 0 1 1 2 0.595652 0.565217 0.716956 0.324474 855 2493 3348
103 102 2011-04-12 2 0 4 0 2 1 2 0.5025 0.493054 0.739167 0.274879 257 1777 2034
104 103 2011-04-13 2 0 4 0 3 1 2 0.4125 0.417283 0.819167 0.250617 209 1953 2162
105 104 2011-04-14 2 0 4 0 4 1 1 0.4675 0.462742 0.540417 0.1107 529 2738 3267
106 105 2011-04-15 2 0 4 1 5 0 1 0.446667 0.441913 0.67125 0.226375 642 2484 3126
107 106 2011-04-16 2 0 4 0 6 0 3 0.430833 0.425492 0.888333 0.340808 121 674 795
108 107 2011-04-17 2 0 4 0 0 0 1 0.456667 0.445696 0.479583 0.303496 1558 2186 3744
109 108 2011-04-18 2 0 4 0 1 1 1 0.5125 0.503146 0.5425 0.163567 669 2760 3429
110 109 2011-04-19 2 0 4 0 2 1 2 0.505833 0.489258 0.665833 0.157971 409 2795 3204
111 110 2011-04-20 2 0 4 0 3 1 1 0.595 0.564392 0.614167 0.241925 613 3331 3944
112 111 2011-04-21 2 0 4 0 4 1 1 0.459167 0.453892 0.407083 0.325258 745 3444 4189
113 112 2011-04-22 2 0 4 0 5 1 2 0.336667 0.321954 0.729583 0.219521 177 1506 1683
114 113 2011-04-23 2 0 4 0 6 0 2 0.46 0.450121 0.887917 0.230725 1462 2574 4036
115 114 2011-04-24 2 0 4 0 0 0 2 0.581667 0.551763 0.810833 0.192175 1710 2481 4191
116 115 2011-04-25 2 0 4 0 1 1 1 0.606667 0.5745 0.776667 0.185333 773 3300 4073
117 116 2011-04-26 2 0 4 0 2 1 1 0.631667 0.594083 0.729167 0.3265 678 3722 4400
118 117 2011-04-27 2 0 4 0 3 1 2 0.62 0.575142 0.835417 0.3122 547 3325 3872
119 118 2011-04-28 2 0 4 0 4 1 2 0.6175 0.578929 0.700833 0.320908 569 3489 4058
120 119 2011-04-29 2 0 4 0 5 1 1 0.51 0.497463 0.457083 0.240063 878 3717 4595
121 120 2011-04-30 2 0 4 0 6 0 1 0.4725 0.464021 0.503333 0.235075 1965 3347 5312
122 121 2011-05-01 2 0 5 0 0 0 2 0.451667 0.448204 0.762083 0.106354 1138 2213 3351
123 122 2011-05-02 2 0 5 0 1 1 2 0.549167 0.532833 0.73 0.183454 847 3554 4401
124 123 2011-05-03 2 0 5 0 2 1 2 0.616667 0.582079 0.697083 0.342667 603 3848 4451
125 124 2011-05-04 2 0 5 0 3 1 2 0.414167 0.40465 0.737083 0.328996 255 2378 2633
126 125 2011-05-05 2 0 5 0 4 1 1 0.459167 0.441917 0.444167 0.295392 614 3819 4433
127 126 2011-05-06 2 0 5 0 5 1 1 0.479167 0.474117 0.59 0.228246 894 3714 4608
128 127 2011-05-07 2 0 5 0 6 0 1 0.52 0.512621 0.54125 0.16045 1612 3102 4714
129 128 2011-05-08 2 0 5 0 0 0 1 0.528333 0.518933 0.631667 0.0746375 1401 2932 4333
130 129 2011-05-09 2 0 5 0 1 1 1 0.5325 0.525246 0.58875 0.176 664 3698 4362
131 130 2011-05-10 2 0 5 0 2 1 1 0.5325 0.522721 0.489167 0.115671 694 4109 4803
132 131 2011-05-11 2 0 5 0 3 1 1 0.5425 0.5284 0.632917 0.120642 550 3632 4182
133 132 2011-05-12 2 0 5 0 4 1 1 0.535 0.523363 0.7475 0.189667 695 4169 4864
134 133 2011-05-13 2 0 5 0 5 1 2 0.5125 0.4943 0.863333 0.179725 692 3413 4105
135 134 2011-05-14 2 0 5 0 6 0 2 0.520833 0.500629 0.9225 0.13495 902 2507 3409
136 135 2011-05-15 2 0 5 0 0 0 2 0.5625 0.536 0.867083 0.152979 1582 2971 4553
137 136 2011-05-16 2 0 5 0 1 1 1 0.5775 0.550512 0.787917 0.126871 773 3185 3958
138 137 2011-05-17 2 0 5 0 2 1 2 0.561667 0.538529 0.837917 0.277354 678 3445 4123
139 138 2011-05-18 2 0 5 0 3 1 2 0.55 0.527158 0.87 0.201492 536 3319 3855
140 139 2011-05-19 2 0 5 0 4 1 2 0.530833 0.510742 0.829583 0.108213 735 3840 4575
141 140 2011-05-20 2 0 5 0 5 1 1 0.536667 0.529042 0.719583 0.125013 909 4008 4917
142 141 2011-05-21 2 0 5 0 6 0 1 0.6025 0.571975 0.626667 0.12065 2258 3547 5805
143 142 2011-05-22 2 0 5 0 0 0 1 0.604167 0.5745 0.749583 0.148008 1576 3084 4660
144 143 2011-05-23 2 0 5 0 1 1 2 0.631667 0.590296 0.81 0.233842 836 3438 4274
145 144 2011-05-24 2 0 5 0 2 1 2 0.66 0.604813 0.740833 0.207092 659 3833 4492
146 145 2011-05-25 2 0 5 0 3 1 1 0.660833 0.615542 0.69625 0.154233 740 4238 4978
147 146 2011-05-26 2 0 5 0 4 1 1 0.708333 0.654688 0.6775 0.199642 758 3919 4677
148 147 2011-05-27 2 0 5 0 5 1 1 0.681667 0.637008 0.65375 0.240679 871 3808 4679
149 148 2011-05-28 2 0 5 0 6 0 1 0.655833 0.612379 0.729583 0.230092 2001 2757 4758
150 149 2011-05-29 2 0 5 0 0 0 1 0.6675 0.61555 0.81875 0.213938 2355 2433 4788
151 150 2011-05-30 2 0 5 1 1 0 1 0.733333 0.671092 0.685 0.131225 1549 2549 4098
152 151 2011-05-31 2 0 5 0 2 1 1 0.775 0.725383 0.636667 0.111329 673 3309 3982
153 152 2011-06-01 2 0 6 0 3 1 2 0.764167 0.720967 0.677083 0.207092 513 3461 3974
154 153 2011-06-02 2 0 6 0 4 1 1 0.715 0.643942 0.305 0.292287 736 4232 4968
155 154 2011-06-03 2 0 6 0 5 1 1 0.62 0.587133 0.354167 0.253121 898 4414 5312
156 155 2011-06-04 2 0 6 0 6 0 1 0.635 0.594696 0.45625 0.123142 1869 3473 5342
157 156 2011-06-05 2 0 6 0 0 0 2 0.648333 0.616804 0.6525 0.138692 1685 3221 4906
158 157 2011-06-06 2 0 6 0 1 1 1 0.678333 0.621858 0.6 0.121896 673 3875 4548
159 158 2011-06-07 2 0 6 0 2 1 1 0.7075 0.65595 0.597917 0.187808 763 4070 4833
160 159 2011-06-08 2 0 6 0 3 1 1 0.775833 0.727279 0.622083 0.136817 676 3725 4401
161 160 2011-06-09 2 0 6 0 4 1 2 0.808333 0.757579 0.568333 0.149883 563 3352 3915
162 161 2011-06-10 2 0 6 0 5 1 1 0.755 0.703292 0.605 0.140554 815 3771 4586
163 162 2011-06-11 2 0 6 0 6 0 1 0.725 0.678038 0.654583 0.15485 1729 3237 4966
164 163 2011-06-12 2 0 6 0 0 0 1 0.6925 0.643325 0.747917 0.163567 1467 2993 4460
165 164 2011-06-13 2 0 6 0 1 1 1 0.635 0.601654 0.494583 0.30535 863 4157 5020
166 165 2011-06-14 2 0 6 0 2 1 1 0.604167 0.591546 0.507083 0.269283 727 4164 4891
167 166 2011-06-15 2 0 6 0 3 1 1 0.626667 0.587754 0.471667 0.167912 769 4411 5180
168 167 2011-06-16 2 0 6 0 4 1 2 0.628333 0.595346 0.688333 0.206471 545 3222 3767
169 168 2011-06-17 2 0 6 0 5 1 1 0.649167 0.600383 0.735833 0.143029 863 3981 4844
170 169 2011-06-18 2 0 6 0 6 0 1 0.696667 0.643954 0.670417 0.119408 1807 3312 5119
171 170 2011-06-19 2 0 6 0 0 0 2 0.699167 0.645846 0.666667 0.102 1639 3105 4744
172 171 2011-06-20 2 0 6 0 1 1 2 0.635 0.595346 0.74625 0.155475 699 3311 4010
173 172 2011-06-21 3 0 6 0 2 1 2 0.680833 0.637646 0.770417 0.171025 774 4061 4835
174 173 2011-06-22 3 0 6 0 3 1 1 0.733333 0.693829 0.7075 0.172262 661 3846 4507
175 174 2011-06-23 3 0 6 0 4 1 2 0.728333 0.693833 0.703333 0.238804 746 4044 4790
176 175 2011-06-24 3 0 6 0 5 1 1 0.724167 0.656583 0.573333 0.222025 969 4022 4991
177 176 2011-06-25 3 0 6 0 6 0 1 0.695 0.643313 0.483333 0.209571 1782 3420 5202
178 177 2011-06-26 3 0 6 0 0 0 1 0.68 0.637629 0.513333 0.0945333 1920 3385 5305
179 178 2011-06-27 3 0 6 0 1 1 2 0.6825 0.637004 0.658333 0.107588 854 3854 4708
180 179 2011-06-28 3 0 6 0 2 1 1 0.744167 0.692558 0.634167 0.144283 732 3916 4648
181 180 2011-06-29 3 0 6 0 3 1 1 0.728333 0.654688 0.497917 0.261821 848 4377 5225
182 181 2011-06-30 3 0 6 0 4 1 1 0.696667 0.637008 0.434167 0.185312 1027 4488 5515
183 182 2011-07-01 3 0 7 0 5 1 1 0.7225 0.652162 0.39625 0.102608 1246 4116 5362
184 183 2011-07-02 3 0 7 0 6 0 1 0.738333 0.667308 0.444583 0.115062 2204 2915 5119
185 184 2011-07-03 3 0 7 0 0 0 2 0.716667 0.668575 0.6825 0.228858 2282 2367 4649
186 185 2011-07-04 3 0 7 1 1 0 2 0.726667 0.665417 0.637917 0.0814792 3065 2978 6043
187 186 2011-07-05 3 0 7 0 2 1 1 0.746667 0.696338 0.590417 0.126258 1031 3634 4665
188 187 2011-07-06 3 0 7 0 3 1 1 0.72 0.685633 0.743333 0.149883 784 3845 4629
189 188 2011-07-07 3 0 7 0 4 1 1 0.75 0.686871 0.65125 0.1592 754 3838 4592
190 189 2011-07-08 3 0 7 0 5 1 2 0.709167 0.670483 0.757917 0.225129 692 3348 4040
191 190 2011-07-09 3 0 7 0 6 0 1 0.733333 0.664158 0.609167 0.167912 1988 3348 5336
192 191 2011-07-10 3 0 7 0 0 0 1 0.7475 0.690025 0.578333 0.183471 1743 3138 4881
193 192 2011-07-11 3 0 7 0 1 1 1 0.7625 0.729804 0.635833 0.282337 723 3363 4086
194 193 2011-07-12 3 0 7 0 2 1 1 0.794167 0.739275 0.559167 0.200254 662 3596 4258
195 194 2011-07-13 3 0 7 0 3 1 1 0.746667 0.689404 0.631667 0.146133 748 3594 4342
196 195 2011-07-14 3 0 7 0 4 1 1 0.680833 0.635104 0.47625 0.240667 888 4196 5084
197 196 2011-07-15 3 0 7 0 5 1 1 0.663333 0.624371 0.59125 0.182833 1318 4220 5538
198 197 2011-07-16 3 0 7 0 6 0 1 0.686667 0.638263 0.585 0.208342 2418 3505 5923
199 198 2011-07-17 3 0 7 0 0 0 1 0.719167 0.669833 0.604167 0.245033 2006 3296 5302
200 199 2011-07-18 3 0 7 0 1 1 1 0.746667 0.703925 0.65125 0.215804 841 3617 4458
201 200 2011-07-19 3 0 7 0 2 1 1 0.776667 0.747479 0.650417 0.1306 752 3789 4541
202 201 2011-07-20 3 0 7 0 3 1 1 0.768333 0.74685 0.707083 0.113817 644 3688 4332
203 202 2011-07-21 3 0 7 0 4 1 2 0.815 0.826371 0.69125 0.222021 632 3152 3784
204 203 2011-07-22 3 0 7 0 5 1 1 0.848333 0.840896 0.580417 0.1331 562 2825 3387
205 204 2011-07-23 3 0 7 0 6 0 1 0.849167 0.804287 0.5 0.131221 987 2298 3285
206 205 2011-07-24 3 0 7 0 0 0 1 0.83 0.794829 0.550833 0.169171 1050 2556 3606
207 206 2011-07-25 3 0 7 0 1 1 1 0.743333 0.720958 0.757083 0.0908083 568 3272 3840
208 207 2011-07-26 3 0 7 0 2 1 1 0.771667 0.696979 0.540833 0.200258 750 3840 4590
209 208 2011-07-27 3 0 7 0 3 1 1 0.775 0.690667 0.402917 0.183463 755 3901 4656
210 209 2011-07-28 3 0 7 0 4 1 1 0.779167 0.7399 0.583333 0.178479 606 3784 4390
211 210 2011-07-29 3 0 7 0 5 1 1 0.838333 0.785967 0.5425 0.174138 670 3176 3846
212 211 2011-07-30 3 0 7 0 6 0 1 0.804167 0.728537 0.465833 0.168537 1559 2916 4475
213 212 2011-07-31 3 0 7 0 0 0 1 0.805833 0.729796 0.480833 0.164813 1524 2778 4302
214 213 2011-08-01 3 0 8 0 1 1 1 0.771667 0.703292 0.550833 0.156717 729 3537 4266
215 214 2011-08-02 3 0 8 0 2 1 1 0.783333 0.707071 0.49125 0.20585 801 4044 4845
216 215 2011-08-03 3 0 8 0 3 1 2 0.731667 0.679937 0.6575 0.135583 467 3107 3574
217 216 2011-08-04 3 0 8 0 4 1 2 0.71 0.664788 0.7575 0.19715 799 3777 4576
218 217 2011-08-05 3 0 8 0 5 1 1 0.710833 0.656567 0.630833 0.184696 1023 3843 4866
219 218 2011-08-06 3 0 8 0 6 0 2 0.716667 0.676154 0.755 0.22825 1521 2773 4294
220 219 2011-08-07 3 0 8 0 0 0 1 0.7425 0.715292 0.752917 0.201487 1298 2487 3785
221 220 2011-08-08 3 0 8 0 1 1 1 0.765 0.703283 0.592083 0.192175 846 3480 4326
222 221 2011-08-09 3 0 8 0 2 1 1 0.775 0.724121 0.570417 0.151121 907 3695 4602
223 222 2011-08-10 3 0 8 0 3 1 1 0.766667 0.684983 0.424167 0.200258 884 3896 4780
224 223 2011-08-11 3 0 8 0 4 1 1 0.7175 0.651521 0.42375 0.164796 812 3980 4792
225 224 2011-08-12 3 0 8 0 5 1 1 0.708333 0.654042 0.415 0.125621 1051 3854 4905
226 225 2011-08-13 3 0 8 0 6 0 2 0.685833 0.645858 0.729583 0.211454 1504 2646 4150
227 226 2011-08-14 3 0 8 0 0 0 2 0.676667 0.624388 0.8175 0.222633 1338 2482 3820
228 227 2011-08-15 3 0 8 0 1 1 1 0.665833 0.616167 0.712083 0.208954 775 3563 4338
229 228 2011-08-16 3 0 8 0 2 1 1 0.700833 0.645837 0.578333 0.236329 721 4004 4725
230 229 2011-08-17 3 0 8 0 3 1 1 0.723333 0.666671 0.575417 0.143667 668 4026 4694
231 230 2011-08-18 3 0 8 0 4 1 1 0.711667 0.662258 0.654583 0.233208 639 3166 3805
232 231 2011-08-19 3 0 8 0 5 1 2 0.685 0.633221 0.722917 0.139308 797 3356 4153
233 232 2011-08-20 3 0 8 0 6 0 1 0.6975 0.648996 0.674167 0.104467 1914 3277 5191
234 233 2011-08-21 3 0 8 0 0 0 1 0.710833 0.675525 0.77 0.248754 1249 2624 3873
235 234 2011-08-22 3 0 8 0 1 1 1 0.691667 0.638254 0.47 0.27675 833 3925 4758
236 235 2011-08-23 3 0 8 0 2 1 1 0.640833 0.606067 0.455417 0.146763 1281 4614 5895
237 236 2011-08-24 3 0 8 0 3 1 1 0.673333 0.630692 0.605 0.253108 949 4181 5130
238 237 2011-08-25 3 0 8 0 4 1 2 0.684167 0.645854 0.771667 0.210833 435 3107 3542
239 238 2011-08-26 3 0 8 0 5 1 1 0.7 0.659733 0.76125 0.0839625 768 3893 4661
240 239 2011-08-27 3 0 8 0 6 0 2 0.68 0.635556 0.85 0.375617 226 889 1115
241 240 2011-08-28 3 0 8 0 0 0 1 0.707059 0.647959 0.561765 0.304659 1415 2919 4334
242 241 2011-08-29 3 0 8 0 1 1 1 0.636667 0.607958 0.554583 0.159825 729 3905 4634
243 242 2011-08-30 3 0 8 0 2 1 1 0.639167 0.594704 0.548333 0.125008 775 4429 5204
244 243 2011-08-31 3 0 8 0 3 1 1 0.656667 0.611121 0.597917 0.0833333 688 4370 5058
245 244 2011-09-01 3 0 9 0 4 1 1 0.655 0.614921 0.639167 0.141796 783 4332 5115
246 245 2011-09-02 3 0 9 0 5 1 2 0.643333 0.604808 0.727083 0.139929 875 3852 4727
247 246 2011-09-03 3 0 9 0 6 0 1 0.669167 0.633213 0.716667 0.185325 1935 2549 4484
248 247 2011-09-04 3 0 9 0 0 0 1 0.709167 0.665429 0.742083 0.206467 2521 2419 4940
249 248 2011-09-05 3 0 9 1 1 0 2 0.673333 0.625646 0.790417 0.212696 1236 2115 3351
250 249 2011-09-06 3 0 9 0 2 1 3 0.54 0.5152 0.886957 0.343943 204 2506 2710
251 250 2011-09-07 3 0 9 0 3 1 3 0.599167 0.544229 0.917083 0.0970208 118 1878 1996
252 251 2011-09-08 3 0 9 0 4 1 3 0.633913 0.555361 0.939565 0.192748 153 1689 1842
253 252 2011-09-09 3 0 9 0 5 1 2 0.65 0.578946 0.897917 0.124379 417 3127 3544
254 253 2011-09-10 3 0 9 0 6 0 1 0.66 0.607962 0.75375 0.153608 1750 3595 5345
255 254 2011-09-11 3 0 9 0 0 0 1 0.653333 0.609229 0.71375 0.115054 1633 3413 5046
256 255 2011-09-12 3 0 9 0 1 1 1 0.644348 0.60213 0.692174 0.088913 690 4023 4713
257 256 2011-09-13 3 0 9 0 2 1 1 0.650833 0.603554 0.7125 0.141804 701 4062 4763
258 257 2011-09-14 3 0 9 0 3 1 1 0.673333 0.6269 0.697083 0.1673 647 4138 4785
259 258 2011-09-15 3 0 9 0 4 1 2 0.5775 0.553671 0.709167 0.271146 428 3231 3659
260 259 2011-09-16 3 0 9 0 5 1 2 0.469167 0.461475 0.590417 0.164183 742 4018 4760
261 260 2011-09-17 3 0 9 0 6 0 2 0.491667 0.478512 0.718333 0.189675 1434 3077 4511
262 261 2011-09-18 3 0 9 0 0 0 1 0.5075 0.490537 0.695 0.178483 1353 2921 4274
263 262 2011-09-19 3 0 9 0 1 1 2 0.549167 0.529675 0.69 0.151742 691 3848 4539
264 263 2011-09-20 3 0 9 0 2 1 2 0.561667 0.532217 0.88125 0.134954 438 3203 3641
265 264 2011-09-21 3 0 9 0 3 1 2 0.595 0.550533 0.9 0.0964042 539 3813 4352
266 265 2011-09-22 3 0 9 0 4 1 2 0.628333 0.554963 0.902083 0.128125 555 4240 4795
267 266 2011-09-23 4 0 9 0 5 1 2 0.609167 0.522125 0.9725 0.0783667 258 2137 2395
268 267 2011-09-24 4 0 9 0 6 0 2 0.606667 0.564412 0.8625 0.0783833 1776 3647 5423
269 268 2011-09-25 4 0 9 0 0 0 2 0.634167 0.572637 0.845 0.0503792 1544 3466 5010
270 269 2011-09-26 4 0 9 0 1 1 2 0.649167 0.589042 0.848333 0.1107 684 3946 4630
271 270 2011-09-27 4 0 9 0 2 1 2 0.636667 0.574525 0.885417 0.118171 477 3643 4120
272 271 2011-09-28 4 0 9 0 3 1 2 0.635 0.575158 0.84875 0.148629 480 3427 3907
273 272 2011-09-29 4 0 9 0 4 1 1 0.616667 0.574512 0.699167 0.172883 653 4186 4839
274 273 2011-09-30 4 0 9 0 5 1 1 0.564167 0.544829 0.6475 0.206475 830 4372 5202
275 274 2011-10-01 4 0 10 0 6 0 2 0.41 0.412863 0.75375 0.292296 480 1949 2429
276 275 2011-10-02 4 0 10 0 0 0 2 0.356667 0.345317 0.791667 0.222013 616 2302 2918
277 276 2011-10-03 4 0 10 0 1 1 2 0.384167 0.392046 0.760833 0.0833458 330 3240 3570
278 277 2011-10-04 4 0 10 0 2 1 1 0.484167 0.472858 0.71 0.205854 486 3970 4456
279 278 2011-10-05 4 0 10 0 3 1 1 0.538333 0.527138 0.647917 0.17725 559 4267 4826
280 279 2011-10-06 4 0 10 0 4 1 1 0.494167 0.480425 0.620833 0.134954 639 4126 4765
281 280 2011-10-07 4 0 10 0 5 1 1 0.510833 0.504404 0.684167 0.0223917 949 4036 4985
282 281 2011-10-08 4 0 10 0 6 0 1 0.521667 0.513242 0.70125 0.0454042 2235 3174 5409
283 282 2011-10-09 4 0 10 0 0 0 1 0.540833 0.523983 0.7275 0.06345 2397 3114 5511
284 283 2011-10-10 4 0 10 1 1 0 1 0.570833 0.542925 0.73375 0.0423042 1514 3603 5117
285 284 2011-10-11 4 0 10 0 2 1 2 0.566667 0.546096 0.80875 0.143042 667 3896 4563
286 285 2011-10-12 4 0 10 0 3 1 3 0.543333 0.517717 0.90625 0.24815 217 2199 2416
287 286 2011-10-13 4 0 10 0 4 1 2 0.589167 0.551804 0.896667 0.141787 290 2623 2913
288 287 2011-10-14 4 0 10 0 5 1 2 0.550833 0.529675 0.71625 0.223883 529 3115 3644
289 288 2011-10-15 4 0 10 0 6 0 1 0.506667 0.498725 0.483333 0.258083 1899 3318 5217
290 289 2011-10-16 4 0 10 0 0 0 1 0.511667 0.503154 0.486667 0.281717 1748 3293 5041
291 290 2011-10-17 4 0 10 0 1 1 1 0.534167 0.510725 0.579583 0.175379 713 3857 4570
292 291 2011-10-18 4 0 10 0 2 1 2 0.5325 0.522721 0.701667 0.110087 637 4111 4748
293 292 2011-10-19 4 0 10 0 3 1 3 0.541739 0.513848 0.895217 0.243339 254 2170 2424
294 293 2011-10-20 4 0 10 0 4 1 1 0.475833 0.466525 0.63625 0.422275 471 3724 4195
295 294 2011-10-21 4 0 10 0 5 1 1 0.4275 0.423596 0.574167 0.221396 676 3628 4304
296 295 2011-10-22 4 0 10 0 6 0 1 0.4225 0.425492 0.629167 0.0926667 1499 2809 4308
297 296 2011-10-23 4 0 10 0 0 0 1 0.421667 0.422333 0.74125 0.0995125 1619 2762 4381
298 297 2011-10-24 4 0 10 0 1 1 1 0.463333 0.457067 0.772083 0.118792 699 3488 4187
299 298 2011-10-25 4 0 10 0 2 1 1 0.471667 0.463375 0.622917 0.166658 695 3992 4687
300 299 2011-10-26 4 0 10 0 3 1 2 0.484167 0.472846 0.720417 0.148642 404 3490 3894
301 300 2011-10-27 4 0 10 0 4 1 2 0.47 0.457046 0.812917 0.197763 240 2419 2659
302 301 2011-10-28 4 0 10 0 5 1 2 0.330833 0.318812 0.585833 0.229479 456 3291 3747
303 302 2011-10-29 4 0 10 0 6 0 3 0.254167 0.227913 0.8825 0.351371 57 570 627
304 303 2011-10-30 4 0 10 0 0 0 1 0.319167 0.321329 0.62375 0.176617 885 2446 3331
305 304 2011-10-31 4 0 10 0 1 1 1 0.34 0.356063 0.703333 0.10635 362 3307 3669
306 305 2011-11-01 4 0 11 0 2 1 1 0.400833 0.397088 0.68375 0.135571 410 3658 4068
307 306 2011-11-02 4 0 11 0 3 1 1 0.3775 0.390133 0.71875 0.0820917 370 3816 4186
308 307 2011-11-03 4 0 11 0 4 1 1 0.408333 0.405921 0.702083 0.136817 318 3656 3974
309 308 2011-11-04 4 0 11 0 5 1 2 0.403333 0.403392 0.6225 0.271779 470 3576 4046
310 309 2011-11-05 4 0 11 0 6 0 1 0.326667 0.323854 0.519167 0.189062 1156 2770 3926
311 310 2011-11-06 4 0 11 0 0 0 1 0.348333 0.362358 0.734583 0.0920542 952 2697 3649
312 311 2011-11-07 4 0 11 0 1 1 1 0.395 0.400871 0.75875 0.057225 373 3662 4035
313 312 2011-11-08 4 0 11 0 2 1 1 0.408333 0.412246 0.721667 0.0690375 376 3829 4205
314 313 2011-11-09 4 0 11 0 3 1 1 0.4 0.409079 0.758333 0.0621958 305 3804 4109
315 314 2011-11-10 4 0 11 0 4 1 2 0.38 0.373721 0.813333 0.189067 190 2743 2933
316 315 2011-11-11 4 0 11 1 5 0 1 0.324167 0.306817 0.44625 0.314675 440 2928 3368
317 316 2011-11-12 4 0 11 0 6 0 1 0.356667 0.357942 0.552917 0.212062 1275 2792 4067
318 317 2011-11-13 4 0 11 0 0 0 1 0.440833 0.43055 0.458333 0.281721 1004 2713 3717
319 318 2011-11-14 4 0 11 0 1 1 1 0.53 0.524612 0.587083 0.306596 595 3891 4486
320 319 2011-11-15 4 0 11 0 2 1 2 0.53 0.507579 0.68875 0.199633 449 3746 4195
321 320 2011-11-16 4 0 11 0 3 1 3 0.456667 0.451988 0.93 0.136829 145 1672 1817
322 321 2011-11-17 4 0 11 0 4 1 2 0.341667 0.323221 0.575833 0.305362 139 2914 3053
323 322 2011-11-18 4 0 11 0 5 1 1 0.274167 0.272721 0.41 0.168533 245 3147 3392
324 323 2011-11-19 4 0 11 0 6 0 1 0.329167 0.324483 0.502083 0.224496 943 2720 3663
325 324 2011-11-20 4 0 11 0 0 0 2 0.463333 0.457058 0.684583 0.18595 787 2733 3520
326 325 2011-11-21 4 0 11 0 1 1 3 0.4475 0.445062 0.91 0.138054 220 2545 2765
327 326 2011-11-22 4 0 11 0 2 1 3 0.416667 0.421696 0.9625 0.118792 69 1538 1607
328 327 2011-11-23 4 0 11 0 3 1 2 0.440833 0.430537 0.757917 0.335825 112 2454 2566
329 328 2011-11-24 4 0 11 1 4 0 1 0.373333 0.372471 0.549167 0.167304 560 935 1495
330 329 2011-11-25 4 0 11 0 5 1 1 0.375 0.380671 0.64375 0.0988958 1095 1697 2792
331 330 2011-11-26 4 0 11 0 6 0 1 0.375833 0.385087 0.681667 0.0684208 1249 1819 3068
332 331 2011-11-27 4 0 11 0 0 0 1 0.459167 0.4558 0.698333 0.208954 810 2261 3071
333 332 2011-11-28 4 0 11 0 1 1 1 0.503478 0.490122 0.743043 0.142122 253 3614 3867
334 333 2011-11-29 4 0 11 0 2 1 2 0.458333 0.451375 0.830833 0.258092 96 2818 2914
335 334 2011-11-30 4 0 11 0 3 1 1 0.325 0.311221 0.613333 0.271158 188 3425 3613
336 335 2011-12-01 4 0 12 0 4 1 1 0.3125 0.305554 0.524583 0.220158 182 3545 3727
337 336 2011-12-02 4 0 12 0 5 1 1 0.314167 0.331433 0.625833 0.100754 268 3672 3940
338 337 2011-12-03 4 0 12 0 6 0 1 0.299167 0.310604 0.612917 0.0957833 706 2908 3614
339 338 2011-12-04 4 0 12 0 0 0 1 0.330833 0.3491 0.775833 0.0839583 634 2851 3485
340 339 2011-12-05 4 0 12 0 1 1 2 0.385833 0.393925 0.827083 0.0622083 233 3578 3811
341 340 2011-12-06 4 0 12 0 2 1 3 0.4625 0.4564 0.949583 0.232583 126 2468 2594
342 341 2011-12-07 4 0 12 0 3 1 3 0.41 0.400246 0.970417 0.266175 50 655 705
343 342 2011-12-08 4 0 12 0 4 1 1 0.265833 0.256938 0.58 0.240058 150 3172 3322
344 343 2011-12-09 4 0 12 0 5 1 1 0.290833 0.317542 0.695833 0.0827167 261 3359 3620
345 344 2011-12-10 4 0 12 0 6 0 1 0.275 0.266412 0.5075 0.233221 502 2688 3190
346 345 2011-12-11 4 0 12 0 0 0 1 0.220833 0.253154 0.49 0.0665417 377 2366 2743
347 346 2011-12-12 4 0 12 0 1 1 1 0.238333 0.270196 0.670833 0.06345 143 3167 3310
348 347 2011-12-13 4 0 12 0 2 1 1 0.2825 0.301138 0.59 0.14055 155 3368 3523
349 348 2011-12-14 4 0 12 0 3 1 2 0.3175 0.338362 0.66375 0.0609583 178 3562 3740
350 349 2011-12-15 4 0 12 0 4 1 2 0.4225 0.412237 0.634167 0.268042 181 3528 3709
351 350 2011-12-16 4 0 12 0 5 1 2 0.375 0.359825 0.500417 0.260575 178 3399 3577
352 351 2011-12-17 4 0 12 0 6 0 2 0.258333 0.249371 0.560833 0.243167 275 2464 2739
353 352 2011-12-18 4 0 12 0 0 0 1 0.238333 0.245579 0.58625 0.169779 220 2211 2431
354 353 2011-12-19 4 0 12 0 1 1 1 0.276667 0.280933 0.6375 0.172896 260 3143 3403
355 354 2011-12-20 4 0 12 0 2 1 2 0.385833 0.396454 0.595417 0.0615708 216 3534 3750
356 355 2011-12-21 1 0 12 0 3 1 2 0.428333 0.428017 0.858333 0.2214 107 2553 2660
357 356 2011-12-22 1 0 12 0 4 1 2 0.423333 0.426121 0.7575 0.047275 227 2841 3068
358 357 2011-12-23 1 0 12 0 5 1 1 0.373333 0.377513 0.68625 0.274246 163 2046 2209
359 358 2011-12-24 1 0 12 0 6 0 1 0.3025 0.299242 0.5425 0.190304 155 856 1011
360 359 2011-12-25 1 0 12 0 0 0 1 0.274783 0.279961 0.681304 0.155091 303 451 754
361 360 2011-12-26 1 0 12 1 1 0 1 0.321739 0.315535 0.506957 0.239465 430 887 1317
362 361 2011-12-27 1 0 12 0 2 1 2 0.325 0.327633 0.7625 0.18845 103 1059 1162
363 362 2011-12-28 1 0 12 0 3 1 1 0.29913 0.279974 0.503913 0.293961 255 2047 2302
364 363 2011-12-29 1 0 12 0 4 1 1 0.248333 0.263892 0.574167 0.119412 254 2169 2423
365 364 2011-12-30 1 0 12 0 5 1 1 0.311667 0.318812 0.636667 0.134337 491 2508 2999
366 365 2011-12-31 1 0 12 0 6 0 1 0.41 0.414121 0.615833 0.220154 665 1820 2485
367 366 2012-01-01 1 1 1 0 0 0 1 0.37 0.375621 0.6925 0.192167 686 1608 2294
368 367 2012-01-02 1 1 1 1 1 0 1 0.273043 0.252304 0.381304 0.329665 244 1707 1951
369 368 2012-01-03 1 1 1 0 2 1 1 0.15 0.126275 0.44125 0.365671 89 2147 2236
370 369 2012-01-04 1 1 1 0 3 1 2 0.1075 0.119337 0.414583 0.1847 95 2273 2368
371 370 2012-01-05 1 1 1 0 4 1 1 0.265833 0.278412 0.524167 0.129987 140 3132 3272
372 371 2012-01-06 1 1 1 0 5 1 1 0.334167 0.340267 0.542083 0.167908 307 3791 4098
373 372 2012-01-07 1 1 1 0 6 0 1 0.393333 0.390779 0.531667 0.174758 1070 3451 4521
374 373 2012-01-08 1 1 1 0 0 0 1 0.3375 0.340258 0.465 0.191542 599 2826 3425
375 374 2012-01-09 1 1 1 0 1 1 2 0.224167 0.247479 0.701667 0.0989 106 2270 2376
376 375 2012-01-10 1 1 1 0 2 1 1 0.308696 0.318826 0.646522 0.187552 173 3425 3598
377 376 2012-01-11 1 1 1 0 3 1 2 0.274167 0.282821 0.8475 0.131221 92 2085 2177
378 377 2012-01-12 1 1 1 0 4 1 2 0.3825 0.381938 0.802917 0.180967 269 3828 4097
379 378 2012-01-13 1 1 1 0 5 1 1 0.274167 0.249362 0.5075 0.378108 174 3040 3214
380 379 2012-01-14 1 1 1 0 6 0 1 0.18 0.183087 0.4575 0.187183 333 2160 2493
381 380 2012-01-15 1 1 1 0 0 0 1 0.166667 0.161625 0.419167 0.251258 284 2027 2311
382 381 2012-01-16 1 1 1 1 1 0 1 0.19 0.190663 0.5225 0.231358 217 2081 2298
383 382 2012-01-17 1 1 1 0 2 1 2 0.373043 0.364278 0.716087 0.34913 127 2808 2935
384 383 2012-01-18 1 1 1 0 3 1 1 0.303333 0.275254 0.443333 0.415429 109 3267 3376
385 384 2012-01-19 1 1 1 0 4 1 1 0.19 0.190038 0.4975 0.220158 130 3162 3292
386 385 2012-01-20 1 1 1 0 5 1 2 0.2175 0.220958 0.45 0.20275 115 3048 3163
387 386 2012-01-21 1 1 1 0 6 0 2 0.173333 0.174875 0.83125 0.222642 67 1234 1301
388 387 2012-01-22 1 1 1 0 0 0 2 0.1625 0.16225 0.79625 0.199638 196 1781 1977
389 388 2012-01-23 1 1 1 0 1 1 2 0.218333 0.243058 0.91125 0.110708 145 2287 2432
390 389 2012-01-24 1 1 1 0 2 1 1 0.3425 0.349108 0.835833 0.123767 439 3900 4339
391 390 2012-01-25 1 1 1 0 3 1 1 0.294167 0.294821 0.64375 0.161071 467 3803 4270
392 391 2012-01-26 1 1 1 0 4 1 2 0.341667 0.35605 0.769583 0.0733958 244 3831 4075
393 392 2012-01-27 1 1 1 0 5 1 2 0.425 0.415383 0.74125 0.342667 269 3187 3456
394 393 2012-01-28 1 1 1 0 6 0 1 0.315833 0.326379 0.543333 0.210829 775 3248 4023
395 394 2012-01-29 1 1 1 0 0 0 1 0.2825 0.272721 0.31125 0.24005 558 2685 3243
396 395 2012-01-30 1 1 1 0 1 1 1 0.269167 0.262625 0.400833 0.215792 126 3498 3624
397 396 2012-01-31 1 1 1 0 2 1 1 0.39 0.381317 0.416667 0.261817 324 4185 4509
398 397 2012-02-01 1 1 2 0 3 1 1 0.469167 0.466538 0.507917 0.189067 304 4275 4579
399 398 2012-02-02 1 1 2 0 4 1 2 0.399167 0.398971 0.672917 0.187187 190 3571 3761
400 399 2012-02-03 1 1 2 0 5 1 1 0.313333 0.309346 0.526667 0.178496 310 3841 4151
401 400 2012-02-04 1 1 2 0 6 0 2 0.264167 0.272725 0.779583 0.121896 384 2448 2832
402 401 2012-02-05 1 1 2 0 0 0 2 0.265833 0.264521 0.687917 0.175996 318 2629 2947
403 402 2012-02-06 1 1 2 0 1 1 1 0.282609 0.296426 0.622174 0.1538 206 3578 3784
404 403 2012-02-07 1 1 2 0 2 1 1 0.354167 0.361104 0.49625 0.147379 199 4176 4375
405 404 2012-02-08 1 1 2 0 3 1 2 0.256667 0.266421 0.722917 0.133721 109 2693 2802
406 405 2012-02-09 1 1 2 0 4 1 1 0.265 0.261988 0.562083 0.194037 163 3667 3830
407 406 2012-02-10 1 1 2 0 5 1 2 0.280833 0.293558 0.54 0.116929 227 3604 3831
408 407 2012-02-11 1 1 2 0 6 0 3 0.224167 0.210867 0.73125 0.289796 192 1977 2169
409 408 2012-02-12 1 1 2 0 0 0 1 0.1275 0.101658 0.464583 0.409212 73 1456 1529
410 409 2012-02-13 1 1 2 0 1 1 1 0.2225 0.227913 0.41125 0.167283 94 3328 3422
411 410 2012-02-14 1 1 2 0 2 1 2 0.319167 0.333946 0.50875 0.141179 135 3787 3922
412 411 2012-02-15 1 1 2 0 3 1 1 0.348333 0.351629 0.53125 0.1816 141 4028 4169
413 412 2012-02-16 1 1 2 0 4 1 2 0.316667 0.330162 0.752917 0.091425 74 2931 3005
414 413 2012-02-17 1 1 2 0 5 1 1 0.343333 0.351629 0.634583 0.205846 349 3805 4154
415 414 2012-02-18 1 1 2 0 6 0 1 0.346667 0.355425 0.534583 0.190929 1435 2883 4318
416 415 2012-02-19 1 1 2 0 0 0 2 0.28 0.265788 0.515833 0.253112 618 2071 2689
417 416 2012-02-20 1 1 2 1 1 0 1 0.28 0.273391 0.507826 0.229083 502 2627 3129
418 417 2012-02-21 1 1 2 0 2 1 1 0.287826 0.295113 0.594348 0.205717 163 3614 3777
419 418 2012-02-22 1 1 2 0 3 1 1 0.395833 0.392667 0.567917 0.234471 394 4379 4773
420 419 2012-02-23 1 1 2 0 4 1 1 0.454167 0.444446 0.554583 0.190913 516 4546 5062
421 420 2012-02-24 1 1 2 0 5 1 2 0.4075 0.410971 0.7375 0.237567 246 3241 3487
422 421 2012-02-25 1 1 2 0 6 0 1 0.290833 0.255675 0.395833 0.421642 317 2415 2732
423 422 2012-02-26 1 1 2 0 0 0 1 0.279167 0.268308 0.41 0.205229 515 2874 3389
424 423 2012-02-27 1 1 2 0 1 1 1 0.366667 0.357954 0.490833 0.268033 253 4069 4322
425 424 2012-02-28 1 1 2 0 2 1 1 0.359167 0.353525 0.395833 0.193417 229 4134 4363
426 425 2012-02-29 1 1 2 0 3 1 2 0.344348 0.34847 0.804783 0.179117 65 1769 1834
427 426 2012-03-01 1 1 3 0 4 1 1 0.485833 0.475371 0.615417 0.226987 325 4665 4990
428 427 2012-03-02 1 1 3 0 5 1 2 0.353333 0.359842 0.657083 0.144904 246 2948 3194
429 428 2012-03-03 1 1 3 0 6 0 2 0.414167 0.413492 0.62125 0.161079 956 3110 4066
430 429 2012-03-04 1 1 3 0 0 0 1 0.325833 0.303021 0.403333 0.334571 710 2713 3423
431 430 2012-03-05 1 1 3 0 1 1 1 0.243333 0.241171 0.50625 0.228858 203 3130 3333
432 431 2012-03-06 1 1 3 0 2 1 1 0.258333 0.255042 0.456667 0.200875 221 3735 3956
433 432 2012-03-07 1 1 3 0 3 1 1 0.404167 0.3851 0.513333 0.345779 432 4484 4916
434 433 2012-03-08 1 1 3 0 4 1 1 0.5275 0.524604 0.5675 0.441563 486 4896 5382
435 434 2012-03-09 1 1 3 0 5 1 2 0.410833 0.397083 0.407083 0.4148 447 4122 4569
436 435 2012-03-10 1 1 3 0 6 0 1 0.2875 0.277767 0.350417 0.22575 968 3150 4118
437 436 2012-03-11 1 1 3 0 0 0 1 0.361739 0.35967 0.476957 0.222587 1658 3253 4911
438 437 2012-03-12 1 1 3 0 1 1 1 0.466667 0.459592 0.489167 0.207713 838 4460 5298
439 438 2012-03-13 1 1 3 0 2 1 1 0.565 0.542929 0.6175 0.23695 762 5085 5847
440 439 2012-03-14 1 1 3 0 3 1 1 0.5725 0.548617 0.507083 0.115062 997 5315 6312
441 440 2012-03-15 1 1 3 0 4 1 1 0.5575 0.532825 0.579583 0.149883 1005 5187 6192
442 441 2012-03-16 1 1 3 0 5 1 2 0.435833 0.436229 0.842083 0.113192 548 3830 4378
443 442 2012-03-17 1 1 3 0 6 0 2 0.514167 0.505046 0.755833 0.110704 3155 4681 7836
444 443 2012-03-18 1 1 3 0 0 0 2 0.4725 0.464 0.81 0.126883 2207 3685 5892
445 444 2012-03-19 1 1 3 0 1 1 1 0.545 0.532821 0.72875 0.162317 982 5171 6153
446 445 2012-03-20 1 1 3 0 2 1 1 0.560833 0.538533 0.807917 0.121271 1051 5042 6093
447 446 2012-03-21 2 1 3 0 3 1 2 0.531667 0.513258 0.82125 0.0895583 1122 5108 6230
448 447 2012-03-22 2 1 3 0 4 1 1 0.554167 0.531567 0.83125 0.117562 1334 5537 6871
449 448 2012-03-23 2 1 3 0 5 1 2 0.601667 0.570067 0.694167 0.1163 2469 5893 8362
450 449 2012-03-24 2 1 3 0 6 0 2 0.5025 0.486733 0.885417 0.192783 1033 2339 3372
451 450 2012-03-25 2 1 3 0 0 0 2 0.4375 0.437488 0.880833 0.220775 1532 3464 4996
452 451 2012-03-26 2 1 3 0 1 1 1 0.445833 0.43875 0.477917 0.386821 795 4763 5558
453 452 2012-03-27 2 1 3 0 2 1 1 0.323333 0.315654 0.29 0.187192 531 4571 5102
454 453 2012-03-28 2 1 3 0 3 1 1 0.484167 0.47095 0.48125 0.291671 674 5024 5698
455 454 2012-03-29 2 1 3 0 4 1 1 0.494167 0.482304 0.439167 0.31965 834 5299 6133
456 455 2012-03-30 2 1 3 0 5 1 2 0.37 0.375621 0.580833 0.138067 796 4663 5459
457 456 2012-03-31 2 1 3 0 6 0 2 0.424167 0.421708 0.738333 0.250617 2301 3934 6235
458 457 2012-04-01 2 1 4 0 0 0 2 0.425833 0.417287 0.67625 0.172267 2347 3694 6041
459 458 2012-04-02 2 1 4 0 1 1 1 0.433913 0.427513 0.504348 0.312139 1208 4728 5936
460 459 2012-04-03 2 1 4 0 2 1 1 0.466667 0.461483 0.396667 0.100133 1348 5424 6772
461 460 2012-04-04 2 1 4 0 3 1 1 0.541667 0.53345 0.469583 0.180975 1058 5378 6436
462 461 2012-04-05 2 1 4 0 4 1 1 0.435 0.431163 0.374167 0.219529 1192 5265 6457
463 462 2012-04-06 2 1 4 0 5 1 1 0.403333 0.390767 0.377083 0.300388 1807 4653 6460
464 463 2012-04-07 2 1 4 0 6 0 1 0.4375 0.426129 0.254167 0.274871 3252 3605 6857
465 464 2012-04-08 2 1 4 0 0 0 1 0.5 0.492425 0.275833 0.232596 2230 2939 5169
466 465 2012-04-09 2 1 4 0 1 1 1 0.489167 0.476638 0.3175 0.358196 905 4680 5585
467 466 2012-04-10 2 1 4 0 2 1 1 0.446667 0.436233 0.435 0.249375 819 5099 5918
468 467 2012-04-11 2 1 4 0 3 1 1 0.348696 0.337274 0.469565 0.295274 482 4380 4862
469 468 2012-04-12 2 1 4 0 4 1 1 0.3975 0.387604 0.46625 0.290429 663 4746 5409
470 469 2012-04-13 2 1 4 0 5 1 1 0.4425 0.431808 0.408333 0.155471 1252 5146 6398
471 470 2012-04-14 2 1 4 0 6 0 1 0.495 0.487996 0.502917 0.190917 2795 4665 7460
472 471 2012-04-15 2 1 4 0 0 0 1 0.606667 0.573875 0.507917 0.225129 2846 4286 7132
473 472 2012-04-16 2 1 4 1 1 0 1 0.664167 0.614925 0.561667 0.284829 1198 5172 6370
474 473 2012-04-17 2 1 4 0 2 1 1 0.608333 0.598487 0.390417 0.273629 989 5702 6691
475 474 2012-04-18 2 1 4 0 3 1 2 0.463333 0.457038 0.569167 0.167912 347 4020 4367
476 475 2012-04-19 2 1 4 0 4 1 1 0.498333 0.493046 0.6125 0.0659292 846 5719 6565
477 476 2012-04-20 2 1 4 0 5 1 1 0.526667 0.515775 0.694583 0.149871 1340 5950 7290
478 477 2012-04-21 2 1 4 0 6 0 1 0.57 0.542921 0.682917 0.283587 2541 4083 6624
479 478 2012-04-22 2 1 4 0 0 0 3 0.396667 0.389504 0.835417 0.344546 120 907 1027
480 479 2012-04-23 2 1 4 0 1 1 2 0.321667 0.301125 0.766667 0.303496 195 3019 3214
481 480 2012-04-24 2 1 4 0 2 1 1 0.413333 0.405283 0.454167 0.249383 518 5115 5633
482 481 2012-04-25 2 1 4 0 3 1 1 0.476667 0.470317 0.427917 0.118792 655 5541 6196
483 482 2012-04-26 2 1 4 0 4 1 2 0.498333 0.483583 0.756667 0.176625 475 4551 5026
484 483 2012-04-27 2 1 4 0 5 1 1 0.4575 0.452637 0.400833 0.347633 1014 5219 6233
485 484 2012-04-28 2 1 4 0 6 0 2 0.376667 0.377504 0.489583 0.129975 1120 3100 4220
486 485 2012-04-29 2 1 4 0 0 0 1 0.458333 0.450121 0.587083 0.116908 2229 4075 6304
487 486 2012-04-30 2 1 4 0 1 1 2 0.464167 0.457696 0.57 0.171638 665 4907 5572
488 487 2012-05-01 2 1 5 0 2 1 2 0.613333 0.577021 0.659583 0.156096 653 5087 5740
489 488 2012-05-02 2 1 5 0 3 1 1 0.564167 0.537896 0.797083 0.138058 667 5502 6169
490 489 2012-05-03 2 1 5 0 4 1 2 0.56 0.537242 0.768333 0.133696 764 5657 6421
491 490 2012-05-04 2 1 5 0 5 1 1 0.6275 0.590917 0.735417 0.162938 1069 5227 6296
492 491 2012-05-05 2 1 5 0 6 0 2 0.621667 0.584608 0.756667 0.152992 2496 4387 6883
493 492 2012-05-06 2 1 5 0 0 0 2 0.5625 0.546737 0.74 0.149879 2135 4224 6359
494 493 2012-05-07 2 1 5 0 1 1 2 0.5375 0.527142 0.664167 0.230721 1008 5265 6273
495 494 2012-05-08 2 1 5 0 2 1 2 0.581667 0.557471 0.685833 0.296029 738 4990 5728
496 495 2012-05-09 2 1 5 0 3 1 2 0.575 0.553025 0.744167 0.216412 620 4097 4717
497 496 2012-05-10 2 1 5 0 4 1 1 0.505833 0.491783 0.552083 0.314063 1026 5546 6572
498 497 2012-05-11 2 1 5 0 5 1 1 0.533333 0.520833 0.360417 0.236937 1319 5711 7030
499 498 2012-05-12 2 1 5 0 6 0 1 0.564167 0.544817 0.480417 0.123133 2622 4807 7429
500 499 2012-05-13 2 1 5 0 0 0 1 0.6125 0.585238 0.57625 0.225117 2172 3946 6118
501 500 2012-05-14 2 1 5 0 1 1 2 0.573333 0.5499 0.789583 0.212692 342 2501 2843
502 501 2012-05-15 2 1 5 0 2 1 2 0.611667 0.576404 0.794583 0.147392 625 4490 5115
503 502 2012-05-16 2 1 5 0 3 1 1 0.636667 0.595975 0.697917 0.122512 991 6433 7424
504 503 2012-05-17 2 1 5 0 4 1 1 0.593333 0.572613 0.52 0.229475 1242 6142 7384
505 504 2012-05-18 2 1 5 0 5 1 1 0.564167 0.551121 0.523333 0.136817 1521 6118 7639
506 505 2012-05-19 2 1 5 0 6 0 1 0.6 0.566908 0.45625 0.083975 3410 4884 8294
507 506 2012-05-20 2 1 5 0 0 0 1 0.620833 0.583967 0.530417 0.254367 2704 4425 7129
508 507 2012-05-21 2 1 5 0 1 1 2 0.598333 0.565667 0.81125 0.233204 630 3729 4359
509 508 2012-05-22 2 1 5 0 2 1 2 0.615 0.580825 0.765833 0.118167 819 5254 6073
510 509 2012-05-23 2 1 5 0 3 1 2 0.621667 0.584612 0.774583 0.102 766 4494 5260
511 510 2012-05-24 2 1 5 0 4 1 1 0.655 0.6067 0.716667 0.172896 1059 5711 6770
512 511 2012-05-25 2 1 5 0 5 1 1 0.68 0.627529 0.747083 0.14055 1417 5317 6734
513 512 2012-05-26 2 1 5 0 6 0 1 0.6925 0.642696 0.7325 0.198992 2855 3681 6536
514 513 2012-05-27 2 1 5 0 0 0 1 0.69 0.641425 0.697083 0.215171 3283 3308 6591
515 514 2012-05-28 2 1 5 1 1 0 1 0.7125 0.6793 0.67625 0.196521 2557 3486 6043
516 515 2012-05-29 2 1 5 0 2 1 1 0.7225 0.672992 0.684583 0.2954 880 4863 5743
517 516 2012-05-30 2 1 5 0 3 1 2 0.656667 0.611129 0.67 0.134329 745 6110 6855
518 517 2012-05-31 2 1 5 0 4 1 1 0.68 0.631329 0.492917 0.195279 1100 6238 7338
519 518 2012-06-01 2 1 6 0 5 1 2 0.654167 0.607962 0.755417 0.237563 533 3594 4127
520 519 2012-06-02 2 1 6 0 6 0 1 0.583333 0.566288 0.549167 0.186562 2795 5325 8120
521 520 2012-06-03 2 1 6 0 0 0 1 0.6025 0.575133 0.493333 0.184087 2494 5147 7641
522 521 2012-06-04 2 1 6 0 1 1 1 0.5975 0.578283 0.487083 0.284833 1071 5927 6998
523 522 2012-06-05 2 1 6 0 2 1 2 0.540833 0.525892 0.613333 0.209575 968 6033 7001
524 523 2012-06-06 2 1 6 0 3 1 1 0.554167 0.542292 0.61125 0.077125 1027 6028 7055
525 524 2012-06-07 2 1 6 0 4 1 1 0.6025 0.569442 0.567083 0.15735 1038 6456 7494
526 525 2012-06-08 2 1 6 0 5 1 1 0.649167 0.597862 0.467917 0.175383 1488 6248 7736
527 526 2012-06-09 2 1 6 0 6 0 1 0.710833 0.648367 0.437083 0.144287 2708 4790 7498
528 527 2012-06-10 2 1 6 0 0 0 1 0.726667 0.663517 0.538333 0.133721 2224 4374 6598
529 528 2012-06-11 2 1 6 0 1 1 2 0.720833 0.659721 0.587917 0.207713 1017 5647 6664
530 529 2012-06-12 2 1 6 0 2 1 2 0.653333 0.597875 0.833333 0.214546 477 4495 4972
531 530 2012-06-13 2 1 6 0 3 1 1 0.655833 0.611117 0.582083 0.343279 1173 6248 7421
532 531 2012-06-14 2 1 6 0 4 1 1 0.648333 0.624383 0.569583 0.253733 1180 6183 7363
533 532 2012-06-15 2 1 6 0 5 1 1 0.639167 0.599754 0.589583 0.176617 1563 6102 7665
534 533 2012-06-16 2 1 6 0 6 0 1 0.631667 0.594708 0.504167 0.166667 2963 4739 7702
535 534 2012-06-17 2 1 6 0 0 0 1 0.5925 0.571975 0.59875 0.144904 2634 4344 6978
536 535 2012-06-18 2 1 6 0 1 1 2 0.568333 0.544842 0.777917 0.174746 653 4446 5099
537 536 2012-06-19 2 1 6 0 2 1 1 0.688333 0.654692 0.69 0.148017 968 5857 6825
538 537 2012-06-20 2 1 6 0 3 1 1 0.7825 0.720975 0.592083 0.113812 872 5339 6211
539 538 2012-06-21 3 1 6 0 4 1 1 0.805833 0.752542 0.567917 0.118787 778 5127 5905
540 539 2012-06-22 3 1 6 0 5 1 1 0.7775 0.724121 0.57375 0.182842 964 4859 5823
541 540 2012-06-23 3 1 6 0 6 0 1 0.731667 0.652792 0.534583 0.179721 2657 4801 7458
542 541 2012-06-24 3 1 6 0 0 0 1 0.743333 0.674254 0.479167 0.145525 2551 4340 6891
543 542 2012-06-25 3 1 6 0 1 1 1 0.715833 0.654042 0.504167 0.300383 1139 5640 6779
544 543 2012-06-26 3 1 6 0 2 1 1 0.630833 0.594704 0.373333 0.347642 1077 6365 7442
545 544 2012-06-27 3 1 6 0 3 1 1 0.6975 0.640792 0.36 0.271775 1077 6258 7335
546 545 2012-06-28 3 1 6 0 4 1 1 0.749167 0.675512 0.4225 0.17165 921 5958 6879
547 546 2012-06-29 3 1 6 0 5 1 1 0.834167 0.786613 0.48875 0.165417 829 4634 5463
548 547 2012-06-30 3 1 6 0 6 0 1 0.765 0.687508 0.60125 0.161071 1455 4232 5687
549 548 2012-07-01 3 1 7 0 0 0 1 0.815833 0.750629 0.51875 0.168529 1421 4110 5531
550 549 2012-07-02 3 1 7 0 1 1 1 0.781667 0.702038 0.447083 0.195267 904 5323 6227
551 550 2012-07-03 3 1 7 0 2 1 1 0.780833 0.70265 0.492083 0.126237 1052 5608 6660
552 551 2012-07-04 3 1 7 1 3 0 1 0.789167 0.732337 0.53875 0.13495 2562 4841 7403
553 552 2012-07-05 3 1 7 0 4 1 1 0.8275 0.761367 0.457917 0.194029 1405 4836 6241
554 553 2012-07-06 3 1 7 0 5 1 1 0.828333 0.752533 0.450833 0.146142 1366 4841 6207
555 554 2012-07-07 3 1 7 0 6 0 1 0.861667 0.804913 0.492083 0.163554 1448 3392 4840
556 555 2012-07-08 3 1 7 0 0 0 1 0.8225 0.790396 0.57375 0.125629 1203 3469 4672
557 556 2012-07-09 3 1 7 0 1 1 2 0.710833 0.654054 0.683333 0.180975 998 5571 6569
558 557 2012-07-10 3 1 7 0 2 1 2 0.720833 0.664796 0.6675 0.151737 954 5336 6290
559 558 2012-07-11 3 1 7 0 3 1 1 0.716667 0.650271 0.633333 0.151733 975 6289 7264
560 559 2012-07-12 3 1 7 0 4 1 1 0.715833 0.654683 0.529583 0.146775 1032 6414 7446
561 560 2012-07-13 3 1 7 0 5 1 2 0.731667 0.667933 0.485833 0.08085 1511 5988 7499
562 561 2012-07-14 3 1 7 0 6 0 2 0.703333 0.666042 0.699167 0.143679 2355 4614 6969
563 562 2012-07-15 3 1 7 0 0 0 1 0.745833 0.705196 0.717917 0.166667 1920 4111 6031
564 563 2012-07-16 3 1 7 0 1 1 1 0.763333 0.724125 0.645 0.164187 1088 5742 6830
565 564 2012-07-17 3 1 7 0 2 1 1 0.818333 0.755683 0.505833 0.114429 921 5865 6786
566 565 2012-07-18 3 1 7 0 3 1 1 0.793333 0.745583 0.577083 0.137442 799 4914 5713
567 566 2012-07-19 3 1 7 0 4 1 1 0.77 0.714642 0.600417 0.165429 888 5703 6591
568 567 2012-07-20 3 1 7 0 5 1 2 0.665833 0.613025 0.844167 0.208967 747 5123 5870
569 568 2012-07-21 3 1 7 0 6 0 3 0.595833 0.549912 0.865417 0.2133 1264 3195 4459
570 569 2012-07-22 3 1 7 0 0 0 2 0.6675 0.623125 0.7625 0.0939208 2544 4866 7410
571 570 2012-07-23 3 1 7 0 1 1 1 0.741667 0.690017 0.694167 0.138683 1135 5831 6966
572 571 2012-07-24 3 1 7 0 2 1 1 0.750833 0.70645 0.655 0.211454 1140 6452 7592
573 572 2012-07-25 3 1 7 0 3 1 1 0.724167 0.654054 0.45 0.1648 1383 6790 8173
574 573 2012-07-26 3 1 7 0 4 1 1 0.776667 0.739263 0.596667 0.284813 1036 5825 6861
575 574 2012-07-27 3 1 7 0 5 1 1 0.781667 0.734217 0.594583 0.152992 1259 5645 6904
576 575 2012-07-28 3 1 7 0 6 0 1 0.755833 0.697604 0.613333 0.15735 2234 4451 6685
577 576 2012-07-29 3 1 7 0 0 0 1 0.721667 0.667933 0.62375 0.170396 2153 4444 6597
578 577 2012-07-30 3 1 7 0 1 1 1 0.730833 0.684987 0.66875 0.153617 1040 6065 7105
579 578 2012-07-31 3 1 7 0 2 1 1 0.713333 0.662896 0.704167 0.165425 968 6248 7216
580 579 2012-08-01 3 1 8 0 3 1 1 0.7175 0.667308 0.6775 0.141179 1074 6506 7580
581 580 2012-08-02 3 1 8 0 4 1 1 0.7525 0.707088 0.659583 0.129354 983 6278 7261
582 581 2012-08-03 3 1 8 0 5 1 2 0.765833 0.722867 0.6425 0.215792 1328 5847 7175
583 582 2012-08-04 3 1 8 0 6 0 1 0.793333 0.751267 0.613333 0.257458 2345 4479 6824
584 583 2012-08-05 3 1 8 0 0 0 1 0.769167 0.731079 0.6525 0.290421 1707 3757 5464
585 584 2012-08-06 3 1 8 0 1 1 2 0.7525 0.710246 0.654167 0.129354 1233 5780 7013
586 585 2012-08-07 3 1 8 0 2 1 2 0.735833 0.697621 0.70375 0.116908 1278 5995 7273
587 586 2012-08-08 3 1 8 0 3 1 2 0.75 0.707717 0.672917 0.1107 1263 6271 7534
588 587 2012-08-09 3 1 8 0 4 1 1 0.755833 0.699508 0.620417 0.1561 1196 6090 7286
589 588 2012-08-10 3 1 8 0 5 1 2 0.715833 0.667942 0.715833 0.238813 1065 4721 5786
590 589 2012-08-11 3 1 8 0 6 0 2 0.6925 0.638267 0.732917 0.206479 2247 4052 6299
591 590 2012-08-12 3 1 8 0 0 0 1 0.700833 0.644579 0.530417 0.122512 2182 4362 6544
592 591 2012-08-13 3 1 8 0 1 1 1 0.720833 0.662254 0.545417 0.136212 1207 5676 6883
593 592 2012-08-14 3 1 8 0 2 1 1 0.726667 0.676779 0.686667 0.169158 1128 5656 6784
594 593 2012-08-15 3 1 8 0 3 1 1 0.706667 0.654037 0.619583 0.169771 1198 6149 7347
595 594 2012-08-16 3 1 8 0 4 1 1 0.719167 0.654688 0.519167 0.141796 1338 6267 7605
596 595 2012-08-17 3 1 8 0 5 1 1 0.723333 0.2424 0.570833 0.231354 1483 5665 7148
597 596 2012-08-18 3 1 8 0 6 0 1 0.678333 0.618071 0.603333 0.177867 2827 5038 7865
598 597 2012-08-19 3 1 8 0 0 0 2 0.635833 0.603554 0.711667 0.08645 1208 3341 4549
599 598 2012-08-20 3 1 8 0 1 1 2 0.635833 0.595967 0.734167 0.129979 1026 5504 6530
600 599 2012-08-21 3 1 8 0 2 1 1 0.649167 0.601025 0.67375 0.0727708 1081 5925 7006
601 600 2012-08-22 3 1 8 0 3 1 1 0.6675 0.621854 0.677083 0.0702833 1094 6281 7375
602 601 2012-08-23 3 1 8 0 4 1 1 0.695833 0.637008 0.635833 0.0845958 1363 6402 7765
603 602 2012-08-24 3 1 8 0 5 1 2 0.7025 0.6471 0.615 0.0721458 1325 6257 7582
604 603 2012-08-25 3 1 8 0 6 0 2 0.661667 0.618696 0.712917 0.244408 1829 4224 6053
605 604 2012-08-26 3 1 8 0 0 0 2 0.653333 0.595996 0.845833 0.228858 1483 3772 5255
606 605 2012-08-27 3 1 8 0 1 1 1 0.703333 0.654688 0.730417 0.128733 989 5928 6917
607 606 2012-08-28 3 1 8 0 2 1 1 0.728333 0.66605 0.62 0.190925 935 6105 7040
608 607 2012-08-29 3 1 8 0 3 1 1 0.685 0.635733 0.552083 0.112562 1177 6520 7697
609 608 2012-08-30 3 1 8 0 4 1 1 0.706667 0.652779 0.590417 0.0771167 1172 6541 7713
610 609 2012-08-31 3 1 8 0 5 1 1 0.764167 0.6894 0.5875 0.168533 1433 5917 7350
611 610 2012-09-01 3 1 9 0 6 0 2 0.753333 0.702654 0.638333 0.113187 2352 3788 6140
612 611 2012-09-02 3 1 9 0 0 0 2 0.696667 0.649 0.815 0.0640708 2613 3197 5810
613 612 2012-09-03 3 1 9 1 1 0 1 0.7075 0.661629 0.790833 0.151121 1965 4069 6034
614 613 2012-09-04 3 1 9 0 2 1 1 0.725833 0.686888 0.755 0.236321 867 5997 6864
615 614 2012-09-05 3 1 9 0 3 1 1 0.736667 0.708983 0.74125 0.187808 832 6280 7112
616 615 2012-09-06 3 1 9 0 4 1 2 0.696667 0.655329 0.810417 0.142421 611 5592 6203
617 616 2012-09-07 3 1 9 0 5 1 1 0.703333 0.657204 0.73625 0.171646 1045 6459 7504
618 617 2012-09-08 3 1 9 0 6 0 2 0.659167 0.611121 0.799167 0.281104 1557 4419 5976
619 618 2012-09-09 3 1 9 0 0 0 1 0.61 0.578925 0.5475 0.224496 2570 5657 8227
620 619 2012-09-10 3 1 9 0 1 1 1 0.583333 0.565654 0.50375 0.258713 1118 6407 7525
621 620 2012-09-11 3 1 9 0 2 1 1 0.5775 0.554292 0.52 0.0920542 1070 6697 7767
622 621 2012-09-12 3 1 9 0 3 1 1 0.599167 0.570075 0.577083 0.131846 1050 6820 7870
623 622 2012-09-13 3 1 9 0 4 1 1 0.6125 0.579558 0.637083 0.0827208 1054 6750 7804
624 623 2012-09-14 3 1 9 0 5 1 1 0.633333 0.594083 0.6725 0.103863 1379 6630 8009
625 624 2012-09-15 3 1 9 0 6 0 1 0.608333 0.585867 0.501667 0.247521 3160 5554 8714
626 625 2012-09-16 3 1 9 0 0 0 1 0.58 0.563125 0.57 0.0901833 2166 5167 7333
627 626 2012-09-17 3 1 9 0 1 1 2 0.580833 0.55305 0.734583 0.151742 1022 5847 6869
628 627 2012-09-18 3 1 9 0 2 1 2 0.623333 0.565067 0.8725 0.357587 371 3702 4073
629 628 2012-09-19 3 1 9 0 3 1 1 0.5525 0.540404 0.536667 0.215175 788 6803 7591
630 629 2012-09-20 3 1 9 0 4 1 1 0.546667 0.532192 0.618333 0.118167 939 6781 7720
631 630 2012-09-21 3 1 9 0 5 1 1 0.599167 0.571971 0.66875 0.154229 1250 6917 8167
632 631 2012-09-22 3 1 9 0 6 0 1 0.65 0.610488 0.646667 0.283583 2512 5883 8395
633 632 2012-09-23 4 1 9 0 0 0 1 0.529167 0.518933 0.467083 0.223258 2454 5453 7907
634 633 2012-09-24 4 1 9 0 1 1 1 0.514167 0.502513 0.492917 0.142404 1001 6435 7436
635 634 2012-09-25 4 1 9 0 2 1 1 0.55 0.544179 0.57 0.236321 845 6693 7538
636 635 2012-09-26 4 1 9 0 3 1 1 0.635 0.596613 0.630833 0.2444 787 6946 7733
637 636 2012-09-27 4 1 9 0 4 1 2 0.65 0.607975 0.690833 0.134342 751 6642 7393
638 637 2012-09-28 4 1 9 0 5 1 2 0.619167 0.585863 0.69 0.164179 1045 6370 7415
639 638 2012-09-29 4 1 9 0 6 0 1 0.5425 0.530296 0.542917 0.227604 2589 5966 8555
640 639 2012-09-30 4 1 9 0 0 0 1 0.526667 0.517663 0.583333 0.134958 2015 4874 6889
641 640 2012-10-01 4 1 10 0 1 1 2 0.520833 0.512 0.649167 0.0908042 763 6015 6778
642 641 2012-10-02 4 1 10 0 2 1 3 0.590833 0.542333 0.871667 0.104475 315 4324 4639
643 642 2012-10-03 4 1 10 0 3 1 2 0.6575 0.599133 0.79375 0.0665458 728 6844 7572
644 643 2012-10-04 4 1 10 0 4 1 2 0.6575 0.607975 0.722917 0.117546 891 6437 7328
645 644 2012-10-05 4 1 10 0 5 1 1 0.615 0.580187 0.6275 0.10635 1516 6640 8156
646 645 2012-10-06 4 1 10 0 6 0 1 0.554167 0.538521 0.664167 0.268025 3031 4934 7965
647 646 2012-10-07 4 1 10 0 0 0 2 0.415833 0.419813 0.708333 0.141162 781 2729 3510
648 647 2012-10-08 4 1 10 1 1 0 2 0.383333 0.387608 0.709583 0.189679 874 4604 5478
649 648 2012-10-09 4 1 10 0 2 1 2 0.446667 0.438112 0.761667 0.1903 601 5791 6392
650 649 2012-10-10 4 1 10 0 3 1 1 0.514167 0.503142 0.630833 0.187821 780 6911 7691
651 650 2012-10-11 4 1 10 0 4 1 1 0.435 0.431167 0.463333 0.181596 834 6736 7570
652 651 2012-10-12 4 1 10 0 5 1 1 0.4375 0.433071 0.539167 0.235092 1060 6222 7282
653 652 2012-10-13 4 1 10 0 6 0 1 0.393333 0.391396 0.494583 0.146142 2252 4857 7109
654 653 2012-10-14 4 1 10 0 0 0 1 0.521667 0.508204 0.640417 0.278612 2080 4559 6639
655 654 2012-10-15 4 1 10 0 1 1 2 0.561667 0.53915 0.7075 0.296037 760 5115 5875
656 655 2012-10-16 4 1 10 0 2 1 1 0.468333 0.460846 0.558333 0.182221 922 6612 7534
657 656 2012-10-17 4 1 10 0 3 1 1 0.455833 0.450108 0.692917 0.101371 979 6482 7461
658 657 2012-10-18 4 1 10 0 4 1 2 0.5225 0.512625 0.728333 0.236937 1008 6501 7509
659 658 2012-10-19 4 1 10 0 5 1 2 0.563333 0.537896 0.815 0.134954 753 4671 5424
660 659 2012-10-20 4 1 10 0 6 0 1 0.484167 0.472842 0.572917 0.117537 2806 5284 8090
661 660 2012-10-21 4 1 10 0 0 0 1 0.464167 0.456429 0.51 0.166054 2132 4692 6824
662 661 2012-10-22 4 1 10 0 1 1 1 0.4875 0.482942 0.568333 0.0814833 830 6228 7058
663 662 2012-10-23 4 1 10 0 2 1 1 0.544167 0.530304 0.641667 0.0945458 841 6625 7466
664 663 2012-10-24 4 1 10 0 3 1 1 0.5875 0.558721 0.63625 0.0727792 795 6898 7693
665 664 2012-10-25 4 1 10 0 4 1 2 0.55 0.529688 0.800417 0.124375 875 6484 7359
666 665 2012-10-26 4 1 10 0 5 1 2 0.545833 0.52275 0.807083 0.132467 1182 6262 7444
667 666 2012-10-27 4 1 10 0 6 0 2 0.53 0.515133 0.72 0.235692 2643 5209 7852
668 667 2012-10-28 4 1 10 0 0 0 2 0.4775 0.467771 0.694583 0.398008 998 3461 4459
669 668 2012-10-29 4 1 10 0 1 1 3 0.44 0.4394 0.88 0.3582 2 20 22
670 669 2012-10-30 4 1 10 0 2 1 2 0.318182 0.309909 0.825455 0.213009 87 1009 1096
671 670 2012-10-31 4 1 10 0 3 1 2 0.3575 0.3611 0.666667 0.166667 419 5147 5566
672 671 2012-11-01 4 1 11 0 4 1 2 0.365833 0.369942 0.581667 0.157346 466 5520 5986
673 672 2012-11-02 4 1 11 0 5 1 1 0.355 0.356042 0.522083 0.266175 618 5229 5847
674 673 2012-11-03 4 1 11 0 6 0 2 0.343333 0.323846 0.49125 0.270529 1029 4109 5138
675 674 2012-11-04 4 1 11 0 0 0 1 0.325833 0.329538 0.532917 0.179108 1201 3906 5107
676 675 2012-11-05 4 1 11 0 1 1 1 0.319167 0.308075 0.494167 0.236325 378 4881 5259
677 676 2012-11-06 4 1 11 0 2 1 1 0.280833 0.281567 0.567083 0.173513 466 5220 5686
678 677 2012-11-07 4 1 11 0 3 1 2 0.295833 0.274621 0.5475 0.304108 326 4709 5035
679 678 2012-11-08 4 1 11 0 4 1 1 0.352174 0.341891 0.333478 0.347835 340 4975 5315
680 679 2012-11-09 4 1 11 0 5 1 1 0.361667 0.355413 0.540833 0.214558 709 5283 5992
681 680 2012-11-10 4 1 11 0 6 0 1 0.389167 0.393937 0.645417 0.0578458 2090 4446 6536
682 681 2012-11-11 4 1 11 0 0 0 1 0.420833 0.421713 0.659167 0.1275 2290 4562 6852
683 682 2012-11-12 4 1 11 1 1 0 1 0.485 0.475383 0.741667 0.173517 1097 5172 6269
684 683 2012-11-13 4 1 11 0 2 1 2 0.343333 0.323225 0.662917 0.342046 327 3767 4094
685 684 2012-11-14 4 1 11 0 3 1 1 0.289167 0.281563 0.552083 0.199625 373 5122 5495
686 685 2012-11-15 4 1 11 0 4 1 2 0.321667 0.324492 0.620417 0.152987 320 5125 5445
687 686 2012-11-16 4 1 11 0 5 1 1 0.345 0.347204 0.524583 0.171025 484 5214 5698
688 687 2012-11-17 4 1 11 0 6 0 1 0.325 0.326383 0.545417 0.179729 1313 4316 5629
689 688 2012-11-18 4 1 11 0 0 0 1 0.3425 0.337746 0.692917 0.227612 922 3747 4669
690 689 2012-11-19 4 1 11 0 1 1 2 0.380833 0.375621 0.623333 0.235067 449 5050 5499
691 690 2012-11-20 4 1 11 0 2 1 2 0.374167 0.380667 0.685 0.082725 534 5100 5634
692 691 2012-11-21 4 1 11 0 3 1 1 0.353333 0.364892 0.61375 0.103246 615 4531 5146
693 692 2012-11-22 4 1 11 1 4 0 1 0.34 0.350371 0.580417 0.0528708 955 1470 2425
694 693 2012-11-23 4 1 11 0 5 1 1 0.368333 0.378779 0.56875 0.148021 1603 2307 3910
695 694 2012-11-24 4 1 11 0 6 0 1 0.278333 0.248742 0.404583 0.376871 532 1745 2277
696 695 2012-11-25 4 1 11 0 0 0 1 0.245833 0.257583 0.468333 0.1505 309 2115 2424
697 696 2012-11-26 4 1 11 0 1 1 1 0.313333 0.339004 0.535417 0.04665 337 4750 5087
698 697 2012-11-27 4 1 11 0 2 1 2 0.291667 0.281558 0.786667 0.237562 123 3836 3959
699 698 2012-11-28 4 1 11 0 3 1 1 0.296667 0.289762 0.50625 0.210821 198 5062 5260
700 699 2012-11-29 4 1 11 0 4 1 1 0.28087 0.298422 0.555652 0.115522 243 5080 5323
701 700 2012-11-30 4 1 11 0 5 1 1 0.298333 0.323867 0.649583 0.0584708 362 5306 5668
702 701 2012-12-01 4 1 12 0 6 0 2 0.298333 0.316904 0.806667 0.0597042 951 4240 5191
703 702 2012-12-02 4 1 12 0 0 0 2 0.3475 0.359208 0.823333 0.124379 892 3757 4649
704 703 2012-12-03 4 1 12 0 1 1 1 0.4525 0.455796 0.7675 0.0827208 555 5679 6234
705 704 2012-12-04 4 1 12 0 2 1 1 0.475833 0.469054 0.73375 0.174129 551 6055 6606
706 705 2012-12-05 4 1 12 0 3 1 1 0.438333 0.428012 0.485 0.324021 331 5398 5729
707 706 2012-12-06 4 1 12 0 4 1 1 0.255833 0.258204 0.50875 0.174754 340 5035 5375
708 707 2012-12-07 4 1 12 0 5 1 2 0.320833 0.321958 0.764167 0.1306 349 4659 5008
709 708 2012-12-08 4 1 12 0 6 0 2 0.381667 0.389508 0.91125 0.101379 1153 4429 5582
710 709 2012-12-09 4 1 12 0 0 0 2 0.384167 0.390146 0.905417 0.157975 441 2787 3228
711 710 2012-12-10 4 1 12 0 1 1 2 0.435833 0.435575 0.925 0.190308 329 4841 5170
712 711 2012-12-11 4 1 12 0 2 1 2 0.353333 0.338363 0.596667 0.296037 282 5219 5501
713 712 2012-12-12 4 1 12 0 3 1 2 0.2975 0.297338 0.538333 0.162937 310 5009 5319
714 713 2012-12-13 4 1 12 0 4 1 1 0.295833 0.294188 0.485833 0.174129 425 5107 5532
715 714 2012-12-14 4 1 12 0 5 1 1 0.281667 0.294192 0.642917 0.131229 429 5182 5611
716 715 2012-12-15 4 1 12 0 6 0 1 0.324167 0.338383 0.650417 0.10635 767 4280 5047
717 716 2012-12-16 4 1 12 0 0 0 2 0.3625 0.369938 0.83875 0.100742 538 3248 3786
718 717 2012-12-17 4 1 12 0 1 1 2 0.393333 0.4015 0.907083 0.0982583 212 4373 4585
719 718 2012-12-18 4 1 12 0 2 1 1 0.410833 0.409708 0.66625 0.221404 433 5124 5557
720 719 2012-12-19 4 1 12 0 3 1 1 0.3325 0.342162 0.625417 0.184092 333 4934 5267
721 720 2012-12-20 4 1 12 0 4 1 2 0.33 0.335217 0.667917 0.132463 314 3814 4128
722 721 2012-12-21 1 1 12 0 5 1 2 0.326667 0.301767 0.556667 0.374383 221 3402 3623
723 722 2012-12-22 1 1 12 0 6 0 1 0.265833 0.236113 0.44125 0.407346 205 1544 1749
724 723 2012-12-23 1 1 12 0 0 0 1 0.245833 0.259471 0.515417 0.133083 408 1379 1787
725 724 2012-12-24 1 1 12 0 1 1 2 0.231304 0.2589 0.791304 0.0772304 174 746 920
726 725 2012-12-25 1 1 12 1 2 0 2 0.291304 0.294465 0.734783 0.168726 440 573 1013
727 726 2012-12-26 1 1 12 0 3 1 3 0.243333 0.220333 0.823333 0.316546 9 432 441
728 727 2012-12-27 1 1 12 0 4 1 2 0.254167 0.226642 0.652917 0.350133 247 1867 2114
729 728 2012-12-28 1 1 12 0 5 1 2 0.253333 0.255046 0.59 0.155471 644 2451 3095
730 729 2012-12-29 1 1 12 0 6 0 2 0.253333 0.2424 0.752917 0.124383 159 1182 1341
731 730 2012-12-30 1 1 12 0 0 0 1 0.255833 0.2317 0.483333 0.350754 364 1432 1796
732 731 2012-12-31 1 1 12 0 1 1 2 0.215833 0.223487 0.5775 0.154846 439 2290 2729

File diff suppressed because one or more lines are too long

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:346a4fc838c6474b144db51a5783842bfd0ddee5bd13e6faebad2dabcbf897e0
size 293284

@ -0,0 +1,2 @@
model_checkpoint_path: "image_classification"
all_model_checkpoint_paths: "image_classification"

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:54636561a3ce25bd3e19253c6b0d8538147b0ae398331ac4a2d86c6d987368cd
size 31035704

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:766b2cef9fbc745cf056b3152224f7cf77163b330ea9a15f9392beb8b89bc5a8
size 31035320

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0f00d98ebfb30b3ec0ad19f9756dc2630b89003e10525f5e148445e82aa6a1f9
size 31035999

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3f7bb240661948b8f4d53e36ec720d8306f5668bd0071dcb4e6c947f78e9682b
size 31035696

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d91802434d8376bbaeeadf58a737e3a1b12ac839077e931237e0dcd43adcb154
size 31035623

@ -0,0 +1 @@
<meta HTTP-EQUIV="REFRESH" content="0; url=http://www.cs.toronto.edu/~kriz/cifar.html">

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6d958be074577803d12ecdefd02955f39262c83c16fe9348329d7fe0b5c001ce
size 170498071

File diff suppressed because one or more lines are too long

@ -0,0 +1,165 @@
import pickle
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelBinarizer
def _load_label_names():
"""
Load the label names from file
"""
return ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
def load_cfar10_batch(cifar10_dataset_folder_path, batch_id):
"""
Load a batch of the dataset
"""
with open(cifar10_dataset_folder_path + '/data_batch_' + str(batch_id), mode='rb') as file:
batch = pickle.load(file, encoding='latin1')
features = batch['data'].reshape((len(batch['data']), 3, 32, 32)).transpose(0, 2, 3, 1)
labels = batch['labels']
return features, labels
def display_stats(cifar10_dataset_folder_path, batch_id, sample_id):
"""
Display Stats of the the dataset
"""
batch_ids = list(range(1, 6))
if batch_id not in batch_ids:
print('Batch Id out of Range. Possible Batch Ids: {}'.format(batch_ids))
return None
features, labels = load_cfar10_batch(cifar10_dataset_folder_path, batch_id)
if not (0 <= sample_id < len(features)):
print('{} samples in batch {}. {} is out of range.'.format(len(features), batch_id, sample_id))
return None
print('\nStats of batch {}:'.format(batch_id))
print('Samples: {}'.format(len(features)))
print('Label Counts: {}'.format(dict(zip(*np.unique(labels, return_counts=True)))))
print('First 20 Labels: {}'.format(labels[:20]))
sample_image = features[sample_id]
sample_label = labels[sample_id]
label_names = _load_label_names()
print('\nExample of Image {}:'.format(sample_id))
print('Image - Min Value: {} Max Value: {}'.format(sample_image.min(), sample_image.max()))
print('Image - Shape: {}'.format(sample_image.shape))
print('Label - Label Id: {} Name: {}'.format(sample_label, label_names[sample_label]))
plt.axis('off')
plt.imshow(sample_image)
def _preprocess_and_save(normalize, one_hot_encode, features, labels, filename):
"""
Preprocess data and save it to file
"""
features = normalize(features)
labels = one_hot_encode(labels)
pickle.dump((features, labels), open(filename, 'wb'))
def preprocess_and_save_data(cifar10_dataset_folder_path, normalize, one_hot_encode):
"""
Preprocess Training and Validation Data
"""
n_batches = 5
valid_features = []
valid_labels = []
for batch_i in range(1, n_batches + 1):
features, labels = load_cfar10_batch(cifar10_dataset_folder_path, batch_i)
validation_count = int(len(features) * 0.1)
# Prprocess and save a batch of training data
_preprocess_and_save(
normalize,
one_hot_encode,
features[:-validation_count],
labels[:-validation_count],
'preprocess_batch_' + str(batch_i) + '.p')
# Use a portion of training batch for validation
valid_features.extend(features[-validation_count:])
valid_labels.extend(labels[-validation_count:])
# Preprocess and Save all validation data
_preprocess_and_save(
normalize,
one_hot_encode,
np.array(valid_features),
np.array(valid_labels),
'preprocess_validation.p')
with open(cifar10_dataset_folder_path + '/test_batch', mode='rb') as file:
batch = pickle.load(file, encoding='latin1')
# load the training data
test_features = batch['data'].reshape((len(batch['data']), 3, 32, 32)).transpose(0, 2, 3, 1)
test_labels = batch['labels']
# Preprocess and Save all training data
_preprocess_and_save(
normalize,
one_hot_encode,
np.array(test_features),
np.array(test_labels),
'preprocess_training.p')
def batch_features_labels(features, labels, batch_size):
"""
Split features and labels into batches
"""
for start in range(0, len(features), batch_size):
end = min(start + batch_size, len(features))
yield features[start:end], labels[start:end]
def load_preprocess_training_batch(batch_id, batch_size):
"""
Load the Preprocessed Training data and return them in batches of <batch_size> or less
"""
filename = 'preprocess_batch_' + str(batch_id) + '.p'
features, labels = pickle.load(open(filename, mode='rb'))
# Return the training data in batches of size <batch_size> or less
return batch_features_labels(features, labels, batch_size)
def display_image_predictions(features, labels, predictions):
n_classes = 10
label_names = _load_label_names()
label_binarizer = LabelBinarizer()
label_binarizer.fit(range(n_classes))
label_ids = label_binarizer.inverse_transform(np.array(labels))
fig, axies = plt.subplots(nrows=4, ncols=2)
fig.tight_layout()
fig.suptitle('Softmax Predictions', fontsize=20, y=1.1)
n_predictions = 3
margin = 0.05
ind = np.arange(n_predictions)
width = (1. - 2. * margin) / n_predictions
for image_i, (feature, label_id, pred_indicies, pred_values) in enumerate(zip(features, label_ids, predictions.indices, predictions.values)):
pred_names = [label_names[pred_i] for pred_i in pred_indicies]
correct_name = label_names[label_id]
axies[image_i][0].imshow(feature*255)
axies[image_i][0].set_title(correct_name)
axies[image_i][0].set_axis_off()
axies[image_i][1].barh(ind + margin, pred_values[::-1], width)
axies[image_i][1].set_yticks(ind + margin)
axies[image_i][1].set_yticklabels(pred_names[::-1])
axies[image_i][1].set_xticks([0, 0.5, 1.0])

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e64b4233415742ee2ef69338ffbc2d61210fa3092abd42369a5fd55e41748e46
size 23940328

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:56ee02f2c8292680e92c5a415a29d4b5dd53bf02608edcfe5a83f8017b8a6a5c
size 221904211

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:35bf0c70458928128eb21122c36a32a15bca438842b9062363ad7cef5b5ab8be
size 221904211

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bf3b607a39cf3a68951b513ce602869ce38bcb86d369042fac95adf0cdbcdb0d
size 221904211

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:85216cbda3a030e02d55de2cee83124019ff9d1c95d239c166dff276278fc4d9
size 221904211

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:855256a55dc3ea09e29e46f05070f09207ce5ad603ff137fb12fcc617efaed01
size 221904211

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ec95efd0e1e788eabcaa51a460868cb63d55ed774a2cca4c5cacfb777360f9c0
size 246560211

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6db0027275dd6c8f3a8d012bc0e2c6422c638f8dc9db30ce3de8298f0ea10e95
size 123280211

@ -0,0 +1,199 @@
import os
import numpy as np
import tensorflow as tf
import random
from unittest.mock import MagicMock
def _print_success_message():
return print('Tests Passed')
def test_folder_path(cifar10_dataset_folder_path):
assert cifar10_dataset_folder_path is not None,\
'Cifar-10 data folder not set.'
assert cifar10_dataset_folder_path[-1] != '/',\
'The "/" shouldn\'t be added to the end of the path.'
assert os.path.exists(cifar10_dataset_folder_path),\
'Path not found.'
assert os.path.isdir(cifar10_dataset_folder_path),\
'{} is not a folder.'.format(os.path.basename(cifar10_dataset_folder_path))
train_files = [cifar10_dataset_folder_path + '/data_batch_' + str(batch_id) for batch_id in range(1, 6)]
other_files = [cifar10_dataset_folder_path + '/batches.meta', cifar10_dataset_folder_path + '/test_batch']
missing_files = [path for path in train_files + other_files if not os.path.exists(path)]
assert not missing_files,\
'Missing files in directory: {}'.format(missing_files)
print('All files found!')
def test_normalize(normalize):
test_shape = (np.random.choice(range(1000)), 32, 32, 3)
test_numbers = np.random.choice(range(256), test_shape)
normalize_out = normalize(test_numbers)
assert type(normalize_out).__module__ == np.__name__,\
'Not Numpy Object'
assert normalize_out.shape == test_shape,\
'Incorrect Shape. {} shape found'.format(normalize_out.shape)
assert normalize_out.max() <= 1 and normalize_out.min() >= 0,\
'Incorect Range. {} to {} found'.format(normalize_out.min(), normalize_out.max())
_print_success_message()
def test_one_hot_encode(one_hot_encode):
test_shape = np.random.choice(range(1000))
test_numbers = np.random.choice(range(10), test_shape)
one_hot_out = one_hot_encode(test_numbers)
assert type(one_hot_out).__module__ == np.__name__,\
'Not Numpy Object'
assert one_hot_out.shape == (test_shape, 10),\
'Incorrect Shape. {} shape found'.format(one_hot_out.shape)
n_encode_tests = 5
test_pairs = list(zip(test_numbers, one_hot_out))
test_indices = np.random.choice(len(test_numbers), n_encode_tests)
labels = [test_pairs[test_i][0] for test_i in test_indices]
enc_labels = np.array([test_pairs[test_i][1] for test_i in test_indices])
new_enc_labels = one_hot_encode(labels)
assert np.array_equal(enc_labels, new_enc_labels),\
'Encodings returned different results for the same numbers.\n' \
'For the first call it returned:\n' \
'{}\n' \
'For the second call it returned\n' \
'{}\n' \
'Make sure you save the map of labels to encodings outside of the function.'.format(enc_labels, new_enc_labels)
_print_success_message()
def test_nn_image_inputs(neural_net_image_input):
image_shape = (32, 32, 3)
nn_inputs_out_x = neural_net_image_input(image_shape)
assert nn_inputs_out_x.get_shape().as_list() == [None, image_shape[0], image_shape[1], image_shape[2]],\
'Incorrect Image Shape. Found {} shape'.format(nn_inputs_out_x.get_shape().as_list())
assert nn_inputs_out_x.op.type == 'Placeholder',\
'Incorrect Image Type. Found {} type'.format(nn_inputs_out_x.op.type)
assert nn_inputs_out_x.name == 'x:0', \
'Incorrect Name. Found {}'.format(nn_inputs_out_x.name)
print('Image Input Tests Passed.')
def test_nn_label_inputs(neural_net_label_input):
n_classes = 10
nn_inputs_out_y = neural_net_label_input(n_classes)
assert nn_inputs_out_y.get_shape().as_list() == [None, n_classes],\
'Incorrect Label Shape. Found {} shape'.format(nn_inputs_out_y.get_shape().as_list())
assert nn_inputs_out_y.op.type == 'Placeholder',\
'Incorrect Label Type. Found {} type'.format(nn_inputs_out_y.op.type)
assert nn_inputs_out_y.name == 'y:0', \
'Incorrect Name. Found {}'.format(nn_inputs_out_y.name)
print('Label Input Tests Passed.')
def test_nn_keep_prob_inputs(neural_net_keep_prob_input):
nn_inputs_out_k = neural_net_keep_prob_input()
assert nn_inputs_out_k.get_shape().ndims is None,\
'Too many dimensions found for keep prob. Found {} dimensions. It should be a scalar (0-Dimension Tensor).'.format(nn_inputs_out_k.get_shape().ndims)
assert nn_inputs_out_k.op.type == 'Placeholder',\
'Incorrect keep prob Type. Found {} type'.format(nn_inputs_out_k.op.type)
assert nn_inputs_out_k.name == 'keep_prob:0', \
'Incorrect Name. Found {}'.format(nn_inputs_out_k.name)
print('Keep Prob Tests Passed.')
def test_con_pool(conv2d_maxpool):
test_x = tf.placeholder(tf.float32, [None, 32, 32, 5])
test_num_outputs = 10
test_con_k = (2, 2)
test_con_s = (4, 4)
test_pool_k = (2, 2)
test_pool_s = (2, 2)
conv2d_maxpool_out = conv2d_maxpool(test_x, test_num_outputs, test_con_k, test_con_s, test_pool_k, test_pool_s)
assert conv2d_maxpool_out.get_shape().as_list() == [None, 4, 4, 10],\
'Incorrect Shape. Found {} shape'.format(conv2d_maxpool_out.get_shape().as_list())
_print_success_message()
def test_flatten(flatten):
test_x = tf.placeholder(tf.float32, [None, 10, 30, 6])
flat_out = flatten(test_x)
assert flat_out.get_shape().as_list() == [None, 10*30*6],\
'Incorrect Shape. Found {} shape'.format(flat_out.get_shape().as_list())
_print_success_message()
def test_fully_conn(fully_conn):
test_x = tf.placeholder(tf.float32, [None, 128])
test_num_outputs = 40
fc_out = fully_conn(test_x, test_num_outputs)
assert fc_out.get_shape().as_list() == [None, 40],\
'Incorrect Shape. Found {} shape'.format(fc_out.get_shape().as_list())
_print_success_message()
def test_output(output):
test_x = tf.placeholder(tf.float32, [None, 128])
test_num_outputs = 40
output_out = output(test_x, test_num_outputs)
assert output_out.get_shape().as_list() == [None, 40],\
'Incorrect Shape. Found {} shape'.format(output_out.get_shape().as_list())
_print_success_message()
def test_conv_net(conv_net):
test_x = tf.placeholder(tf.float32, [None, 32, 32, 3])
test_k = tf.placeholder(tf.float32)
logits_out = conv_net(test_x, test_k)
assert logits_out.get_shape().as_list() == [None, 10],\
'Incorrect Model Output. Found {}'.format(logits_out.get_shape().as_list())
print('Neural Network Built!')
def test_train_nn(train_neural_network):
mock_session = tf.Session()
test_x = np.random.rand(128, 32, 32, 3)
test_y = np.random.rand(128, 10)
test_k = np.random.rand(1)
test_optimizer = tf.train.AdamOptimizer()
mock_session.run = MagicMock()
train_neural_network(mock_session, test_optimizer, test_k, test_x, test_y)
assert mock_session.run.called, 'Session not used'
_print_success_message()

@ -0,0 +1,705 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Anna KaRNNa\n",
"\n",
"In this notebook, I'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.\n",
"\n",
"This network is based off of Andrej Karpathy's [post on RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) and [implementation in Torch](https://github.com/karpathy/char-rnn). Also, some information [here at r2rt](http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html) and from [Sherjil Ozair](https://github.com/sherjilozair/char-rnn-tensorflow) on GitHub. Below is the general architecture of the character-wise RNN.\n",
"\n",
"<img src=\"assets/charseq.jpeg\" width=\"500\">"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"import time\n",
"from collections import namedtuple\n",
"\n",
"import numpy as np\n",
"import tensorflow as tf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we'll load the text file and convert it into integers for our network to use. Here I'm creating a couple dictionaries to convert the characters to and from integers. Encoding the characters as integers makes it easier to use as input in the network."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"with open('anna.txt', 'r') as f:\n",
" text=f.read()\n",
"vocab = set(text)\n",
"vocab_to_int = {c: i for i, c in enumerate(vocab)}\n",
"int_to_vocab = dict(enumerate(vocab))\n",
"chars = np.array([vocab_to_int[c] for c in text], dtype=np.int32)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's check out the first 100 characters, make sure everything is peachy. According to the [American Book Review](http://americanbookreview.org/100bestlines.asp), this is the 6th best first line of a book ever."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'Chapter 1\\n\\n\\nHappy families are all alike; every unhappy family is unhappy in its own\\nway.\\n\\nEverythin'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"text[:100]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And we can see the characters encoded as integers."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([47, 21, 34, 65, 42, 26, 54, 15, 1, 32, 32, 32, 19, 34, 65, 65, 57,\n",
" 15, 74, 34, 5, 80, 27, 80, 26, 68, 15, 34, 54, 26, 15, 34, 27, 27,\n",
" 15, 34, 27, 80, 7, 26, 16, 15, 26, 78, 26, 54, 57, 15, 66, 24, 21,\n",
" 34, 65, 65, 57, 15, 74, 34, 5, 80, 27, 57, 15, 80, 68, 15, 66, 24,\n",
" 21, 34, 65, 65, 57, 15, 80, 24, 15, 80, 42, 68, 15, 40, 2, 24, 32,\n",
" 2, 34, 57, 12, 32, 32, 17, 78, 26, 54, 57, 42, 21, 80, 24], dtype=int32)"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chars[:100]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since the network is working with individual characters, it's similar to a classification problem in which we are trying to predict the next character from the previous text. Here's how many 'classes' our network has to pick from."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"82"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.max(chars)+1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Making training and validation batches\n",
"\n",
"Now I need to split up the data into batches, and into training and validation sets. I should be making a test set here, but I'm not going to worry about that. My test will be if the network can generate new text.\n",
"\n",
"Here I'll make both input and target arrays. The targets are the same as the inputs, except shifted one character over. I'll also drop the last bit of data so that I'll only have completely full batches.\n",
"\n",
"The idea here is to make a 2D matrix where the number of rows is equal to the batch size. Each row will be one long concatenated string from the character data. We'll split this data into a training set and validation set using the `split_frac` keyword. This will keep 90% of the batches in the training set, the other 10% in the validation set."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def split_data(chars, batch_size, num_steps, split_frac=0.9):\n",
" \"\"\" \n",
" Split character data into training and validation sets, inputs and targets for each set.\n",
" \n",
" Arguments\n",
" ---------\n",
" chars: character array\n",
" batch_size: Size of examples in each of batch\n",
" num_steps: Number of sequence steps to keep in the input and pass to the network\n",
" split_frac: Fraction of batches to keep in the training set\n",
" \n",
" \n",
" Returns train_x, train_y, val_x, val_y\n",
" \"\"\"\n",
" \n",
" slice_size = batch_size * num_steps\n",
" n_batches = int(len(chars) / slice_size)\n",
" \n",
" # Drop the last few characters to make only full batches\n",
" x = chars[: n_batches*slice_size]\n",
" y = chars[1: n_batches*slice_size + 1]\n",
" \n",
" # Split the data into batch_size slices, then stack them into a 2D matrix \n",
" x = np.stack(np.split(x, batch_size))\n",
" y = np.stack(np.split(y, batch_size))\n",
" \n",
" # Now x and y are arrays with dimensions batch_size x n_batches*num_steps\n",
" \n",
" # Split into training and validation sets, keep the first split_frac batches for training\n",
" split_idx = int(n_batches*split_frac)\n",
" train_x, train_y= x[:, :split_idx*num_steps], y[:, :split_idx*num_steps]\n",
" val_x, val_y = x[:, split_idx*num_steps:], y[:, split_idx*num_steps:]\n",
" \n",
" return train_x, train_y, val_x, val_y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now I'll make my data sets and we can check out what's going on here. Here I'm going to use a batch size of 10 and 50 sequence steps."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"train_x, train_y, val_x, val_y = split_data(chars, 10, 50)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(10, 178650)"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_x.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looking at the size of this array, we see that we have rows equal to the batch size. When we want to get a batch out of here, we can grab a subset of this array that contains all the rows but has a width equal to the number of steps in the sequence. The first batch looks like this:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[47, 21, 34, 65, 42, 26, 54, 15, 1, 32, 32, 32, 19, 34, 65, 65, 57,\n",
" 15, 74, 34, 5, 80, 27, 80, 26, 68, 15, 34, 54, 26, 15, 34, 27, 27,\n",
" 15, 34, 27, 80, 7, 26, 16, 15, 26, 78, 26, 54, 57, 15, 66, 24],\n",
" [15, 34, 5, 15, 24, 40, 42, 15, 37, 40, 80, 24, 37, 15, 42, 40, 15,\n",
" 68, 42, 34, 57, 36, 72, 15, 34, 24, 68, 2, 26, 54, 26, 38, 15, 35,\n",
" 24, 24, 34, 36, 15, 68, 5, 80, 27, 80, 24, 37, 36, 15, 11, 66],\n",
" [78, 80, 24, 12, 32, 32, 72, 67, 26, 68, 36, 15, 80, 42, 0, 68, 15,\n",
" 68, 26, 42, 42, 27, 26, 38, 12, 15, 10, 21, 26, 15, 65, 54, 80, 49,\n",
" 26, 15, 80, 68, 15, 5, 34, 37, 24, 80, 74, 80, 49, 26, 24, 42],\n",
" [24, 15, 38, 66, 54, 80, 24, 37, 15, 21, 80, 68, 15, 49, 40, 24, 78,\n",
" 26, 54, 68, 34, 42, 80, 40, 24, 15, 2, 80, 42, 21, 15, 21, 80, 68,\n",
" 32, 11, 54, 40, 42, 21, 26, 54, 15, 2, 34, 68, 15, 42, 21, 80],\n",
" [15, 80, 42, 15, 80, 68, 36, 15, 68, 80, 54, 81, 72, 15, 68, 34, 80,\n",
" 38, 15, 42, 21, 26, 15, 40, 27, 38, 15, 5, 34, 24, 36, 15, 37, 26,\n",
" 42, 42, 80, 24, 37, 15, 66, 65, 36, 15, 34, 24, 38, 32, 49, 54],\n",
" [15, 79, 42, 15, 2, 34, 68, 32, 40, 24, 27, 57, 15, 2, 21, 26, 24,\n",
" 15, 42, 21, 26, 15, 68, 34, 5, 26, 15, 26, 78, 26, 24, 80, 24, 37,\n",
" 15, 21, 26, 15, 49, 34, 5, 26, 15, 42, 40, 15, 42, 21, 26, 80],\n",
" [21, 26, 24, 15, 49, 40, 5, 26, 15, 74, 40, 54, 15, 5, 26, 36, 72,\n",
" 15, 68, 21, 26, 15, 68, 34, 80, 38, 36, 15, 34, 24, 38, 15, 2, 26,\n",
" 24, 42, 15, 11, 34, 49, 7, 15, 80, 24, 42, 40, 15, 42, 21, 26],\n",
" [16, 15, 11, 66, 42, 15, 24, 40, 2, 15, 68, 21, 26, 15, 2, 40, 66,\n",
" 27, 38, 15, 54, 26, 34, 38, 80, 27, 57, 15, 21, 34, 78, 26, 15, 68,\n",
" 34, 49, 54, 80, 74, 80, 49, 26, 38, 36, 15, 24, 40, 42, 15, 5],\n",
" [42, 15, 80, 68, 24, 0, 42, 12, 15, 10, 21, 26, 57, 0, 54, 26, 15,\n",
" 65, 54, 40, 65, 54, 80, 26, 42, 40, 54, 68, 15, 40, 74, 15, 34, 15,\n",
" 68, 40, 54, 42, 36, 32, 11, 66, 42, 15, 2, 26, 0, 54, 26, 15],\n",
" [15, 68, 34, 80, 38, 15, 42, 40, 15, 21, 26, 54, 68, 26, 27, 74, 36,\n",
" 15, 34, 24, 38, 15, 11, 26, 37, 34, 24, 15, 34, 37, 34, 80, 24, 15,\n",
" 74, 54, 40, 5, 15, 42, 21, 26, 15, 11, 26, 37, 80, 24, 24, 80]], dtype=int32)"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_x[:,:50]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I'll write another function to grab batches out of the arrays made by `split_data`. Here each batch will be a sliding window on these arrays with size `batch_size X num_steps`. For example, if we want our network to train on a sequence of 100 characters, `num_steps = 100`. For the next batch, we'll shift this window the next sequence of `num_steps` characters. In this way we can feed batches to the network and the cell states will continue through on each batch."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_batch(arrs, num_steps):\n",
" batch_size, slice_size = arrs[0].shape\n",
" \n",
" n_batches = int(slice_size/num_steps)\n",
" for b in range(n_batches):\n",
" yield [x[:, b*num_steps: (b+1)*num_steps] for x in arrs]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building the model\n",
"\n",
"Below is a function where I build the graph for the network."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"def build_rnn(num_classes, batch_size=50, num_steps=50, lstm_size=128, num_layers=2,\n",
" learning_rate=0.001, grad_clip=5, sampling=False):\n",
" \n",
" # When we're using this network for sampling later, we'll be passing in\n",
" # one character at a time, so providing an option for that\n",
" if sampling == True:\n",
" batch_size, num_steps = 1, 1\n",
"\n",
" tf.reset_default_graph()\n",
" \n",
" # Declare placeholders we'll feed into the graph\n",
" inputs = tf.placeholder(tf.int32, [batch_size, num_steps], name='inputs')\n",
" targets = tf.placeholder(tf.int32, [batch_size, num_steps], name='targets')\n",
" \n",
" # Keep probability placeholder for drop out layers\n",
" keep_prob = tf.placeholder(tf.float32, name='keep_prob')\n",
" \n",
" # One-hot encoding the input and target characters\n",
" x_one_hot = tf.one_hot(inputs, num_classes)\n",
" y_one_hot = tf.one_hot(targets, num_classes)\n",
"\n",
" ### Build the RNN layers\n",
" # Use a basic LSTM cell\n",
" lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)\n",
" \n",
" # Add dropout to the cell\n",
" drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)\n",
" \n",
" # Stack up multiple LSTM layers, for deep learning\n",
" cell = tf.contrib.rnn.MultiRNNCell([drop] * num_layers)\n",
" initial_state = cell.zero_state(batch_size, tf.float32)\n",
"\n",
" ### Run the data through the RNN layers\n",
" # This makes a list where each element is one step in the sequence\n",
" rnn_inputs = [tf.squeeze(i, squeeze_dims=[1]) for i in tf.split(x_one_hot, num_steps, 1)]\n",
" \n",
" # Run each sequence step through the RNN and collect the outputs\n",
" outputs, state = tf.contrib.rnn.static_rnn(cell, rnn_inputs, initial_state=initial_state)\n",
" final_state = state\n",
" \n",
" # Reshape output so it's a bunch of rows, one output row for each step for each batch\n",
" seq_output = tf.concat(outputs, axis=1)\n",
" output = tf.reshape(seq_output, [-1, lstm_size])\n",
" \n",
" # Now connect the RNN outputs to a softmax layer\n",
" with tf.variable_scope('softmax'):\n",
" softmax_w = tf.Variable(tf.truncated_normal((lstm_size, num_classes), stddev=0.1))\n",
" softmax_b = tf.Variable(tf.zeros(num_classes))\n",
" \n",
" # Since output is a bunch of rows of RNN cell outputs, logits will be a bunch\n",
" # of rows of logit outputs, one for each step and batch\n",
" logits = tf.matmul(output, softmax_w) + softmax_b\n",
" \n",
" # Use softmax to get the probabilities for predicted characters\n",
" preds = tf.nn.softmax(logits, name='predictions')\n",
" \n",
" # Reshape the targets to match the logits\n",
" y_reshaped = tf.reshape(y_one_hot, [-1, num_classes])\n",
" loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped)\n",
" cost = tf.reduce_mean(loss)\n",
"\n",
" # Optimizer for training, using gradient clipping to control exploding gradients\n",
" tvars = tf.trainable_variables()\n",
" grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), grad_clip)\n",
" train_op = tf.train.AdamOptimizer(learning_rate)\n",
" optimizer = train_op.apply_gradients(zip(grads, tvars))\n",
" \n",
" # Export the nodes\n",
" # NOTE: I'm using a namedtuple here because I think they are cool\n",
" export_nodes = ['inputs', 'targets', 'initial_state', 'final_state',\n",
" 'keep_prob', 'cost', 'preds', 'optimizer']\n",
" Graph = namedtuple('Graph', export_nodes)\n",
" local_dict = locals()\n",
" graph = Graph(*[local_dict[each] for each in export_nodes])\n",
" \n",
" return graph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hyperparameters\n",
"\n",
"Here I'm defining the hyperparameters for the network. \n",
"\n",
"* `batch_size` - Number of sequences running through the network in one pass.\n",
"* `num_steps` - Number of characters in the sequence the network is trained on. Larger is better typically, the network will learn more long range dependencies. But it takes longer to train. 100 is typically a good number here.\n",
"* `lstm_size` - The number of units in the hidden layers.\n",
"* `num_layers` - Number of hidden LSTM layers to use\n",
"* `learning_rate` - Learning rate for training\n",
"* `keep_prob` - The dropout keep probability when training. If you're network is overfitting, try decreasing this.\n",
"\n",
"Here's some good advice from Andrej Karpathy on training the network. I'm going to write it in here for your benefit, but also link to [where it originally came from](https://github.com/karpathy/char-rnn#tips-and-tricks).\n",
"\n",
"> ## Tips and Tricks\n",
"\n",
">### Monitoring Validation Loss vs. Training Loss\n",
">If you're somewhat new to Machine Learning or Neural Networks it can take a bit of expertise to get good models. The most important quantity to keep track of is the difference between your training loss (printed during training) and the validation loss (printed once in a while when the RNN is run on the validation data (by default every 1000 iterations)). In particular:\n",
"\n",
"> - If your training loss is much lower than validation loss then this means the network might be **overfitting**. Solutions to this are to decrease your network size, or to increase dropout. For example you could try dropout of 0.5 and so on.\n",
"> - If your training/validation loss are about equal then your model is **underfitting**. Increase the size of your model (either number of layers or the raw number of neurons per layer)\n",
"\n",
"> ### Approximate number of parameters\n",
"\n",
"> The two most important parameters that control the model are `lstm_size` and `num_layers`. I would advise that you always use `num_layers` of either 2/3. The `lstm_size` can be adjusted based on how much data you have. The two important quantities to keep track of here are:\n",
"\n",
"> - The number of parameters in your model. This is printed when you start training.\n",
"> - The size of your dataset. 1MB file is approximately 1 million characters.\n",
"\n",
">These two should be about the same order of magnitude. It's a little tricky to tell. Here are some examples:\n",
"\n",
"> - I have a 100MB dataset and I'm using the default parameter settings (which currently print 150K parameters). My data size is significantly larger (100 mil >> 0.15 mil), so I expect to heavily underfit. I am thinking I can comfortably afford to make `lstm_size` larger.\n",
"> - I have a 10MB dataset and running a 10 million parameter model. I'm slightly nervous and I'm carefully monitoring my validation loss. If it's larger than my training loss then I may want to try to increase dropout a bit and see if that helps the validation loss.\n",
"\n",
"> ### Best models strategy\n",
"\n",
">The winning strategy to obtaining very good models (if you have the compute time) is to always err on making the network larger (as large as you're willing to wait for it to compute) and then try different dropout values (between 0,1). Whatever model has the best validation performance (the loss, written in the checkpoint filename, low is good) is the one you should use in the end.\n",
"\n",
">It is very common in deep learning to run many different models with many different hyperparameter settings, and in the end take whatever checkpoint gave the best validation performance.\n",
"\n",
">By the way, the size of your training and validation splits are also parameters. Make sure you have a decent amount of data in your validation set or otherwise the validation performance will be noisy and not very informative.\n"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"batch_size = 100\n",
"num_steps = 100 \n",
"lstm_size = 512\n",
"num_layers = 2\n",
"learning_rate = 0.001\n",
"keep_prob = 0.5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training\n",
"\n",
"Time for training which is pretty straightforward. Here I pass in some data, and get an LSTM state back. Then I pass that state back in to the network so the next batch can continue the state from the previous batch. And every so often (set by `save_every_n`) I calculate the validation loss and save a checkpoint.\n",
"\n",
"Here I'm saving checkpoints with the format\n",
"\n",
"`i{iteration number}_l{# hidden layer units}_v{validation loss}.ckpt`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true,
"scrolled": true
},
"outputs": [],
"source": [
"epochs = 20\n",
"# Save every N iterations\n",
"save_every_n = 200\n",
"train_x, train_y, val_x, val_y = split_data(chars, batch_size, num_steps)\n",
"\n",
"model = build_rnn(len(vocab), \n",
" batch_size=batch_size,\n",
" num_steps=num_steps,\n",
" learning_rate=learning_rate,\n",
" lstm_size=lstm_size,\n",
" num_layers=num_layers)\n",
"\n",
"saver = tf.train.Saver(max_to_keep=100)\n",
"with tf.Session() as sess:\n",
" sess.run(tf.global_variables_initializer())\n",
" \n",
" # Use the line below to load a checkpoint and resume training\n",
" #saver.restore(sess, 'checkpoints/______.ckpt')\n",
" \n",
" n_batches = int(train_x.shape[1]/num_steps)\n",
" iterations = n_batches * epochs\n",
" for e in range(epochs):\n",
" \n",
" # Train network\n",
" new_state = sess.run(model.initial_state)\n",
" loss = 0\n",
" for b, (x, y) in enumerate(get_batch([train_x, train_y], num_steps), 1):\n",
" iteration = e*n_batches + b\n",
" start = time.time()\n",
" feed = {model.inputs: x,\n",
" model.targets: y,\n",
" model.keep_prob: keep_prob,\n",
" model.initial_state: new_state}\n",
" batch_loss, new_state, _ = sess.run([model.cost, model.final_state, model.optimizer], \n",
" feed_dict=feed)\n",
" loss += batch_loss\n",
" end = time.time()\n",
" print('Epoch {}/{} '.format(e+1, epochs),\n",
" 'Iteration {}/{}'.format(iteration, iterations),\n",
" 'Training loss: {:.4f}'.format(loss/b),\n",
" '{:.4f} sec/batch'.format((end-start)))\n",
" \n",
" \n",
" if (iteration%save_every_n == 0) or (iteration == iterations):\n",
" # Check performance, notice dropout has been set to 1\n",
" val_loss = []\n",
" new_state = sess.run(model.initial_state)\n",
" for x, y in get_batch([val_x, val_y], num_steps):\n",
" feed = {model.inputs: x,\n",
" model.targets: y,\n",
" model.keep_prob: 1.,\n",
" model.initial_state: new_state}\n",
" batch_loss, new_state = sess.run([model.cost, model.final_state], feed_dict=feed)\n",
" val_loss.append(batch_loss)\n",
"\n",
" print('Validation loss:', np.mean(val_loss),\n",
" 'Saving checkpoint!')\n",
" saver.save(sess, \"checkpoints/i{}_l{}_v{:.3f}.ckpt\".format(iteration, lstm_size, np.mean(val_loss)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Saved checkpoints\n",
"\n",
"Read up on saving and loading checkpoints here: https://www.tensorflow.org/programmers_guide/variables"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"tf.train.get_checkpoint_state('checkpoints')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sampling\n",
"\n",
"Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.\n",
"\n",
"The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def pick_top_n(preds, vocab_size, top_n=5):\n",
" p = np.squeeze(preds)\n",
" p[np.argsort(p)[:-top_n]] = 0\n",
" p = p / np.sum(p)\n",
" c = np.random.choice(vocab_size, 1, p=p)[0]\n",
" return c"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def sample(checkpoint, n_samples, lstm_size, vocab_size, prime=\"The \"):\n",
" samples = [c for c in prime]\n",
" model = build_rnn(vocab_size, lstm_size=lstm_size, sampling=True)\n",
" saver = tf.train.Saver()\n",
" with tf.Session() as sess:\n",
" saver.restore(sess, checkpoint)\n",
" new_state = sess.run(model.initial_state)\n",
" for c in prime:\n",
" x = np.zeros((1, 1))\n",
" x[0,0] = vocab_to_int[c]\n",
" feed = {model.inputs: x,\n",
" model.keep_prob: 1.,\n",
" model.initial_state: new_state}\n",
" preds, new_state = sess.run([model.preds, model.final_state], \n",
" feed_dict=feed)\n",
"\n",
" c = pick_top_n(preds, len(vocab))\n",
" samples.append(int_to_vocab[c])\n",
"\n",
" for i in range(n_samples):\n",
" x[0,0] = c\n",
" feed = {model.inputs: x,\n",
" model.keep_prob: 1.,\n",
" model.initial_state: new_state}\n",
" preds, new_state = sess.run([model.preds, model.final_state], \n",
" feed_dict=feed)\n",
"\n",
" c = pick_top_n(preds, len(vocab))\n",
" samples.append(int_to_vocab[c])\n",
" \n",
" return ''.join(samples)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, pass in the path to a checkpoint and sample from the network."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"checkpoint = \"checkpoints/____.ckpt\"\n",
"samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime=\"Far\")\n",
"print(samp)"
]
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

File diff suppressed because one or more lines are too long

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:568bb39fc78aca98e9db91c4329dcd1aa5ec4c3df3aca2064ce5d6f023ae16c9
size 2025486

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

File diff suppressed because one or more lines are too long

@ -0,0 +1,59 @@
name: dlnd-tf-lab
dependencies:
- openssl=1.0.2j
- pip>=8.1.2
- psutil=4.4.1
- python>=3.4.0
- readline=6.2
- setuptools=27.2.0
- sqlite=3.13.0
- tk=8.5.18
- wheel=0.29.0
- xz=5.2.2
- zlib=1.2.8
- pip:
- appnope==0.1.0
- cycler==0.10.0
- decorator==4.0.10
- entrypoints==0.2.2
- ipykernel==4.5.0
- ipython==5.1.0
- ipython-genutils==0.1.0
- ipywidgets==5.2.2
- jinja2==2.8
- jsonschema==2.5.1
- jupyter==1.0.0
- jupyter-client==4.4.0
- jupyter-console==5.0.0
- jupyter-core==4.2.0
- markupsafe==0.23
- matplotlib==1.5.3
- mistune==0.7.3
- nbconvert==4.2.0
- nbformat==4.1.0
- notebook==4.2.3
- numpy==1.11.2
- pexpect==4.2.1
- pickleshare==0.7.4
- pillow==3.4.2
- prompt-toolkit==1.0.8
- protobuf==3.1.0.post1
- ptyprocess==0.5.1
- pygments==2.1.3
- pyparsing==2.1.10
- python-dateutil==2.5.3
- pytz==2016.7
- pyzmq==16.0.0
- qtconsole==4.2.1
- scikit-learn==0.18
- scipy==0.18.1
- simplegeneric==0.8.1
- six==1.10.0
- sklearn==0.0
- tensorflow>=0.12.1
- terminado==0.6
- tornado==4.4.2
- tqdm==4.8.4
- traitlets==4.3.1
- wcwidth==0.1.7
- widgetsnbextension==1.2.6

@ -0,0 +1,80 @@
name: dlnd-tf-lab
channels: !!python/tuple
- defaults
dependencies:
- bleach=1.5.0=py35_0
- bzip2=1.0.6=vc14_3
- colorama=0.3.7=py35_0
- cycler=0.10.0=py35_0
- decorator=4.0.11=py35_0
- entrypoints=0.2.2=py35_1
- freetype=2.5.5=vc14_2
- html5lib=0.999=py35_0
- icu=57.1=vc14_0
- ipykernel=4.5.2=py35_0
- ipython=5.2.2=py35_0
- ipython_genutils=0.1.0=py35_0
- ipywidgets=5.2.2=py35_1
- jinja2=2.9.4=py35_0
- jpeg=9b=vc14_0
- jsonschema=2.5.1=py35_0
- jupyter=1.0.0=py35_3
- jupyter_client=4.4.0=py35_0
- jupyter_console=5.0.0=py35_0
- jupyter_core=4.3.0=py35_0
- libpng=1.6.27=vc14_0
- libtiff=4.0.6=vc14_3
- markupsafe=0.23=py35_2
- matplotlib=2.0.0=np112py35_0
- mistune=0.7.3=py35_0
- mkl=2017.0.1=0
- nbconvert=5.1.1=py35_0
- nbformat=4.2.0=py35_0
- notebook=4.3.1=py35_1
- numpy=1.12.0=py35_0
- olefile=0.44=py35_0
- openssl=1.0.2k=vc14_0
- pandas=0.19.2=np112py35_1
- pandocfilters=1.4.1=py35_0
- path.py=10.1=py35_0
- pickleshare=0.7.4=py35_0
- pillow=4.0.0=py35_1
- pip=9.0.1=py35_1
- prompt_toolkit=1.0.9=py35_0
- pygments=2.1.3=py35_0
- pyparsing=2.1.4=py35_0
- pyqt=5.6.0=py35_2
- python=3.5.2=0
- python-dateutil=2.6.0=py35_0
- pytz=2016.10=py35_0
- pyzmq=16.0.2=py35_0
- qt=5.6.2=vc14_3
- qtconsole=4.2.1=py35_2
- scikit-learn=0.18.1=np112py35_1
- scipy=0.18.1=np112py35_1
- setuptools=27.2.0=py35_1
- simplegeneric=0.8.1=py35_1
- sip=4.18=py35_0
- six=1.10.0=py35_0
- testpath=0.3=py35_0
- tk=8.5.18=vc14_0
- tornado=4.4.2=py35_0
- traitlets=4.3.1=py35_0
- vs2015_runtime=14.0.25123=0
- wcwidth=0.1.7=py35_0
- wheel=0.29.0=py35_0
- widgetsnbextension=1.2.6=py35_0
- win_unicode_console=0.5=py35_0
- zlib=1.2.8=vc14_3
- pip:
- ipython-genutils==0.1.0
- jupyter-client==4.4.0
- jupyter-console==5.0.0
- jupyter-core==4.3.0
- prompt-toolkit==1.0.9
- protobuf==3.2.0
- tensorflow==1.0.0
- tqdm==4.11.2
- win-unicode-console==0.5
prefix: C:\Users\Mat\Anaconda3\envs\dlnd-tf-lab

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 63 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

File diff suppressed because one or more lines are too long

@ -0,0 +1,112 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# Solutions\n",
"## Problem 1\n",
"Implement the Min-Max scaling function ($X'=a+{\\frac {\\left(X-X_{\\min }\\right)\\left(b-a\\right)}{X_{\\max }-X_{\\min }}}$) with the parameters:\n",
"\n",
"$X_{\\min }=0$\n",
"\n",
"$X_{\\max }=255$\n",
"\n",
"$a=0.1$\n",
"\n",
"$b=0.9$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Problem 1 - Implement Min-Max scaling for grayscale image data\n",
"def normalize_grayscale(image_data):\n",
" \"\"\"\n",
" Normalize the image data with Min-Max scaling to a range of [0.1, 0.9]\n",
" :param image_data: The image data to be normalized\n",
" :return: Normalized image data\n",
" \"\"\"\n",
" a = 0.1\n",
" b = 0.9\n",
" grayscale_min = 0\n",
" grayscale_max = 255\n",
" return a + ( ( (image_data - grayscale_min)*(b - a) )/( grayscale_max - grayscale_min ) )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Problem 2\n",
"- Use [tf.placeholder()](https://www.tensorflow.org/api_docs/python/io_ops.html#placeholder) for `features` and `labels` since they are the inputs to the model.\n",
"- Any math operations must have the same type on both sides of the operator. The weights are float32, so the `features` and `labels` must also be float32.\n",
"- Use [tf.Variable()](https://www.tensorflow.org/api_docs/python/state_ops.html#Variable) to allow `weights` and `biases` to be modified.\n",
"- The `weights` must be the dimensions of features by labels. The number of features is the size of the image, 28*28=784. The size of labels is 10.\n",
"- The `biases` must be the dimensions of the labels, which is 10."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"features_count = 784\n",
"labels_count = 10\n",
"\n",
"# Problem 2 - Set the features and labels tensors\n",
"features = tf.placeholder(tf.float32)\n",
"labels = tf.placeholder(tf.float32)\n",
"\n",
"# Problem 2 - Set the weights and biases tensors\n",
"weights = tf.Variable(tf.truncated_normal((features_count, labels_count)))\n",
"biases = tf.Variable(tf.zeros(labels_count))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Problem 3\n",
"Configuration 1\n",
"* **Epochs:** 1\n",
"* **Learning Rate:** 0.1\n",
"\n",
"Configuration 2\n",
"* **Epochs:** 4 or 5\n",
"* **Learning Rate:** 0.2"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4a205367d7c771405d2334bf0cb2ed4dfb651aa1273dd60454d825091e83a0a5
size 508160488

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a9c56d4c0daf330a47778058319132b0f12b50cc592fc354fd6e77be636fc744
size 6125664

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8000f214a988720bb02f2d0f21eec838ccf2cb36ee644979597b0ea44a8bf5c3
size 132817335

File diff suppressed because one or more lines are too long

@ -0,0 +1,269 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Handwritten Number Recognition with TFLearn and MNIST\n",
"\n",
"In this notebook, we'll be building a neural network that recognizes handwritten numbers 0-9. \n",
"\n",
"This kind of neural network is used in a variety of real-world applications including: recognizing phone numbers and sorting postal mail by address. To build the network, we'll be using the **MNIST** data set, which consists of images of handwritten numbers and their correct labels 0-9.\n",
"\n",
"We'll be using [TFLearn](http://tflearn.org/), a high-level library built on top of TensorFlow to build the neural network. We'll start off by importing all the modules we'll need, then load the data, and finally build the network."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Import Numpy, TensorFlow, TFLearn, and MNIST data\n",
"import numpy as np\n",
"import tensorflow as tf\n",
"import tflearn\n",
"import tflearn.datasets.mnist as mnist"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Retrieving training and test data\n",
"\n",
"The MNIST data set already contains both training and test data. There are 55,000 data points of training data, and 10,000 points of test data.\n",
"\n",
"Each MNIST data point has:\n",
"1. an image of a handwritten digit and \n",
"2. a corresponding label (a number 0-9 that identifies the image)\n",
"\n",
"We'll call the images, which will be the input to our neural network, **X** and their corresponding labels **Y**.\n",
"\n",
"We're going to want our labels as *one-hot vectors*, which are vectors that holds mostly 0's and one 1. It's easiest to see this in a example. As a one-hot vector, the number 0 is represented as [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], and 4 is represented as [0, 0, 0, 0, 1, 0, 0, 0, 0, 0].\n",
"\n",
"### Flattened data\n",
"\n",
"For this example, we'll be using *flattened* data or a representation of MNIST images in one dimension rather than two. So, each handwritten number image, which is 28x28 pixels, will be represented as a one dimensional array of 784 pixel values. \n",
"\n",
"Flattening the data throws away information about the 2D structure of the image, but it simplifies our data so that all of the training data can be contained in one array whose shape is [55000, 784]; the first dimension is the number of training images and the second dimension is the number of pixels in each image. This is the kind of data that is easy to analyze using a simple neural network."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Retrieve the training and test data\n",
"trainX, trainY, testX, testY = mnist.load_data(one_hot=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualize the training data\n",
"\n",
"Provided below is a function that will help you visualize the MNIST data. By passing in the index of a training example, the function `show_digit` will display that training image along with it's corresponding label in the title."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Visualizing the data\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"# Function for displaying a training image by it's index in the MNIST set\n",
"def show_digit(index):\n",
" label = trainY[index].argmax(axis=0)\n",
" # Reshape 784 array into 28x28 image\n",
" image = trainX[index].reshape([28,28])\n",
" plt.title('Training data, index: %d, Label: %d' % (index, label))\n",
" plt.imshow(image, cmap='gray_r')\n",
" plt.show()\n",
" \n",
"# Display the first (index 0) training image\n",
"show_digit(0)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## Building the network\n",
"\n",
"TFLearn lets you build the network by defining the layers in that network. \n",
"\n",
"For this example, you'll define:\n",
"\n",
"1. The input layer, which tells the network the number of inputs it should expect for each piece of MNIST data. \n",
"2. Hidden layers, which recognize patterns in data and connect the input to the output layer, and\n",
"3. The output layer, which defines how the network learns and outputs a label for a given image.\n",
"\n",
"Let's start with the input layer; to define the input layer, you'll define the type of data that the network expects. For example,\n",
"\n",
"```\n",
"net = tflearn.input_data([None, 100])\n",
"```\n",
"\n",
"would create a network with 100 inputs. The number of inputs to your network needs to match the size of your data. For this example, we're using 784 element long vectors to encode our input data, so we need **784 input units**.\n",
"\n",
"\n",
"### Adding layers\n",
"\n",
"To add new hidden layers, you use \n",
"\n",
"```\n",
"net = tflearn.fully_connected(net, n_units, activation='ReLU')\n",
"```\n",
"\n",
"This adds a fully connected layer where every unit (or node) in the previous layer is connected to every unit in this layer. The first argument `net` is the network you created in the `tflearn.input_data` call, it designates the input to the hidden layer. You can set the number of units in the layer with `n_units`, and set the activation function with the `activation` keyword. You can keep adding layers to your network by repeated calling `tflearn.fully_connected(net, n_units)`. \n",
"\n",
"Then, to set how you train the network, use:\n",
"\n",
"```\n",
"net = tflearn.regression(net, optimizer='sgd', learning_rate=0.1, loss='categorical_crossentropy')\n",
"```\n",
"\n",
"Again, this is passing in the network you've been building. The keywords: \n",
"\n",
"* `optimizer` sets the training method, here stochastic gradient descent\n",
"* `learning_rate` is the learning rate\n",
"* `loss` determines how the network error is calculated. In this example, with categorical cross-entropy.\n",
"\n",
"Finally, you put all this together to create the model with `tflearn.DNN(net)`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Exercise:** Below in the `build_model()` function, you'll put together the network using TFLearn. You get to choose how many layers to use, how many hidden units, etc.\n",
"\n",
"**Hint:** The final output layer must have 10 output nodes (one for each digit 0-9). It's also recommended to use a `softmax` activation layer as your final output layer. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Define the neural network\n",
"def build_model():\n",
" # This resets all parameters and variables, leave this here\n",
" tf.reset_default_graph()\n",
" \n",
" #### Your code ####\n",
" # Include the input layer, hidden layer(s), and set how you want to train the model\n",
" \n",
" # This model assumes that your network is named \"net\" \n",
" model = tflearn.DNN(net)\n",
" return model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Build the model\n",
"model = build_model()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training the network\n",
"\n",
"Now that we've constructed the network, saved as the variable `model`, we can fit it to the data. Here we use the `model.fit` method. You pass in the training features `trainX` and the training targets `trainY`. Below I set `validation_set=0.1` which reserves 10% of the data set as the validation set. You can also set the batch size and number of epochs with the `batch_size` and `n_epoch` keywords, respectively. \n",
"\n",
"Too few epochs don't effectively train your network, and too many take a long time to execute. Choose wisely!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Training\n",
"model.fit(trainX, trainY, validation_set=0.1, show_metric=True, batch_size=100, n_epoch=20)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Testing\n",
"After you're satisified with the training output and accuracy, you can then run the network on the **test data set** to measure it's performance! Remember, only do this after you've done the training and are satisfied with the results.\n",
"\n",
"A good result will be **higher than 95% accuracy**. Some simple models have been known to get up to 99.7% accuracy!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Compare the labels that our model predicts with the actual labels\n",
"\n",
"# Find the indices of the most confident prediction for each item. That tells us the predicted digit for that sample.\n",
"predictions = np.array(model.predict(testX)).argmax(axis=1)\n",
"\n",
"# Calculate the accuracy, which is the percentage of times the predicated labels matched the actual labels\n",
"actual = testY.argmax(axis=1)\n",
"test_accuracy = np.mean(predictions == actual, axis=0)\n",
"\n",
"# Print out the result\n",
"print(\"Test accuracy: \", test_accuracy)"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -0,0 +1,629 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sentiment analysis with TFLearn\n",
"\n",
"In this notebook, we'll continue Andrew Trask's work by building a network for sentiment analysis on the movie review data. Instead of a network written with Numpy, we'll be using [TFLearn](http://tflearn.org/), a high-level library built on top of TensorFlow. TFLearn makes it simpler to build networks just by defining the layers. It takes care of most of the details for you.\n",
"\n",
"We'll start off by importing all the modules we'll need, then load and prepare the data."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import tensorflow as tf\n",
"import tflearn\n",
"from tflearn.data_utils import to_categorical"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preparing the data\n",
"\n",
"Following along with Andrew, our goal here is to convert our reviews into word vectors. The word vectors will have elements representing words in the total vocabulary. If the second position represents the word 'the', for each review we'll count up the number of times 'the' appears in the text and set the second position to that count. I'll show you examples as we build the input data from the reviews data. Check out Andrew's notebook and video for more about this."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Read the data\n",
"\n",
"Use the pandas library to read the reviews and postive/negative labels from comma-separated files. The data we're using has already been preprocessed a bit and we know it uses only lower case characters. If we were working from raw data, where we didn't know it was all lower case, we would want to add a step here to convert it. That's so we treat different variations of the same word, like `The`, `the`, and `THE`, all the same way."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"reviews = pd.read_csv('reviews.txt', header=None)\n",
"labels = pd.read_csv('labels.txt', header=None)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Counting word frequency\n",
"\n",
"To start off we'll need to count how often each word appears in the data. We'll use this count to create a vocabulary we'll use to encode the review data. This resulting count is known as a [bag of words](https://en.wikipedia.org/wiki/Bag-of-words_model). We'll use it to select our vocabulary and build the word vectors. You should have seen how to do this in Andrew's lesson. Try to implement it here using the [Counter class](https://docs.python.org/2/library/collections.html#collections.Counter).\n",
"\n",
"> **Exercise:** Create the bag of words from the reviews data and assign it to `total_counts`. The reviews are stores in the `reviews` [Pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html). If you want the reviews as a Numpy array, use `reviews.values`. You can iterate through the rows in the DataFrame with `for idx, row in reviews.iterrows():` ([documentation](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iterrows.html)). When you break up the reviews into words, use `.split(' ')` instead of `.split()` so your results match ours."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total words in data set: 74074\n"
]
}
],
"source": [
"from collections import Counter\n",
"total_counts = Counter()\n",
"for _, row in reviews.iterrows():\n",
" total_counts.update(row[0].split(' '))\n",
"print(\"Total words in data set: \", len(total_counts))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's keep the first 10000 most frequent words. As Andrew noted, most of the words in the vocabulary are rarely used so they will have little effect on our predictions. Below, we'll sort `vocab` by the count value and keep the 10000 most frequent words."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['', 'the', '.', 'and', 'a', 'of', 'to', 'is', 'br', 'it', 'in', 'i', 'this', 'that', 's', 'was', 'as', 'for', 'with', 'movie', 'but', 'film', 'you', 'on', 't', 'not', 'he', 'are', 'his', 'have', 'be', 'one', 'all', 'at', 'they', 'by', 'an', 'who', 'so', 'from', 'like', 'there', 'her', 'or', 'just', 'about', 'out', 'if', 'has', 'what', 'some', 'good', 'can', 'more', 'she', 'when', 'very', 'up', 'time', 'no']\n"
]
}
],
"source": [
"vocab = sorted(total_counts, key=total_counts.get, reverse=True)[:10000]\n",
"print(vocab[:60])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What's the last word in our vocabulary? We can use this to judge if 10000 is too few. If the last word is pretty common, we probably need to keep more words."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"float : 30\n"
]
}
],
"source": [
"print(vocab[-1], ': ', total_counts[vocab[-1]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The last word in our vocabulary shows up in 30 reviews out of 25000. I think it's fair to say this is a tiny proportion of reviews. We are probably fine with this number of words.\n",
"\n",
"**Note:** When you run, you may see a different word from the one shown above, but it will also have the value `30`. That's because there are many words tied for that number of counts, and the `Counter` class does not guarantee which one will be returned in the case of a tie.\n",
"\n",
"Now for each review in the data, we'll make a word vector. First we need to make a mapping of word to index, pretty easy to do with a dictionary comprehension.\n",
"\n",
"> **Exercise:** Create a dictionary called `word2idx` that maps each word in the vocabulary to an index. The first word in `vocab` has index `0`, the second word has index `1`, and so on."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"word2idx = {word: i for i, word in enumerate(vocab)}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Text to vector function\n",
"\n",
"Now we can write a function that converts a some text to a word vector. The function will take a string of words as input and return a vector with the words counted up. Here's the general algorithm to do this:\n",
"\n",
"* Initialize the word vector with [np.zeros](https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html), it should be the length of the vocabulary.\n",
"* Split the input string of text into a list of words with `.split(' ')`. Again, if you call `.split()` instead, you'll get slightly different results than what we show here.\n",
"* For each word in that list, increment the element in the index associated with that word, which you get from `word2idx`.\n",
"\n",
"**Note:** Since all words aren't in the `vocab` dictionary, you'll get a key error if you run into one of those words. You can use the `.get` method of the `word2idx` dictionary to specify a default returned value when you make a key error. For example, `word2idx.get(word, None)` returns `None` if `word` doesn't exist in the dictionary."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def text_to_vector(text):\n",
" word_vector = np.zeros(len(vocab), dtype=np.int_)\n",
" for word in text.split(' '):\n",
" idx = word2idx.get(word, None)\n",
" if idx is None:\n",
" continue\n",
" else:\n",
" word_vector[idx] += 1\n",
" return np.array(word_vector)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you do this right, the following code should return\n",
"\n",
"```\n",
"text_to_vector('The tea is for a party to celebrate '\n",
" 'the movie so she has no time for a cake')[:65]\n",
" \n",
"array([0, 1, 0, 0, 2, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0])\n",
"``` "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 1, 0, 0, 2, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"text_to_vector('The tea is for a party to celebrate '\n",
" 'the movie so she has no time for a cake')[:65]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, run through our entire review data set and convert each review to a word vector."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"word_vectors = np.zeros((len(reviews), len(vocab)), dtype=np.int_)\n",
"for ii, (_, text) in enumerate(reviews.iterrows()):\n",
" word_vectors[ii] = text_to_vector(text[0])"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 18, 9, 27, 1, 4, 4, 6, 4, 0, 2, 2, 5, 0,\n",
" 4, 1, 0, 2, 0, 0, 0, 0, 0, 0],\n",
" [ 5, 4, 8, 1, 7, 3, 1, 2, 0, 4, 0, 0, 0,\n",
" 1, 2, 0, 0, 1, 3, 0, 0, 0, 1],\n",
" [ 78, 24, 12, 4, 17, 5, 20, 2, 8, 8, 2, 1, 1,\n",
" 2, 8, 0, 5, 5, 4, 0, 2, 1, 4],\n",
" [167, 53, 23, 0, 22, 23, 13, 14, 8, 10, 8, 12, 9,\n",
" 4, 11, 2, 11, 5, 11, 0, 5, 3, 0],\n",
" [ 19, 10, 11, 4, 6, 2, 2, 5, 0, 1, 2, 3, 1,\n",
" 0, 0, 0, 3, 1, 0, 1, 0, 0, 0]])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Printing out the first 5 word vectors\n",
"word_vectors[:5, :23]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Train, Validation, Test sets\n",
"\n",
"Now that we have the word_vectors, we're ready to split our data into train, validation, and test sets. Remember that we train on the train data, use the validation data to set the hyperparameters, and at the very end measure the network performance on the test data. Here we're using the function `to_categorical` from TFLearn to reshape the target data so that we'll have two output units and can classify with a softmax activation function. We actually won't be creating the validation set here, TFLearn will do that for us later."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"Y = (labels=='positive').astype(np.int_)\n",
"records = len(labels)\n",
"\n",
"shuffle = np.arange(records)\n",
"np.random.shuffle(shuffle)\n",
"test_fraction = 0.9\n",
"\n",
"train_split, test_split = shuffle[:int(records*test_fraction)], shuffle[int(records*test_fraction):]\n",
"trainX, trainY = word_vectors[train_split,:], to_categorical(Y.values[train_split], 2)\n",
"testX, testY = word_vectors[test_split,:], to_categorical(Y.values[test_split], 2)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0., 1.],\n",
" [ 1., 0.],\n",
" [ 1., 0.],\n",
" ..., \n",
" [ 1., 0.],\n",
" [ 1., 0.],\n",
" [ 0., 1.]])"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"trainY"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building the network\n",
"\n",
"[TFLearn](http://tflearn.org/) lets you build the network by [defining the layers](http://tflearn.org/layers/core/). \n",
"\n",
"### Input layer\n",
"\n",
"For the input layer, you just need to tell it how many units you have. For example, \n",
"\n",
"```\n",
"net = tflearn.input_data([None, 100])\n",
"```\n",
"\n",
"would create a network with 100 input units. The first element in the list, `None` in this case, sets the batch size. Setting it to `None` here leaves it at the default batch size.\n",
"\n",
"The number of inputs to your network needs to match the size of your data. For this example, we're using 10000 element long vectors to encode our input data, so we need 10000 input units.\n",
"\n",
"\n",
"### Adding layers\n",
"\n",
"To add new hidden layers, you use \n",
"\n",
"```\n",
"net = tflearn.fully_connected(net, n_units, activation='ReLU')\n",
"```\n",
"\n",
"This adds a fully connected layer where every unit in the previous layer is connected to every unit in this layer. The first argument `net` is the network you created in the `tflearn.input_data` call. It's telling the network to use the output of the previous layer as the input to this layer. You can set the number of units in the layer with `n_units`, and set the activation function with the `activation` keyword. You can keep adding layers to your network by repeated calling `net = tflearn.fully_connected(net, n_units)`.\n",
"\n",
"### Output layer\n",
"\n",
"The last layer you add is used as the output layer. There for, you need to set the number of units to match the target data. In this case we are predicting two classes, positive or negative sentiment. You also need to set the activation function so it's appropriate for your model. Again, we're trying to predict if some input data belongs to one of two classes, so we should use softmax.\n",
"\n",
"```\n",
"net = tflearn.fully_connected(net, 2, activation='softmax')\n",
"```\n",
"\n",
"### Training\n",
"To set how you train the network, use \n",
"\n",
"```\n",
"net = tflearn.regression(net, optimizer='sgd', learning_rate=0.1, loss='categorical_crossentropy')\n",
"```\n",
"\n",
"Again, this is passing in the network you've been building. The keywords: \n",
"\n",
"* `optimizer` sets the training method, here stochastic gradient descent\n",
"* `learning_rate` is the learning rate\n",
"* `loss` determines how the network error is calculated. In this example, with the categorical cross-entropy.\n",
"\n",
"Finally you put all this together to create the model with `tflearn.DNN(net)`. So it ends up looking something like \n",
"\n",
"```\n",
"net = tflearn.input_data([None, 10]) # Input\n",
"net = tflearn.fully_connected(net, 5, activation='ReLU') # Hidden\n",
"net = tflearn.fully_connected(net, 2, activation='softmax') # Output\n",
"net = tflearn.regression(net, optimizer='sgd', learning_rate=0.1, loss='categorical_crossentropy')\n",
"model = tflearn.DNN(net)\n",
"```\n",
"\n",
"> **Exercise:** Below in the `build_model()` function, you'll put together the network using TFLearn. You get to choose how many layers to use, how many hidden units, etc."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Network building\n",
"def build_model():\n",
" # This resets all parameters and variables, leave this here\n",
" tf.reset_default_graph()\n",
" \n",
" # Inputs\n",
" net = tflearn.input_data([None, 10000])\n",
"\n",
" # Hidden layer(s)\n",
" net = tflearn.fully_connected(net, 200, activation='ReLU')\n",
" net = tflearn.fully_connected(net, 25, activation='ReLU')\n",
"\n",
" # Output layer\n",
" net = tflearn.fully_connected(net, 2, activation='softmax')\n",
" net = tflearn.regression(net, optimizer='sgd', \n",
" learning_rate=0.1, \n",
" loss='categorical_crossentropy')\n",
" \n",
" model = tflearn.DNN(net)\n",
" return model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Intializing the model\n",
"\n",
"Next we need to call the `build_model()` function to actually build the model. In my solution I haven't included any arguments to the function, but you can add arguments so you can change parameters in the model if you want.\n",
"\n",
"> **Note:** You might get a bunch of warnings here. TFLearn uses a lot of deprecated code in TensorFlow. Hopefully it gets updated to the new TensorFlow version soon."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING:tensorflow:From //anaconda3/envs/dl/lib/python3.5/site-packages/tflearn/summaries.py:46 in get_summary.: scalar_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.\n",
"Instructions for updating:\n",
"Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.\n",
"WARNING:tensorflow:From //anaconda3/envs/dl/lib/python3.5/site-packages/tflearn/summaries.py:46 in get_summary.: scalar_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.\n",
"Instructions for updating:\n",
"Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.\n",
"WARNING:tensorflow:From //anaconda3/envs/dl/lib/python3.5/site-packages/tflearn/helpers/trainer.py:766 in create_summaries.: merge_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.\n",
"Instructions for updating:\n",
"Please switch to tf.summary.merge.\n",
"WARNING:tensorflow:VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.\n",
"WARNING:tensorflow:From //anaconda3/envs/dl/lib/python3.5/site-packages/tflearn/helpers/trainer.py:130 in __init__.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.\n",
"Instructions for updating:\n",
"Use `tf.global_variables_initializer` instead.\n"
]
}
],
"source": [
"model = build_model()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training the network\n",
"\n",
"Now that we've constructed the network, saved as the variable `model`, we can fit it to the data. Here we use the `model.fit` method. You pass in the training features `trainX` and the training targets `trainY`. Below I set `validation_set=0.1` which reserves 10% of the data set as the validation set. You can also set the batch size and number of epochs with the `batch_size` and `n_epoch` keywords, respectively. Below is the code to fit our the network to our word vectors.\n",
"\n",
"You can rerun `model.fit` to train the network further if you think you can increase the validation accuracy. Remember, all hyperparameter adjustments must be done using the validation set. **Only use the test set after you're completely done training the network.**"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training Step: 15900 | total loss: \u001b[1m\u001b[32m0.28905\u001b[0m\u001b[0m\n",
"| SGD | epoch: 100 | loss: 0.28905 - acc: 0.8768 | val_loss: 0.45618 - val_acc: 0.8351 -- iter: 20250/20250\n",
"Training Step: 15900 | total loss: \u001b[1m\u001b[32m0.28905\u001b[0m\u001b[0m\n",
"| SGD | epoch: 100 | loss: 0.28905 - acc: 0.8768 | val_loss: 0.45618 - val_acc: 0.8351 -- iter: 20250/20250\n",
"--\n"
]
}
],
"source": [
"# Training\n",
"model.fit(trainX, trainY, validation_set=0.1, show_metric=True, batch_size=128, n_epoch=100)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Testing\n",
"\n",
"After you're satisified with your hyperparameters, you can run the network on the test set to measure it's performance. Remember, *only do this after finalizing the hyperparameters*."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Test accuracy: 0.8504\n"
]
}
],
"source": [
"predictions = (np.array(model.predict(testX))[:,0] >= 0.5).astype(np.int_)\n",
"test_accuracy = np.mean(predictions == testY[:,0], axis=0)\n",
"print(\"Test accuracy: \", test_accuracy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Try out your own text!"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Helper function that uses your model to predict sentiment\n",
"def test_sentence(sentence):\n",
" positive_prob = model.predict([text_to_vector(sentence.lower())])[0][1]\n",
" print('Sentence: {}'.format(sentence))\n",
" print('P(positive) = {:.3f} :'.format(positive_prob), \n",
" 'Positive' if positive_prob > 0.5 else 'Negative')"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sentence: Moonlight is by far the best movie of 2016.\n",
"P(positive) = 0.932 : Positive\n",
"Sentence: It's amazing anyone could be talented enough to make something this spectacularly awful\n",
"P(positive) = 0.002 : Negative\n"
]
}
],
"source": [
"sentence = \"Moonlight is by far the best movie of 2016.\"\n",
"test_sentence(sentence)\n",
"\n",
"sentence = \"It's amazing anyone could be talented enough to make something this spectacularly awful\"\n",
"test_sentence(sentence)"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -0,0 +1,487 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sentiment analysis with TFLearn\n",
"\n",
"In this notebook, we'll continue Andrew Trask's work by building a network for sentiment analysis on the movie review data. Instead of a network written with Numpy, we'll be using [TFLearn](http://tflearn.org/), a high-level library built on top of TensorFlow. TFLearn makes it simpler to build networks just by defining the layers. It takes care of most of the details for you.\n",
"\n",
"We'll start off by importing all the modules we'll need, then load and prepare the data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import tensorflow as tf\n",
"import tflearn\n",
"from tflearn.data_utils import to_categorical"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preparing the data\n",
"\n",
"Following along with Andrew, our goal here is to convert our reviews into word vectors. The word vectors will have elements representing words in the total vocabulary. If the second position represents the word 'the', for each review we'll count up the number of times 'the' appears in the text and set the second position to that count. I'll show you examples as we build the input data from the reviews data. Check out Andrew's notebook and video for more about this."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Read the data\n",
"\n",
"Use the pandas library to read the reviews and postive/negative labels from comma-separated files. The data we're using has already been preprocessed a bit and we know it uses only lower case characters. If we were working from raw data, where we didn't know it was all lower case, we would want to add a step here to convert it. That's so we treat different variations of the same word, like `The`, `the`, and `THE`, all the same way."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"reviews = pd.read_csv('reviews.txt', header=None)\n",
"labels = pd.read_csv('labels.txt', header=None)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Counting word frequency\n",
"\n",
"To start off we'll need to count how often each word appears in the data. We'll use this count to create a vocabulary we'll use to encode the review data. This resulting count is known as a [bag of words](https://en.wikipedia.org/wiki/Bag-of-words_model). We'll use it to select our vocabulary and build the word vectors. You should have seen how to do this in Andrew's lesson. Try to implement it here using the [Counter class](https://docs.python.org/2/library/collections.html#collections.Counter).\n",
"\n",
"> **Exercise:** Create the bag of words from the reviews data and assign it to `total_counts`. The reviews are stores in the `reviews` [Pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html). If you want the reviews as a Numpy array, use `reviews.values`. You can iterate through the rows in the DataFrame with `for idx, row in reviews.iterrows():` ([documentation](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iterrows.html)). When you break up the reviews into words, use `.split(' ')` instead of `.split()` so your results match ours."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from collections import Counter\n",
"\n",
"total_counts = # bag of words here\n",
"\n",
"print(\"Total words in data set: \", len(total_counts))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's keep the first 10000 most frequent words. As Andrew noted, most of the words in the vocabulary are rarely used so they will have little effect on our predictions. Below, we'll sort `vocab` by the count value and keep the 10000 most frequent words."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"vocab = sorted(total_counts, key=total_counts.get, reverse=True)[:10000]\n",
"print(vocab[:60])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What's the last word in our vocabulary? We can use this to judge if 10000 is too few. If the last word is pretty common, we probably need to keep more words."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(vocab[-1], ': ', total_counts[vocab[-1]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The last word in our vocabulary shows up in 30 reviews out of 25000. I think it's fair to say this is a tiny proportion of reviews. We are probably fine with this number of words.\n",
"\n",
"**Note:** When you run, you may see a different word from the one shown above, but it will also have the value `30`. That's because there are many words tied for that number of counts, and the `Counter` class does not guarantee which one will be returned in the case of a tie.\n",
"\n",
"Now for each review in the data, we'll make a word vector. First we need to make a mapping of word to index, pretty easy to do with a dictionary comprehension.\n",
"\n",
"> **Exercise:** Create a dictionary called `word2idx` that maps each word in the vocabulary to an index. The first word in `vocab` has index `0`, the second word has index `1`, and so on."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"word2idx = ## create the word-to-index dictionary here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Text to vector function\n",
"\n",
"Now we can write a function that converts a some text to a word vector. The function will take a string of words as input and return a vector with the words counted up. Here's the general algorithm to do this:\n",
"\n",
"* Initialize the word vector with [np.zeros](https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html), it should be the length of the vocabulary.\n",
"* Split the input string of text into a list of words with `.split(' ')`. Again, if you call `.split()` instead, you'll get slightly different results than what we show here.\n",
"* For each word in that list, increment the element in the index associated with that word, which you get from `word2idx`.\n",
"\n",
"**Note:** Since all words aren't in the `vocab` dictionary, you'll get a key error if you run into one of those words. You can use the `.get` method of the `word2idx` dictionary to specify a default returned value when you make a key error. For example, `word2idx.get(word, None)` returns `None` if `word` doesn't exist in the dictionary."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def text_to_vector(text):\n",
" \n",
" pass"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you do this right, the following code should return\n",
"\n",
"```\n",
"text_to_vector('The tea is for a party to celebrate '\n",
" 'the movie so she has no time for a cake')[:65]\n",
" \n",
"array([0, 1, 0, 0, 2, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0])\n",
"``` "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"text_to_vector('The tea is for a party to celebrate '\n",
" 'the movie so she has no time for a cake')[:65]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, run through our entire review data set and convert each review to a word vector."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"word_vectors = np.zeros((len(reviews), len(vocab)), dtype=np.int_)\n",
"for ii, (_, text) in enumerate(reviews.iterrows()):\n",
" word_vectors[ii] = text_to_vector(text[0])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Printing out the first 5 word vectors\n",
"word_vectors[:5, :23]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Train, Validation, Test sets\n",
"\n",
"Now that we have the word_vectors, we're ready to split our data into train, validation, and test sets. Remember that we train on the train data, use the validation data to set the hyperparameters, and at the very end measure the network performance on the test data. Here we're using the function `to_categorical` from TFLearn to reshape the target data so that we'll have two output units and can classify with a softmax activation function. We actually won't be creating the validation set here, TFLearn will do that for us later."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"Y = (labels=='positive').astype(np.int_)\n",
"records = len(labels)\n",
"\n",
"shuffle = np.arange(records)\n",
"np.random.shuffle(shuffle)\n",
"test_fraction = 0.9\n",
"\n",
"train_split, test_split = shuffle[:int(records*test_fraction)], shuffle[int(records*test_fraction):]\n",
"trainX, trainY = word_vectors[train_split,:], to_categorical(Y.values[train_split], 2)\n",
"testX, testY = word_vectors[test_split,:], to_categorical(Y.values[test_split], 2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"trainY"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building the network\n",
"\n",
"[TFLearn](http://tflearn.org/) lets you build the network by [defining the layers](http://tflearn.org/layers/core/). \n",
"\n",
"### Input layer\n",
"\n",
"For the input layer, you just need to tell it how many units you have. For example, \n",
"\n",
"```\n",
"net = tflearn.input_data([None, 100])\n",
"```\n",
"\n",
"would create a network with 100 input units. The first element in the list, `None` in this case, sets the batch size. Setting it to `None` here leaves it at the default batch size.\n",
"\n",
"The number of inputs to your network needs to match the size of your data. For this example, we're using 10000 element long vectors to encode our input data, so we need 10000 input units.\n",
"\n",
"\n",
"### Adding layers\n",
"\n",
"To add new hidden layers, you use \n",
"\n",
"```\n",
"net = tflearn.fully_connected(net, n_units, activation='ReLU')\n",
"```\n",
"\n",
"This adds a fully connected layer where every unit in the previous layer is connected to every unit in this layer. The first argument `net` is the network you created in the `tflearn.input_data` call. It's telling the network to use the output of the previous layer as the input to this layer. You can set the number of units in the layer with `n_units`, and set the activation function with the `activation` keyword. You can keep adding layers to your network by repeated calling `net = tflearn.fully_connected(net, n_units)`.\n",
"\n",
"### Output layer\n",
"\n",
"The last layer you add is used as the output layer. Therefore, you need to set the number of units to match the target data. In this case we are predicting two classes, positive or negative sentiment. You also need to set the activation function so it's appropriate for your model. Again, we're trying to predict if some input data belongs to one of two classes, so we should use softmax.\n",
"\n",
"```\n",
"net = tflearn.fully_connected(net, 2, activation='softmax')\n",
"```\n",
"\n",
"### Training\n",
"To set how you train the network, use \n",
"\n",
"```\n",
"net = tflearn.regression(net, optimizer='sgd', learning_rate=0.1, loss='categorical_crossentropy')\n",
"```\n",
"\n",
"Again, this is passing in the network you've been building. The keywords: \n",
"\n",
"* `optimizer` sets the training method, here stochastic gradient descent\n",
"* `learning_rate` is the learning rate\n",
"* `loss` determines how the network error is calculated. In this example, with the categorical cross-entropy.\n",
"\n",
"Finally you put all this together to create the model with `tflearn.DNN(net)`. So it ends up looking something like \n",
"\n",
"```\n",
"net = tflearn.input_data([None, 10]) # Input\n",
"net = tflearn.fully_connected(net, 5, activation='ReLU') # Hidden\n",
"net = tflearn.fully_connected(net, 2, activation='softmax') # Output\n",
"net = tflearn.regression(net, optimizer='sgd', learning_rate=0.1, loss='categorical_crossentropy')\n",
"model = tflearn.DNN(net)\n",
"```\n",
"\n",
"> **Exercise:** Below in the `build_model()` function, you'll put together the network using TFLearn. You get to choose how many layers to use, how many hidden units, etc."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Network building\n",
"def build_model():\n",
" # This resets all parameters and variables, leave this here\n",
" tf.reset_default_graph()\n",
" \n",
" #### Your code ####\n",
" \n",
" model = tflearn.DNN(net)\n",
" return model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Intializing the model\n",
"\n",
"Next we need to call the `build_model()` function to actually build the model. In my solution I haven't included any arguments to the function, but you can add arguments so you can change parameters in the model if you want.\n",
"\n",
"> **Note:** You might get a bunch of warnings here. TFLearn uses a lot of deprecated code in TensorFlow. Hopefully it gets updated to the new TensorFlow version soon."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"model = build_model()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training the network\n",
"\n",
"Now that we've constructed the network, saved as the variable `model`, we can fit it to the data. Here we use the `model.fit` method. You pass in the training features `trainX` and the training targets `trainY`. Below I set `validation_set=0.1` which reserves 10% of the data set as the validation set. You can also set the batch size and number of epochs with the `batch_size` and `n_epoch` keywords, respectively. Below is the code to fit our the network to our word vectors.\n",
"\n",
"You can rerun `model.fit` to train the network further if you think you can increase the validation accuracy. Remember, all hyperparameter adjustments must be done using the validation set. **Only use the test set after you're completely done training the network.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [],
"source": [
"# Training\n",
"model.fit(trainX, trainY, validation_set=0.1, show_metric=True, batch_size=128, n_epoch=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Testing\n",
"\n",
"After you're satisified with your hyperparameters, you can run the network on the test set to measure its performance. Remember, *only do this after finalizing the hyperparameters*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"predictions = (np.array(model.predict(testX))[:,0] >= 0.5).astype(np.int_)\n",
"test_accuracy = np.mean(predictions == testY[:,0], axis=0)\n",
"print(\"Test accuracy: \", test_accuracy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Try out your own text!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Helper function that uses your model to predict sentiment\n",
"def test_sentence(sentence):\n",
" positive_prob = model.predict([text_to_vector(sentence.lower())])[0][1]\n",
" print('Sentence: {}'.format(sentence))\n",
" print('P(positive) = {:.3f} :'.format(positive_prob), \n",
" 'Positive' if positive_prob > 0.5 else 'Negative')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"sentence = \"Moonlight is by far the best movie of 2016.\"\n",
"test_sentence(sentence)\n",
"\n",
"sentence = \"It's amazing anyone could be talented enough to make something this spectacularly awful\"\n",
"test_sentence(sentence)"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:301d18424e95957a000293f5d8393239451d7f54622a8f21613172aebe64ca06
size 225000

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d7c2f21e5c8f5240859910ba5331b9baa88310aa978dbd8747ebd4ff6ebbefa6
size 33678267

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

@ -0,0 +1,88 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sentiment Classification & How To \"Frame Problems\" for a Neural Network\n",
"\n",
"by Andrew Trask\n",
"\n",
"- **Twitter**: @iamtrask\n",
"- **Blog**: http://iamtrask.github.io"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What You Should Already Know\n",
"\n",
"- neural networks, forward and back-propagation\n",
"- stochastic gradient descent\n",
"- mean squared error\n",
"- and train/test splits\n",
"\n",
"### Where to Get Help if You Need it\n",
"- Re-watch previous Udacity Lectures\n",
"- Leverage the recommended Course Reading Material - [Grokking Deep Learning](https://www.manning.com/books/grokking-deep-learning) (40% Off: **traskud17**)\n",
"- Shoot me a tweet @iamtrask\n",
"\n",
"\n",
"### Tutorial Outline:\n",
"\n",
"- Intro: The Importance of \"Framing a Problem\"\n",
"\n",
"\n",
"- Curate a Dataset\n",
"- Developing a \"Predictive Theory\"\n",
"- **PROJECT 1**: Quick Theory Validation\n",
"\n",
"\n",
"- Transforming Text to Numbers\n",
"- **PROJECT 2**: Creating the Input/Output Data\n",
"\n",
"\n",
"- Putting it all together in a Neural Network\n",
"- **PROJECT 3**: Building our Neural Network\n",
"\n",
"\n",
"- Understanding Neural Noise\n",
"- **PROJECT 4**: Making Learning Faster by Reducing Noise\n",
"\n",
"\n",
"- Analyzing Inefficiencies in our Network\n",
"- **PROJECT 5**: Making our Network Train and Run Faster\n",
"\n",
"\n",
"- Further Noise Reduction\n",
"- **PROJECT 6**: Reducing Noise by Strategically Reducing the Vocabulary\n",
"\n",
"\n",
"- Analysis: What's going on in the weights?"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

@ -0,0 +1,236 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sentiment Classification & How To \"Frame Problems\" for a Neural Network\n",
"\n",
"by Andrew Trask\n",
"\n",
"- **Twitter**: @iamtrask\n",
"- **Blog**: http://iamtrask.github.io"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What You Should Already Know\n",
"\n",
"- neural networks, forward and back-propagation\n",
"- stochastic gradient descent\n",
"- mean squared error\n",
"- and train/test splits\n",
"\n",
"### Where to Get Help if You Need it\n",
"- Re-watch previous Udacity Lectures\n",
"- Leverage the recommended Course Reading Material - [Grokking Deep Learning](https://www.manning.com/books/grokking-deep-learning) (40% Off: **traskud17**)\n",
"- Shoot me a tweet @iamtrask\n",
"\n",
"\n",
"### Tutorial Outline:\n",
"\n",
"- Intro: The Importance of \"Framing a Problem\"\n",
"\n",
"\n",
"- Curate a Dataset\n",
"- Developing a \"Predictive Theory\"\n",
"- **PROJECT 1**: Quick Theory Validation\n",
"\n",
"\n",
"- Transforming Text to Numbers\n",
"- **PROJECT 2**: Creating the Input/Output Data\n",
"\n",
"\n",
"- Putting it all together in a Neural Network\n",
"- **PROJECT 3**: Building our Neural Network\n",
"\n",
"\n",
"- Understanding Neural Noise\n",
"- **PROJECT 4**: Making Learning Faster by Reducing Noise\n",
"\n",
"\n",
"- Analyzing Inefficiencies in our Network\n",
"- **PROJECT 5**: Making our Network Train and Run Faster\n",
"\n",
"\n",
"- Further Noise Reduction\n",
"- **PROJECT 6**: Reducing Noise by Strategically Reducing the Vocabulary\n",
"\n",
"\n",
"- Analysis: What's going on in the weights?"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "56bb3cba-260c-4ebe-9ed6-b995b4c72aa3"
}
},
"source": [
"# Lesson: Curate a Dataset"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"nbpresent": {
"id": "eba2b193-0419-431e-8db9-60f34dd3fe83"
}
},
"outputs": [],
"source": [
"def pretty_print_review_and_label(i):\n",
" print(labels[i] + \"\\t:\\t\" + reviews[i][:80] + \"...\")\n",
"\n",
"g = open('reviews.txt','r') # What we know!\n",
"reviews = list(map(lambda x:x[:-1],g.readlines()))\n",
"g.close()\n",
"\n",
"g = open('labels.txt','r') # What we WANT to know!\n",
"labels = list(map(lambda x:x[:-1].upper(),g.readlines()))\n",
"g.close()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"25000"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(reviews)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"nbpresent": {
"id": "bb95574b-21a0-4213-ae50-34363cf4f87f"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life such as teachers . my years in the teaching profession lead me to believe that bromwell high s satire is much closer to reality than is teachers . the scramble to survive financially the insightful students who can see right through their pathetic teachers pomp the pettiness of the whole situation all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn t '"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"reviews[0]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false,
"nbpresent": {
"id": "e0408810-c424-4ed4-afb9-1735e9ddbd0a"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'POSITIVE'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"labels[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lesson: Develop a Predictive Theory"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false,
"nbpresent": {
"id": "e67a709f-234f-4493-bae6-4fb192141ee0"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"labels.txt \t : \t reviews.txt\n",
"\n",
"NEGATIVE\t:\tthis movie is terrible but it has some good effects . ...\n",
"POSITIVE\t:\tadrian pasdar is excellent is this film . he makes a fascinating woman . ...\n",
"NEGATIVE\t:\tcomment this movie is impossible . is terrible very improbable bad interpretat...\n",
"POSITIVE\t:\texcellent episode movie ala pulp fiction . days suicides . it doesnt get more...\n",
"NEGATIVE\t:\tif you haven t seen this it s terrible . it is pure trash . i saw this about ...\n",
"POSITIVE\t:\tthis schiffer guy is a real genius the movie is of excellent quality and both e...\n"
]
}
],
"source": [
"print(\"labels.txt \\t : \\t reviews.txt\\n\")\n",
"pretty_print_review_and_label(2137)\n",
"pretty_print_review_and_label(12816)\n",
"pretty_print_review_and_label(6267)\n",
"pretty_print_review_and_label(21934)\n",
"pretty_print_review_and_label(5297)\n",
"pretty_print_review_and_label(4998)"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:301d18424e95957a000293f5d8393239451d7f54622a8f21613172aebe64ca06
size 225000

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d7c2f21e5c8f5240859910ba5331b9baa88310aa978dbd8747ebd4ff6ebbefa6
size 33678267

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

@ -0,0 +1,847 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Anna KaRNNa\n",
"\n",
"In this notebook, I'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.\n",
"\n",
"This network is based off of Andrej Karpathy's [post on RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) and [implementation in Torch](https://github.com/karpathy/char-rnn). Also, some information [here at r2rt](http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html) and from [Sherjil Ozair](https://github.com/sherjilozair/char-rnn-tensorflow) on GitHub. Below is the general architecture of the character-wise RNN.\n",
"\n",
"<img src=\"assets/charseq.jpeg\" width=\"500\">"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"import time\n",
"from collections import namedtuple\n",
"\n",
"import numpy as np\n",
"import tensorflow as tf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we'll load the text file and convert it into integers for our network to use."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"with open('anna.txt', 'r') as f:\n",
" text=f.read()\n",
"vocab = set(text)\n",
"vocab_to_int = {c: i for i, c in enumerate(vocab)}\n",
"int_to_vocab = dict(enumerate(vocab))\n",
"chars = np.array([vocab_to_int[c] for c in text], dtype=np.int32)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'Chapter 1\\n\\n\\nHappy families are all alike; every unhappy family is unhappy in its own\\nway.\\n\\nEverythin'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"text[:100]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([26, 49, 22, 46, 33, 70, 51, 37, 36, 31, 31, 31, 65, 22, 46, 46, 39,\n",
" 37, 23, 22, 4, 77, 32, 77, 70, 81, 37, 22, 51, 70, 37, 22, 32, 32,\n",
" 37, 22, 32, 77, 0, 70, 64, 37, 70, 16, 70, 51, 39, 37, 20, 74, 49,\n",
" 22, 46, 46, 39, 37, 23, 22, 4, 77, 32, 39, 37, 77, 81, 37, 20, 74,\n",
" 49, 22, 46, 46, 39, 37, 77, 74, 37, 77, 33, 81, 37, 21, 75, 74, 31,\n",
" 75, 22, 39, 13, 31, 31, 24, 16, 70, 51, 39, 33, 49, 77, 74], dtype=int32)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chars[:100]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now I need to split up the data into batches, and into training and validation sets. I should be making a test set here, but I'm not going to worry about that. My test will be if the network can generate new text.\n",
"\n",
"Here I'll make both input and target arrays. The targets are the same as the inputs, except shifted one character over. I'll also drop the last bit of data so that I'll only have completely full batches.\n",
"\n",
"The idea here is to make a 2D matrix where the number of rows is equal to the number of batches. Each row will be one long concatenated string from the character data. We'll split this data into a training set and validation set using the `split_frac` keyword. This will keep 90% of the batches in the training set, the other 10% in the validation set."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def split_data(chars, batch_size, num_steps, split_frac=0.9):\n",
" \"\"\" \n",
" Split character data into training and validation sets, inputs and targets for each set.\n",
" \n",
" Arguments\n",
" ---------\n",
" chars: character array\n",
" batch_size: Size of examples in each of batch\n",
" num_steps: Number of sequence steps to keep in the input and pass to the network\n",
" split_frac: Fraction of batches to keep in the training set\n",
" \n",
" \n",
" Returns train_x, train_y, val_x, val_y\n",
" \"\"\"\n",
" \n",
" \n",
" slice_size = batch_size * num_steps\n",
" n_batches = int(len(chars) / slice_size)\n",
" \n",
" # Drop the last few characters to make only full batches\n",
" x = chars[: n_batches*slice_size]\n",
" y = chars[1: n_batches*slice_size + 1]\n",
" \n",
" # Split the data into batch_size slices, then stack them into a 2D matrix \n",
" x = np.stack(np.split(x, batch_size))\n",
" y = np.stack(np.split(y, batch_size))\n",
" \n",
" # Now x and y are arrays with dimensions batch_size x n_batches*num_steps\n",
" \n",
" # Split into training and validation sets, keep the virst split_frac batches for training\n",
" split_idx = int(n_batches*split_frac)\n",
" train_x, train_y= x[:, :split_idx*num_steps], y[:, :split_idx*num_steps]\n",
" val_x, val_y = x[:, split_idx*num_steps:], y[:, split_idx*num_steps:]\n",
" \n",
" return train_x, train_y, val_x, val_y"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"train_x, train_y, val_x, val_y = split_data(chars, 10, 200)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(10, 178400)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_x.shape"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[26, 49, 22, 46, 33, 70, 51, 37, 36, 31],\n",
" [11, 74, 53, 37, 49, 70, 37, 4, 21, 16],\n",
" [37, 1, 22, 33, 1, 49, 77, 74, 69, 37],\n",
" [21, 33, 49, 70, 51, 37, 75, 21, 20, 32],\n",
" [37, 33, 49, 70, 37, 32, 22, 74, 53, 2],\n",
" [37, 52, 49, 51, 21, 20, 69, 49, 37, 32],\n",
" [33, 37, 33, 21, 31, 53, 21, 13, 31, 31],\n",
" [21, 37, 49, 70, 51, 81, 70, 32, 23, 54],\n",
" [49, 22, 33, 37, 77, 81, 37, 33, 49, 70],\n",
" [70, 51, 81, 70, 32, 23, 37, 22, 74, 53]], dtype=int32)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_x[:,:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I'll write another function to grab batches out of the arrays made by split data. Here each batch will be a sliding window on these arrays with size `batch_size X num_steps`. For example, if we want our network to train on a sequence of 100 characters, `num_steps = 100`. For the next batch, we'll shift this window the next sequence of `num_steps` characters. In this way we can feed batches to the network and the cell states will continue through on each batch."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_batch(arrs, num_steps):\n",
" batch_size, slice_size = arrs[0].shape\n",
" \n",
" n_batches = int(slice_size/num_steps)\n",
" for b in range(n_batches):\n",
" yield [x[:, b*num_steps: (b+1)*num_steps] for x in arrs]"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"def build_rnn(num_classes, batch_size=50, num_steps=50, lstm_size=128, num_layers=2,\n",
" learning_rate=0.001, grad_clip=5, sampling=False):\n",
" \n",
" if sampling == True:\n",
" batch_size, num_steps = 1, 1\n",
"\n",
" tf.reset_default_graph()\n",
" \n",
" # Declare placeholders we'll feed into the graph\n",
" with tf.name_scope('inputs'):\n",
" inputs = tf.placeholder(tf.int32, [batch_size, num_steps], name='inputs')\n",
" x_one_hot = tf.one_hot(inputs, num_classes, name='x_one_hot')\n",
" \n",
" with tf.name_scope('targets'):\n",
" targets = tf.placeholder(tf.int32, [batch_size, num_steps], name='targets')\n",
" y_one_hot = tf.one_hot(targets, num_classes, name='y_one_hot')\n",
" y_reshaped = tf.reshape(y_one_hot, [-1, num_classes])\n",
" \n",
" keep_prob = tf.placeholder(tf.float32, name='keep_prob')\n",
" \n",
" # Build the RNN layers\n",
" with tf.name_scope(\"RNN_layers\"):\n",
" lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)\n",
" drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)\n",
" cell = tf.contrib.rnn.MultiRNNCell([drop] * num_layers)\n",
" \n",
" with tf.name_scope(\"RNN_init_state\"):\n",
" initial_state = cell.zero_state(batch_size, tf.float32)\n",
"\n",
" # Run the data through the RNN layers\n",
" with tf.name_scope(\"RNN_forward\"):\n",
" rnn_inputs = [tf.squeeze(i, squeeze_dims=[1]) for i in tf.split(x_one_hot, num_steps, 1)]\n",
" outputs, state = tf.contrib.rnn.static_rnn(cell, rnn_inputs, initial_state=initial_state)\n",
" \n",
" final_state = state\n",
" \n",
" # Reshape output so it's a bunch of rows, one row for each cell output\n",
" with tf.name_scope('sequence_reshape'):\n",
" seq_output = tf.concat(outputs, axis=1,name='seq_output')\n",
" output = tf.reshape(seq_output, [-1, lstm_size], name='graph_output')\n",
" \n",
" # Now connect the RNN putputs to a softmax layer and calculate the cost\n",
" with tf.name_scope('logits'):\n",
" softmax_w = tf.Variable(tf.truncated_normal((lstm_size, num_classes), stddev=0.1),\n",
" name='softmax_w')\n",
" softmax_b = tf.Variable(tf.zeros(num_classes), name='softmax_b')\n",
" logits = tf.matmul(output, softmax_w) + softmax_b\n",
"\n",
" with tf.name_scope('predictions'):\n",
" preds = tf.nn.softmax(logits, name='predictions')\n",
" \n",
" \n",
" with tf.name_scope('cost'):\n",
" loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped, name='loss')\n",
" cost = tf.reduce_mean(loss, name='cost')\n",
"\n",
" # Optimizer for training, using gradient clipping to control exploding gradients\n",
" with tf.name_scope('train'):\n",
" tvars = tf.trainable_variables()\n",
" grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), grad_clip)\n",
" train_op = tf.train.AdamOptimizer(learning_rate)\n",
" optimizer = train_op.apply_gradients(zip(grads, tvars))\n",
" \n",
" # Export the nodes \n",
" export_nodes = ['inputs', 'targets', 'initial_state', 'final_state',\n",
" 'keep_prob', 'cost', 'preds', 'optimizer']\n",
" Graph = namedtuple('Graph', export_nodes)\n",
" local_dict = locals()\n",
" graph = Graph(*[local_dict[each] for each in export_nodes])\n",
" \n",
" return graph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hyperparameters\n",
"\n",
"Here I'm defining the hyperparameters for the network. The two you probably haven't seen before are `lstm_size` and `num_layers`. These set the number of hidden units in the LSTM layers and the number of LSTM layers, respectively. Of course, making these bigger will improve the network's performance but you'll have to watch out for overfitting. If your validation loss is much larger than the training loss, you're probably overfitting. Decrease the size of the network or decrease the dropout keep probability."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"batch_size = 100\n",
"num_steps = 100\n",
"lstm_size = 512\n",
"num_layers = 2\n",
"learning_rate = 0.001"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Write out the graph for TensorBoard"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"model = build_rnn(len(vocab), \n",
" batch_size=batch_size,\n",
" num_steps=num_steps,\n",
" learning_rate=learning_rate,\n",
" lstm_size=lstm_size,\n",
" num_layers=num_layers)\n",
"\n",
"with tf.Session() as sess:\n",
" \n",
" sess.run(tf.global_variables_initializer())\n",
" file_writer = tf.summary.FileWriter('./logs/3', sess.graph)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training\n",
"\n",
"Time for training which is is pretty straightforward. Here I pass in some data, and get an LSTM state back. Then I pass that state back in to the network so the next batch can continue the state from the previous batch. And every so often (set by `save_every_n`) I calculate the validation loss and save a checkpoint."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"!mkdir -p checkpoints/anna"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/10 Iteration 1/1780 Training loss: 4.4195 1.3313 sec/batch\n",
"Epoch 1/10 Iteration 2/1780 Training loss: 4.3756 0.1287 sec/batch\n",
"Epoch 1/10 Iteration 3/1780 Training loss: 4.2069 0.1276 sec/batch\n",
"Epoch 1/10 Iteration 4/1780 Training loss: 4.5396 0.1185 sec/batch\n",
"Epoch 1/10 Iteration 5/1780 Training loss: 4.4190 0.1206 sec/batch\n",
"Epoch 1/10 Iteration 6/1780 Training loss: 4.3547 0.1233 sec/batch\n",
"Epoch 1/10 Iteration 7/1780 Training loss: 4.2792 0.1188 sec/batch\n",
"Epoch 1/10 Iteration 8/1780 Training loss: 4.2018 0.1170 sec/batch\n",
"Epoch 1/10 Iteration 9/1780 Training loss: 4.1251 0.1187 sec/batch\n",
"Epoch 1/10 Iteration 10/1780 Training loss: 4.0558 0.1174 sec/batch\n",
"Epoch 1/10 Iteration 11/1780 Training loss: 3.9946 0.1190 sec/batch\n",
"Epoch 1/10 Iteration 12/1780 Training loss: 3.9451 0.1193 sec/batch\n",
"Epoch 1/10 Iteration 13/1780 Training loss: 3.9011 0.1210 sec/batch\n",
"Epoch 1/10 Iteration 14/1780 Training loss: 3.8632 0.1185 sec/batch\n",
"Epoch 1/10 Iteration 15/1780 Training loss: 3.8275 0.1199 sec/batch\n",
"Epoch 1/10 Iteration 16/1780 Training loss: 3.7945 0.1211 sec/batch\n",
"Epoch 1/10 Iteration 17/1780 Training loss: 3.7649 0.1215 sec/batch\n",
"Epoch 1/10 Iteration 18/1780 Training loss: 3.7400 0.1214 sec/batch\n",
"Epoch 1/10 Iteration 19/1780 Training loss: 3.7164 0.1247 sec/batch\n",
"Epoch 1/10 Iteration 20/1780 Training loss: 3.6933 0.1212 sec/batch\n",
"Epoch 1/10 Iteration 21/1780 Training loss: 3.6728 0.1203 sec/batch\n",
"Epoch 1/10 Iteration 22/1780 Training loss: 3.6538 0.1207 sec/batch\n",
"Epoch 1/10 Iteration 23/1780 Training loss: 3.6359 0.1200 sec/batch\n",
"Epoch 1/10 Iteration 24/1780 Training loss: 3.6198 0.1229 sec/batch\n",
"Epoch 1/10 Iteration 25/1780 Training loss: 3.6041 0.1204 sec/batch\n",
"Epoch 1/10 Iteration 26/1780 Training loss: 3.5904 0.1202 sec/batch\n",
"Epoch 1/10 Iteration 27/1780 Training loss: 3.5774 0.1189 sec/batch\n",
"Epoch 1/10 Iteration 28/1780 Training loss: 3.5642 0.1214 sec/batch\n",
"Epoch 1/10 Iteration 29/1780 Training loss: 3.5522 0.1231 sec/batch\n",
"Epoch 1/10 Iteration 30/1780 Training loss: 3.5407 0.1199 sec/batch\n",
"Epoch 1/10 Iteration 31/1780 Training loss: 3.5309 0.1180 sec/batch\n",
"Epoch 1/10 Iteration 32/1780 Training loss: 3.5207 0.1179 sec/batch\n",
"Epoch 1/10 Iteration 33/1780 Training loss: 3.5109 0.1224 sec/batch\n",
"Epoch 1/10 Iteration 34/1780 Training loss: 3.5021 0.1206 sec/batch\n",
"Epoch 1/10 Iteration 35/1780 Training loss: 3.4931 0.1241 sec/batch\n",
"Epoch 1/10 Iteration 36/1780 Training loss: 3.4850 0.1169 sec/batch\n",
"Epoch 1/10 Iteration 37/1780 Training loss: 3.4767 0.1204 sec/batch\n",
"Epoch 1/10 Iteration 38/1780 Training loss: 3.4688 0.1202 sec/batch\n",
"Epoch 1/10 Iteration 39/1780 Training loss: 3.4611 0.1213 sec/batch\n"
]
},
{
"ename": "KeyboardInterrupt",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-15-09fa3beeed23>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 33\u001b[0m model.initial_state: new_state}\n\u001b[1;32m 34\u001b[0m batch_loss, new_state, _ = sess.run([model.cost, model.final_state, model.optimizer], \n\u001b[0;32m---> 35\u001b[0;31m feed_dict=feed)\n\u001b[0m\u001b[1;32m 36\u001b[0m \u001b[0mloss\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0mbatch_loss\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 37\u001b[0m \u001b[0mend\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/mat/miniconda3/envs/tf-gpu/lib/python3.5/site-packages/tensorflow/python/client/session.py\u001b[0m in \u001b[0;36mrun\u001b[0;34m(self, fetches, feed_dict, options, run_metadata)\u001b[0m\n\u001b[1;32m 765\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 766\u001b[0m result = self._run(None, fetches, feed_dict, options_ptr,\n\u001b[0;32m--> 767\u001b[0;31m run_metadata_ptr)\n\u001b[0m\u001b[1;32m 768\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mrun_metadata\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 769\u001b[0m \u001b[0mproto_data\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtf_session\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mTF_GetBuffer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrun_metadata_ptr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/mat/miniconda3/envs/tf-gpu/lib/python3.5/site-packages/tensorflow/python/client/session.py\u001b[0m in \u001b[0;36m_run\u001b[0;34m(self, handle, fetches, feed_dict, options, run_metadata)\u001b[0m\n\u001b[1;32m 963\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mfinal_fetches\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0mfinal_targets\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 964\u001b[0m results = self._do_run(handle, final_targets, final_fetches,\n\u001b[0;32m--> 965\u001b[0;31m feed_dict_string, options, run_metadata)\n\u001b[0m\u001b[1;32m 966\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 967\u001b[0m \u001b[0mresults\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/mat/miniconda3/envs/tf-gpu/lib/python3.5/site-packages/tensorflow/python/client/session.py\u001b[0m in \u001b[0;36m_do_run\u001b[0;34m(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)\u001b[0m\n\u001b[1;32m 1013\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mhandle\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1014\u001b[0m return self._do_call(_run_fn, self._session, feed_dict, fetch_list,\n\u001b[0;32m-> 1015\u001b[0;31m target_list, options, run_metadata)\n\u001b[0m\u001b[1;32m 1016\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1017\u001b[0m return self._do_call(_prun_fn, self._session, handle, feed_dict,\n",
"\u001b[0;32m/home/mat/miniconda3/envs/tf-gpu/lib/python3.5/site-packages/tensorflow/python/client/session.py\u001b[0m in \u001b[0;36m_do_call\u001b[0;34m(self, fn, *args)\u001b[0m\n\u001b[1;32m 1020\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_do_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1021\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1022\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1023\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0merrors\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mOpError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1024\u001b[0m \u001b[0mmessage\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcompat\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mas_text\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmessage\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/mat/miniconda3/envs/tf-gpu/lib/python3.5/site-packages/tensorflow/python/client/session.py\u001b[0m in \u001b[0;36m_run_fn\u001b[0;34m(session, feed_dict, fetch_list, target_list, options, run_metadata)\u001b[0m\n\u001b[1;32m 1002\u001b[0m return tf_session.TF_Run(session, options,\n\u001b[1;32m 1003\u001b[0m \u001b[0mfeed_dict\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfetch_list\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtarget_list\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1004\u001b[0;31m status, run_metadata)\n\u001b[0m\u001b[1;32m 1005\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1006\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_prun_fn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msession\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhandle\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfeed_dict\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfetch_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mKeyboardInterrupt\u001b[0m: "
]
}
],
"source": [
"epochs = 10\n",
"save_every_n = 200\n",
"train_x, train_y, val_x, val_y = split_data(chars, batch_size, num_steps)\n",
"\n",
"model = build_rnn(len(vocab), \n",
" batch_size=batch_size,\n",
" num_steps=num_steps,\n",
" learning_rate=learning_rate,\n",
" lstm_size=lstm_size,\n",
" num_layers=num_layers)\n",
"\n",
"saver = tf.train.Saver(max_to_keep=100)\n",
"\n",
"with tf.Session() as sess:\n",
" sess.run(tf.global_variables_initializer())\n",
" \n",
" # Use the line below to load a checkpoint and resume training\n",
" #saver.restore(sess, 'checkpoints/anna20.ckpt')\n",
" \n",
" n_batches = int(train_x.shape[1]/num_steps)\n",
" iterations = n_batches * epochs\n",
" for e in range(epochs):\n",
" \n",
" # Train network\n",
" new_state = sess.run(model.initial_state)\n",
" loss = 0\n",
" for b, (x, y) in enumerate(get_batch([train_x, train_y], num_steps), 1):\n",
" iteration = e*n_batches + b\n",
" start = time.time()\n",
" feed = {model.inputs: x,\n",
" model.targets: y,\n",
" model.keep_prob: 0.5,\n",
" model.initial_state: new_state}\n",
" batch_loss, new_state, _ = sess.run([model.cost, model.final_state, model.optimizer], \n",
" feed_dict=feed)\n",
" loss += batch_loss\n",
" end = time.time()\n",
" print('Epoch {}/{} '.format(e+1, epochs),\n",
" 'Iteration {}/{}'.format(iteration, iterations),\n",
" 'Training loss: {:.4f}'.format(loss/b),\n",
" '{:.4f} sec/batch'.format((end-start)))\n",
" \n",
" \n",
" if (iteration%save_every_n == 0) or (iteration == iterations):\n",
" # Check performance, notice dropout has been set to 1\n",
" val_loss = []\n",
" new_state = sess.run(model.initial_state)\n",
" for x, y in get_batch([val_x, val_y], num_steps):\n",
" feed = {model.inputs: x,\n",
" model.targets: y,\n",
" model.keep_prob: 1.,\n",
" model.initial_state: new_state}\n",
" batch_loss, new_state = sess.run([model.cost, model.final_state], feed_dict=feed)\n",
" val_loss.append(batch_loss)\n",
"\n",
" print('Validation loss:', np.mean(val_loss),\n",
" 'Saving checkpoint!')\n",
" saver.save(sess, \"checkpoints/anna/i{}_l{}_{:.3f}.ckpt\".format(iteration, lstm_size, np.mean(val_loss)))"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"text/plain": [
"model_checkpoint_path: \"checkpoints/anna/i3560_l512_1.122.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i200_l512_2.432.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i400_l512_1.980.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i600_l512_1.750.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i800_l512_1.595.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1000_l512_1.484.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1200_l512_1.407.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1400_l512_1.349.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1600_l512_1.292.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1800_l512_1.255.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2000_l512_1.224.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2200_l512_1.204.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2400_l512_1.187.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2600_l512_1.172.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2800_l512_1.160.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i3000_l512_1.148.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i3200_l512_1.137.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i3400_l512_1.129.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i3560_l512_1.122.ckpt\""
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tf.train.get_checkpoint_state('checkpoints/anna')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sampling\n",
"\n",
"Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.\n",
"\n",
"The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def pick_top_n(preds, vocab_size, top_n=5):\n",
" p = np.squeeze(preds)\n",
" p[np.argsort(p)[:-top_n]] = 0\n",
" p = p / np.sum(p)\n",
" c = np.random.choice(vocab_size, 1, p=p)[0]\n",
" return c"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def sample(checkpoint, n_samples, lstm_size, vocab_size, prime=\"The \"):\n",
" prime = \"Far\"\n",
" samples = [c for c in prime]\n",
" model = build_rnn(vocab_size, lstm_size=lstm_size, sampling=True)\n",
" saver = tf.train.Saver()\n",
" with tf.Session() as sess:\n",
" saver.restore(sess, checkpoint)\n",
" new_state = sess.run(model.initial_state)\n",
" for c in prime:\n",
" x = np.zeros((1, 1))\n",
" x[0,0] = vocab_to_int[c]\n",
" feed = {model.inputs: x,\n",
" model.keep_prob: 1.,\n",
" model.initial_state: new_state}\n",
" preds, new_state = sess.run([model.preds, model.final_state], \n",
" feed_dict=feed)\n",
"\n",
" c = pick_top_n(preds, len(vocab))\n",
" samples.append(int_to_vocab[c])\n",
"\n",
" for i in range(n_samples):\n",
" x[0,0] = c\n",
" feed = {model.inputs: x,\n",
" model.keep_prob: 1.,\n",
" model.initial_state: new_state}\n",
" preds, new_state = sess.run([model.preds, model.final_state], \n",
" feed_dict=feed)\n",
"\n",
" c = pick_top_n(preds, len(vocab))\n",
" samples.append(int_to_vocab[c])\n",
" \n",
" return ''.join(samples)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Farlathit that if had so\n",
"like it that it were. He could not trouble to his wife, and there was\n",
"anything in them of the side of his weaky in the creature at his forteren\n",
"to him.\n",
"\n",
"\"What is it? I can't bread to those,\" said Stepan Arkadyevitch. \"It's not\n",
"my children, and there is an almost this arm, true it mays already,\n",
"and tell you what I have say to you, and was not looking at the peasant,\n",
"why is, I don't know him out, and she doesn't speak to me immediately, as\n",
"you would say the countess and the more frest an angelembre, and time and\n",
"things's silent, but I was not in my stand that is in my head. But if he\n",
"say, and was so feeling with his soul. A child--in his soul of his\n",
"soul of his soul. He should not see that any of that sense of. Here he\n",
"had not been so composed and to speak for as in a whole picture, but\n",
"all the setting and her excellent and society, who had been delighted\n",
"and see to anywing had been being troed to thousand words on them,\n",
"we liked him.\n",
"\n",
"That set in her money at the table, he came into the party. The capable\n",
"of his she could not be as an old composure.\n",
"\n",
"\"That's all something there will be down becime by throe is\n",
"such a silent, as in a countess, I should state it out and divorct.\n",
"The discussion is not for me. I was that something was simply they are\n",
"all three manshess of a sensitions of mind it all.\"\n",
"\n",
"\"No,\" he thought, shouted and lifting his soul. \"While it might see your\n",
"honser and she, I could burst. And I had been a midelity. And I had a\n",
"marnief are through the countess,\" he said, looking at him, a chosing\n",
"which they had been carried out and still solied, and there was a sen that\n",
"was to be completely, and that this matter of all the seconds of it, and\n",
"a concipation were to her husband, who came up and conscaously, that he\n",
"was not the station. All his fourse she was always at the country,,\n",
"to speak oft, and though they were to hear the delightful throom and\n",
"whether they came towards the morning, and his living and a coller and\n",
"hold--the children. \n"
]
}
],
"source": [
"checkpoint = \"checkpoints/anna/i3560_l512_1.122.ckpt\"\n",
"samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime=\"Far\")\n",
"print(samp)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Farnt him oste wha sorind thans tout thint asd an sesand an hires on thime sind thit aled, ban thand and out hore as the ter hos ton ho te that, was tis tart al the hand sostint him sore an tit an son thes, win he se ther san ther hher tas tarereng,.\n",
"\n",
"Anl at an ades in ond hesiln, ad hhe torers teans, wast tar arering tho this sos alten sorer has hhas an siton ther him he had sin he ard ate te anling the sosin her ans and\n",
"arins asd and ther ale te tot an tand tanginge wath and ho ald, so sot th asend sat hare sother horesinnd, he hesense wing ante her so tith tir sherinn, anded and to the toul anderin he sorit he torsith she se atere an ting ot hand and thit hhe so the te wile har\n",
"ens ont in the sersise, and we he seres tar aterer, to ato tat or has he he wan ton here won and sen heren he sosering, to to theer oo adent har herere the wosh oute, was serild ward tous hed astend..\n",
"\n",
"I's sint on alt in har tor tit her asd hade shithans ored he talereng an soredendere tim tot hees. Tise sor and \n"
]
}
],
"source": [
"checkpoint = \"checkpoints/anna/i200_l512_2.432.ckpt\"\n",
"samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime=\"Far\")\n",
"print(samp)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fard as astice her said he celatice of to seress in the raice, and to be the some and sere allats to that said to that the sark and a cast a the wither ald the pacinesse of her had astition, he said to the sount as she west at hissele. Af the cond it he was a fact onthis astisarianing.\n",
"\n",
"\n",
"\"Or a ton to to be that's a more at aspestale as the sont of anstiring as\n",
"thours and trey.\n",
"\n",
"The same wo dangring the\n",
"raterst, who sore and somethy had ast out an of his book. \"We had's beane were that, and a morted a thay he had to tere. Then to\n",
"her homent andertersed his his ancouted to the pirsted, the soution for of the pirsice inthirgest and stenciol, with the hard and and\n",
"a colrice of to be oneres,\n",
"the song to this anderssad.\n",
"The could ounterss the said to serom of\n",
"soment a carsed of sheres of she\n",
"torded\n",
"har and want in their of hould, but\n",
"her told in that in he tad a the same to her. Serghing an her has and with the seed, and the camt ont his about of the\n",
"sail, the her then all houg ant or to hus to \n"
]
}
],
"source": [
"checkpoint = \"checkpoints/anna/i600_l512_1.750.ckpt\"\n",
"samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime=\"Far\")\n",
"print(samp)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Farrat, his felt has at it.\n",
"\n",
"\"When the pose ther hor exceed\n",
"to his sheant was,\" weat a sime of his sounsed. The coment and the facily that which had began terede a marilicaly whice whether the pose of his hand, at she was alligated herself the same on she had to\n",
"taiking to his forthing and streath how to hand\n",
"began in a lang at some at it, this he cholded not set all her. \"Wo love that is setthing. Him anstering as seen that.\"\n",
"\n",
"\"Yes in the man that say the mare a crances is it?\" said Sergazy Ivancatching. \"You doon think were somether is ifficult of a mone of\n",
"though the most at the countes that the\n",
"mean on the come to say the most, to\n",
"his feesing of\n",
"a man she, whilo he\n",
"sained and well, that he would still at to said. He wind at his for the sore in the most\n",
"of hoss and almoved to see him. They have betine the sumper into at he his stire, and what he was that at the so steate of the\n",
"sound, and shin should have a geest of shall feet on the conderation to she had been at that imporsing the dre\n"
]
}
],
"source": [
"checkpoint = \"checkpoints/anna/i1000_l512_1.484.ckpt\"\n",
"samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime=\"Far\")\n",
"print(samp)"
]
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
},
"toc": {
"colors": {
"hover_highlight": "#DAA520",
"running_highlight": "#FF0000",
"selected_highlight": "#FFD700"
},
"moveMenuLeft": true,
"nav_menu": {
"height": "111px",
"width": "251px"
},
"navigate_menu": true,
"number_sections": true,
"sideBar": true,
"threshold": 4,
"toc_cell": false,
"toc_section_display": "block",
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -0,0 +1,794 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Anna KaRNNa\n",
"\n",
"In this notebook, I'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.\n",
"\n",
"This network is based off of Andrej Karpathy's [post on RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) and [implementation in Torch](https://github.com/karpathy/char-rnn). Also, some information [here at r2rt](http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html) and from [Sherjil Ozair](https://github.com/sherjilozair/char-rnn-tensorflow) on GitHub. Below is the general architecture of the character-wise RNN.\n",
"\n",
"<img src=\"assets/charseq.jpeg\" width=\"500\">"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"import time\n",
"from collections import namedtuple\n",
"\n",
"import numpy as np\n",
"import tensorflow as tf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we'll load the text file and convert it into integers for our network to use."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"with open('anna.txt', 'r') as f:\n",
" text=f.read()\n",
"vocab = set(text)\n",
"vocab_to_int = {c: i for i, c in enumerate(vocab)}\n",
"int_to_vocab = dict(enumerate(vocab))\n",
"chars = np.array([vocab_to_int[c] for c in text], dtype=np.int32)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'Chapter 1\\n\\n\\nHappy families are all alike; every unhappy family is unhappy in its own\\nway.\\n\\nEverythin'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"text[:100]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([82, 78, 3, 48, 15, 79, 77, 50, 30, 20, 20, 20, 38, 3, 48, 48, 8,\n",
" 50, 10, 3, 9, 33, 4, 33, 79, 43, 50, 3, 77, 79, 50, 3, 4, 4,\n",
" 50, 3, 4, 33, 17, 79, 64, 50, 79, 44, 79, 77, 8, 50, 49, 70, 78,\n",
" 3, 48, 48, 8, 50, 10, 3, 9, 33, 4, 8, 50, 33, 43, 50, 49, 70,\n",
" 78, 3, 48, 48, 8, 50, 33, 70, 50, 33, 15, 43, 50, 55, 62, 70, 20,\n",
" 62, 3, 8, 22, 20, 20, 80, 44, 79, 77, 8, 15, 78, 33, 70], dtype=int32)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chars[:100]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now I need to split up the data into batches, and into training and validation sets. I should be making a test set here, but I'm not going to worry about that. My test will be if the network can generate new text.\n",
"\n",
"Here I'll make both input and target arrays. The targets are the same as the inputs, except shifted one character over. I'll also drop the last bit of data so that I'll only have completely full batches.\n",
"\n",
"The idea here is to make a 2D matrix where the number of rows is equal to the number of batches. Each row will be one long concatenated string from the character data. We'll split this data into a training set and validation set using the `split_frac` keyword. This will keep 90% of the batches in the training set, the other 10% in the validation set."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def split_data(chars, batch_size, num_steps, split_frac=0.9):\n",
" \"\"\" \n",
" Split character data into training and validation sets, inputs and targets for each set.\n",
" \n",
" Arguments\n",
" ---------\n",
" chars: character array\n",
" batch_size: Size of examples in each of batch\n",
" num_steps: Number of sequence steps to keep in the input and pass to the network\n",
" split_frac: Fraction of batches to keep in the training set\n",
" \n",
" \n",
" Returns train_x, train_y, val_x, val_y\n",
" \"\"\"\n",
" \n",
" \n",
" slice_size = batch_size * num_steps\n",
" n_batches = int(len(chars) / slice_size)\n",
" \n",
" # Drop the last few characters to make only full batches\n",
" x = chars[: n_batches*slice_size]\n",
" y = chars[1: n_batches*slice_size + 1]\n",
" \n",
" # Split the data into batch_size slices, then stack them into a 2D matrix \n",
" x = np.stack(np.split(x, batch_size))\n",
" y = np.stack(np.split(y, batch_size))\n",
" \n",
" # Now x and y are arrays with dimensions batch_size x n_batches*num_steps\n",
" \n",
" # Split into training and validation sets, keep the virst split_frac batches for training\n",
" split_idx = int(n_batches*split_frac)\n",
" train_x, train_y= x[:, :split_idx*num_steps], y[:, :split_idx*num_steps]\n",
" val_x, val_y = x[:, split_idx*num_steps:], y[:, split_idx*num_steps:]\n",
" \n",
" return train_x, train_y, val_x, val_y"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"train_x, train_y, val_x, val_y = split_data(chars, 10, 200)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(10, 178400)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_x.shape"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[82, 78, 3, 48, 15, 79, 77, 50, 30, 20],\n",
" [67, 70, 58, 50, 78, 79, 50, 9, 55, 44],\n",
" [50, 65, 3, 15, 65, 78, 33, 70, 32, 50],\n",
" [55, 15, 78, 79, 77, 50, 62, 55, 49, 4],\n",
" [50, 15, 78, 79, 50, 4, 3, 70, 58, 18],\n",
" [50, 51, 78, 77, 55, 49, 32, 78, 50, 4],\n",
" [15, 50, 15, 55, 20, 58, 55, 22, 20, 20],\n",
" [55, 50, 78, 79, 77, 43, 79, 4, 10, 56],\n",
" [78, 3, 15, 50, 33, 43, 50, 15, 78, 79],\n",
" [79, 77, 43, 79, 4, 10, 50, 3, 70, 58]], dtype=int32)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_x[:,:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I'll write another function to grab batches out of the arrays made by split data. Here each batch will be a sliding window on these arrays with size `batch_size X num_steps`. For example, if we want our network to train on a sequence of 100 characters, `num_steps = 100`. For the next batch, we'll shift this window the next sequence of `num_steps` characters. In this way we can feed batches to the network and the cell states will continue through on each batch."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_batch(arrs, num_steps):\n",
" batch_size, slice_size = arrs[0].shape\n",
" \n",
" n_batches = int(slice_size/num_steps)\n",
" for b in range(n_batches):\n",
" yield [x[:, b*num_steps: (b+1)*num_steps] for x in arrs]"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"def build_rnn(num_classes, batch_size=50, num_steps=50, lstm_size=128, num_layers=2,\n",
" learning_rate=0.001, grad_clip=5, sampling=False):\n",
" \n",
" if sampling == True:\n",
" batch_size, num_steps = 1, 1\n",
"\n",
" tf.reset_default_graph()\n",
" \n",
" # Declare placeholders we'll feed into the graph\n",
" \n",
" inputs = tf.placeholder(tf.int32, [batch_size, num_steps], name='inputs')\n",
" x_one_hot = tf.one_hot(inputs, num_classes, name='x_one_hot')\n",
"\n",
"\n",
" targets = tf.placeholder(tf.int32, [batch_size, num_steps], name='targets')\n",
" y_one_hot = tf.one_hot(targets, num_classes, name='y_one_hot')\n",
" y_reshaped = tf.reshape(y_one_hot, [-1, num_classes])\n",
" \n",
" keep_prob = tf.placeholder(tf.float32, name='keep_prob')\n",
" \n",
" # Build the RNN layers\n",
" \n",
" lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)\n",
" drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)\n",
" cell = tf.contrib.rnn.MultiRNNCell([drop] * num_layers)\n",
"\n",
" initial_state = cell.zero_state(batch_size, tf.float32)\n",
"\n",
" # Run the data through the RNN layers\n",
" rnn_inputs = [tf.squeeze(i, squeeze_dims=[1]) for i in tf.split(x_one_hot, num_steps, 1)]\n",
" outputs, state = tf.contrib.rnn.static_rnn(cell, rnn_inputs, initial_state=initial_state)\n",
" \n",
" final_state = tf.identity(state, name='final_state')\n",
" \n",
" # Reshape output so it's a bunch of rows, one row for each cell output\n",
" \n",
" seq_output = tf.concat(outputs, axis=1,name='seq_output')\n",
" output = tf.reshape(seq_output, [-1, lstm_size], name='graph_output')\n",
" \n",
" # Now connect the RNN putputs to a softmax layer and calculate the cost\n",
" softmax_w = tf.Variable(tf.truncated_normal((lstm_size, num_classes), stddev=0.1),\n",
" name='softmax_w')\n",
" softmax_b = tf.Variable(tf.zeros(num_classes), name='softmax_b')\n",
" logits = tf.matmul(output, softmax_w) + softmax_b\n",
"\n",
" preds = tf.nn.softmax(logits, name='predictions')\n",
" \n",
" loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped, name='loss')\n",
" cost = tf.reduce_mean(loss, name='cost')\n",
"\n",
" # Optimizer for training, using gradient clipping to control exploding gradients\n",
" tvars = tf.trainable_variables()\n",
" grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), grad_clip)\n",
" train_op = tf.train.AdamOptimizer(learning_rate)\n",
" optimizer = train_op.apply_gradients(zip(grads, tvars))\n",
"\n",
" # Export the nodes \n",
" export_nodes = ['inputs', 'targets', 'initial_state', 'final_state',\n",
" 'keep_prob', 'cost', 'preds', 'optimizer']\n",
" Graph = namedtuple('Graph', export_nodes)\n",
" local_dict = locals()\n",
" graph = Graph(*[local_dict[each] for each in export_nodes])\n",
" \n",
" return graph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hyperparameters\n",
"\n",
"Here I'm defining the hyperparameters for the network. The two you probably haven't seen before are `lstm_size` and `num_layers`. These set the number of hidden units in the LSTM layers and the number of LSTM layers, respectively. Of course, making these bigger will improve the network's performance but you'll have to watch out for overfitting. If your validation loss is much larger than the training loss, you're probably overfitting. Decrease the size of the network or decrease the dropout keep probability."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"batch_size = 100\n",
"num_steps = 100\n",
"lstm_size = 512\n",
"num_layers = 2\n",
"learning_rate = 0.001"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Write out the graph for TensorBoard"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"model = build_rnn(len(vocab),\n",
" batch_size=batch_size,\n",
" num_steps=num_steps,\n",
" learning_rate=learning_rate,\n",
" lstm_size=lstm_size,\n",
" num_layers=num_layers)\n",
"\n",
"with tf.Session() as sess:\n",
" \n",
" sess.run(tf.global_variables_initializer())\n",
" file_writer = tf.summary.FileWriter('./logs/1', sess.graph)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training\n",
"\n",
"Time for training which is is pretty straightforward. Here I pass in some data, and get an LSTM state back. Then I pass that state back in to the network so the next batch can continue the state from the previous batch. And every so often (set by `save_every_n`) I calculate the validation loss and save a checkpoint."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"!mkdir -p checkpoints/anna"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true,
"scrolled": true
},
"outputs": [
{
"ename": "ValueError",
"evalue": "Expected state to be a tuple of length 2, but received: Tensor(\"initial_state:0\", shape=(2, 2, 100, 512), dtype=float32)",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-20-4190d11347ea>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0mlearning_rate\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlearning_rate\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0mlstm_size\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlstm_size\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 10\u001b[0;31m num_layers=num_layers)\n\u001b[0m\u001b[1;32m 11\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 12\u001b[0m \u001b[0msaver\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrain\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mSaver\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmax_to_keep\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m100\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m<ipython-input-19-a7e675cc0f3d>\u001b[0m in \u001b[0;36mbuild_rnn\u001b[0;34m(num_classes, batch_size, num_steps, lstm_size, num_layers, learning_rate, grad_clip, sampling)\u001b[0m\n\u001b[1;32m 25\u001b[0m \u001b[0;31m# Run the data through the RNN layers\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 26\u001b[0m \u001b[0mrnn_inputs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mtf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msqueeze\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msqueeze_dims\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mtf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msplit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_one_hot\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnum_steps\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 27\u001b[0;31m \u001b[0moutputs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstate\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcontrib\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrnn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstatic_rnn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcell\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrnn_inputs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minitial_state\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0minitial_state\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 28\u001b[0m \u001b[0mfinal_state\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0midentity\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstate\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'final_state'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 29\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/mat/miniconda3/envs/tf-gpu/lib/python3.5/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn.py\u001b[0m in \u001b[0;36mstatic_rnn\u001b[0;34m(cell, inputs, initial_state, dtype, sequence_length, scope)\u001b[0m\n\u001b[1;32m 195\u001b[0m state_size=cell.state_size)\n\u001b[1;32m 196\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 197\u001b[0;31m \u001b[0;34m(\u001b[0m\u001b[0moutput\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstate\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcall_cell\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 198\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 199\u001b[0m \u001b[0moutputs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutput\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/mat/miniconda3/envs/tf-gpu/lib/python3.5/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn.py\u001b[0m in \u001b[0;36m<lambda>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 182\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mtime\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mvarscope\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreuse_variables\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 183\u001b[0m \u001b[0;31m# pylint: disable=cell-var-from-loop\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 184\u001b[0;31m \u001b[0mcall_cell\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mlambda\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mcell\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minput_\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstate\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 185\u001b[0m \u001b[0;31m# pylint: enable=cell-var-from-loop\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 186\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0msequence_length\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/mat/miniconda3/envs/tf-gpu/lib/python3.5/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, inputs, state, scope)\u001b[0m\n\u001b[1;32m 647\u001b[0m raise ValueError(\n\u001b[1;32m 648\u001b[0m \u001b[0;34m\"Expected state to be a tuple of length %d, but received: %s\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 649\u001b[0;31m % (len(self.state_size), state))\n\u001b[0m\u001b[1;32m 650\u001b[0m \u001b[0mcur_state\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstate\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 651\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mValueError\u001b[0m: Expected state to be a tuple of length 2, but received: Tensor(\"initial_state:0\", shape=(2, 2, 100, 512), dtype=float32)"
]
}
],
"source": [
"epochs = 1\n",
"save_every_n = 200\n",
"train_x, train_y, val_x, val_y = split_data(chars, batch_size, num_steps)\n",
"\n",
"model = build_rnn(len(vocab), \n",
" batch_size=batch_size,\n",
" num_steps=num_steps,\n",
" learning_rate=learning_rate,\n",
" lstm_size=lstm_size,\n",
" num_layers=num_layers)\n",
"\n",
"saver = tf.train.Saver(max_to_keep=100)\n",
"\n",
"with tf.Session() as sess:\n",
" sess.run(tf.global_variables_initializer())\n",
" \n",
" # Use the line below to load a checkpoint and resume training\n",
" #saver.restore(sess, 'checkpoints/anna20.ckpt')\n",
" \n",
" n_batches = int(train_x.shape[1]/num_steps)\n",
" iterations = n_batches * epochs\n",
" for e in range(epochs):\n",
" \n",
" # Train network\n",
" new_state = sess.run(model.initial_state)\n",
" loss = 0\n",
" for b, (x, y) in enumerate(get_batch([train_x, train_y], num_steps), 1):\n",
" iteration = e*n_batches + b\n",
" start = time.time()\n",
" feed = {model.inputs: x,\n",
" model.targets: y,\n",
" model.keep_prob: 0.5,\n",
" model.initial_state: new_state}\n",
" batch_loss, new_state, _ = sess.run([model.cost, model.final_state, model.optimizer], \n",
" feed_dict=feed)\n",
" loss += batch_loss\n",
" end = time.time()\n",
" print('Epoch {}/{} '.format(e+1, epochs),\n",
" 'Iteration {}/{}'.format(iteration, iterations),\n",
" 'Training loss: {:.4f}'.format(loss/b),\n",
" '{:.4f} sec/batch'.format((end-start)))\n",
" \n",
" \n",
" if (iteration%save_every_n == 0) or (iteration == iterations):\n",
" # Check performance, notice dropout has been set to 1\n",
" val_loss = []\n",
" new_state = sess.run(model.initial_state)\n",
" for x, y in get_batch([val_x, val_y], num_steps):\n",
" feed = {model.inputs: x,\n",
" model.targets: y,\n",
" model.keep_prob: 1.,\n",
" model.initial_state: new_state}\n",
" batch_loss, new_state = sess.run([model.cost, model.final_state], feed_dict=feed)\n",
" val_loss.append(batch_loss)\n",
"\n",
" print('Validation loss:', np.mean(val_loss),\n",
" 'Saving checkpoint!')\n",
" saver.save(sess, \"checkpoints/anna/i{}_l{}_{:.3f}.ckpt\".format(iteration, lstm_size, np.mean(val_loss)))"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"text/plain": [
"model_checkpoint_path: \"checkpoints/anna/i3560_l512_1.122.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i200_l512_2.432.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i400_l512_1.980.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i600_l512_1.750.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i800_l512_1.595.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1000_l512_1.484.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1200_l512_1.407.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1400_l512_1.349.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1600_l512_1.292.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1800_l512_1.255.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2000_l512_1.224.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2200_l512_1.204.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2400_l512_1.187.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2600_l512_1.172.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2800_l512_1.160.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i3000_l512_1.148.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i3200_l512_1.137.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i3400_l512_1.129.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i3560_l512_1.122.ckpt\""
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tf.train.get_checkpoint_state('checkpoints/anna')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sampling\n",
"\n",
"Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.\n",
"\n",
"The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def pick_top_n(preds, vocab_size, top_n=5):\n",
" p = np.squeeze(preds)\n",
" p[np.argsort(p)[:-top_n]] = 0\n",
" p = p / np.sum(p)\n",
" c = np.random.choice(vocab_size, 1, p=p)[0]\n",
" return c"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def sample(checkpoint, n_samples, lstm_size, vocab_size, prime=\"The \"):\n",
" prime = \"Far\"\n",
" samples = [c for c in prime]\n",
" model = build_rnn(vocab_size, lstm_size=lstm_size, sampling=True)\n",
" saver = tf.train.Saver()\n",
" with tf.Session() as sess:\n",
" saver.restore(sess, checkpoint)\n",
" new_state = sess.run(model.initial_state)\n",
" for c in prime:\n",
" x = np.zeros((1, 1))\n",
" x[0,0] = vocab_to_int[c]\n",
" feed = {model.inputs: x,\n",
" model.keep_prob: 1.,\n",
" model.initial_state: new_state}\n",
" preds, new_state = sess.run([model.preds, model.final_state], \n",
" feed_dict=feed)\n",
"\n",
" c = pick_top_n(preds, len(vocab))\n",
" samples.append(int_to_vocab[c])\n",
"\n",
" for i in range(n_samples):\n",
" x[0,0] = c\n",
" feed = {model.inputs: x,\n",
" model.keep_prob: 1.,\n",
" model.initial_state: new_state}\n",
" preds, new_state = sess.run([model.preds, model.final_state], \n",
" feed_dict=feed)\n",
"\n",
" c = pick_top_n(preds, len(vocab))\n",
" samples.append(int_to_vocab[c])\n",
" \n",
" return ''.join(samples)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Farlathit that if had so\n",
"like it that it were. He could not trouble to his wife, and there was\n",
"anything in them of the side of his weaky in the creature at his forteren\n",
"to him.\n",
"\n",
"\"What is it? I can't bread to those,\" said Stepan Arkadyevitch. \"It's not\n",
"my children, and there is an almost this arm, true it mays already,\n",
"and tell you what I have say to you, and was not looking at the peasant,\n",
"why is, I don't know him out, and she doesn't speak to me immediately, as\n",
"you would say the countess and the more frest an angelembre, and time and\n",
"things's silent, but I was not in my stand that is in my head. But if he\n",
"say, and was so feeling with his soul. A child--in his soul of his\n",
"soul of his soul. He should not see that any of that sense of. Here he\n",
"had not been so composed and to speak for as in a whole picture, but\n",
"all the setting and her excellent and society, who had been delighted\n",
"and see to anywing had been being troed to thousand words on them,\n",
"we liked him.\n",
"\n",
"That set in her money at the table, he came into the party. The capable\n",
"of his she could not be as an old composure.\n",
"\n",
"\"That's all something there will be down becime by throe is\n",
"such a silent, as in a countess, I should state it out and divorct.\n",
"The discussion is not for me. I was that something was simply they are\n",
"all three manshess of a sensitions of mind it all.\"\n",
"\n",
"\"No,\" he thought, shouted and lifting his soul. \"While it might see your\n",
"honser and she, I could burst. And I had been a midelity. And I had a\n",
"marnief are through the countess,\" he said, looking at him, a chosing\n",
"which they had been carried out and still solied, and there was a sen that\n",
"was to be completely, and that this matter of all the seconds of it, and\n",
"a concipation were to her husband, who came up and conscaously, that he\n",
"was not the station. All his fourse she was always at the country,,\n",
"to speak oft, and though they were to hear the delightful throom and\n",
"whether they came towards the morning, and his living and a coller and\n",
"hold--the children. \n"
]
}
],
"source": [
"checkpoint = \"checkpoints/anna/i3560_l512_1.122.ckpt\"\n",
"samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime=\"Far\")\n",
"print(samp)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Farnt him oste wha sorind thans tout thint asd an sesand an hires on thime sind thit aled, ban thand and out hore as the ter hos ton ho te that, was tis tart al the hand sostint him sore an tit an son thes, win he se ther san ther hher tas tarereng,.\n",
"\n",
"Anl at an ades in ond hesiln, ad hhe torers teans, wast tar arering tho this sos alten sorer has hhas an siton ther him he had sin he ard ate te anling the sosin her ans and\n",
"arins asd and ther ale te tot an tand tanginge wath and ho ald, so sot th asend sat hare sother horesinnd, he hesense wing ante her so tith tir sherinn, anded and to the toul anderin he sorit he torsith she se atere an ting ot hand and thit hhe so the te wile har\n",
"ens ont in the sersise, and we he seres tar aterer, to ato tat or has he he wan ton here won and sen heren he sosering, to to theer oo adent har herere the wosh oute, was serild ward tous hed astend..\n",
"\n",
"I's sint on alt in har tor tit her asd hade shithans ored he talereng an soredendere tim tot hees. Tise sor and \n"
]
}
],
"source": [
"checkpoint = \"checkpoints/anna/i200_l512_2.432.ckpt\"\n",
"samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime=\"Far\")\n",
"print(samp)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fard as astice her said he celatice of to seress in the raice, and to be the some and sere allats to that said to that the sark and a cast a the wither ald the pacinesse of her had astition, he said to the sount as she west at hissele. Af the cond it he was a fact onthis astisarianing.\n",
"\n",
"\n",
"\"Or a ton to to be that's a more at aspestale as the sont of anstiring as\n",
"thours and trey.\n",
"\n",
"The same wo dangring the\n",
"raterst, who sore and somethy had ast out an of his book. \"We had's beane were that, and a morted a thay he had to tere. Then to\n",
"her homent andertersed his his ancouted to the pirsted, the soution for of the pirsice inthirgest and stenciol, with the hard and and\n",
"a colrice of to be oneres,\n",
"the song to this anderssad.\n",
"The could ounterss the said to serom of\n",
"soment a carsed of sheres of she\n",
"torded\n",
"har and want in their of hould, but\n",
"her told in that in he tad a the same to her. Serghing an her has and with the seed, and the camt ont his about of the\n",
"sail, the her then all houg ant or to hus to \n"
]
}
],
"source": [
"checkpoint = \"checkpoints/anna/i600_l512_1.750.ckpt\"\n",
"samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime=\"Far\")\n",
"print(samp)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Farrat, his felt has at it.\n",
"\n",
"\"When the pose ther hor exceed\n",
"to his sheant was,\" weat a sime of his sounsed. The coment and the facily that which had began terede a marilicaly whice whether the pose of his hand, at she was alligated herself the same on she had to\n",
"taiking to his forthing and streath how to hand\n",
"began in a lang at some at it, this he cholded not set all her. \"Wo love that is setthing. Him anstering as seen that.\"\n",
"\n",
"\"Yes in the man that say the mare a crances is it?\" said Sergazy Ivancatching. \"You doon think were somether is ifficult of a mone of\n",
"though the most at the countes that the\n",
"mean on the come to say the most, to\n",
"his feesing of\n",
"a man she, whilo he\n",
"sained and well, that he would still at to said. He wind at his for the sore in the most\n",
"of hoss and almoved to see him. They have betine the sumper into at he his stire, and what he was that at the so steate of the\n",
"sound, and shin should have a geest of shall feet on the conderation to she had been at that imporsing the dre\n"
]
}
],
"source": [
"checkpoint = \"checkpoints/anna/i1000_l512_1.484.ckpt\"\n",
"samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime=\"Far\")\n",
"print(samp)"
]
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
},
"toc": {
"colors": {
"hover_highlight": "#DAA520",
"running_highlight": "#FF0000",
"selected_highlight": "#FFD700"
},
"moveMenuLeft": true,
"nav_menu": {
"height": "123px",
"width": "335px"
},
"navigate_menu": true,
"number_sections": true,
"sideBar": true,
"threshold": 4,
"toc_cell": false,
"toc_section_display": "block",
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}

File diff suppressed because it is too large Load Diff

@ -0,0 +1,847 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Anna KaRNNa\n",
"\n",
"In this notebook, I'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.\n",
"\n",
"This network is based off of Andrej Karpathy's [post on RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) and [implementation in Torch](https://github.com/karpathy/char-rnn). Also, some information [here at r2rt](http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html) and from [Sherjil Ozair](https://github.com/sherjilozair/char-rnn-tensorflow) on GitHub. Below is the general architecture of the character-wise RNN.\n",
"\n",
"<img src=\"assets/charseq.jpeg\" width=\"500\">"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"import time\n",
"from collections import namedtuple\n",
"\n",
"import numpy as np\n",
"import tensorflow as tf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we'll load the text file and convert it into integers for our network to use."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"with open('anna.txt', 'r') as f:\n",
" text=f.read()\n",
"vocab = set(text)\n",
"vocab_to_int = {c: i for i, c in enumerate(vocab)}\n",
"int_to_vocab = dict(enumerate(vocab))\n",
"chars = np.array([vocab_to_int[c] for c in text], dtype=np.int32)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'Chapter 1\\n\\n\\nHappy families are all alike; every unhappy family is unhappy in its own\\nway.\\n\\nEverythin'"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"text[:100]"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([26, 49, 22, 46, 33, 70, 51, 37, 36, 31, 31, 31, 65, 22, 46, 46, 39,\n",
" 37, 23, 22, 4, 77, 32, 77, 70, 81, 37, 22, 51, 70, 37, 22, 32, 32,\n",
" 37, 22, 32, 77, 0, 70, 64, 37, 70, 16, 70, 51, 39, 37, 20, 74, 49,\n",
" 22, 46, 46, 39, 37, 23, 22, 4, 77, 32, 39, 37, 77, 81, 37, 20, 74,\n",
" 49, 22, 46, 46, 39, 37, 77, 74, 37, 77, 33, 81, 37, 21, 75, 74, 31,\n",
" 75, 22, 39, 13, 31, 31, 24, 16, 70, 51, 39, 33, 49, 77, 74], dtype=int32)"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chars[:100]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now I need to split up the data into batches, and into training and validation sets. I should be making a test set here, but I'm not going to worry about that. My test will be if the network can generate new text.\n",
"\n",
"Here I'll make both input and target arrays. The targets are the same as the inputs, except shifted one character over. I'll also drop the last bit of data so that I'll only have completely full batches.\n",
"\n",
"The idea here is to make a 2D matrix where the number of rows is equal to the number of batches. Each row will be one long concatenated string from the character data. We'll split this data into a training set and validation set using the `split_frac` keyword. This will keep 90% of the batches in the training set, the other 10% in the validation set."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def split_data(chars, batch_size, num_steps, split_frac=0.9):\n",
" \"\"\" \n",
" Split character data into training and validation sets, inputs and targets for each set.\n",
" \n",
" Arguments\n",
" ---------\n",
" chars: character array\n",
" batch_size: Size of examples in each of batch\n",
" num_steps: Number of sequence steps to keep in the input and pass to the network\n",
" split_frac: Fraction of batches to keep in the training set\n",
" \n",
" \n",
" Returns train_x, train_y, val_x, val_y\n",
" \"\"\"\n",
" \n",
" \n",
" slice_size = batch_size * num_steps\n",
" n_batches = int(len(chars) / slice_size)\n",
" \n",
" # Drop the last few characters to make only full batches\n",
" x = chars[: n_batches*slice_size]\n",
" y = chars[1: n_batches*slice_size + 1]\n",
" \n",
" # Split the data into batch_size slices, then stack them into a 2D matrix \n",
" x = np.stack(np.split(x, batch_size))\n",
" y = np.stack(np.split(y, batch_size))\n",
" \n",
" # Now x and y are arrays with dimensions batch_size x n_batches*num_steps\n",
" \n",
" # Split into training and validation sets, keep the virst split_frac batches for training\n",
" split_idx = int(n_batches*split_frac)\n",
" train_x, train_y= x[:, :split_idx*num_steps], y[:, :split_idx*num_steps]\n",
" val_x, val_y = x[:, split_idx*num_steps:], y[:, split_idx*num_steps:]\n",
" \n",
" return train_x, train_y, val_x, val_y"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"train_x, train_y, val_x, val_y = split_data(chars, 10, 200)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(10, 178400)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_x.shape"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[26, 49, 22, 46, 33, 70, 51, 37, 36, 31],\n",
" [11, 74, 53, 37, 49, 70, 37, 4, 21, 16],\n",
" [37, 1, 22, 33, 1, 49, 77, 74, 69, 37],\n",
" [21, 33, 49, 70, 51, 37, 75, 21, 20, 32],\n",
" [37, 33, 49, 70, 37, 32, 22, 74, 53, 2],\n",
" [37, 52, 49, 51, 21, 20, 69, 49, 37, 32],\n",
" [33, 37, 33, 21, 31, 53, 21, 13, 31, 31],\n",
" [21, 37, 49, 70, 51, 81, 70, 32, 23, 54],\n",
" [49, 22, 33, 37, 77, 81, 37, 33, 49, 70],\n",
" [70, 51, 81, 70, 32, 23, 37, 22, 74, 53]], dtype=int32)"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_x[:,:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I'll write another function to grab batches out of the arrays made by split data. Here each batch will be a sliding window on these arrays with size `batch_size X num_steps`. For example, if we want our network to train on a sequence of 100 characters, `num_steps = 100`. For the next batch, we'll shift this window the next sequence of `num_steps` characters. In this way we can feed batches to the network and the cell states will continue through on each batch."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_batch(arrs, num_steps):\n",
" batch_size, slice_size = arrs[0].shape\n",
" \n",
" n_batches = int(slice_size/num_steps)\n",
" for b in range(n_batches):\n",
" yield [x[:, b*num_steps: (b+1)*num_steps] for x in arrs]"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"def build_rnn(num_classes, batch_size=50, num_steps=50, lstm_size=128, num_layers=2,\n",
" learning_rate=0.001, grad_clip=5, sampling=False):\n",
" \n",
" if sampling == True:\n",
" batch_size, num_steps = 1, 1\n",
"\n",
" tf.reset_default_graph()\n",
" \n",
" # Declare placeholders we'll feed into the graph\n",
" with tf.name_scope('inputs'):\n",
" inputs = tf.placeholder(tf.int32, [batch_size, num_steps], name='inputs')\n",
" x_one_hot = tf.one_hot(inputs, num_classes, name='x_one_hot')\n",
" \n",
" with tf.name_scope('targets'):\n",
" targets = tf.placeholder(tf.int32, [batch_size, num_steps], name='targets')\n",
" y_one_hot = tf.one_hot(targets, num_classes, name='y_one_hot')\n",
" y_reshaped = tf.reshape(y_one_hot, [-1, num_classes])\n",
" \n",
" keep_prob = tf.placeholder(tf.float32, name='keep_prob')\n",
" \n",
" # Build the RNN layers\n",
" with tf.name_scope(\"RNN_layers\"):\n",
" lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)\n",
" drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)\n",
" cell = tf.contrib.rnn.MultiRNNCell([drop] * num_layers)\n",
" \n",
" with tf.name_scope(\"RNN_init_state\"):\n",
" initial_state = cell.zero_state(batch_size, tf.float32)\n",
"\n",
" # Run the data through the RNN layers\n",
" with tf.name_scope(\"RNN_forward\"):\n",
" rnn_inputs = [tf.squeeze(i, squeeze_dims=[1]) for i in tf.split(x_one_hot, num_steps, 1)]\n",
" outputs, state = tf.contrib.rnn.static_rnn(cell, rnn_inputs, initial_state=initial_state)\n",
" \n",
" final_state = state\n",
" \n",
" # Reshape output so it's a bunch of rows, one row for each cell output\n",
" with tf.name_scope('sequence_reshape'):\n",
" seq_output = tf.concat(outputs, axis=1,name='seq_output')\n",
" output = tf.reshape(seq_output, [-1, lstm_size], name='graph_output')\n",
" \n",
" # Now connect the RNN putputs to a softmax layer and calculate the cost\n",
" with tf.name_scope('logits'):\n",
" softmax_w = tf.Variable(tf.truncated_normal((lstm_size, num_classes), stddev=0.1),\n",
" name='softmax_w')\n",
" softmax_b = tf.Variable(tf.zeros(num_classes), name='softmax_b')\n",
" logits = tf.matmul(output, softmax_w) + softmax_b\n",
"\n",
" with tf.name_scope('predictions'):\n",
" preds = tf.nn.softmax(logits, name='predictions')\n",
" \n",
" \n",
" with tf.name_scope('cost'):\n",
" loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped, name='loss')\n",
" cost = tf.reduce_mean(loss, name='cost')\n",
"\n",
" # Optimizer for training, using gradient clipping to control exploding gradients\n",
" with tf.name_scope('train'):\n",
" tvars = tf.trainable_variables()\n",
" grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), grad_clip)\n",
" train_op = tf.train.AdamOptimizer(learning_rate)\n",
" optimizer = train_op.apply_gradients(zip(grads, tvars))\n",
" \n",
" # Export the nodes \n",
" export_nodes = ['inputs', 'targets', 'initial_state', 'final_state',\n",
" 'keep_prob', 'cost', 'preds', 'optimizer']\n",
" Graph = namedtuple('Graph', export_nodes)\n",
" local_dict = locals()\n",
" graph = Graph(*[local_dict[each] for each in export_nodes])\n",
" \n",
" return graph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hyperparameters\n",
"\n",
"Here I'm defining the hyperparameters for the network. The two you probably haven't seen before are `lstm_size` and `num_layers`. These set the number of hidden units in the LSTM layers and the number of LSTM layers, respectively. Of course, making these bigger will improve the network's performance but you'll have to watch out for overfitting. If your validation loss is much larger than the training loss, you're probably overfitting. Decrease the size of the network or decrease the dropout keep probability."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"batch_size = 100\n",
"num_steps = 100\n",
"lstm_size = 512\n",
"num_layers = 2\n",
"learning_rate = 0.001"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Write out the graph for TensorBoard"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"model = build_rnn(len(vocab), \n",
" batch_size=batch_size,\n",
" num_steps=num_steps,\n",
" learning_rate=learning_rate,\n",
" lstm_size=lstm_size,\n",
" num_layers=num_layers)\n",
"\n",
"with tf.Session() as sess:\n",
" \n",
" sess.run(tf.global_variables_initializer())\n",
" file_writer = tf.summary.FileWriter('./logs/3', sess.graph)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training\n",
"\n",
"Time for training which is is pretty straightforward. Here I pass in some data, and get an LSTM state back. Then I pass that state back in to the network so the next batch can continue the state from the previous batch. And every so often (set by `save_every_n`) I calculate the validation loss and save a checkpoint."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"!mkdir -p checkpoints/anna"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/10 Iteration 1/1780 Training loss: 4.4195 1.3313 sec/batch\n",
"Epoch 1/10 Iteration 2/1780 Training loss: 4.3756 0.1287 sec/batch\n",
"Epoch 1/10 Iteration 3/1780 Training loss: 4.2069 0.1276 sec/batch\n",
"Epoch 1/10 Iteration 4/1780 Training loss: 4.5396 0.1185 sec/batch\n",
"Epoch 1/10 Iteration 5/1780 Training loss: 4.4190 0.1206 sec/batch\n",
"Epoch 1/10 Iteration 6/1780 Training loss: 4.3547 0.1233 sec/batch\n",
"Epoch 1/10 Iteration 7/1780 Training loss: 4.2792 0.1188 sec/batch\n",
"Epoch 1/10 Iteration 8/1780 Training loss: 4.2018 0.1170 sec/batch\n",
"Epoch 1/10 Iteration 9/1780 Training loss: 4.1251 0.1187 sec/batch\n",
"Epoch 1/10 Iteration 10/1780 Training loss: 4.0558 0.1174 sec/batch\n",
"Epoch 1/10 Iteration 11/1780 Training loss: 3.9946 0.1190 sec/batch\n",
"Epoch 1/10 Iteration 12/1780 Training loss: 3.9451 0.1193 sec/batch\n",
"Epoch 1/10 Iteration 13/1780 Training loss: 3.9011 0.1210 sec/batch\n",
"Epoch 1/10 Iteration 14/1780 Training loss: 3.8632 0.1185 sec/batch\n",
"Epoch 1/10 Iteration 15/1780 Training loss: 3.8275 0.1199 sec/batch\n",
"Epoch 1/10 Iteration 16/1780 Training loss: 3.7945 0.1211 sec/batch\n",
"Epoch 1/10 Iteration 17/1780 Training loss: 3.7649 0.1215 sec/batch\n",
"Epoch 1/10 Iteration 18/1780 Training loss: 3.7400 0.1214 sec/batch\n",
"Epoch 1/10 Iteration 19/1780 Training loss: 3.7164 0.1247 sec/batch\n",
"Epoch 1/10 Iteration 20/1780 Training loss: 3.6933 0.1212 sec/batch\n",
"Epoch 1/10 Iteration 21/1780 Training loss: 3.6728 0.1203 sec/batch\n",
"Epoch 1/10 Iteration 22/1780 Training loss: 3.6538 0.1207 sec/batch\n",
"Epoch 1/10 Iteration 23/1780 Training loss: 3.6359 0.1200 sec/batch\n",
"Epoch 1/10 Iteration 24/1780 Training loss: 3.6198 0.1229 sec/batch\n",
"Epoch 1/10 Iteration 25/1780 Training loss: 3.6041 0.1204 sec/batch\n",
"Epoch 1/10 Iteration 26/1780 Training loss: 3.5904 0.1202 sec/batch\n",
"Epoch 1/10 Iteration 27/1780 Training loss: 3.5774 0.1189 sec/batch\n",
"Epoch 1/10 Iteration 28/1780 Training loss: 3.5642 0.1214 sec/batch\n",
"Epoch 1/10 Iteration 29/1780 Training loss: 3.5522 0.1231 sec/batch\n",
"Epoch 1/10 Iteration 30/1780 Training loss: 3.5407 0.1199 sec/batch\n",
"Epoch 1/10 Iteration 31/1780 Training loss: 3.5309 0.1180 sec/batch\n",
"Epoch 1/10 Iteration 32/1780 Training loss: 3.5207 0.1179 sec/batch\n",
"Epoch 1/10 Iteration 33/1780 Training loss: 3.5109 0.1224 sec/batch\n",
"Epoch 1/10 Iteration 34/1780 Training loss: 3.5021 0.1206 sec/batch\n",
"Epoch 1/10 Iteration 35/1780 Training loss: 3.4931 0.1241 sec/batch\n",
"Epoch 1/10 Iteration 36/1780 Training loss: 3.4850 0.1169 sec/batch\n",
"Epoch 1/10 Iteration 37/1780 Training loss: 3.4767 0.1204 sec/batch\n",
"Epoch 1/10 Iteration 38/1780 Training loss: 3.4688 0.1202 sec/batch\n",
"Epoch 1/10 Iteration 39/1780 Training loss: 3.4611 0.1213 sec/batch\n"
]
},
{
"ename": "KeyboardInterrupt",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-15-09fa3beeed23>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 33\u001b[0m model.initial_state: new_state}\n\u001b[1;32m 34\u001b[0m batch_loss, new_state, _ = sess.run([model.cost, model.final_state, model.optimizer], \n\u001b[0;32m---> 35\u001b[0;31m feed_dict=feed)\n\u001b[0m\u001b[1;32m 36\u001b[0m \u001b[0mloss\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0mbatch_loss\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 37\u001b[0m \u001b[0mend\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/mat/miniconda3/envs/tf-gpu/lib/python3.5/site-packages/tensorflow/python/client/session.py\u001b[0m in \u001b[0;36mrun\u001b[0;34m(self, fetches, feed_dict, options, run_metadata)\u001b[0m\n\u001b[1;32m 765\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 766\u001b[0m result = self._run(None, fetches, feed_dict, options_ptr,\n\u001b[0;32m--> 767\u001b[0;31m run_metadata_ptr)\n\u001b[0m\u001b[1;32m 768\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mrun_metadata\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 769\u001b[0m \u001b[0mproto_data\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtf_session\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mTF_GetBuffer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrun_metadata_ptr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/mat/miniconda3/envs/tf-gpu/lib/python3.5/site-packages/tensorflow/python/client/session.py\u001b[0m in \u001b[0;36m_run\u001b[0;34m(self, handle, fetches, feed_dict, options, run_metadata)\u001b[0m\n\u001b[1;32m 963\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mfinal_fetches\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0mfinal_targets\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 964\u001b[0m results = self._do_run(handle, final_targets, final_fetches,\n\u001b[0;32m--> 965\u001b[0;31m feed_dict_string, options, run_metadata)\n\u001b[0m\u001b[1;32m 966\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 967\u001b[0m \u001b[0mresults\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/mat/miniconda3/envs/tf-gpu/lib/python3.5/site-packages/tensorflow/python/client/session.py\u001b[0m in \u001b[0;36m_do_run\u001b[0;34m(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)\u001b[0m\n\u001b[1;32m 1013\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mhandle\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1014\u001b[0m return self._do_call(_run_fn, self._session, feed_dict, fetch_list,\n\u001b[0;32m-> 1015\u001b[0;31m target_list, options, run_metadata)\n\u001b[0m\u001b[1;32m 1016\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1017\u001b[0m return self._do_call(_prun_fn, self._session, handle, feed_dict,\n",
"\u001b[0;32m/home/mat/miniconda3/envs/tf-gpu/lib/python3.5/site-packages/tensorflow/python/client/session.py\u001b[0m in \u001b[0;36m_do_call\u001b[0;34m(self, fn, *args)\u001b[0m\n\u001b[1;32m 1020\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_do_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1021\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1022\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1023\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0merrors\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mOpError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1024\u001b[0m \u001b[0mmessage\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcompat\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mas_text\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmessage\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/mat/miniconda3/envs/tf-gpu/lib/python3.5/site-packages/tensorflow/python/client/session.py\u001b[0m in \u001b[0;36m_run_fn\u001b[0;34m(session, feed_dict, fetch_list, target_list, options, run_metadata)\u001b[0m\n\u001b[1;32m 1002\u001b[0m return tf_session.TF_Run(session, options,\n\u001b[1;32m 1003\u001b[0m \u001b[0mfeed_dict\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfetch_list\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtarget_list\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1004\u001b[0;31m status, run_metadata)\n\u001b[0m\u001b[1;32m 1005\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1006\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_prun_fn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msession\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhandle\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfeed_dict\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfetch_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mKeyboardInterrupt\u001b[0m: "
]
}
],
"source": [
"epochs = 10\n",
"save_every_n = 200\n",
"train_x, train_y, val_x, val_y = split_data(chars, batch_size, num_steps)\n",
"\n",
"model = build_rnn(len(vocab), \n",
" batch_size=batch_size,\n",
" num_steps=num_steps,\n",
" learning_rate=learning_rate,\n",
" lstm_size=lstm_size,\n",
" num_layers=num_layers)\n",
"\n",
"saver = tf.train.Saver(max_to_keep=100)\n",
"\n",
"with tf.Session() as sess:\n",
" sess.run(tf.global_variables_initializer())\n",
" \n",
" # Use the line below to load a checkpoint and resume training\n",
" #saver.restore(sess, 'checkpoints/anna20.ckpt')\n",
" \n",
" n_batches = int(train_x.shape[1]/num_steps)\n",
" iterations = n_batches * epochs\n",
" for e in range(epochs):\n",
" \n",
" # Train network\n",
" new_state = sess.run(model.initial_state)\n",
" loss = 0\n",
" for b, (x, y) in enumerate(get_batch([train_x, train_y], num_steps), 1):\n",
" iteration = e*n_batches + b\n",
" start = time.time()\n",
" feed = {model.inputs: x,\n",
" model.targets: y,\n",
" model.keep_prob: 0.5,\n",
" model.initial_state: new_state}\n",
" batch_loss, new_state, _ = sess.run([model.cost, model.final_state, model.optimizer], \n",
" feed_dict=feed)\n",
" loss += batch_loss\n",
" end = time.time()\n",
" print('Epoch {}/{} '.format(e+1, epochs),\n",
" 'Iteration {}/{}'.format(iteration, iterations),\n",
" 'Training loss: {:.4f}'.format(loss/b),\n",
" '{:.4f} sec/batch'.format((end-start)))\n",
" \n",
" \n",
" if (iteration%save_every_n == 0) or (iteration == iterations):\n",
" # Check performance, notice dropout has been set to 1\n",
" val_loss = []\n",
" new_state = sess.run(model.initial_state)\n",
" for x, y in get_batch([val_x, val_y], num_steps):\n",
" feed = {model.inputs: x,\n",
" model.targets: y,\n",
" model.keep_prob: 1.,\n",
" model.initial_state: new_state}\n",
" batch_loss, new_state = sess.run([model.cost, model.final_state], feed_dict=feed)\n",
" val_loss.append(batch_loss)\n",
"\n",
" print('Validation loss:', np.mean(val_loss),\n",
" 'Saving checkpoint!')\n",
" saver.save(sess, \"checkpoints/anna/i{}_l{}_{:.3f}.ckpt\".format(iteration, lstm_size, np.mean(val_loss)))"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"text/plain": [
"model_checkpoint_path: \"checkpoints/anna/i3560_l512_1.122.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i200_l512_2.432.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i400_l512_1.980.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i600_l512_1.750.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i800_l512_1.595.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1000_l512_1.484.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1200_l512_1.407.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1400_l512_1.349.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1600_l512_1.292.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1800_l512_1.255.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2000_l512_1.224.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2200_l512_1.204.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2400_l512_1.187.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2600_l512_1.172.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2800_l512_1.160.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i3000_l512_1.148.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i3200_l512_1.137.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i3400_l512_1.129.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i3560_l512_1.122.ckpt\""
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tf.train.get_checkpoint_state('checkpoints/anna')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sampling\n",
"\n",
"Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.\n",
"\n",
"The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def pick_top_n(preds, vocab_size, top_n=5):\n",
" p = np.squeeze(preds)\n",
" p[np.argsort(p)[:-top_n]] = 0\n",
" p = p / np.sum(p)\n",
" c = np.random.choice(vocab_size, 1, p=p)[0]\n",
" return c"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def sample(checkpoint, n_samples, lstm_size, vocab_size, prime=\"The \"):\n",
" prime = \"Far\"\n",
" samples = [c for c in prime]\n",
" model = build_rnn(vocab_size, lstm_size=lstm_size, sampling=True)\n",
" saver = tf.train.Saver()\n",
" with tf.Session() as sess:\n",
" saver.restore(sess, checkpoint)\n",
" new_state = sess.run(model.initial_state)\n",
" for c in prime:\n",
" x = np.zeros((1, 1))\n",
" x[0,0] = vocab_to_int[c]\n",
" feed = {model.inputs: x,\n",
" model.keep_prob: 1.,\n",
" model.initial_state: new_state}\n",
" preds, new_state = sess.run([model.preds, model.final_state], \n",
" feed_dict=feed)\n",
"\n",
" c = pick_top_n(preds, len(vocab))\n",
" samples.append(int_to_vocab[c])\n",
"\n",
" for i in range(n_samples):\n",
" x[0,0] = c\n",
" feed = {model.inputs: x,\n",
" model.keep_prob: 1.,\n",
" model.initial_state: new_state}\n",
" preds, new_state = sess.run([model.preds, model.final_state], \n",
" feed_dict=feed)\n",
"\n",
" c = pick_top_n(preds, len(vocab))\n",
" samples.append(int_to_vocab[c])\n",
" \n",
" return ''.join(samples)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Farlathit that if had so\n",
"like it that it were. He could not trouble to his wife, and there was\n",
"anything in them of the side of his weaky in the creature at his forteren\n",
"to him.\n",
"\n",
"\"What is it? I can't bread to those,\" said Stepan Arkadyevitch. \"It's not\n",
"my children, and there is an almost this arm, true it mays already,\n",
"and tell you what I have say to you, and was not looking at the peasant,\n",
"why is, I don't know him out, and she doesn't speak to me immediately, as\n",
"you would say the countess and the more frest an angelembre, and time and\n",
"things's silent, but I was not in my stand that is in my head. But if he\n",
"say, and was so feeling with his soul. A child--in his soul of his\n",
"soul of his soul. He should not see that any of that sense of. Here he\n",
"had not been so composed and to speak for as in a whole picture, but\n",
"all the setting and her excellent and society, who had been delighted\n",
"and see to anywing had been being troed to thousand words on them,\n",
"we liked him.\n",
"\n",
"That set in her money at the table, he came into the party. The capable\n",
"of his she could not be as an old composure.\n",
"\n",
"\"That's all something there will be down becime by throe is\n",
"such a silent, as in a countess, I should state it out and divorct.\n",
"The discussion is not for me. I was that something was simply they are\n",
"all three manshess of a sensitions of mind it all.\"\n",
"\n",
"\"No,\" he thought, shouted and lifting his soul. \"While it might see your\n",
"honser and she, I could burst. And I had been a midelity. And I had a\n",
"marnief are through the countess,\" he said, looking at him, a chosing\n",
"which they had been carried out and still solied, and there was a sen that\n",
"was to be completely, and that this matter of all the seconds of it, and\n",
"a concipation were to her husband, who came up and conscaously, that he\n",
"was not the station. All his fourse she was always at the country,,\n",
"to speak oft, and though they were to hear the delightful throom and\n",
"whether they came towards the morning, and his living and a coller and\n",
"hold--the children. \n"
]
}
],
"source": [
"checkpoint = \"checkpoints/anna/i3560_l512_1.122.ckpt\"\n",
"samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime=\"Far\")\n",
"print(samp)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Farnt him oste wha sorind thans tout thint asd an sesand an hires on thime sind thit aled, ban thand and out hore as the ter hos ton ho te that, was tis tart al the hand sostint him sore an tit an son thes, win he se ther san ther hher tas tarereng,.\n",
"\n",
"Anl at an ades in ond hesiln, ad hhe torers teans, wast tar arering tho this sos alten sorer has hhas an siton ther him he had sin he ard ate te anling the sosin her ans and\n",
"arins asd and ther ale te tot an tand tanginge wath and ho ald, so sot th asend sat hare sother horesinnd, he hesense wing ante her so tith tir sherinn, anded and to the toul anderin he sorit he torsith she se atere an ting ot hand and thit hhe so the te wile har\n",
"ens ont in the sersise, and we he seres tar aterer, to ato tat or has he he wan ton here won and sen heren he sosering, to to theer oo adent har herere the wosh oute, was serild ward tous hed astend..\n",
"\n",
"I's sint on alt in har tor tit her asd hade shithans ored he talereng an soredendere tim tot hees. Tise sor and \n"
]
}
],
"source": [
"checkpoint = \"checkpoints/anna/i200_l512_2.432.ckpt\"\n",
"samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime=\"Far\")\n",
"print(samp)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fard as astice her said he celatice of to seress in the raice, and to be the some and sere allats to that said to that the sark and a cast a the wither ald the pacinesse of her had astition, he said to the sount as she west at hissele. Af the cond it he was a fact onthis astisarianing.\n",
"\n",
"\n",
"\"Or a ton to to be that's a more at aspestale as the sont of anstiring as\n",
"thours and trey.\n",
"\n",
"The same wo dangring the\n",
"raterst, who sore and somethy had ast out an of his book. \"We had's beane were that, and a morted a thay he had to tere. Then to\n",
"her homent andertersed his his ancouted to the pirsted, the soution for of the pirsice inthirgest and stenciol, with the hard and and\n",
"a colrice of to be oneres,\n",
"the song to this anderssad.\n",
"The could ounterss the said to serom of\n",
"soment a carsed of sheres of she\n",
"torded\n",
"har and want in their of hould, but\n",
"her told in that in he tad a the same to her. Serghing an her has and with the seed, and the camt ont his about of the\n",
"sail, the her then all houg ant or to hus to \n"
]
}
],
"source": [
"checkpoint = \"checkpoints/anna/i600_l512_1.750.ckpt\"\n",
"samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime=\"Far\")\n",
"print(samp)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Farrat, his felt has at it.\n",
"\n",
"\"When the pose ther hor exceed\n",
"to his sheant was,\" weat a sime of his sounsed. The coment and the facily that which had began terede a marilicaly whice whether the pose of his hand, at she was alligated herself the same on she had to\n",
"taiking to his forthing and streath how to hand\n",
"began in a lang at some at it, this he cholded not set all her. \"Wo love that is setthing. Him anstering as seen that.\"\n",
"\n",
"\"Yes in the man that say the mare a crances is it?\" said Sergazy Ivancatching. \"You doon think were somether is ifficult of a mone of\n",
"though the most at the countes that the\n",
"mean on the come to say the most, to\n",
"his feesing of\n",
"a man she, whilo he\n",
"sained and well, that he would still at to said. He wind at his for the sore in the most\n",
"of hoss and almoved to see him. They have betine the sumper into at he his stire, and what he was that at the so steate of the\n",
"sound, and shin should have a geest of shall feet on the conderation to she had been at that imporsing the dre\n"
]
}
],
"source": [
"checkpoint = \"checkpoints/anna/i1000_l512_1.484.ckpt\"\n",
"samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime=\"Far\")\n",
"print(samp)"
]
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
},
"toc": {
"colors": {
"hover_highlight": "#DAA520",
"running_highlight": "#FF0000",
"selected_highlight": "#FFD700"
},
"moveMenuLeft": true,
"nav_menu": {
"height": "111px",
"width": "251px"
},
"navigate_menu": true,
"number_sections": true,
"sideBar": true,
"threshold": 4,
"toc_cell": false,
"toc_section_display": "block",
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}

File diff suppressed because it is too large Load Diff

@ -0,0 +1,794 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Anna KaRNNa\n",
"\n",
"In this notebook, I'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.\n",
"\n",
"This network is based off of Andrej Karpathy's [post on RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) and [implementation in Torch](https://github.com/karpathy/char-rnn). Also, some information [here at r2rt](http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html) and from [Sherjil Ozair](https://github.com/sherjilozair/char-rnn-tensorflow) on GitHub. Below is the general architecture of the character-wise RNN.\n",
"\n",
"<img src=\"assets/charseq.jpeg\" width=\"500\">"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"import time\n",
"from collections import namedtuple\n",
"\n",
"import numpy as np\n",
"import tensorflow as tf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we'll load the text file and convert it into integers for our network to use."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"with open('anna.txt', 'r') as f:\n",
" text=f.read()\n",
"vocab = set(text)\n",
"vocab_to_int = {c: i for i, c in enumerate(vocab)}\n",
"int_to_vocab = dict(enumerate(vocab))\n",
"chars = np.array([vocab_to_int[c] for c in text], dtype=np.int32)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'Chapter 1\\n\\n\\nHappy families are all alike; every unhappy family is unhappy in its own\\nway.\\n\\nEverythin'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"text[:100]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([82, 78, 3, 48, 15, 79, 77, 50, 30, 20, 20, 20, 38, 3, 48, 48, 8,\n",
" 50, 10, 3, 9, 33, 4, 33, 79, 43, 50, 3, 77, 79, 50, 3, 4, 4,\n",
" 50, 3, 4, 33, 17, 79, 64, 50, 79, 44, 79, 77, 8, 50, 49, 70, 78,\n",
" 3, 48, 48, 8, 50, 10, 3, 9, 33, 4, 8, 50, 33, 43, 50, 49, 70,\n",
" 78, 3, 48, 48, 8, 50, 33, 70, 50, 33, 15, 43, 50, 55, 62, 70, 20,\n",
" 62, 3, 8, 22, 20, 20, 80, 44, 79, 77, 8, 15, 78, 33, 70], dtype=int32)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chars[:100]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now I need to split up the data into batches, and into training and validation sets. I should be making a test set here, but I'm not going to worry about that. My test will be if the network can generate new text.\n",
"\n",
"Here I'll make both input and target arrays. The targets are the same as the inputs, except shifted one character over. I'll also drop the last bit of data so that I'll only have completely full batches.\n",
"\n",
"The idea here is to make a 2D matrix where the number of rows is equal to the number of batches. Each row will be one long concatenated string from the character data. We'll split this data into a training set and validation set using the `split_frac` keyword. This will keep 90% of the batches in the training set, the other 10% in the validation set."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def split_data(chars, batch_size, num_steps, split_frac=0.9):\n",
" \"\"\" \n",
" Split character data into training and validation sets, inputs and targets for each set.\n",
" \n",
" Arguments\n",
" ---------\n",
" chars: character array\n",
" batch_size: Size of examples in each of batch\n",
" num_steps: Number of sequence steps to keep in the input and pass to the network\n",
" split_frac: Fraction of batches to keep in the training set\n",
" \n",
" \n",
" Returns train_x, train_y, val_x, val_y\n",
" \"\"\"\n",
" \n",
" \n",
" slice_size = batch_size * num_steps\n",
" n_batches = int(len(chars) / slice_size)\n",
" \n",
" # Drop the last few characters to make only full batches\n",
" x = chars[: n_batches*slice_size]\n",
" y = chars[1: n_batches*slice_size + 1]\n",
" \n",
" # Split the data into batch_size slices, then stack them into a 2D matrix \n",
" x = np.stack(np.split(x, batch_size))\n",
" y = np.stack(np.split(y, batch_size))\n",
" \n",
" # Now x and y are arrays with dimensions batch_size x n_batches*num_steps\n",
" \n",
" # Split into training and validation sets, keep the virst split_frac batches for training\n",
" split_idx = int(n_batches*split_frac)\n",
" train_x, train_y= x[:, :split_idx*num_steps], y[:, :split_idx*num_steps]\n",
" val_x, val_y = x[:, split_idx*num_steps:], y[:, split_idx*num_steps:]\n",
" \n",
" return train_x, train_y, val_x, val_y"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"train_x, train_y, val_x, val_y = split_data(chars, 10, 200)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(10, 178400)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_x.shape"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[82, 78, 3, 48, 15, 79, 77, 50, 30, 20],\n",
" [67, 70, 58, 50, 78, 79, 50, 9, 55, 44],\n",
" [50, 65, 3, 15, 65, 78, 33, 70, 32, 50],\n",
" [55, 15, 78, 79, 77, 50, 62, 55, 49, 4],\n",
" [50, 15, 78, 79, 50, 4, 3, 70, 58, 18],\n",
" [50, 51, 78, 77, 55, 49, 32, 78, 50, 4],\n",
" [15, 50, 15, 55, 20, 58, 55, 22, 20, 20],\n",
" [55, 50, 78, 79, 77, 43, 79, 4, 10, 56],\n",
" [78, 3, 15, 50, 33, 43, 50, 15, 78, 79],\n",
" [79, 77, 43, 79, 4, 10, 50, 3, 70, 58]], dtype=int32)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_x[:,:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I'll write another function to grab batches out of the arrays made by split data. Here each batch will be a sliding window on these arrays with size `batch_size X num_steps`. For example, if we want our network to train on a sequence of 100 characters, `num_steps = 100`. For the next batch, we'll shift this window the next sequence of `num_steps` characters. In this way we can feed batches to the network and the cell states will continue through on each batch."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_batch(arrs, num_steps):\n",
" batch_size, slice_size = arrs[0].shape\n",
" \n",
" n_batches = int(slice_size/num_steps)\n",
" for b in range(n_batches):\n",
" yield [x[:, b*num_steps: (b+1)*num_steps] for x in arrs]"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"def build_rnn(num_classes, batch_size=50, num_steps=50, lstm_size=128, num_layers=2,\n",
" learning_rate=0.001, grad_clip=5, sampling=False):\n",
" \n",
" if sampling == True:\n",
" batch_size, num_steps = 1, 1\n",
"\n",
" tf.reset_default_graph()\n",
" \n",
" # Declare placeholders we'll feed into the graph\n",
" \n",
" inputs = tf.placeholder(tf.int32, [batch_size, num_steps], name='inputs')\n",
" x_one_hot = tf.one_hot(inputs, num_classes, name='x_one_hot')\n",
"\n",
"\n",
" targets = tf.placeholder(tf.int32, [batch_size, num_steps], name='targets')\n",
" y_one_hot = tf.one_hot(targets, num_classes, name='y_one_hot')\n",
" y_reshaped = tf.reshape(y_one_hot, [-1, num_classes])\n",
" \n",
" keep_prob = tf.placeholder(tf.float32, name='keep_prob')\n",
" \n",
" # Build the RNN layers\n",
" \n",
" lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)\n",
" drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)\n",
" cell = tf.contrib.rnn.MultiRNNCell([drop] * num_layers)\n",
"\n",
" initial_state = cell.zero_state(batch_size, tf.float32)\n",
"\n",
" # Run the data through the RNN layers\n",
" rnn_inputs = [tf.squeeze(i, squeeze_dims=[1]) for i in tf.split(x_one_hot, num_steps, 1)]\n",
" outputs, state = tf.contrib.rnn.static_rnn(cell, rnn_inputs, initial_state=initial_state)\n",
" \n",
" final_state = tf.identity(state, name='final_state')\n",
" \n",
" # Reshape output so it's a bunch of rows, one row for each cell output\n",
" \n",
" seq_output = tf.concat(outputs, axis=1,name='seq_output')\n",
" output = tf.reshape(seq_output, [-1, lstm_size], name='graph_output')\n",
" \n",
" # Now connect the RNN putputs to a softmax layer and calculate the cost\n",
" softmax_w = tf.Variable(tf.truncated_normal((lstm_size, num_classes), stddev=0.1),\n",
" name='softmax_w')\n",
" softmax_b = tf.Variable(tf.zeros(num_classes), name='softmax_b')\n",
" logits = tf.matmul(output, softmax_w) + softmax_b\n",
"\n",
" preds = tf.nn.softmax(logits, name='predictions')\n",
" \n",
" loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped, name='loss')\n",
" cost = tf.reduce_mean(loss, name='cost')\n",
"\n",
" # Optimizer for training, using gradient clipping to control exploding gradients\n",
" tvars = tf.trainable_variables()\n",
" grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), grad_clip)\n",
" train_op = tf.train.AdamOptimizer(learning_rate)\n",
" optimizer = train_op.apply_gradients(zip(grads, tvars))\n",
"\n",
" # Export the nodes \n",
" export_nodes = ['inputs', 'targets', 'initial_state', 'final_state',\n",
" 'keep_prob', 'cost', 'preds', 'optimizer']\n",
" Graph = namedtuple('Graph', export_nodes)\n",
" local_dict = locals()\n",
" graph = Graph(*[local_dict[each] for each in export_nodes])\n",
" \n",
" return graph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hyperparameters\n",
"\n",
"Here I'm defining the hyperparameters for the network. The two you probably haven't seen before are `lstm_size` and `num_layers`. These set the number of hidden units in the LSTM layers and the number of LSTM layers, respectively. Of course, making these bigger will improve the network's performance but you'll have to watch out for overfitting. If your validation loss is much larger than the training loss, you're probably overfitting. Decrease the size of the network or decrease the dropout keep probability."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"batch_size = 100\n",
"num_steps = 100\n",
"lstm_size = 512\n",
"num_layers = 2\n",
"learning_rate = 0.001"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Write out the graph for TensorBoard"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"model = build_rnn(len(vocab),\n",
" batch_size=batch_size,\n",
" num_steps=num_steps,\n",
" learning_rate=learning_rate,\n",
" lstm_size=lstm_size,\n",
" num_layers=num_layers)\n",
"\n",
"with tf.Session() as sess:\n",
" \n",
" sess.run(tf.global_variables_initializer())\n",
" file_writer = tf.summary.FileWriter('./logs/1', sess.graph)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training\n",
"\n",
"Time for training which is is pretty straightforward. Here I pass in some data, and get an LSTM state back. Then I pass that state back in to the network so the next batch can continue the state from the previous batch. And every so often (set by `save_every_n`) I calculate the validation loss and save a checkpoint."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"!mkdir -p checkpoints/anna"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true,
"scrolled": true
},
"outputs": [
{
"ename": "ValueError",
"evalue": "Expected state to be a tuple of length 2, but received: Tensor(\"initial_state:0\", shape=(2, 2, 100, 512), dtype=float32)",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-20-4190d11347ea>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0mlearning_rate\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlearning_rate\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0mlstm_size\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlstm_size\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 10\u001b[0;31m num_layers=num_layers)\n\u001b[0m\u001b[1;32m 11\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 12\u001b[0m \u001b[0msaver\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrain\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mSaver\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmax_to_keep\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m100\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m<ipython-input-19-a7e675cc0f3d>\u001b[0m in \u001b[0;36mbuild_rnn\u001b[0;34m(num_classes, batch_size, num_steps, lstm_size, num_layers, learning_rate, grad_clip, sampling)\u001b[0m\n\u001b[1;32m 25\u001b[0m \u001b[0;31m# Run the data through the RNN layers\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 26\u001b[0m \u001b[0mrnn_inputs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mtf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msqueeze\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msqueeze_dims\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mtf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msplit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_one_hot\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnum_steps\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 27\u001b[0;31m \u001b[0moutputs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstate\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcontrib\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrnn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstatic_rnn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcell\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrnn_inputs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minitial_state\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0minitial_state\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 28\u001b[0m \u001b[0mfinal_state\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0midentity\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstate\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'final_state'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 29\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/mat/miniconda3/envs/tf-gpu/lib/python3.5/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn.py\u001b[0m in \u001b[0;36mstatic_rnn\u001b[0;34m(cell, inputs, initial_state, dtype, sequence_length, scope)\u001b[0m\n\u001b[1;32m 195\u001b[0m state_size=cell.state_size)\n\u001b[1;32m 196\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 197\u001b[0;31m \u001b[0;34m(\u001b[0m\u001b[0moutput\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstate\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcall_cell\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 198\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 199\u001b[0m \u001b[0moutputs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutput\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/mat/miniconda3/envs/tf-gpu/lib/python3.5/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn.py\u001b[0m in \u001b[0;36m<lambda>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 182\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mtime\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mvarscope\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreuse_variables\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 183\u001b[0m \u001b[0;31m# pylint: disable=cell-var-from-loop\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 184\u001b[0;31m \u001b[0mcall_cell\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mlambda\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mcell\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minput_\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstate\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 185\u001b[0m \u001b[0;31m# pylint: enable=cell-var-from-loop\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 186\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0msequence_length\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/home/mat/miniconda3/envs/tf-gpu/lib/python3.5/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, inputs, state, scope)\u001b[0m\n\u001b[1;32m 647\u001b[0m raise ValueError(\n\u001b[1;32m 648\u001b[0m \u001b[0;34m\"Expected state to be a tuple of length %d, but received: %s\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 649\u001b[0;31m % (len(self.state_size), state))\n\u001b[0m\u001b[1;32m 650\u001b[0m \u001b[0mcur_state\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstate\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 651\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mValueError\u001b[0m: Expected state to be a tuple of length 2, but received: Tensor(\"initial_state:0\", shape=(2, 2, 100, 512), dtype=float32)"
]
}
],
"source": [
"epochs = 1\n",
"save_every_n = 200\n",
"train_x, train_y, val_x, val_y = split_data(chars, batch_size, num_steps)\n",
"\n",
"model = build_rnn(len(vocab), \n",
" batch_size=batch_size,\n",
" num_steps=num_steps,\n",
" learning_rate=learning_rate,\n",
" lstm_size=lstm_size,\n",
" num_layers=num_layers)\n",
"\n",
"saver = tf.train.Saver(max_to_keep=100)\n",
"\n",
"with tf.Session() as sess:\n",
" sess.run(tf.global_variables_initializer())\n",
" \n",
" # Use the line below to load a checkpoint and resume training\n",
" #saver.restore(sess, 'checkpoints/anna20.ckpt')\n",
" \n",
" n_batches = int(train_x.shape[1]/num_steps)\n",
" iterations = n_batches * epochs\n",
" for e in range(epochs):\n",
" \n",
" # Train network\n",
" new_state = sess.run(model.initial_state)\n",
" loss = 0\n",
" for b, (x, y) in enumerate(get_batch([train_x, train_y], num_steps), 1):\n",
" iteration = e*n_batches + b\n",
" start = time.time()\n",
" feed = {model.inputs: x,\n",
" model.targets: y,\n",
" model.keep_prob: 0.5,\n",
" model.initial_state: new_state}\n",
" batch_loss, new_state, _ = sess.run([model.cost, model.final_state, model.optimizer], \n",
" feed_dict=feed)\n",
" loss += batch_loss\n",
" end = time.time()\n",
" print('Epoch {}/{} '.format(e+1, epochs),\n",
" 'Iteration {}/{}'.format(iteration, iterations),\n",
" 'Training loss: {:.4f}'.format(loss/b),\n",
" '{:.4f} sec/batch'.format((end-start)))\n",
" \n",
" \n",
" if (iteration%save_every_n == 0) or (iteration == iterations):\n",
" # Check performance, notice dropout has been set to 1\n",
" val_loss = []\n",
" new_state = sess.run(model.initial_state)\n",
" for x, y in get_batch([val_x, val_y], num_steps):\n",
" feed = {model.inputs: x,\n",
" model.targets: y,\n",
" model.keep_prob: 1.,\n",
" model.initial_state: new_state}\n",
" batch_loss, new_state = sess.run([model.cost, model.final_state], feed_dict=feed)\n",
" val_loss.append(batch_loss)\n",
"\n",
" print('Validation loss:', np.mean(val_loss),\n",
" 'Saving checkpoint!')\n",
" saver.save(sess, \"checkpoints/anna/i{}_l{}_{:.3f}.ckpt\".format(iteration, lstm_size, np.mean(val_loss)))"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"text/plain": [
"model_checkpoint_path: \"checkpoints/anna/i3560_l512_1.122.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i200_l512_2.432.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i400_l512_1.980.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i600_l512_1.750.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i800_l512_1.595.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1000_l512_1.484.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1200_l512_1.407.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1400_l512_1.349.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1600_l512_1.292.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i1800_l512_1.255.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2000_l512_1.224.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2200_l512_1.204.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2400_l512_1.187.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2600_l512_1.172.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i2800_l512_1.160.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i3000_l512_1.148.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i3200_l512_1.137.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i3400_l512_1.129.ckpt\"\n",
"all_model_checkpoint_paths: \"checkpoints/anna/i3560_l512_1.122.ckpt\""
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tf.train.get_checkpoint_state('checkpoints/anna')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sampling\n",
"\n",
"Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.\n",
"\n",
"The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def pick_top_n(preds, vocab_size, top_n=5):\n",
" p = np.squeeze(preds)\n",
" p[np.argsort(p)[:-top_n]] = 0\n",
" p = p / np.sum(p)\n",
" c = np.random.choice(vocab_size, 1, p=p)[0]\n",
" return c"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def sample(checkpoint, n_samples, lstm_size, vocab_size, prime=\"The \"):\n",
" prime = \"Far\"\n",
" samples = [c for c in prime]\n",
" model = build_rnn(vocab_size, lstm_size=lstm_size, sampling=True)\n",
" saver = tf.train.Saver()\n",
" with tf.Session() as sess:\n",
" saver.restore(sess, checkpoint)\n",
" new_state = sess.run(model.initial_state)\n",
" for c in prime:\n",
" x = np.zeros((1, 1))\n",
" x[0,0] = vocab_to_int[c]\n",
" feed = {model.inputs: x,\n",
" model.keep_prob: 1.,\n",
" model.initial_state: new_state}\n",
" preds, new_state = sess.run([model.preds, model.final_state], \n",
" feed_dict=feed)\n",
"\n",
" c = pick_top_n(preds, len(vocab))\n",
" samples.append(int_to_vocab[c])\n",
"\n",
" for i in range(n_samples):\n",
" x[0,0] = c\n",
" feed = {model.inputs: x,\n",
" model.keep_prob: 1.,\n",
" model.initial_state: new_state}\n",
" preds, new_state = sess.run([model.preds, model.final_state], \n",
" feed_dict=feed)\n",
"\n",
" c = pick_top_n(preds, len(vocab))\n",
" samples.append(int_to_vocab[c])\n",
" \n",
" return ''.join(samples)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Farlathit that if had so\n",
"like it that it were. He could not trouble to his wife, and there was\n",
"anything in them of the side of his weaky in the creature at his forteren\n",
"to him.\n",
"\n",
"\"What is it? I can't bread to those,\" said Stepan Arkadyevitch. \"It's not\n",
"my children, and there is an almost this arm, true it mays already,\n",
"and tell you what I have say to you, and was not looking at the peasant,\n",
"why is, I don't know him out, and she doesn't speak to me immediately, as\n",
"you would say the countess and the more frest an angelembre, and time and\n",
"things's silent, but I was not in my stand that is in my head. But if he\n",
"say, and was so feeling with his soul. A child--in his soul of his\n",
"soul of his soul. He should not see that any of that sense of. Here he\n",
"had not been so composed and to speak for as in a whole picture, but\n",
"all the setting and her excellent and society, who had been delighted\n",
"and see to anywing had been being troed to thousand words on them,\n",
"we liked him.\n",
"\n",
"That set in her money at the table, he came into the party. The capable\n",
"of his she could not be as an old composure.\n",
"\n",
"\"That's all something there will be down becime by throe is\n",
"such a silent, as in a countess, I should state it out and divorct.\n",
"The discussion is not for me. I was that something was simply they are\n",
"all three manshess of a sensitions of mind it all.\"\n",
"\n",
"\"No,\" he thought, shouted and lifting his soul. \"While it might see your\n",
"honser and she, I could burst. And I had been a midelity. And I had a\n",
"marnief are through the countess,\" he said, looking at him, a chosing\n",
"which they had been carried out and still solied, and there was a sen that\n",
"was to be completely, and that this matter of all the seconds of it, and\n",
"a concipation were to her husband, who came up and conscaously, that he\n",
"was not the station. All his fourse she was always at the country,,\n",
"to speak oft, and though they were to hear the delightful throom and\n",
"whether they came towards the morning, and his living and a coller and\n",
"hold--the children. \n"
]
}
],
"source": [
"checkpoint = \"checkpoints/anna/i3560_l512_1.122.ckpt\"\n",
"samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime=\"Far\")\n",
"print(samp)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Farnt him oste wha sorind thans tout thint asd an sesand an hires on thime sind thit aled, ban thand and out hore as the ter hos ton ho te that, was tis tart al the hand sostint him sore an tit an son thes, win he se ther san ther hher tas tarereng,.\n",
"\n",
"Anl at an ades in ond hesiln, ad hhe torers teans, wast tar arering tho this sos alten sorer has hhas an siton ther him he had sin he ard ate te anling the sosin her ans and\n",
"arins asd and ther ale te tot an tand tanginge wath and ho ald, so sot th asend sat hare sother horesinnd, he hesense wing ante her so tith tir sherinn, anded and to the toul anderin he sorit he torsith she se atere an ting ot hand and thit hhe so the te wile har\n",
"ens ont in the sersise, and we he seres tar aterer, to ato tat or has he he wan ton here won and sen heren he sosering, to to theer oo adent har herere the wosh oute, was serild ward tous hed astend..\n",
"\n",
"I's sint on alt in har tor tit her asd hade shithans ored he talereng an soredendere tim tot hees. Tise sor and \n"
]
}
],
"source": [
"checkpoint = \"checkpoints/anna/i200_l512_2.432.ckpt\"\n",
"samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime=\"Far\")\n",
"print(samp)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fard as astice her said he celatice of to seress in the raice, and to be the some and sere allats to that said to that the sark and a cast a the wither ald the pacinesse of her had astition, he said to the sount as she west at hissele. Af the cond it he was a fact onthis astisarianing.\n",
"\n",
"\n",
"\"Or a ton to to be that's a more at aspestale as the sont of anstiring as\n",
"thours and trey.\n",
"\n",
"The same wo dangring the\n",
"raterst, who sore and somethy had ast out an of his book. \"We had's beane were that, and a morted a thay he had to tere. Then to\n",
"her homent andertersed his his ancouted to the pirsted, the soution for of the pirsice inthirgest and stenciol, with the hard and and\n",
"a colrice of to be oneres,\n",
"the song to this anderssad.\n",
"The could ounterss the said to serom of\n",
"soment a carsed of sheres of she\n",
"torded\n",
"har and want in their of hould, but\n",
"her told in that in he tad a the same to her. Serghing an her has and with the seed, and the camt ont his about of the\n",
"sail, the her then all houg ant or to hus to \n"
]
}
],
"source": [
"checkpoint = \"checkpoints/anna/i600_l512_1.750.ckpt\"\n",
"samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime=\"Far\")\n",
"print(samp)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Farrat, his felt has at it.\n",
"\n",
"\"When the pose ther hor exceed\n",
"to his sheant was,\" weat a sime of his sounsed. The coment and the facily that which had began terede a marilicaly whice whether the pose of his hand, at she was alligated herself the same on she had to\n",
"taiking to his forthing and streath how to hand\n",
"began in a lang at some at it, this he cholded not set all her. \"Wo love that is setthing. Him anstering as seen that.\"\n",
"\n",
"\"Yes in the man that say the mare a crances is it?\" said Sergazy Ivancatching. \"You doon think were somether is ifficult of a mone of\n",
"though the most at the countes that the\n",
"mean on the come to say the most, to\n",
"his feesing of\n",
"a man she, whilo he\n",
"sained and well, that he would still at to said. He wind at his for the sore in the most\n",
"of hoss and almoved to see him. They have betine the sumper into at he his stire, and what he was that at the so steate of the\n",
"sound, and shin should have a geest of shall feet on the conderation to she had been at that imporsing the dre\n"
]
}
],
"source": [
"checkpoint = \"checkpoints/anna/i1000_l512_1.484.ckpt\"\n",
"samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime=\"Far\")\n",
"print(samp)"
]
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
},
"toc": {
"colors": {
"hover_highlight": "#DAA520",
"running_highlight": "#FF0000",
"selected_highlight": "#FFD700"
},
"moveMenuLeft": true,
"nav_menu": {
"height": "123px",
"width": "335px"
},
"navigate_menu": true,
"number_sections": true,
"sideBar": true,
"threshold": 4,
"toc_cell": false,
"toc_section_display": "block",
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:568bb39fc78aca98e9db91c4329dcd1aa5ec4c3df3aca2064ce5d6f023ae16c9
size 2025486

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save