- Installation and Setup
- Machine Learning Primer
- What's TensorFlow?
- Basic TensorFlow concepts
- MNIST Example
- Softmax
- Convoluted Neural Networks
Ensure that you have the following installed:
Materials are available here
\[H(X) = -\sum_{i=1}^{N} p(x_i) \log p(x_i)\]
\[H(\text{fair dice roll}) = -\sum_{i=1}^6 \frac{1}{6} \log \frac{1}{6}=2.58\]
\[H(20/80 \text{ coin toss}) = -\frac{1}{5}\log \frac{1}{5}-\frac{4}{5}\log \frac{4}{5} = 0.72\]
Recall from linear algebra that:
All are tensors of n-order. Similary, tensors can be transformed with operations. TensorFlow provides library of algorithms to perform tensor operations efficiently.
Simple linear regression model:
\[w_o + w_1 x = \hat{y}\]
Can represent linear regression as a graph
\[a_i^{(2)} = g(w_{i0} + w_{i1}x_1 + w_{i2}x_2 + w_{i3}x_3)\]
For more complex models, it could be helpful to visualize your graph. TensorBoard provides this virtualization tool
\[g(u) = max(0, u)\]
\[S(x) = \frac{1}{1+e^{-x}}\]
\[S_j(\boldsymbol{z}) = \frac{e^{z_j}}{\sum_{k=1}^K e^{z_k}} \text{ for }j=1,\dots,k\]
\[\sum_{j=1}^k S_j(\boldsymbol{z}) = 1\]
Tensorflow separates definition of computations from their execution
Phases:
session
to execute operations in the graphimport tensorflow as tf a = tf.add(3,5)
a
?print(a)
Create a session
, and within it, evaluate the graph
sess = tf.Session() print(sess.run(a)) sess.close()
Alternatively:
with tf.Session() as sess: print(sess.run(a))
tf.summary.FileWriter
serializes the graph into a format the TensorBoard can readtf.summary.FileWriter("logs", tf.get_default_graph()).close()
tensorboard --logdir=logs
Or in Jupyter:
!tensorboard --logdir=logs
Try to generate the following graph: \((x+y)^{xy}\) where \(x=2,y=3\)
Useful functions: tf.add
, tf.multiply
, tf.pow
x = 2 y = 3 op1 = tf.add(x, y) op2 = tf.multiply(x, y) op3 = tf.pow(op1, op2) with tf.Session() as sess: op3 = sess.run(op3)
W1 = tf.ones((2,2)) W2 = tf.Variable(tf.zeros((2,2)), name="weights") with tf.Session() as sess: print(sess.run(W1)) sess.run(tf.global_variables_initializer()) print(sess.run(W2))
To create a 3-dimensional variable with shape [1,2,3]
:
my_var = tf.get_variable("my_var", [1,2,3])
You may optionally specify the dtype
and initializer to tf.get_variable
:
my_int_variable = tf.get_variable("my_int_variable", [1, 2, 3], dtype=tf.int32, initializer=tf.zeros_initializer)
Can initialize a tf.Variable
to have the value of a tf.Tensor
:
other_variable = tf.get_variable("other_variable", dtype=tf.int32, initializer=tf.constant([23, 42]))
Use tf.assign
to assign a value to a variable
state = tf.Variable(0, name="counter") new_value = tf.add(state, tf.constant(1)) update = tf.assign(state, new_value) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) print(sess.run(state)) for _ in range(3): sess.run(update) print(sess.run(state))
input1 = tf.constant(3.0) input2 = tf.constant(2.0) input3 = tf.constant(5.0) intermed = tf.add(input2, input3) mul = tf.multiply(input1, intermed) with tf.Session() as sess: result = sess.run([mul, intermed]) print(result)
tf.placeholder
variables represent our input datafeed_dict
is a python dictionary that maps tf.placeholder
variables to datainput1 = tf.placeholder(tf.float32) input2 = tf.placeholder(tf.float32) output = tf.multiply(input1, input2) with tf.Session() as sess: print(sess.run([output], feed_dict={input1:[7.], input2:[2.]}))
import tensorflow as tf import numpy as np import seaborn import matplotlib.pyplot as plt # %matplotlib inline
\[L = \sum (y - \hat{y})^2\]
Given the following function, fit a linear model
\[y = x + 20 \sin(x/10)\]
# Define data size and batch size n_samples = 1000 batch_size = 100 # TensorFlow is particular about shapes, so resize X_data = np.reshape(X_data, (n_samples, 1)) y_data = np.reshape(y_data, (n_samples, 1)) # Define placeholders for input X = tf.placeholder(tf.float32, shape=(batch_size, 1)) y = tf.placeholder(tf.float32, shape=(batch_size, 1))
Loss function is defined as: \[J(W,b) = \frac{1}{N}\sum_{i=1}^{N}(y_i-(W_{x_i}+b))^2\]
# Define variables to be learned W = tf.get_variable("weights", (1,1), initializer = tf.random_normal_initializer()) b = tf.get_variable("bias", (1,), initializer = tf.constant_initializer(0.0)) y_pred = tf.matmul(X, W) + b loss = tf.reduce_sum((y - y_pred)**2/n_samples)
# Define optimizer operation opt_operation = tf.train.AdamOptimizer().minimize(loss) with tf.Session() as sess: # Initialize all variables in graph sess.run(tf.global_variables_initializer()) # Gradient descent for 500 steps: for _ in range(500): # Select from random mini batch indices = np.random.choice(n_samples, batch_size) X_batch, y_batch = X_data[indices], y_data[indices] # Do gradient descent step _, loss_val = sess.run([opt_operation, loss], feed_dict={X: X_batch, y: y_batch}) print(sess.run([W, b])) # Display results plt.scatter(X_data, y_data) plt.scatter(X_data, sess.run(W) * X_data + sess.run(b), c='g')
To download and read in the data automatically:
from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
One hot encoding
The MNIST data is split into three parts:
mnist.train
)mnist.test
)mnist.validation
)Every MNIST data has 2 parts:
img_size = 28 img_size_flat = img_size * img_size img_shape = (img_size, img_size) num_classes = 10
x = tf.placeholder(tf.float32, [None, img_size_flat]) y_true = tf.placeholder(tf.float32, [None, num_classes]) y_true_cls = tf.placeholder(tf.int64, [None])
x
is a placeholder
, value that we will input when we ask TensorFlow to run[None, 784]
None
means that x
can be of any lengthweights = tf.Variable(tf.zeros([img_size_flat, num_classes])) biases = tf.Variable(tf.zeros([num_classes]))
[784,10]
as we want to 784-dimensional image vectors
by weights
to produce 10-dimensional vectors of evidencex
with weight
and biases
[num_images, 10]
and W
has shape [784, 10]
.logits
is typical TensorFlow terminologylogits = tf.matmul(x, weights) + biases y_pred = tf.nn.softmax(logits) y_pred_cls = tf.argmax(y_pred, axis = 1)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_true) cost = tf.reduce_mean(cross_entropy) optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.5).minimize(cost) correct_prediction = tf.equal(y_pred_cls, y_true_cls) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
def optimize(num_iterations): for i in range(num_iterations): x_batch, y_true_batch = mnist.train.next_batch(batch_size) feed_dict_train = {x: x_batch, y_true: y_true_batch} session.run(optimizer, feed_dict=feed_dict_train)
Using small batches of random data is called stochastic training, it is more feasible than training on the entire data set
feed_dict_test = {x: mnist.test.images, y_true: mnist.test.labels, y_true_cls: mnist.test.cls} def print_accuracy(): # Use TensorFlow to compute the accuracy. acc = session.run(accuracy, feed_dict=feed_dict_test) # Print the accuracy. print("Accuracy on test-set: {0:.1%}".format(acc))
Approx 91% is very bad, 6 digit ZIP code would have an accuracy rate of 57%
Helper functions to create ReLU neurons
def weight_variable(shape): initial = tf.truncated_normal(shape, stddev=0.05) return tf.Variable(initial) def new_biases(length): return tf.Variable(tf.constant(0.05, shape=[length]))
Input is a 4-dim tensor:
Output is another 4-dim tensor:
def new_conv_layer(input, # The previous layer. num_input_channels, # Num. channels in prev. layer. filter_size, # Width and height of each filter. num_filters, # Number of filters. use_pooling=True): # Use 2x2 max-pooling. # ... return layer, weights
def flatten_layer(layer): # ... # return both the flatten layer and number of features return layer_flat, num_features
Assumed that input is a 2-dim tensor of shape [num_images, num_inputs]
, output is a 2-dim tensor of shape [num_images, num_outputs]
def new_fc_layer(input, # The previous layer. num_inputs, # Num. inputs from prev. layer. num_outputs, # Num. outputs. use_relu=True): # Use Rectified Linear Unit (ReLU)? # create new weights and biases # calculate new layer # use ReLU? return layer
x
is the placeholder variable for input images
float32
[None, img_size_flat]
x
to be encoded as a 4-dim tensor, so its shape
is [num_images, img_height, img_width, num_channels]
x = tf.placeholder(tf.float32, shape=[None, img_size_flat], name='x') x_image = tf.reshape(x, [-1, img_size, img_size, num_channels]) y_true = tf.placeholder(tf.float32, shape=[None, num_classes], name='y_true') y_true_cls = tf.argmax(y_true, axis=1)
x_image
as input and creates num_filters1
different filters
layer_conv1, weights_conv1 = \ new_conv_layer(input=x_image, num_input_channels=num_channels, filter_size=filter_size1, num_filters=num_filters1, use_pooling=True)
layer_conv2, weights_conv2 = \ new_conv_layer(input=layer_conv1, num_input_channels=num_filters1, filter_size=filter_size2, num_filters=num_filters2, use_pooling=True)
layer_flat, num_features = flatten_layer(layer_conv2)
layer_fc1 = new_fc_layer(input=layer_flat, num_inputs=num_features, num_outputs=fc_size, use_relu=True)
layer_fc2 = new_fc_layer(input=layer_fc1, num_inputs=fc_size, num_outputs=num_classes, use_relu=False)
y_pred = tf.nn.softmax(layer_fc2) y_pred_cls = tf.argmax(y_pred, axis=1) cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=layer_fc2, labels=y_true) cost = tf.reduce_mean(cross_entropy) optimizer = tf.train.AdamOptimizer(learning_rate=1e-4).minimize(cost) correct_prediction = tf.equal(y_pred_cls, y_true_cls) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.train.Saver
to save the graph and the trained weightsmodel_path = "./tmp/model.ckpt" save_path = saver.save(sess, model_path) # saver is not declared??? print("Model saved in file: %s" % save_path)
saver = tf.train.Saver() model_path = "./tmp/model.ckpt" with tf.Session() as sess: sess.run(tf.global_variables_initializer()) saver.restore(sess, model_path) print("Accuracy:", accuracy.eval({x: mnist.test.images, y_: mnist.test.labels}))
Thank You