- Installation and Setup
- Machine Learning Primer
- What's TensorFlow?
- Basic TensorFlow concepts
- MNIST Example
- Softmax
- Convoluted Neural Networks
Ensure that you have the following installed:
Materials are available here
\[H(X) = -\sum_{i=1}^{N} p(x_i) \log p(x_i)\]
\[H(\text{fair dice roll}) = -\sum_{i=1}^6 \frac{1}{6} \log \frac{1}{6}=2.58\]
\[H(20/80 \text{ coin toss}) = -\frac{1}{5}\log \frac{1}{5}-\frac{4}{5}\log \frac{4}{5} = 0.72\]
Recall from linear algebra that:
All are tensors of n-order. Similary, tensors can be transformed with operations. TensorFlow provides library of algorithms to perform tensor operations efficiently.
Simple linear regression model:
\[w_o + w_1 x = \hat{y}\]
Can represent linear regression as a graph
\[a_i^{(2)} = g(w_{i0} + w_{i1}x_1 + w_{i2}x_2 + w_{i3}x_3)\]
For more complex models, it could be helpful to visualize your graph. TensorBoard provides this virtualization tool
\[g(u) = max(0, u)\]
\[S(x) = \frac{1}{1+e^{-x}}\]
\[S_j(\boldsymbol{z}) = \frac{e^{z_j}}{\sum_{k=1}^K e^{z_k}} \text{ for }j=1,\dots,k\]
\[\sum_{j=1}^k S_j(\boldsymbol{z}) = 1\]
Tensorflow separates definition of computations from their execution
Phases:
session to execute operations in the graphimport tensorflow as tf a = tf.add(3,5)
a?print(a)
Create a session, and within it, evaluate the graph
sess = tf.Session() print(sess.run(a)) sess.close()
Alternatively:
with tf.Session() as sess:
print(sess.run(a))
tf.summary.FileWriter serializes the graph into a format the TensorBoard can read
tf.summary.FileWriter("logs", tf.get_default_graph()).close()
tensorboard --logdir=logs
Or in Jupyter:
!tensorboard --logdir=logs
Try to generate the following graph: \((x+y)^{xy}\) where \(x=2,y=3\)
Useful functions: tf.add, tf.multiply, tf.pow
x = 2
y = 3
op1 = tf.add(x, y)
op2 = tf.multiply(x, y)
op3 = tf.pow(op1, op2)
with tf.Session() as sess:
op3 = sess.run(op3)
W1 = tf.ones((2,2))
W2 = tf.Variable(tf.zeros((2,2)), name="weights")
with tf.Session() as sess:
print(sess.run(W1))
sess.run(tf.global_variables_initializer())
print(sess.run(W2))
To create a 3-dimensional variable with shape [1,2,3]:
my_var = tf.get_variable("my_var", [1,2,3])
You may optionally specify the dtype and initializer to tf.get_variable:
my_int_variable = tf.get_variable("my_int_variable", [1, 2, 3],
dtype=tf.int32,
initializer=tf.zeros_initializer)
Can initialize a tf.Variable to have the value of a tf.Tensor:
other_variable = tf.get_variable("other_variable", dtype=tf.int32,
initializer=tf.constant([23, 42]))
Use tf.assign to assign a value to a variable
state = tf.Variable(0, name="counter")
new_value = tf.add(state, tf.constant(1))
update = tf.assign(state, new_value)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(state))
for _ in range(3):
sess.run(update)
print(sess.run(state))
input1 = tf.constant(3.0)
input2 = tf.constant(2.0)
input3 = tf.constant(5.0)
intermed = tf.add(input2, input3)
mul = tf.multiply(input1, intermed)
with tf.Session() as sess:
result = sess.run([mul, intermed])
print(result)
tf.placeholder variables represent our input datafeed_dict is a python dictionary that maps tf.placeholder variables to data
input1 = tf.placeholder(tf.float32)
input2 = tf.placeholder(tf.float32)
output = tf.multiply(input1, input2)
with tf.Session() as sess:
print(sess.run([output], feed_dict={input1:[7.], input2:[2.]}))
import tensorflow as tf import numpy as np import seaborn import matplotlib.pyplot as plt # %matplotlib inline
\[L = \sum (y - \hat{y})^2\]
Given the following function, fit a linear model
\[y = x + 20 \sin(x/10)\]
# Define data size and batch size n_samples = 1000 batch_size = 100 # TensorFlow is particular about shapes, so resize X_data = np.reshape(X_data, (n_samples, 1)) y_data = np.reshape(y_data, (n_samples, 1)) # Define placeholders for input X = tf.placeholder(tf.float32, shape=(batch_size, 1)) y = tf.placeholder(tf.float32, shape=(batch_size, 1))
Loss function is defined as: \[J(W,b) = \frac{1}{N}\sum_{i=1}^{N}(y_i-(W_{x_i}+b))^2\]
# Define variables to be learned
W = tf.get_variable("weights", (1,1),
initializer = tf.random_normal_initializer())
b = tf.get_variable("bias", (1,),
initializer = tf.constant_initializer(0.0))
y_pred = tf.matmul(X, W) + b
loss = tf.reduce_sum((y - y_pred)**2/n_samples)
# Define optimizer operation
opt_operation = tf.train.AdamOptimizer().minimize(loss)
with tf.Session() as sess:
# Initialize all variables in graph
sess.run(tf.global_variables_initializer())
# Gradient descent for 500 steps:
for _ in range(500):
# Select from random mini batch
indices = np.random.choice(n_samples, batch_size)
X_batch, y_batch = X_data[indices], y_data[indices]
# Do gradient descent step
_, loss_val = sess.run([opt_operation, loss],
feed_dict={X: X_batch, y: y_batch})
print(sess.run([W, b]))
# Display results
plt.scatter(X_data, y_data)
plt.scatter(X_data, sess.run(W) * X_data + sess.run(b), c='g')
To download and read in the data automatically:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
One hot encoding
The MNIST data is split into three parts:
mnist.train)mnist.test)mnist.validation)Every MNIST data has 2 parts:
img_size = 28 img_size_flat = img_size * img_size img_shape = (img_size, img_size) num_classes = 10
x = tf.placeholder(tf.float32, [None, img_size_flat]) y_true = tf.placeholder(tf.float32, [None, num_classes]) y_true_cls = tf.placeholder(tf.int64, [None])
x is a placeholder, value that we will input when we ask TensorFlow to run[None, 784]None means that x can be of any lengthweights = tf.Variable(tf.zeros([img_size_flat, num_classes])) biases = tf.Variable(tf.zeros([num_classes]))
[784,10] as we want to 784-dimensional image vectors
by weights to produce 10-dimensional vectors of evidencex with weight and biases[num_images, 10] and W has shape [784, 10].logits is typical TensorFlow terminologylogits = tf.matmul(x, weights) + biases y_pred = tf.nn.softmax(logits) y_pred_cls = tf.argmax(y_pred, axis = 1)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits,
labels=y_true)
cost = tf.reduce_mean(cross_entropy)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.5).minimize(cost)
correct_prediction = tf.equal(y_pred_cls, y_true_cls)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
def optimize(num_iterations):
for i in range(num_iterations):
x_batch, y_true_batch = mnist.train.next_batch(batch_size)
feed_dict_train = {x: x_batch,
y_true: y_true_batch}
session.run(optimizer, feed_dict=feed_dict_train)
Using small batches of random data is called stochastic training, it is more feasible than training on the entire data set
feed_dict_test = {x: mnist.test.images,
y_true: mnist.test.labels,
y_true_cls: mnist.test.cls}
def print_accuracy():
# Use TensorFlow to compute the accuracy.
acc = session.run(accuracy, feed_dict=feed_dict_test)
# Print the accuracy.
print("Accuracy on test-set: {0:.1%}".format(acc))
Approx 91% is very bad, 6 digit ZIP code would have an accuracy rate of 57%
Helper functions to create ReLU neurons
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.05)
return tf.Variable(initial)
def new_biases(length):
return tf.Variable(tf.constant(0.05, shape=[length]))
Input is a 4-dim tensor:
Output is another 4-dim tensor:
def new_conv_layer(input, # The previous layer.
num_input_channels, # Num. channels in prev. layer.
filter_size, # Width and height of each filter.
num_filters, # Number of filters.
use_pooling=True): # Use 2x2 max-pooling.
# ...
return layer, weights
def flatten_layer(layer):
# ...
# return both the flatten layer and number of features
return layer_flat, num_features
Assumed that input is a 2-dim tensor of shape [num_images, num_inputs], output is a 2-dim tensor of shape [num_images, num_outputs]
def new_fc_layer(input, # The previous layer.
num_inputs, # Num. inputs from prev. layer.
num_outputs, # Num. outputs.
use_relu=True): # Use Rectified Linear Unit (ReLU)?
# create new weights and biases
# calculate new layer
# use ReLU?
return layer
x is the placeholder variable for input images
float32[None, img_size_flat]x to be encoded as a 4-dim tensor, so its shape
is [num_images, img_height, img_width, num_channels]x = tf.placeholder(tf.float32, shape=[None, img_size_flat], name='x') x_image = tf.reshape(x, [-1, img_size, img_size, num_channels]) y_true = tf.placeholder(tf.float32, shape=[None, num_classes], name='y_true') y_true_cls = tf.argmax(y_true, axis=1)
x_image as input and creates num_filters1 different filters
layer_conv1, weights_conv1 = \
new_conv_layer(input=x_image,
num_input_channels=num_channels,
filter_size=filter_size1,
num_filters=num_filters1,
use_pooling=True)
layer_conv2, weights_conv2 = \
new_conv_layer(input=layer_conv1,
num_input_channels=num_filters1,
filter_size=filter_size2,
num_filters=num_filters2,
use_pooling=True)
layer_flat, num_features = flatten_layer(layer_conv2)
layer_fc1 = new_fc_layer(input=layer_flat,
num_inputs=num_features,
num_outputs=fc_size,
use_relu=True)
layer_fc2 = new_fc_layer(input=layer_fc1,
num_inputs=fc_size,
num_outputs=num_classes,
use_relu=False)
y_pred = tf.nn.softmax(layer_fc2)
y_pred_cls = tf.argmax(y_pred, axis=1)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=layer_fc2,
labels=y_true)
cost = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate=1e-4).minimize(cost)
correct_prediction = tf.equal(y_pred_cls, y_true_cls)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.train.Saver to save the graph and the trained weights
model_path = "./tmp/model.ckpt"
save_path = saver.save(sess, model_path) # saver is not declared???
print("Model saved in file: %s" % save_path)
saver = tf.train.Saver()
model_path = "./tmp/model.ckpt"
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver.restore(sess, model_path)
print("Accuracy:", accuracy.eval({x: mnist.test.images, y_: mnist.test.labels}))
Thank You