Agenda

  • Installation and Setup
  • Machine Learning Primer
  • What's TensorFlow?
  • Basic TensorFlow concepts
  • MNIST Example
    • Softmax
    • Convoluted Neural Networks

NUS Hackers

Installation and Setup

Machine Learning Primer

What is Machine Learning?

  • "The science of getting computers to act without being explicitly programmed" - Andrew Ng
  • Primary aim is to allow computers to learn automatically without human intervention or assistance and adjust actions accordingly

machine_learning.png

Using TensorFlow to sort cucumbers

  • Makoto Koike used deeping learning and TensorFlow to sort cucumbers by size, shape, color and other attributes

cucumber.png

Structure in data

  • some interpretations to "structure in data"
    • given some data, one can predict other data points with some confidence
    • one can compress the data, i.e., store the same amount of information, with less space
\begin{align*} A = \{1, 2, 6, 4, 7, 9, 0\} \\ B = \{1, 2, 1, 2, 1, 2, 1\} \end{align*}
  • we might say that \(A\) has apparent structure while \(B\) does not

Entropy

  • quantified as Entropy of Process

\[H(X) = -\sum_{i=1}^{N} p(x_i) \log p(x_i)\]

  • If entropy increases, uncertainty in prediction increases

Entropy (examples)

  • Example: fair dice

\[H(\text{fair dice roll}) = -\sum_{i=1}^6 \frac{1}{6} \log \frac{1}{6}=2.58\]

  • Example: biased 20:80 coin

\[H(20/80 \text{ coin toss}) = -\frac{1}{5}\log \frac{1}{5}-\frac{4}{5}\log \frac{4}{5} = 0.72\]

  • biased coin toss has lower entropy; predicting its outcome is easier than a fair dice

What are Tensors?

Recall from linear algebra that:

  • Scalar: an array in 0-D
  • Vector: an array in 1-D
  • Matrix: an array in 2-D

All are tensors of n-order. Similary, tensors can be transformed with operations. TensorFlow provides library of algorithms to perform tensor operations efficiently.

Example

Simple linear regression model:

\[w_o + w_1 x = \hat{y}\]

  • \(w_0\) and \(w_1\) are weights, that are determined during training
  • \(\hat{y}\) is the predicted outcome, to be compared with actual observations \(y\)
  • Goal: build a model that can find values of \(w_0\) and \(w_1\) that minimize prediction error

Graph Representation of ML Models

Can represent linear regression as a graph

linear_reg_graph.png

  • operations are represented as nodes
  • graph shows how data is transformed by nodes and what is passed between them

Graph Representation of ML Models (1)

neural_net.png

\[a_i^{(2)} = g(w_{i0} + w_{i1}x_1 + w_{i2}x_2 + w_{i3}x_3)\]

For more complex models, it could be helpful to visualize your graph. TensorBoard provides this virtualization tool

Activation Functions

  • A popular function is the rectified linear unit (ReLU):

\[g(u) = max(0, u)\]

relu.png

Gradient Descent

  • a way to minimize objective function
  • one takes steps proportional to the negative of the gradient of the function at the current point.

gradient_descent.png

Model Output

  • output depends on activation function used, but is generally any real number \([-\infty, \infty]\)
  • For binary classification, an additional sigmoid function can be applied to bring the output to range of \([0,1]\)

\[S(x) = \frac{1}{1+e^{-x}}\]

sigmoid.png

Softmax Function

  • for multi-class prediction a softmax function is used:

\[S_j(\boldsymbol{z}) = \frac{e^{z_j}}{\sum_{k=1}^K e^{z_k}} \text{ for }j=1,\dots,k\]

  • squash \(K\) dimensional vector z to a \(K\) dimensional vector that sum to 1

\[\sum_{j=1}^k S_j(\boldsymbol{z}) = 1\]

  • state usually represented with one-hot encoding, e.g for dice roll 3: \((0,0,1,0,0,0)\)

Basic TensorFlow Concepts

What is TensorFlow?

  • "TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms"
  • Originally developed Google Brain Team to conduct machine learning research and deep neural networks research
  • General enough to be applicable to a wide variety of other domains

Data Flow Graphs

Tensorflow separates definition of computations from their execution

Phases:

  1. assemble the graph
  2. use a session to execute operations in the graph
import tensorflow as tf
a = tf.add(3,5)

How to get value of a?

print(a)

Create a session, and within it, evaluate the graph

sess = tf.Session()
print(sess.run(a))
sess.close()

Alternatively:

with tf.Session() as sess:
    print(sess.run(a))

Visualizing with TensorBoard

  • tf.summary.FileWriter serializes the graph into a format the TensorBoard can read
tf.summary.FileWriter("logs", tf.get_default_graph()).close()
  • in the same directory, run:
tensorboard --logdir=logs

Or in Jupyter:

!tensorboard --logdir=logs

Practice with More Graphs

Try to generate the following graph: \((x+y)^{xy}\) where \(x=2,y=3\)

graph2.png

Useful functions: tf.add, tf.multiply, tf.pow

Solution

x = 2
y = 3
op1 = tf.add(x, y)
op2 = tf.multiply(x, y)
op3 = tf.pow(op1, op2)
with tf.Session() as sess:
    op3 = sess.run(op3)

TensorFlow Variables

  • TensorFlow variables used to represent shared, persistant state manipulated by your program
  • Variables hold and update parameters in your model during training
  • Variables contain tensors
  • Variables must be initialized unless it is a constant
W1 = tf.ones((2,2))
W2 = tf.Variable(tf.zeros((2,2)), name="weights")

with tf.Session() as sess:
    print(sess.run(W1))
    sess.run(tf.global_variables_initializer())
    print(sess.run(W2))

Creating Variables

To create a 3-dimensional variable with shape [1,2,3]:

my_var = tf.get_variable("my_var", [1,2,3])

You may optionally specify the dtype and initializer to tf.get_variable:

my_int_variable = tf.get_variable("my_int_variable", [1, 2, 3],
                                  dtype=tf.int32,
                                  initializer=tf.zeros_initializer)

Can initialize a tf.Variable to have the value of a tf.Tensor:

other_variable = tf.get_variable("other_variable", dtype=tf.int32, 
  initializer=tf.constant([23, 42]))

Updating Variable State

Use tf.assign to assign a value to a variable

state = tf.Variable(0, name="counter")
new_value = tf.add(state, tf.constant(1))
update = tf.assign(state, new_value)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(state))
    for _ in range(3):
        sess.run(update)
        print(sess.run(state))

Fetching Variable State

input1 = tf.constant(3.0)
input2 = tf.constant(2.0)
input3 = tf.constant(5.0)
intermed = tf.add(input2, input3)
mul = tf.multiply(input1, intermed)

with tf.Session() as sess:
    result = sess.run([mul, intermed])
    print(result)

TensorFlow Placeholders

  • tf.placeholder variables represent our input data
  • feed_dict is a python dictionary that maps tf.placeholder variables to data
input1 = tf.placeholder(tf.float32)
input2 = tf.placeholder(tf.float32)

output = tf.multiply(input1, input2)

with tf.Session() as sess:
    print(sess.run([output], feed_dict={input1:[7.], input2:[2.]}))

Example: Linear Regression

Imports

import tensorflow as tf
import numpy as np
import seaborn
import matplotlib.pyplot as plt
# %matplotlib inline

Recap

  • we have two weights \(w_0\) and \(w_1\), we want the model to figure out good weights by minimizing prediction error
  • define the following loss function

\[L = \sum (y - \hat{y})^2\]

Given the following function, fit a linear model

\[y = x + 20 \sin(x/10)\]

Scatter Plot

sample_data.png

Define Variables and Placeholders

# Define data size and batch size
n_samples = 1000
batch_size = 100

# TensorFlow is particular about shapes, so resize
X_data = np.reshape(X_data, (n_samples, 1))
y_data = np.reshape(y_data, (n_samples, 1))

# Define placeholders for input
X = tf.placeholder(tf.float32, shape=(batch_size, 1))
y = tf.placeholder(tf.float32, shape=(batch_size, 1))

Loss Function

Loss function is defined as: \[J(W,b) = \frac{1}{N}\sum_{i=1}^{N}(y_i-(W_{x_i}+b))^2\]

# Define variables to be learned
W = tf.get_variable("weights", (1,1),
                    initializer = tf.random_normal_initializer())
b = tf.get_variable("bias", (1,),
                    initializer = tf.constant_initializer(0.0))
y_pred = tf.matmul(X, W) + b
loss = tf.reduce_sum((y - y_pred)**2/n_samples)

Define Optimizer and Train Model

# Define optimizer operation
opt_operation = tf.train.AdamOptimizer().minimize(loss)
with tf.Session() as sess:
    # Initialize all variables in graph
    sess.run(tf.global_variables_initializer())
    # Gradient descent for 500 steps:
    for _ in range(500):
        # Select from random mini batch
        indices = np.random.choice(n_samples, batch_size)
        X_batch, y_batch = X_data[indices], y_data[indices]
        # Do gradient descent step
        _, loss_val = sess.run([opt_operation, loss],
                               feed_dict={X: X_batch, y: y_batch})
    print(sess.run([W, b]))
    # Display results
    plt.scatter(X_data, y_data)
    plt.scatter(X_data, sess.run(W) * X_data + sess.run(b), c='g')

Results

trained_model.png

MNIST and TensorFlow

Introduction

  • MNIST is the hello world of machine learning
  • Simple computer vision dataset, consists of images of handwritten digits
  • We are going to train a model to predict what the digits are

MNIST.png

Importing MNIST Data

To download and read in the data automatically:

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

One hot encoding

  • labels have been converted to a vector of length equal to number of classes.
  • the ith element is 1, rest are 0. E.g. Digit 1: \([0,1,\dots]\)

MNIST Data

The MNIST data is split into three parts:

  1. 55,000 data points of training data (mnist.train)
  2. 10,000 data points of test data (mnist.test)
  3. 5,000 data points of validation data (mnist.validation)

Every MNIST data has 2 parts:

  1. an image of a handwritten digit (call it "x")
  2. corresponding label (call it "y")

Softmax Regression

Overview

softmax_1.png

Overview (1)

softmax_2.png

softmax_3.png

Data Dimensions

img_size = 28
img_size_flat = img_size * img_size
img_shape = (img_size, img_size)
num_classes = 10

mnist_7.png

Defining Our Model

x = tf.placeholder(tf.float32, [None, img_size_flat])
y_true = tf.placeholder(tf.float32, [None, num_classes])
y_true_cls = tf.placeholder(tf.int64, [None])
  • x is a placeholder, value that we will input when we ask TensorFlow to run
  • represent MNIST image as a 2-D tensor of floating numbers of shape [None, 784]
  • None means that x can be of any length

Variables to be Optimized

weights = tf.Variable(tf.zeros([img_size_flat, num_classes]))
biases = tf.Variable(tf.zeros([num_classes]))
  • weights has a shape of [784,10] as we want to 784-dimensional image vectors by weights to produce 10-dimensional vectors of evidence
  • biases has a shape of [10] as we can add it to the output.

Model

  • multiples the images in the placeholder variable x with weight and biases
  • Result is a matrix of shape [num_images, 10] and W has shape [784, 10].
  • logits is typical TensorFlow terminology
logits = tf.matmul(x, weights) + biases
y_pred = tf.nn.softmax(logits)
y_pred_cls = tf.argmax(y_pred, axis = 1)

Optimization Method

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits,
                                                        labels=y_true)
cost = tf.reduce_mean(cross_entropy)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.5).minimize(cost)
correct_prediction = tf.equal(y_pred_cls, y_true_cls)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

TensorFlow Run

def optimize(num_iterations):
    for i in range(num_iterations):
        x_batch, y_true_batch = mnist.train.next_batch(batch_size)
        feed_dict_train = {x: x_batch,
                           y_true: y_true_batch}
        session.run(optimizer, feed_dict=feed_dict_train)

Using small batches of random data is called stochastic training, it is more feasible than training on the entire data set

Evaluating Our Model

feed_dict_test = {x: mnist.test.images,
                  y_true: mnist.test.labels,
                  y_true_cls: mnist.test.cls}

def print_accuracy():
    # Use TensorFlow to compute the accuracy.
    acc = session.run(accuracy, feed_dict=feed_dict_test)

    # Print the accuracy.
    print("Accuracy on test-set: {0:.1%}".format(acc))

Approx 91% is very bad, 6 digit ZIP code would have an accuracy rate of 57%

Convolutional Neural Network

Flowchart

cnn_network_flowchart.png

Introduction

  • Convolutional Networks work by moving smaller filter across the input image
  • Filters are re-used for recognizing patters throughout the entire input image
  • This makes Convolutional Networks much more powerful than Fully-Connected networks with the same number of variables

Features

features.png

Features (1)

features_2.png

Convolution

convolution.png

Convolution (1)

convolution_2.png

Convolution (2)

convolution_3.png

Pooling

pooling.png

Pooling (1)

pooling_2.png

Fully Connected Layers

layers.png

Hyper Parameters

  • Convolution:
    • Number of features
    • Size of features
  • Pooling
    • Window size
    • Window stride
  • Fully Connected
    • number of neurons

Weight Initialization

Helper functions to create ReLU neurons

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.05)
  return tf.Variable(initial)

def new_biases(length):
    return tf.Variable(tf.constant(0.05, shape=[length]))

Creating a new Convolutional Layer

Input is a 4-dim tensor:

  1. image number
  2. y-axis of each image
  3. x-axis of each image
  4. channels of each image

Output is another 4-dim tensor:

  1. image number, same as input
  2. y-axis of each image, might be smaller if pooling is used
  3. x-axis of each image, might be smaller if pooling is used
  4. channels produced by the convolutional filters

Helper Function for Creating a New Layer

def new_conv_layer(input,              # The previous layer.
                   num_input_channels, # Num. channels in prev. layer.
                   filter_size,        # Width and height of each filter.
                   num_filters,        # Number of filters.
                   use_pooling=True):  # Use 2x2 max-pooling.
    # ...

    return layer, weights

Flattening a Layer

  • convolutional layer produces an output tensor with 4 dimensions
  • fully connected layer will reduce 4-dim tensor to a 2-dim tensor that can be used as input to the fully connected layer
def flatten_layer(layer):
    # ...

    # return both the flatten layer and number of features
    return layer_flat, num_features

Creating a Fully-Connected Layer

Assumed that input is a 2-dim tensor of shape [num_images, num_inputs], output is a 2-dim tensor of shape [num_images, num_outputs]

def new_fc_layer(input,          # The previous layer.
                 num_inputs,     # Num. inputs from prev. layer.
                 num_outputs,    # Num. outputs.
                 use_relu=True): # Use Rectified Linear Unit (ReLU)?
    # create new weights and biases
    # calculate new layer
    # use ReLU?

    return layer

Placeholder Variables

  • x is the placeholder variable for input images
    • data-type is set to float32
    • shape is set to [None, img_size_flat]
  • convolutional layers expect x to be encoded as a 4-dim tensor, so its shape is [num_images, img_height, img_width, num_channels]
  • also have placeholder for true labels
x = tf.placeholder(tf.float32, shape=[None, img_size_flat], name='x')
x_image = tf.reshape(x, [-1, img_size, img_size, num_channels])
y_true = tf.placeholder(tf.float32, shape=[None, num_classes], name='y_true')
y_true_cls = tf.argmax(y_true, axis=1)

First Convolutional Layer

  • takes x_image as input and creates num_filters1 different filters
    • each filter has width and height equal to filtersize1=
  • down sample the image so its half the size by using max-pooling
layer_conv1, weights_conv1 = \
    new_conv_layer(input=x_image,
                   num_input_channels=num_channels,
                   filter_size=filter_size1,
                   num_filters=num_filters1,
                   use_pooling=True)

Second Convolutional Layer

  • takes as input the output from the first convolutional layer
  • number of iunput channels = number of filters in the first convolutional layer
layer_conv2, weights_conv2 = \
    new_conv_layer(input=layer_conv1,
                   num_input_channels=num_filters1,
                   filter_size=filter_size2,
                   num_filters=num_filters2,
                   use_pooling=True)

Flatten Layer

  • use output of convolutional layer as input to a fully-connected network, which requires for the tensors to be reshaped to a 2-dim tensors
layer_flat, num_features = flatten_layer(layer_conv2)

Fully-Connected Layer 1

layer_fc1 = new_fc_layer(input=layer_flat,
                         num_inputs=num_features,
                         num_outputs=fc_size,
                         use_relu=True)

Fully-Connected Layer 2

layer_fc2 = new_fc_layer(input=layer_fc1,
                         num_inputs=fc_size,
                         num_outputs=num_classes,
                         use_relu=False)

Cost Function and Optimization Method

y_pred = tf.nn.softmax(layer_fc2)
y_pred_cls = tf.argmax(y_pred, axis=1)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=layer_fc2,
                                                        labels=y_true)
cost = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate=1e-4).minimize(cost)
correct_prediction = tf.equal(y_pred_cls, y_true_cls)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Saving and Restoring your model

Exporting the Model

  • We can export the model for use in our own applications
  • use tf.train.Saver to save the graph and the trained weights
model_path = "./tmp/model.ckpt"
save_path = saver.save(sess, model_path) # saver is not declared???
print("Model saved in file: %s" % save_path)

Restoring the Session

saver = tf.train.Saver()
model_path = "./tmp/model.ckpt"
with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  saver.restore(sess, model_path)
  print("Accuracy:", accuracy.eval({x: mnist.test.images, y_: mnist.test.labels}))

Toy Program

toy_program.png

References

Thank You