Difference between revisions of "Xavier/Deep Learning/TensorRT/Parsing Tensorflow"

From RidgeRun Developer Connection
Jump to: navigation, search
(Created page with "<noinclude> {{Xavier/Head}} </noinclude> {{DISPLAYTITLE:NVIDIA Jetson Xavier - Parsing a Tensorflow model for TensorRT|noerror}} TensorRT can also be used on previously gener...")
 
m
 
(13 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
<noinclude>
 
<noinclude>
{{Xavier/Head}}
+
{{Xavier/Head|previous=Deep Learning/TensorRT/Tensorflow|next=Deep Learning‎/TensorRT/Parsing Caffe|metakeywords=TensorRT,Parsing,Tensorflow,Tensorflow Model,uff,inference}}
 
</noinclude>
 
</noinclude>
 +
 
{{DISPLAYTITLE:NVIDIA Jetson Xavier - Parsing a Tensorflow model for TensorRT|noerror}}
 
{{DISPLAYTITLE:NVIDIA Jetson Xavier - Parsing a Tensorflow model for TensorRT|noerror}}
  
TensorRT can also be used on previously generated Tensorflow models to allow for faster inference times. This is a more common case of deployment, where the convolutional neural network is trained on a host with more resources, and then transfered to and embedded system for inference.  
+
__TOC__
 +
<br>
 +
TensorRT can also be used on previously generated Tensorflow models to allow for faster inference times. This is a more common case of deployment, where the convolutional neural network is trained on a host with more resources, and then transferred to an embedded system for inference.  
  
 
At the end of this guide you will be able to:
 
At the end of this guide you will be able to:
Line 12: Line 15:
  
 
To follow this guide you will need:
 
To follow this guide you will need:
 +
* Jetson Xavier with JetPack 4.1
  
* Jetson Xavier with JetPack 4.1
+
And a host development computer with:
And a host with:
 
 
* Tensorflow  
 
* Tensorflow  
 
* TensorRT
 
* TensorRT
 
* A trained Tensorflow model
 
* A trained Tensorflow model
  
= Step 1: Install Prerequisites =
+
== Step 1: Install Prerequisites ==
  
 
#[[Xavier/JetPack_4.1/Getting_Started/Installing_Jetpack|Install JetPack]]
 
#[[Xavier/JetPack_4.1/Getting_Started/Installing_Jetpack|Install JetPack]]
 
#[https://www.tensorflow.org/install/ Install Tensorflow on host]
 
#[https://www.tensorflow.org/install/ Install Tensorflow on host]
#Follow the instructions to [[Xavier/JetPack_4.1/Components/TensorRT#Step_3:_Train_a_model|train a model]] on the host. You can also get other trained model of your choosing.  
+
#Follow the instructions to [[Xavier/Deep_Learning/TensorRT/Tensorflow#Step_3: Train_a_model|train a model]] on the host. You can also get other trained model of your choosing.  
 
# Install TensorRT and its tools on the host computer. First, download the .deb package from [https://developer.nvidia.com/nvidia-tensorrt-download nvidia download page] and install TensorRT:
 
# Install TensorRT and its tools on the host computer. First, download the .deb package from [https://developer.nvidia.com/nvidia-tensorrt-download nvidia download page] and install TensorRT:
<syntaxhighlight language=bash>
+
<syntaxhighlight lang=bash>
 
sudo dpkg -i  <your-deb-package>
 
sudo dpkg -i  <your-deb-package>
 
sudo apt-get update
 
sudo apt-get update
Line 33: Line 36:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
= Step 3: Generate the UFF =
+
== Step 3: Generate the UFF ==
 
+
The new unified format for neural networks is called as UFF. <br>
There are two ways to generate a TensorRT engine from tensorflow. If you are using tensorflow to train you own model you can add uff to the python training script and [[Xavier/JetPack_4.1/Components/TensorRT#Tensorflow_Modelstream_to_UFF|generate the uff from the model stream]]. If you already trained the model or downloaded a trained model checkpoint or frozen graph you can [[Xavier/JetPack_4.1/Components/TensorRT#Tensorflow_Frozen_Protobuf_Model_to_UFF|convert a frozen protobuf to uff]]
+
There are two ways to generate a TensorRT engine from TensorFlow. If you are using TensorFlow to train your own model you can add uff to the python training script and [[Xavier/Deep_Learning/TensorRT/Parsing_Tensorflow#Tensorflow_Modelstream_to_UFF|generate the uff from the model stream]]. If you already trained the model or downloaded a trained model checkpoint or frozen graph you can [[Xavier/Deep_Learning/TensorRT/Parsing_Tensorflow#Tensorflow_saved_session_to_UFF|convert a frozen protobuf to uff]]
  
== Tensorflow Modelstream to UFF ==
+
=== Tensorflow Modelstream to UFF ===
 
This step is done on the host computer. The UFF Toolkit installed on the previous step allows you to convert TensorFlow models to UFF. The UFF parser can build TensorRT engines from these UFF models.  
 
This step is done on the host computer. The UFF Toolkit installed on the previous step allows you to convert TensorFlow models to UFF. The UFF parser can build TensorRT engines from these UFF models.  
  
 
You will need the following includes:
 
You will need the following includes:
 
<syntaxhighlight lang=python>
 
<syntaxhighlight lang=python>
import tensorflow as tf #there is a know bug where TensorFlow needs to be imported before TensorRT
+
import TensorFlow as tf #there is a known bug where TensorFlow needs to be imported before TensorRT
 
import uff # to convert the graph from a serialized frozen TensorFlow model to UFF.
 
import uff # to convert the graph from a serialized frozen TensorFlow model to UFF.
 
import numpy as np
 
import numpy as np
Line 216: Line 219:
 
The important part on the above code is this one:
 
The important part on the above code is this one:
  
<syntaxhighlight languaje=python>
+
<syntaxhighlight lang=python>
 
graphdef = tf.get_default_graph().as_graph_def()
 
graphdef = tf.get_default_graph().as_graph_def()
 
frozen_graph = tf.graph_util.convert_variables_to_constants(sess,graphdef, OUTPUT_NAMES)
 
frozen_graph = tf.graph_util.convert_variables_to_constants(sess,graphdef, OUTPUT_NAMES)
Line 224: Line 227:
 
In these lines we generate the fozen graph from the session and the graph def and remove the training nodes. Now we load the TensorFlow MNIST data loader and run training. The model has summaries included, so you can visualize training in TensorBoard:
 
In these lines we generate the fozen graph from the session and the graph def and remove the training nodes. Now we load the TensorFlow MNIST data loader and run training. The model has summaries included, so you can visualize training in TensorBoard:
  
<syntaxhighlight languaje=python>
+
<syntaxhighlight lang=python>
 
MNIST_DATASETS = tf.contrib.learn.datasets.load_dataset("mnist")
 
MNIST_DATASETS = tf.contrib.learn.datasets.load_dataset("mnist")
 
tf_model = run_training(MNIST_DATASETS)
 
tf_model = run_training(MNIST_DATASETS)
Line 231: Line 234:
 
Finally to generate the UFF run:
 
Finally to generate the UFF run:
  
<syntaxhighlight languaje=python>
+
<syntaxhighlight lang=python>
 
uff_model = uff.from_tensorflow(tf_model, ["fc2/Relu"], output_filename="saved_model.uff")
 
uff_model = uff.from_tensorflow(tf_model, ["fc2/Relu"], output_filename="saved_model.uff")
 
</syntaxhighlight>
 
</syntaxhighlight>
Line 249: Line 252:
 
*list_nodes=[true|false]: To list the nodes on the graph
 
*list_nodes=[true|false]: To list the nodes on the graph
  
== Tensorflow saved session to UFF ==
+
=== Tensorflow saved session to UFF ===
  
For this section you will need a saved Tensorflow checkpoint folder. First we need load the checkpoint file. The tf.train.Saver object saves and restores variables to/from checkpoint files. Note that loading a checkpoint generated with a different Tensorflow version will result on errors. Sadly, to be able to load a tensorflow checkpoint you nedd to know the output node name. A trick to get this from an unknown model is to load it in tensorboard:
+
For this section, you will need a saved Tensorflow checkpoint folder. First, we need to load the checkpoint file. The tf.train.Saver object saves and restores variables to/from checkpoint files. Note that loading a checkpoint generated with a different Tensorflow version will result on errors. Sadly, to be able to load a TensorFlow checkpoint you need to know the output node name. A trick to get this from an unknown model is to load it in tensorboard:
 
<syntaxhighlight lang=bash>
 
<syntaxhighlight lang=bash>
 
tensorboard --logdir=route/to/checkpoint/dir
 
tensorboard --logdir=route/to/checkpoint/dir
Line 289: Line 292:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
This python code opens the checkpoint folder a generate the file <code>"model.uff"<code>.
+
This python code opens the checkpoint folder a generate the file <code>"model.uff"</code>.
Note that not all models on tensorflow can be saved as a uff. For example, ResNet can't be conerted to UFF because it uses the ArgMax layer and it isn't supported right now on uff.
+
Note that not all models on TensorFlow can be saved as a uff. For example, ResNet can't be converted to UFF because it uses the ArgMax layer and it isn't supported right now on uff.
  
= Step 4: Load the uff file and perform inference =
+
== Step 4: Load the uff file and perform inference ==
Up to this point everything was running on the host computer, however, the engine should be created on the actual platform (Xavier) because TensorRT runs device-specific profiling during the optimization phase. Since Python API isn't supported on Xavier at this time, the uff must be loaded with the C++ API instead.
+
Up to this point, everything was running on the host computer, however, the engine should be created on the actual platform (Xavier) because TensorRT runs device-specific profiling during the optimization phase. Since Python API isn't supported on Xavier at this time, the uff must be loaded with the C++ API instead.
  
Loading the uff is an actual example provided by NVIDIA with TensorRT naned <code>sample_uff_mnist</code>. For more details on this example please refer to the [[Xavier/JetPack_4.1/Components/TensorRT#C.2B.2B_API|C++ API]] section.
+
Loading the uff is an actual example provided by NVIDIA with TensorRT naned <code>sample_uff_mnist</code>. For more details on this example please refer to the [[Xavier/Deep_Learning/TensorRT/Building_Examples#C.2B.2B_API|C++ API]] section.
  
 
<noinclude>
 
<noinclude>
{{Xavier/Foot|<Replace with "previous" page>|<Replace with "next" page>}}
+
{{Xavier/Foot|Deep Learning/TensorRT/Tensorflow|Deep Learning‎/TensorRT/Parsing Caffe}}
 
</noinclude>
 
</noinclude>

Latest revision as of 12:55, 13 February 2023



Previous: Deep Learning/TensorRT/Tensorflow Index Next: Deep Learning‎/TensorRT/Parsing Caffe



Nvidia-preferred-partner-badge-rgb-for-screen.png





TensorRT can also be used on previously generated Tensorflow models to allow for faster inference times. This is a more common case of deployment, where the convolutional neural network is trained on a host with more resources, and then transferred to an embedded system for inference.

At the end of this guide you will be able to:

  • Convert a TensorFlow SavedModel to a Frozen Graph.
  • Load a Frozen Graph for inference.
  • Run a TensorRT inference engine on Xavier.

To follow this guide you will need:

  • Jetson Xavier with JetPack 4.1

And a host development computer with:

  • Tensorflow
  • TensorRT
  • A trained Tensorflow model

Step 1: Install Prerequisites

  1. Install JetPack
  2. Install Tensorflow on host
  3. Follow the instructions to train a model on the host. You can also get other trained model of your choosing.
  4. Install TensorRT and its tools on the host computer. First, download the .deb package from nvidia download page and install TensorRT:
sudo dpkg -i  <your-deb-package>
sudo apt-get update
sudo apt-get install tensorrt
sudo apt-get install python-libnvinfer-dev
sudo apt-get install uff-converter-tf

Step 3: Generate the UFF

The new unified format for neural networks is called as UFF.
There are two ways to generate a TensorRT engine from TensorFlow. If you are using TensorFlow to train your own model you can add uff to the python training script and generate the uff from the model stream. If you already trained the model or downloaded a trained model checkpoint or frozen graph you can convert a frozen protobuf to uff

Tensorflow Modelstream to UFF

This step is done on the host computer. The UFF Toolkit installed on the previous step allows you to convert TensorFlow models to UFF. The UFF parser can build TensorRT engines from these UFF models.

You will need the following includes:

import TensorFlow as tf #there is a known bug where TensorFlow needs to be imported before TensorRT
import uff # to convert the graph from a serialized frozen TensorFlow model to UFF.
import numpy as np
import time
import os

Create your model, for this example we are using LeNet5 model to classify handwritten digits:

STARTER_LEARNING_RATE = 1e-4
BATCH_SIZE = 10
NUM_CLASSES = 10
MAX_STEPS = 3000
IMAGE_SIZE = 28
IMAGE_PIXELS = IMAGE_SIZE ** 2
OUTPUT_NAMES = ["fc2/Relu"]

def WeightsVariable(shape):
    return tf.Variable(tf.truncated_normal(shape, stddev=0.1, name='weights'))

def BiasVariable(shape):
    return tf.Variable(tf.constant(0.1, shape=shape, name='biases'))

def Conv2d(x, W, b, strides=1):
    # Conv2D wrapper, with bias and relu activation
    filter_size = W.get_shape().as_list()
    pad_size = filter_size[0]//2
    pad_mat = np.array([[0,0],[pad_size,pad_size],[pad_size,pad_size],[0,0]])
    x = tf.pad(x, pad_mat)
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='VALID')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)

def MaxPool2x2(x, k=2):
    # MaxPool2D wrapper
    pad_size = k//2
    pad_mat = np.array([[0,0],[pad_size,pad_size],[pad_size,pad_size],[0,0]])
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='VALID')

def network(images):
    # Convolution 1
    with tf.name_scope('conv1'):
        weights = WeightsVariable([5,5,1,32])
        biases = BiasVariable([32])
        conv1 = tf.nn.relu(Conv2d(images, weights, biases))
        pool1 = MaxPool2x2(conv1)
    # Convolution 2
    with tf.name_scope('conv2'):
        weights = WeightsVariable([5,5,32,64])
        biases = BiasVariable([64])
        conv2 = tf.nn.relu(Conv2d(pool1, weights, biases))
        pool2 = MaxPool2x2(conv2)
        pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])
    # Fully Connected 1
    with tf.name_scope('fc1'):
        weights = WeightsVariable([7 * 7 * 64, 1024])
        biases = BiasVariable([1024])
        fc1 = tf.nn.relu(tf.matmul(pool2_flat, weights) + biases)
    # Fully Connected 2
    with tf.name_scope('fc2'):
        weights = WeightsVariable([1024, 10])
        biases = BiasVariable([10])
        fc2 = tf.nn.relu(tf.matmul(fc1, weights) + biases)
    return fc2

def loss_metrics(logits, labels):
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,
                                                                   logits=logits,
                                                                   name='softmax')
    return tf.reduce_mean(cross_entropy, name='softmax_mean')


def training(loss):
    tf.summary.scalar('loss', loss)
    global_step = tf.Variable(0, name='global_step', trainable=False)
    learning_rate = tf.train.exponential_decay(STARTER_LEARNING_RATE,
                                               global_step,
                                               100000,
                                               0.75,
                                               staircase=True)
    tf.summary.scalar('learning_rate', learning_rate)
    optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9)
    train_op = optimizer.minimize(loss, global_step=global_step)
    return train_op

def evaluation(logits, labels):
    correct = tf.nn.in_top_k(logits, labels, 1)
    return tf.reduce_sum(tf.cast(correct, tf.int32))

def do_eval(sess,
            eval_correct,
            images_placeholder,
            labels_placeholder,
            data_set,
            summary):
    true_count = 0
    steps_per_epoch = data_set.num_examples // BATCH_SIZE
    num_examples = steps_per_epoch * BATCH_SIZE
    for step in range(steps_per_epoch):
        feed_dict = fill_feed_dict(data_set,
                                   images_placeholder,
                                   labels_placeholder)
        log, correctness = sess.run([summary, eval_correct], feed_dict=feed_dict)
        true_count += correctness
    precision = float(true_count) / num_examples
    tf.summary.scalar('precision', tf.constant(precision))
    print('Num examples %d, Num Correct: %d Precision @ 1: %0.04f' %
          (num_examples, true_count, precision))
    return log

def placeholder_inputs(batch_size):
    images_placeholder = tf.placeholder(tf.float32, shape=(None, 28, 28, 1))
    labels_placeholder = tf.placeholder(tf.int32, shape=(None))
    return images_placeholder, labels_placeholder

def fill_feed_dict(data_set, images_pl, labels_pl):
    images_feed, labels_feed = data_set.next_batch(BATCH_SIZE)
    feed_dict = {
        images_pl: np.reshape(images_feed, (-1,28,28,1)),
        labels_pl: labels_feed,
    }
    return feed_dict

def run_training(data_sets):
    with tf.Graph().as_default():
        images_placeholder, labels_placeholder = placeholder_inputs(BATCH_SIZE)
        logits = network(images_placeholder)
        loss = loss_metrics(logits, labels_placeholder)
        train_op = training(loss)
        eval_correct = evaluation(logits, labels_placeholder)
        summary = tf.summary.merge_all()
        init = tf.global_variables_initializer()
        saver = tf.train.Saver()
        gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)
        sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
        summary_writer = tf.summary.FileWriter("/tmp/tensorflow/mnist/log",
                                               graph=tf.get_default_graph())
        test_writer = tf.summary.FileWriter("/tmp/tensorflow/mnist/log/validation",
                                            graph=tf.get_default_graph())
        sess.run(init)
        for step in range(MAX_STEPS):
            start_time = time.time()
            feed_dict = fill_feed_dict(data_sets.train,
                                       images_placeholder,
                                       labels_placeholder)
            _, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)
            duration = time.time() - start_time
            if step % 100 == 0:
                print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration))
                summary_str = sess.run(summary, feed_dict=feed_dict)
                summary_writer.add_summary(summary_str, step)
                summary_writer.flush()
            if (step + 1) % 1000 == 0 or (step + 1) == MAX_STEPS:
                checkpoint_file = os.path.join("/tmp/tensorflow/mnist/log", "model.ckpt")
                saver.save(sess, checkpoint_file, global_step=step)
                print('Validation Data Eval:')
                log = do_eval(sess,
                              eval_correct,
                              images_placeholder,
                              labels_placeholder,
                              data_sets.validation,
                              summary)
                test_writer.add_summary(log, step)
        # Return sess
        graphdef = tf.get_default_graph().as_graph_def()
        frozen_graph = tf.graph_util.convert_variables_to_constants(sess,
                                                                    graphdef,
                                                                    OUTPUT_NAMES)
        return tf.graph_util.remove_training_nodes(frozen_graph)

The important part on the above code is this one:

graphdef = tf.get_default_graph().as_graph_def()
frozen_graph = tf.graph_util.convert_variables_to_constants(sess,graphdef, OUTPUT_NAMES)
return tf.graph_util.remove_training_nodes(frozen_graph)

In these lines we generate the fozen graph from the session and the graph def and remove the training nodes. Now we load the TensorFlow MNIST data loader and run training. The model has summaries included, so you can visualize training in TensorBoard:

MNIST_DATASETS = tf.contrib.learn.datasets.load_dataset("mnist")
tf_model = run_training(MNIST_DATASETS)

Finally to generate the UFF run:

uff_model = uff.from_tensorflow(tf_model, ["fc2/Relu"], output_filename="saved_model.uff")

This function call should output something like this:

Using output node fc2/Relu
Converting to UFF graph
No. nodes: 28
UFF Output written to model.uff

The function uff.from_tensorflow also has the following parameters:

  • quiet=[true|false]: To suppress logging
  • input_nodes=[...]: To allow you to define a set of input nodes in the graph
  • text=[true|false]: To save a human readable version of UFF model
  • list_nodes=[true|false]: To list the nodes on the graph

Tensorflow saved session to UFF

For this section, you will need a saved Tensorflow checkpoint folder. First, we need to load the checkpoint file. The tf.train.Saver object saves and restores variables to/from checkpoint files. Note that loading a checkpoint generated with a different Tensorflow version will result on errors. Sadly, to be able to load a TensorFlow checkpoint you need to know the output node name. A trick to get this from an unknown model is to load it in tensorboard:

tensorboard --logdir=route/to/checkpoint/dir

You will get a message similar to this:

TensorBoard 1.10.0 at http://mtaylor-laptop:6006 (Press CTRL+C to quit)

Open that address in your browser, go to graph and analyze the graph to determine the output node name. In this example the output node name is ArgMax because it's input is the resnet_model/final_dense signal.

Resnet output node

Add the following code to a python file:

import tensorflow as tf
from tensorflow.contrib import * # this include isn't used but solves some common tf errors
import uff # to convert the graph from a serialized frozen TensorFlow model to UFF.

#Load checkpoint
checkpoint = tf.train.get_checkpoint_state("route/to/checkpoint/folder") #Get all checkpoint names present on the given folder
input_checkpoint = checkpoint.model_checkpoint_path

#Devices should be cleared to allow Tensorflow to control placement of graph when loading on different machines
saver = tf.train.import_meta_graph(input_checkpoint + '.meta',  clear_devices=True)

#Get the graph_def
graph = tf.get_default_graph()
input_graph_def = graph.as_graph_def()

#output names array
output_nodes_names = ['YOUR','MODEL', 'OUTPUT', 'NODES']

with tf.Session(graph=graph) as sess:
  saver.restore(sess, input_checkpoint)
  frozen_graph = tf.graph_util.convert_variables_to_constants(sess, input_graph_def, output_nodes_names)
  frozen_graph = tf.graph_util.remove_training_nodes(frozen_graph)
  uff_model = uff.from_tensorflow(frozen_graph, output_nodes_names, output_filename="model.uff")

This python code opens the checkpoint folder a generate the file "model.uff". Note that not all models on TensorFlow can be saved as a uff. For example, ResNet can't be converted to UFF because it uses the ArgMax layer and it isn't supported right now on uff.

Step 4: Load the uff file and perform inference

Up to this point, everything was running on the host computer, however, the engine should be created on the actual platform (Xavier) because TensorRT runs device-specific profiling during the optimization phase. Since Python API isn't supported on Xavier at this time, the uff must be loaded with the C++ API instead.

Loading the uff is an actual example provided by NVIDIA with TensorRT naned sample_uff_mnist. For more details on this example please refer to the C++ API section.



Previous: Deep Learning/TensorRT/Tensorflow Index Next: Deep Learning‎/TensorRT/Parsing Caffe