NVIDIA Transfer Learning Toolkit

1 Introduction to NVIDIA Transfer Learning Toolkit
2 Configuring Docker
- 2.1 Troubleshooting configuring the Docker
3 Downloading the container
4 Running the container
5 Downloading a pre-trained model
6 Training a model
- 6.1 Resuming Training
7 Evaluating the model
8 Running Inference
9 Pruning the model
- 9.1 Re-training the model
10 Deploying to DeepStream
- 10.1 Genearing INT8 calibration file
11 Exporting the model
12 Deploying to DeepStream
13 Training a Classifier
14 Contact Us

Introduction to NVIDIA Transfer Learning Toolkit

This guide shows you how to train a model in the toolkit and how to deploy it to DeepStream.

Configuring Docker

Note: If installing the image from JetPack this step is not necessary

Instructions to have docker with GPU support.

Instructions to add support for NVIDIA to docker.

Commands to enable the NVIDIA runtime

Troubleshooting configuring the Docker

Error: docker: Error response from daemon: could not select device driver "" with capabilities: gpu.

- Solution: NVIDIA drivers might need to be reinstalled. Follow the instructions in the collabnix blog on New Docker CLI API Support for NVIDIA GPUs to reinstall them. You might need to restart the device before connecting.

Error: ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

- Solution: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.X/compat/

Downloading the container

Create an account in NVIDIA's NGC
Go to Account -> Setup and generate an API Key and follow the instructions for docker. You might need to run the following commands in order to correctly add the password:

sudo usermod -aG docker $USER
newgrp docker

Pull the image (according to the TLT guide):

docker pull nvcr.io/nvidia/tlt-streamanalytics:v1.0.1_py2

Running the container

Create a local directory to host the dataset and other outputs so that they persist outside the docker.

  mkdir -p ~/work/tlt-experiments
  cd ~/work/tlt-experiments

Run the container

  docker run --runtime=nvidia -it -v ~/tlt-experiments:/workspace/tlt-experiments nvcr.io/nvidia/tlt-streamanalytics:v1.0.1_py2

- Watch the docker version if you are using a different version.
- Docker might need to be reinstalled with the following commands to correctly use NVIDIA software:

sudo apt purge docker*
sudo apt install docker

Downloading a pre-trained model

List the available models:

ngc registry model list nvidia/iva/tlt_*_classification

Download a model:

ngc registry model download-version nvidia/iva/[model name]

For example:

ngc registry model download-version nvidia/iva/tlt_mobilenet_v2_classification:1

Training a model

Convert your dataset to match the expected value as defined in TLT IVA Getting Started Guide - Creating an experiment spec file.

tlt-dataset-convert -d convert_config.txt -o tfrecords/tfrecord

Add the experiment configuration file as exposed in TLT IVA Getting Started Guide - Preparing input data structure.

The following could be a good example:

dataset_config {
    data_sources: {
	tfrecords_path: "/workspace/tlt-experiments/tfrecords/*"
	image_directory_path: "/workspace/tlt-experiments/"
    }
    image_extension: "png"
    target_class_mapping {
	key: "car"
	value: "car"
    }
    target_class_mapping {
	key: "automobile"
	value: "car"
    }
    target_class_mapping {
	key: "heavy_truck"
	value: "car"
    }
    target_class_mapping {
	key: "person"
	value: "pedestrian"
    }
    target_class_mapping {
	key: "rider"
	value: "cyclist"
    }
    validation_fold: 0
}

model_config {
    arch: "resnet"
    pretrained_model_file: "pre-trained-models/tlt_resnet18_detectnet_v2_v1/resnet18.hdf5"
    freeze_blocks: 0
    freeze_blocks: 1
    all_projections: True
    num_layers: 18
    use_pooling: False
    use_batch_norm: True
    dropout_rate: 0.1
    training_precision: {
	backend_floatx: FLOAT32
    }
    objective_set: {
	cov {}
	bbox {
	    scale: 35.0
	    offset: 0.5
	}
    }
    training_precision {
	backend_floatx: FLOAT32
    }
}

evaluation_config {
    average_precision_mode: INTEGRATE
    validation_period_during_training: 10
    first_validation_epoch: 1
    minimum_detection_ground_truth_overlap {
	key: "car"
	value: 0.7
    }
    minimum_detection_ground_truth_overlap {
	key: "pedestrian"
	value: 0.5
    }
    minimum_detection_ground_truth_overlap {
	key: "cyclist"
	value: 0.5
    }
    evaluation_box_config {
	key: "car"
	value {
	    minimum_height: 4
	    maximum_height: 9999
	    minimum_width: 4
	    maximum_width: 9999
	}
    }
    evaluation_box_config {
	key: "pedestrian"
	value {
	    minimum_height: 4
	    maximum_height: 9999
	    minimum_width: 4
	    maximum_width: 9999
	}
    }
    evaluation_box_config {
	key: "cyclist"
	value {
	    minimum_height: 4
	    maximum_height: 9999
	    minimum_width: 4
	    maximum_width: 9999
	}
    }
}

bbox_rasterizer_config {
    target_class_config {
	key: "car"
	value: {
	    cov_center_x: 0.5
	    cov_center_y: 0.5
	    cov_radius_x: 0.4
	    cov_radius_y: 0.4
	    bbox_min_radius: 1.0
	}
    }
    target_class_config {
	key: "cyclist"
	value: {
	    cov_center_x: 0.5
	    cov_center_y: 0.5
	    cov_radius_x: 0.4
	    cov_radius_y: 0.4
	    bbox_min_radius: 1.0
	}
    }
    target_class_config {
	key: "pedestrian"
	value: {
	    cov_center_x: 0.5
	    cov_center_y: 0.5
	    cov_radius_x: 0.4
	    cov_radius_y: 0.4
	    bbox_min_radius: 1.0
	}
    }
    deadzone_radius: 0.67
}


postprocessing_config {
    target_class_config {
	key: "car"
	value: {
	    clustering_config {
		coverage_threshold: 0.005
		dbscan_eps: 0.15
		dbscan_min_samples: 0.05
		minimum_bounding_box_height: 20
	    }
	}
    }
    target_class_config {
	key: "cyclist"
	value: {
	    clustering_config {
		coverage_threshold: 0.005
		dbscan_eps: 0.15
		dbscan_min_samples: 0.05
		minimum_bounding_box_height: 20
	    }
	}
    }
    target_class_config {
	key: "pedestrian"
	value: {
	    clustering_config {
		coverage_threshold: 0.005
		dbscan_eps: 0.15
		dbscan_min_samples: 0.05
		minimum_bounding_box_height: 20
	    }
	}
    }
}


cost_function_config {
    target_classes {
	name: "car"
	class_weight: 1.0
	coverage_foreground_weight: 0.05
	objectives {
	    name: "cov"
	    initial_weight: 1.0
	    weight_target: 1.0
	}
	objectives {
	    name: "bbox"
	    initial_weight: 10.0
	    weight_target: 10.0
	}
    }
    target_classes {
	name: "cyclist"
	class_weight: 1.0
	coverage_foreground_weight: 0.05
	objectives {
	    name: "cov"
	    initial_weight: 1.0
	    weight_target: 1.0
	}
	objectives {
	    name: "bbox"
	    initial_weight: 10.0
	    weight_target: 1.0
	}
    }
    target_classes {
	name: "pedestrian"
	class_weight: 1.0
	coverage_foreground_weight: 0.05
	objectives {
	    name: "cov"
	    initial_weight: 1.0
	    weight_target: 1.0
	}
	objectives {
	    name: "bbox"
	    initial_weight: 10.0
	    weight_target: 10.0
	}
    }
    enable_autoweighting: True
    max_objective_weight: 0.9999
    min_objective_weight: 0.0001
}


training_config {
    batch_size_per_gpu: 4
    num_epochs: 240
    learning_rate {
	soft_start_annealing_schedule {
	    min_learning_rate: 5e-6
	    max_learning_rate: 5e-4
	    soft_start: 0.1
	    annealing: 0.7
	}
    }
    regularizer {
	type: L1
	weight: 3e-9
    }
    optimizer {
	adam {
	    epsilon: 1e-08
	    beta1: 0.9
	    beta2: 0.999
	}
    }
    cost_scaling {
	enabled: False
	initial_exponent: 20.0
	increment: 0.005
	decrement: 1.0
    }
    checkpoint_interval: 1
}

# Sample augementation config for 
augmentation_config {
    preprocessing {
	output_image_width: 960
	output_image_height: 544
	output_image_channel: 3
	min_bbox_width: 1.0
	min_bbox_height: 1.0
    }
    spatial_augmentation {

	hflip_probability: 0.5
	vflip_probability: 0.0
	zoom_min: 1.0
	zoom_max: 1.0
	translate_max_x: 8.0
	translate_max_y: 8.0
    }
    color_augmentation {
	color_shift_stddev: 0.0
	hue_rotation_max: 25.0
	saturation_shift_max: 0.2
	contrast_scale_max: 0.1
	contrast_center: 0.5
    }
}


# Sample evaluation config to run evaluation in integrate mode for the given 3 class model, 
# at every 10th epoch starting from the epoch 1.
evaluation_config {
    average_precision_mode: INTEGRATE
    validation_period_during_training: 10
    first_validation_epoch: 1
    minimum_detection_ground_truth_overlap {
	key: "car"
	value: 0.7
    }
    minimum_detection_ground_truth_overlap {
	key: "pedestrian"
	value: 0.5
    }
    minimum_detection_ground_truth_overlap {
	key: "cyclist"
	value: 0.5
    }
    evaluation_box_config {
	key: "car"
	value {
	    minimum_height: 10
	    maximum_height: 9999
	    minimum_width: 10
	    maximum_width: 9999
	}
    }
    evaluation_box_config {
	key: "pedestrian"
	value {
	    minimum_height: 10
	    maximum_height: 9999
	    minimum_width: 10
	    maximum_width: 9999
	}
    }
    evaluation_box_config {
	key: "cyclist"
	value {
	    minimum_height: 10
	    maximum_height: 9999
	    minimum_width: 10
	    maximum_width: 9999
	}
    }
}

Run the training:

tlt-train detectnet_v2 --gpus 1 -r results -e experiment_config.json -k key

Resuming Training

Evaluating the model

Run the evaluation:

tlt-evaluate detectnet_v2  -e experiment_config.json -k key -m $MODEL_FILE

This is the same process that runs every N epochs by setting the following parameters in the experiment config file

    validation_period_during_training: 10
    first_validation_epoch: 1

Running Inference

tlt-infer detectnet_v2 [-h] -m $MODEL_FILE -i $INPUT_IMAGE_DIR -o $OUTPUT_IMAGE_DIR
					   -bs $BATCH_SIZE -cp $CLUSTER_FILE -k key
					   -lw $LINE_WIDTH

The cluster params file should follow this structure:

{
    "dbscan_criterion": "IOU",
    "dbscan_eps": {
        "cyclist": 0.25,
        "pedestrian": 0.35,
        "default": 0.15,
        "car": 0.3
    },
    "dbscan_min_samples": {
        "cyclist": 0.05,
        "pedestrian": 0.05,
        "default": 0.0,
        "car": 0.05
    },
    "min_cov_to_cluster": {
        "cyclist": 0.005,
        "pedestrian": 0.005,
        "default": 0.005,
        "car": 0.005
    },
    "min_obj_height": {
        "cyclist": 4,
        "pedestrian": 4,
        "car": 4,
        "default": 2
    },
    "target_classes": ["car", "cyclist", "pedestrian"],
    "confidence_th": {
        "pedestrian": 0.6,
        "cyclist": 0.6,
        "car": 0.6
    },
    "confidence_model": {
        "car": { "kind": "aggregate_cov"},
        "pedestrian": { "kind": "aggregate_cov"},
        "cyclist": { "kind": "aggregate_cov"},
        "default": { "kind": "aggregate_cov"}
    },
    "output_map": {
        "car" : "car",
        "cyclist" : "cyclist",
        "pedestrian" : "pedestrian"
    },
    "color": {
        "car": "green",
        "cyclist": "magenta",
        "pedestrian": "cyan",
        "default": "blue"
    },
    "postproc_classes": ["car", "cyclist", "pedestrian"],
    "image_height": 384,
    "image_width": 1248,
    "stride": 16
}

Pruning the model

Not all weights in a network contribute equally to the accuracy. By pruning the network the less significant weight can be removed speeding up the network while only having a small impact in the network accuracy. In the sample ran with KITTI the number of parameters was reduced from 11,555,983 to 743,751 while only reducing the car precision from 73.0718 to 73.0707.

tlt-prune -pm $MODEL \
              -o $OUTPUT_DIRECTORY \
              -eq $EQUALIZATION_CRITERION \ # OPTIONS: arithmetic_mean, geometric_mean, union, intersection; useful for MobileNet and ResNet; default: union
              -pg $PRUNING_GRANULARITY \ # Optional
              -pth $PRUNING_THRESHOLD \ # Optional
              -nf $MIN_FILTERS_PER_LAYER \ # Optional: minimum number of filters to keep per layer
              -el $EXCLUDED_LAYERS \ # Optional: List separated by spaces, can be left empty
              -k $KEY

Note: NVIDIA recommends to change the threshold to keep the number of parameters in the model to within 10-20% of the original not-pruned model.

Re-training the model

In order to regain accuracy, NVIDIA recommends that you retrain this pruned model over the same dataset. To do this, use the tlt-train command as documented in Training the model, with an updated spec file that points to the newly pruned model as the pre-trained model file.

model_config {

    pretrained_model_file: prunned_model
    load_graph: true # Since prunning modifies the network the graph must be reloaded

For detectnet_v2, it is important that the user set the load_graph option under model_config to true to import the pruned graph. All the other parameters may be retained in the spec file from the previous training.

Deploying to DeepStream

Genearing INT8 calibration file

Running networks in INT8 mode to improve performance, but this requires a calibration cache at engine creation-time. The calibration cache is generated using a calibration tensor file, if tlt-export is run with the --data_type flag set to int8. Pre-generating the calibration information and caching it removes the need for calibrating the model on the inference machine. Moving the calibration cache is usually much more convenient than moving the calibration tensorfile, since it is a much smaller file and can be moved with the exported model. Using the calibration cache also speeds up engine creation as building the cache can take several minutes to generate depending on the size of the Tensorfile and the model itself. This can only be done for classification or detectnet_v2 models.

tlt-int8-tensorfile detectnet_v2 -e experiment_config.json -m 10 -o calibration.tensor

Exporting the model

tlt-export results/model.step-77844.tlt                     \
           -o resnet18_detector.etlt                        \
           --outputs output_cov/Sigmoid,output_bbox/BiasAdd \
           -k key                                           \
           --input_dims 3,512,512                           \
           --max_workspace_size 1100000                     \
           --export_module detectnet_v2                     \
           --cal_data_file calibration.tensor               \
           --data_type int8                                 \
           --batches 10                                     \
           --cal_cache_file calibration.bin

Deploying to DeepStream

For the Jetson platform, the tlt-converter for JetPack 4.2.2 and JetPack 4.2.3 / 4.3 is available to download in the dev zone. Once the tlt-converter is downloaded, please follow the instructions below to generate a TensorRT engine:

1. Install the open ssl package using the command: sudo apt-get install libssl-dev 2. Run the tlt-converter using the sample command below and generate the engine.

tlt-converter -k key                                    \
              -d 3,512,512                              \
              -o output_cov/Sigmoid,output_bbox/BiasAdd \
              -e resnet10_kitti_multiclass_v1.engine    \
              -m 16                                     \
              -t fp32                                   \
              resnet18_detector.etlt

Training a Classifier

This process is quite similar to training a detector (the example above), sample commands and configuration files are shown here.

Classification spec file:

model_config {

  # Model architecture can be chosen from:
  # ['resnet', 'vgg', 'googlenet', 'alexnet', 'mobilenet_v1', 'mobilenet_v2', 'squeezenet']

  arch: "squeezenet"

  # for resnet --> n_layers can be [10, 18, 50]
  # for vgg --> n_layers can be [16, 19]

  # n_layers: 18 # Only relevant for resnet and vgg
  use_bias: True
  use_batch_norm: True
  all_projections: True
  use_pooling: False
  freeze_bn: False 
  freeze_blocks: 0 # When using pretrained-weights not all layers need to be retrained
  freeze_blocks: 1
  freeze_blocks: 2
  freeze_blocks: 3
  freeze_blocks: 4
  freeze_blocks: 5
  freeze_blocks: 6

  # image size should be "3, X, Y", where X,Y >= 16
  input_image_size: "3,112,112"
}

eval_config {
  eval_dataset_path: "test"
  model_path: "results/weights/squeezenet_080.tlt" # Has to be specific here
  top_k: 3 # If the correct class is in the top 3 (in this case) a special statistic is reported
  batch_size: 256
  n_workers: 8

}

train_config {
  train_dataset_path: "train"
  val_dataset_path: "val"
  pretrained_model_path: "tlt_squeezenet_classification_v1/squeezenet.hdf5"
  # optimizer can be chosen from ['adam', 'sgd']

  optimizer: "sgd"
  batch_size_per_gpu: 16
  n_epochs: 80
  n_workers: 16

  # regularizer
  reg_config {
    type: "L2"
    scope: "Conv2D,Dense"
    weight_decay: 0.00005

  }

  # learning_rate

  lr_config {

    # "step" and "soft_anneal" are supported.

    scheduler: "soft_anneal"

    # "soft_anneal" stands for soft annealing learning rate scheduler.
    # the following 4 parameters should be specified if "soft_anneal" is used.
    learning_rate: 0.005
    soft_start: 0.056
    annealing_points: "0.3, 0.6, 0.8"
    annealing_divider: 10
    # "step" stands for step learning rate scheduler.
    # the following 3 parameters should be specified if "step" is used.
    # learning_rate: 0.006
    # step_size: 10
    # gamma: 0.1
  }
}

Dictory Structure:

├── test
│   ├── Abyssinian
│   ├── american_bulldog
│   ├── american_pit_bull_terrier
│   ...
├── train
│   ├── Abyssinian
│   ├── american_bulldog
│   ├── american_pit_bull_terrier
│   ...
├── val
│   ├── Abyssinian
│   ├── american_bulldog
│   ├── american_pit_bull_terrier
│   ...

Training command:

tlt-train classification --gpus 1 -k key -r results -e pets_classification.json

Evaluating command:

tlt-evaluate classification -e pets_classification.json -k key

Running Inference:

tlt-infer classification -m results/weights/squeezenet_018.tlt -i Beagle.jpg -k key -cm results/classmap.json

RidgeRun Resources

Quick Start

Client Engagement Process

RidgeRun Blog

Homepage

Technical and Sales Support

RidgeRun Online Store

RidgeRun Videos

Contact Us

Contact Us

Visit our Main Website for the RidgeRun Products and Online Store. RidgeRun Engineering informations are available in RidgeRun Professional Services, RidgeRun Subscription Model and Client Engagement Process wiki pages. Please email to support@ridgerun.com for technical questions and contactus@ridgerun.com for other queries. Contact details for sponsoring the RidgeRun GStreamer projects are available in Sponsor Projects page.

NVIDIA Transfer Learning Toolkit

Contents

Introduction to NVIDIA Transfer Learning Toolkit

Configuring Docker

Troubleshooting configuring the Docker

Downloading the container

Running the container

Downloading a pre-trained model

Training a model

Resuming Training

Evaluating the model

Running Inference

Pruning the model

Re-training the model

Deploying to DeepStream

Genearing INT8 calibration file

Exporting the model

Deploying to DeepStream

Training a Classifier

Contact Us

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Useful Links

Legal

Services

Tools