NVIDIA Transfer Learning Toolkit

From RidgeRun Developer Connection
Jump to: navigation, search

Nvidia-preferred-partner-badge-rgb-for-screen.png

RR Contact Us.png

Introduction to NVIDIA Transfer Learning Toolkit

This guide shows you how to train a model in the toolkit and how to deploy it to DeepStream.

Configuring Docker

Note: If installing the image from JetPack this step is not necessary

Troubleshooting configuring the Docker

  • Error: docker: Error response from daemon: could not select device driver "" with capabilities: gpu.
    • Solution: NVIDIA drivers might need to be reinstalled. Follow the instructions in the collabnix blog on New Docker CLI API Support for NVIDIA GPUs to reinstall them. You might need to restart the device before connecting.
  • Error: ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
    • Solution: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.X/compat/

Downloading the container

  • Create an account in NVIDIA's NGC
  • Go to Account -> Setup and generate an API Key and follow the instructions for docker. You might need to run the following commands in order to correctly add the password:
sudo usermod -aG docker $USER
newgrp docker
docker pull nvcr.io/nvidia/tlt-streamanalytics:v1.0.1_py2

Running the container

  • Create a local directory to host the dataset and other outputs so that they persist outside the docker.
  mkdir -p ~/work/tlt-experiments
  cd ~/work/tlt-experiments
  • Run the container
  docker run --runtime=nvidia -it -v ~/tlt-experiments:/workspace/tlt-experiments nvcr.io/nvidia/tlt-streamanalytics:v1.0.1_py2
    • Watch the docker version if you are using a different version.
    • Docker might need to be reinstalled with the following commands to correctly use NVIDIA software:
sudo apt purge docker*
sudo apt install docker

Downloading a pre-trained model

List the available models:

ngc registry model list nvidia/iva/tlt_*_classification

Download a model:

ngc registry model download-version nvidia/iva/[model name] 

For example:

ngc registry model download-version nvidia/iva/tlt_mobilenet_v2_classification:1 

Training a model

tlt-dataset-convert -d convert_config.txt -o tfrecords/tfrecord

The following could be a good example:

dataset_config {
    data_sources: {
	tfrecords_path: "/workspace/tlt-experiments/tfrecords/*"
	image_directory_path: "/workspace/tlt-experiments/"
    }
    image_extension: "png"
    target_class_mapping {
	key: "car"
	value: "car"
    }
    target_class_mapping {
	key: "automobile"
	value: "car"
    }
    target_class_mapping {
	key: "heavy_truck"
	value: "car"
    }
    target_class_mapping {
	key: "person"
	value: "pedestrian"
    }
    target_class_mapping {
	key: "rider"
	value: "cyclist"
    }
    validation_fold: 0
}

model_config {
    arch: "resnet"
    pretrained_model_file: "pre-trained-models/tlt_resnet18_detectnet_v2_v1/resnet18.hdf5"
    freeze_blocks: 0
    freeze_blocks: 1
    all_projections: True
    num_layers: 18
    use_pooling: False
    use_batch_norm: True
    dropout_rate: 0.1
    training_precision: {
	backend_floatx: FLOAT32
    }
    objective_set: {
	cov {}
	bbox {
	    scale: 35.0
	    offset: 0.5
	}
    }
    training_precision {
	backend_floatx: FLOAT32
    }
}

evaluation_config {
    average_precision_mode: INTEGRATE
    validation_period_during_training: 10
    first_validation_epoch: 1
    minimum_detection_ground_truth_overlap {
	key: "car"
	value: 0.7
    }
    minimum_detection_ground_truth_overlap {
	key: "pedestrian"
	value: 0.5
    }
    minimum_detection_ground_truth_overlap {
	key: "cyclist"
	value: 0.5
    }
    evaluation_box_config {
	key: "car"
	value {
	    minimum_height: 4
	    maximum_height: 9999
	    minimum_width: 4
	    maximum_width: 9999
	}
    }
    evaluation_box_config {
	key: "pedestrian"
	value {
	    minimum_height: 4
	    maximum_height: 9999
	    minimum_width: 4
	    maximum_width: 9999
	}
    }
    evaluation_box_config {
	key: "cyclist"
	value {
	    minimum_height: 4
	    maximum_height: 9999
	    minimum_width: 4
	    maximum_width: 9999
	}
    }
}

bbox_rasterizer_config {
    target_class_config {
	key: "car"
	value: {
	    cov_center_x: 0.5
	    cov_center_y: 0.5
	    cov_radius_x: 0.4
	    cov_radius_y: 0.4
	    bbox_min_radius: 1.0
	}
    }
    target_class_config {
	key: "cyclist"
	value: {
	    cov_center_x: 0.5
	    cov_center_y: 0.5
	    cov_radius_x: 0.4
	    cov_radius_y: 0.4
	    bbox_min_radius: 1.0
	}
    }
    target_class_config {
	key: "pedestrian"
	value: {
	    cov_center_x: 0.5
	    cov_center_y: 0.5
	    cov_radius_x: 0.4
	    cov_radius_y: 0.4
	    bbox_min_radius: 1.0
	}
    }
    deadzone_radius: 0.67
}


postprocessing_config {
    target_class_config {
	key: "car"
	value: {
	    clustering_config {
		coverage_threshold: 0.005
		dbscan_eps: 0.15
		dbscan_min_samples: 0.05
		minimum_bounding_box_height: 20
	    }
	}
    }
    target_class_config {
	key: "cyclist"
	value: {
	    clustering_config {
		coverage_threshold: 0.005
		dbscan_eps: 0.15
		dbscan_min_samples: 0.05
		minimum_bounding_box_height: 20
	    }
	}
    }
    target_class_config {
	key: "pedestrian"
	value: {
	    clustering_config {
		coverage_threshold: 0.005
		dbscan_eps: 0.15
		dbscan_min_samples: 0.05
		minimum_bounding_box_height: 20
	    }
	}
    }
}


cost_function_config {
    target_classes {
	name: "car"
	class_weight: 1.0
	coverage_foreground_weight: 0.05
	objectives {
	    name: "cov"
	    initial_weight: 1.0
	    weight_target: 1.0
	}
	objectives {
	    name: "bbox"
	    initial_weight: 10.0
	    weight_target: 10.0
	}
    }
    target_classes {
	name: "cyclist"
	class_weight: 1.0
	coverage_foreground_weight: 0.05
	objectives {
	    name: "cov"
	    initial_weight: 1.0
	    weight_target: 1.0
	}
	objectives {
	    name: "bbox"
	    initial_weight: 10.0
	    weight_target: 1.0
	}
    }
    target_classes {
	name: "pedestrian"
	class_weight: 1.0
	coverage_foreground_weight: 0.05
	objectives {
	    name: "cov"
	    initial_weight: 1.0
	    weight_target: 1.0
	}
	objectives {
	    name: "bbox"
	    initial_weight: 10.0
	    weight_target: 10.0
	}
    }
    enable_autoweighting: True
    max_objective_weight: 0.9999
    min_objective_weight: 0.0001
}


training_config {
    batch_size_per_gpu: 4
    num_epochs: 240
    learning_rate {
	soft_start_annealing_schedule {
	    min_learning_rate: 5e-6
	    max_learning_rate: 5e-4
	    soft_start: 0.1
	    annealing: 0.7
	}
    }
    regularizer {
	type: L1
	weight: 3e-9
    }
    optimizer {
	adam {
	    epsilon: 1e-08
	    beta1: 0.9
	    beta2: 0.999
	}
    }
    cost_scaling {
	enabled: False
	initial_exponent: 20.0
	increment: 0.005
	decrement: 1.0
    }
    checkpoint_interval: 1
}

# Sample augementation config for 
augmentation_config {
    preprocessing {
	output_image_width: 960
	output_image_height: 544
	output_image_channel: 3
	min_bbox_width: 1.0
	min_bbox_height: 1.0
    }
    spatial_augmentation {

	hflip_probability: 0.5
	vflip_probability: 0.0
	zoom_min: 1.0
	zoom_max: 1.0
	translate_max_x: 8.0
	translate_max_y: 8.0
    }
    color_augmentation {
	color_shift_stddev: 0.0
	hue_rotation_max: 25.0
	saturation_shift_max: 0.2
	contrast_scale_max: 0.1
	contrast_center: 0.5
    }
}


# Sample evaluation config to run evaluation in integrate mode for the given 3 class model, 
# at every 10th epoch starting from the epoch 1.
evaluation_config {
    average_precision_mode: INTEGRATE
    validation_period_during_training: 10
    first_validation_epoch: 1
    minimum_detection_ground_truth_overlap {
	key: "car"
	value: 0.7
    }
    minimum_detection_ground_truth_overlap {
	key: "pedestrian"
	value: 0.5
    }
    minimum_detection_ground_truth_overlap {
	key: "cyclist"
	value: 0.5
    }
    evaluation_box_config {
	key: "car"
	value {
	    minimum_height: 10
	    maximum_height: 9999
	    minimum_width: 10
	    maximum_width: 9999
	}
    }
    evaluation_box_config {
	key: "pedestrian"
	value {
	    minimum_height: 10
	    maximum_height: 9999
	    minimum_width: 10
	    maximum_width: 9999
	}
    }
    evaluation_box_config {
	key: "cyclist"
	value {
	    minimum_height: 10
	    maximum_height: 9999
	    minimum_width: 10
	    maximum_width: 9999
	}
    }
}
  • Run the training:
tlt-train detectnet_v2 --gpus 1 -r results -e experiment_config.json -k key

Resuming Training

Evaluating the model

  • Run the evaluation:
tlt-evaluate detectnet_v2  -e experiment_config.json -k key -m $MODEL_FILE

This is the same process that runs every N epochs by setting the following parameters in the experiment config file

    validation_period_during_training: 10
    first_validation_epoch: 1

Running Inference

tlt-infer detectnet_v2 [-h] -m $MODEL_FILE -i $INPUT_IMAGE_DIR -o $OUTPUT_IMAGE_DIR
					   -bs $BATCH_SIZE -cp $CLUSTER_FILE -k key
					   -lw $LINE_WIDTH

The cluster params file should follow this structure:

{
    "dbscan_criterion": "IOU",
    "dbscan_eps": {
        "cyclist": 0.25,
        "pedestrian": 0.35,
        "default": 0.15,
        "car": 0.3
    },
    "dbscan_min_samples": {
        "cyclist": 0.05,
        "pedestrian": 0.05,
        "default": 0.0,
        "car": 0.05
    },
    "min_cov_to_cluster": {
        "cyclist": 0.005,
        "pedestrian": 0.005,
        "default": 0.005,
        "car": 0.005
    },
    "min_obj_height": {
        "cyclist": 4,
        "pedestrian": 4,
        "car": 4,
        "default": 2
    },
    "target_classes": ["car", "cyclist", "pedestrian"],
    "confidence_th": {
        "pedestrian": 0.6,
        "cyclist": 0.6,
        "car": 0.6
    },
    "confidence_model": {
        "car": { "kind": "aggregate_cov"},
        "pedestrian": { "kind": "aggregate_cov"},
        "cyclist": { "kind": "aggregate_cov"},
        "default": { "kind": "aggregate_cov"}
    },
    "output_map": {
        "car" : "car",
        "cyclist" : "cyclist",
        "pedestrian" : "pedestrian"
    },
    "color": {
        "car": "green",
        "cyclist": "magenta",
        "pedestrian": "cyan",
        "default": "blue"
    },
    "postproc_classes": ["car", "cyclist", "pedestrian"],
    "image_height": 384,
    "image_width": 1248,
    "stride": 16
}

Pruning the model

Not all weights in a network contribute equally to the accuracy. By pruning the network the less significant weight can be removed speeding up the network while only having a small impact in the network accuracy. In the sample ran with KITTI the number of parameters was reduced from 11,555,983 to 743,751 while only reducing the car precision from 73.0718 to 73.0707.

tlt-prune -pm $MODEL \
              -o $OUTPUT_DIRECTORY \
              -eq $EQUALIZATION_CRITERION \ # OPTIONS: arithmetic_mean, geometric_mean, union, intersection; useful for MobileNet and ResNet; default: union
              -pg $PRUNING_GRANULARITY \ # Optional
              -pth $PRUNING_THRESHOLD \ # Optional
              -nf $MIN_FILTERS_PER_LAYER \ # Optional: minimum number of filters to keep per layer
              -el $EXCLUDED_LAYERS \ # Optional: List separated by spaces, can be left empty
              -k $KEY

Note: NVIDIA recommends to change the threshold to keep the number of parameters in the model to within 10-20% of the original not-pruned model.

Re-training the model

In order to regain accuracy, NVIDIA recommends that you retrain this pruned model over the same dataset. To do this, use the tlt-train command as documented in Training the model, with an updated spec file that points to the newly pruned model as the pre-trained model file.

model_config {

    pretrained_model_file: prunned_model
    load_graph: true # Since prunning modifies the network the graph must be reloaded

For detectnet_v2, it is important that the user set the load_graph option under model_config to true to import the pruned graph. All the other parameters may be retained in the spec file from the previous training.

Deploying to DeepStream

Genearing INT8 calibration file

Running networks in INT8 mode to improve performance, but this requires a calibration cache at engine creation-time. The calibration cache is generated using a calibration tensor file, if tlt-export is run with the --data_type flag set to int8. Pre-generating the calibration information and caching it removes the need for calibrating the model on the inference machine. Moving the calibration cache is usually much more convenient than moving the calibration tensorfile, since it is a much smaller file and can be moved with the exported model. Using the calibration cache also speeds up engine creation as building the cache can take several minutes to generate depending on the size of the Tensorfile and the model itself. This can only be done for classification or detectnet_v2 models.

tlt-int8-tensorfile detectnet_v2 -e experiment_config.json -m 10 -o calibration.tensor

Exporting the model

tlt-export results/model.step-77844.tlt                     \
           -o resnet18_detector.etlt                        \
           --outputs output_cov/Sigmoid,output_bbox/BiasAdd \
           -k key                                           \
           --input_dims 3,512,512                           \
           --max_workspace_size 1100000                     \
           --export_module detectnet_v2                     \
           --cal_data_file calibration.tensor               \
           --data_type int8                                 \
           --batches 10                                     \
           --cal_cache_file calibration.bin

Deploying to DeepStream

For the Jetson platform, the tlt-converter for JetPack 4.2.2 and JetPack 4.2.3 / 4.3 is available to download in the dev zone. Once the tlt-converter is downloaded, please follow the instructions below to generate a TensorRT engine:

1. Install the open ssl package using the command: sudo apt-get install libssl-dev 2. Run the tlt-converter using the sample command below and generate the engine.

tlt-converter -k key                                    \
              -d 3,512,512                              \
              -o output_cov/Sigmoid,output_bbox/BiasAdd \
              -e resnet10_kitti_multiclass_v1.engine    \
              -m 16                                     \
              -t fp32                                   \
              resnet18_detector.etlt

Training a Classifier

This process is quite similar to training a detector (the example above), sample commands and configuration files are shown here.

Classification spec file:

model_config {

  # Model architecture can be chosen from:
  # ['resnet', 'vgg', 'googlenet', 'alexnet', 'mobilenet_v1', 'mobilenet_v2', 'squeezenet']

  arch: "squeezenet"

  # for resnet --> n_layers can be [10, 18, 50]
  # for vgg --> n_layers can be [16, 19]

  # n_layers: 18 # Only relevant for resnet and vgg
  use_bias: True
  use_batch_norm: True
  all_projections: True
  use_pooling: False
  freeze_bn: False 
  freeze_blocks: 0 # When using pretrained-weights not all layers need to be retrained
  freeze_blocks: 1
  freeze_blocks: 2
  freeze_blocks: 3
  freeze_blocks: 4
  freeze_blocks: 5
  freeze_blocks: 6

  # image size should be "3, X, Y", where X,Y >= 16
  input_image_size: "3,112,112"
}

eval_config {
  eval_dataset_path: "test"
  model_path: "results/weights/squeezenet_080.tlt" # Has to be specific here
  top_k: 3 # If the correct class is in the top 3 (in this case) a special statistic is reported
  batch_size: 256
  n_workers: 8

}

train_config {
  train_dataset_path: "train"
  val_dataset_path: "val"
  pretrained_model_path: "tlt_squeezenet_classification_v1/squeezenet.hdf5"
  # optimizer can be chosen from ['adam', 'sgd']

  optimizer: "sgd"
  batch_size_per_gpu: 16
  n_epochs: 80
  n_workers: 16

  # regularizer
  reg_config {
    type: "L2"
    scope: "Conv2D,Dense"
    weight_decay: 0.00005

  }

  # learning_rate

  lr_config {

    # "step" and "soft_anneal" are supported.

    scheduler: "soft_anneal"

    # "soft_anneal" stands for soft annealing learning rate scheduler.
    # the following 4 parameters should be specified if "soft_anneal" is used.
    learning_rate: 0.005
    soft_start: 0.056
    annealing_points: "0.3, 0.6, 0.8"
    annealing_divider: 10
    # "step" stands for step learning rate scheduler.
    # the following 3 parameters should be specified if "step" is used.
    # learning_rate: 0.006
    # step_size: 10
    # gamma: 0.1
  }
}

Dictory Structure:

├── test
│   ├── Abyssinian
│   ├── american_bulldog
│   ├── american_pit_bull_terrier
│   ...
├── train
│   ├── Abyssinian
│   ├── american_bulldog
│   ├── american_pit_bull_terrier
│   ...
├── val
│   ├── Abyssinian
│   ├── american_bulldog
│   ├── american_pit_bull_terrier
│   ...

Training command:

tlt-train classification --gpus 1 -k key -r results -e pets_classification.json

Evaluating command:

tlt-evaluate classification -e pets_classification.json -k key

Running Inference:

tlt-infer classification -m results/weights/squeezenet_018.tlt -i Beagle.jpg -k key -cm results/classmap.json


RidgeRun Resources

Quick Start Client Engagement Process RidgeRun Blog Homepage
Technical and Sales Support RidgeRun Online Store RidgeRun Videos Contact Us

OOjs UI icon message-progressive.svg Contact Us

Visit our Main Website for the RidgeRun Products and Online Store. RidgeRun Engineering informations are available in RidgeRun Professional Services, RidgeRun Subscription Model and Client Engagement Process wiki pages. Please email to support@ridgerun.com for technical questions and contactus@ridgerun.com for other queries. Contact details for sponsoring the RidgeRun GStreamer projects are available in Sponsor Projects page. Ridgerun-logo.svg
RR Contact Us.png