GstInference - Supported backends - NCSDK
Make sure you also check GstInference's companion project: R2Inference |
The NCSDK Intel® Movidius™ Neural Compute SDK (Intel® Movidius™ NCSDK) enables deployment of deep neural networks on compatible devices such as the Intel® Movidius™ Neural Compute Stick. The NCSDK includes a set of software tools to compile, profile, and validate DNNs (Deep Neural Networks) as well as APIs on C/C++ and Python for application development.
The NCSDK has two general usages:
- Profiling, tuning, and compiling a DNN models.
- Prototyping user applications, that run accelerated with a neural compute device hardware, using the NCAPI.
Contents
Installation
You can install the NCSDK on a system running Linux directly, downloading a Docker container, on a virtual machine or using a Python virtual environment. All the possible installation paths are documented on the official installation guide.
We also provide an installation guide with troubleshooting on the Intel Movidius Installation wiki page
Tools
mvNCCheck
Checks the validity of a Caffe or TensorFlow model on a neural compute device. The check is done by running an inference on both the device and in software and then comparing the results to determine a if the network passes or fails. This tool works best with image classification networks. You can check all the available options on the official documentation.
For example lets test the googlenet caffe model downloaded by the ncappzoo repo:
mvNCCheck -w bvlc_googlenet.caffemodel -i ../../data/images/nps_electric_guitar.png -s 12 -id 546 deploy.prototxt -S 255 -M 110
- -w indicates the weights file
- -i the input image
- -s the number of shaves
- -id the expected label id for the input image (you can find the id for any imagenet model here)
- -S is the scaling sice
- -M is the substracted mean after scaling
Most of these parameters are available from the model documentation. The command produces the following result:
lob generated
USB: Transferring Data...
USB: Myriad Execution Finished
USB: Myriad Connection Closing.
USB: Myriad Connection Closed.
Result: (1000,)
1) 546 0.99609
2) 402 0.0038853
3) 420 8.9228e-05
4) 327 0.0
5) 339 0.0
Expected: (1000,)
1) 546 0.99609
2) 402 0.0039177
3) 420 9.0837e-05
4) 889 1.2875e-05
5) 486 5.3644e-06
------------------------------------------------------------
Obtained values
------------------------------------------------------------
Obtained Min Pixel Accuracy: 0.0032552085031056777% (max allowed=2%), Pass
Obtained Average Pixel Accuracy: 7.264380030846951e-06% (max allowed=1%), Pass
Obtained Percentage of wrong values: 0.0% (max allowed=0%), Pass
Obtained Pixel-wise L2 error: 0.00011369892179413199% (max allowed=1%), Pass
Obtained Global Sum Difference: 7.236003875732422e-05
------------------------------------------------------------
mvNCCompile
Compiles a network and weights files from Caffe or TensorFlow models into a graph file that is compatible with the NCAPI.
For example, giving a caffe model (bvlc_googlenet.caffemodel) and a network description (deploy.prototxt):
mvNCCompile -w bvlc_googlenet.caffemodel -s 12 deploy.prototxt
This command will output the graph and output_expected.npy files, that will be used later on the API
mvNCProfile
Compiles a network, runs it on a connected neural compute device, and outputs profiling info on the terminal and on an HTML file. The profiling data contains layer performance and execution time of the model. The html version of the report also contains a graphical representation of the model. For example, to profile the googlenet network:
mvNCProfile deploy.prototxt -s 12
The output looks like:
mvNCProfile v02.00, Copyright @ Intel Corporation 2017
****** WARNING: using empty weights ******
Layer inception_3b/1x1 forced to im2col_v2, because its output is used in concat
/usr/local/bin/ncsdk/Controllers/FileIO.py:65: UserWarning: You are using a large type. Consider reducing your data sizes for best performance
Blob generated
USB: Transferring Data...
Time to Execute : 115.95 ms
USB: Myriad Execution Finished
Time to Execute : 98.03 ms
USB: Myriad Execution Finished
USB: Myriad Connection Closing.
USB: Myriad Connection Closed.
Network Summary
Detailed Per Layer Profile
Bandwidth time
# Name MFLOPs (MB/s) (ms)
=======================================================================
0 data 0.0 55877.1 0.005
1 conv1/7x7_s2 236.0 2453.0 5.745
2 pool1/3x3_s2 1.8 1346.8 1.137
3 pool1/norm1 0.0 711.3 0.538
4 conv2/3x3_reduce 25.7 471.6 0.828
5 conv2/3x3 693.6 305.9 11.957
6 conv2/norm2 0.0 771.6 1.488
7 pool2/3x3_s2 1.4 1403.3 0.818
8 inception_3a/1x1 19.3 554.6 0.560
9 inception_3a/3x3_reduce 28.9 458.3 0.703
10 inception_3a/3x3 173.4 319.2 4.716
11 inception_3a/5x5_reduce 4.8 1035.8 0.283
12 inception_3a/5x5 20.1 716.0 0.872
13 inception_3a/pool 1.4 648.5 0.443
14 inception_3a/pool_proj 9.6 657.0 0.455
15 inception_3b/1x1 51.4 446.0 0.999
16 inception_3b/3x3_reduce 51.4 445.1 1.001
17 inception_3b/3x3 346.8 261.0 8.228
18 inception_3b/5x5_reduce 12.8 879.9 0.453
19 inception_3b/5x5 120.4 536.8 2.510
20 inception_3b/pool 1.8 678.7 0.564
21 inception_3b/pool_proj 25.7 631.2 0.656
22 pool3/3x3_s2 0.8 1213.8 0.591
23 inception_4a/1x1 36.1 364.0 0.977
24 inception_4a/3x3_reduce 18.1 490.3 0.545
25 inception_4a/3x3 70.4 306.0 2.187
26 inception_4a/5x5_reduce 3.0 763.2 0.254
27 inception_4a/5x5 7.5 455.1 0.414
28 inception_4a/pool 0.8 604.6 0.297
29 inception_4a/pool_proj 12.0 613.0 0.389
30 inception_4b/1x1 32.1 349.6 0.995
31 inception_4b/3x3_reduce 22.5 385.6 0.780
32 inception_4b/3x3 88.5 280.9 2.888
33 inception_4b/5x5_reduce 4.8 576.7 0.373
34 inception_4b/5x5 15.1 339.7 0.885
35 inception_4b/pool 0.9 617.8 0.310
36 inception_4b/pool_proj 12.8 579.5 0.438
37 inception_4c/1x1 25.7 415.5 0.762
38 inception_4c/3x3_reduce 25.7 410.3 0.771
39 inception_4c/3x3 115.6 288.2 3.462
40 inception_4c/5x5_reduce 4.8 574.7 0.374
41 inception_4c/5x5 15.1 339.7 0.885
42 inception_4c/pool 0.9 615.3 0.311
43 inception_4c/pool_proj 12.8 577.3 0.440
44 inception_4d/1x1 22.5 382.9 0.786
45 inception_4d/3x3_reduce 28.9 489.2 0.679
46 inception_4d/3x3 146.3 402.9 2.981
47 inception_4d/5x5_reduce 6.4 728.9 0.305
48 inception_4d/5x5 20.1 408.5 0.979
49 inception_4d/pool 0.9 629.5 0.304
50 inception_4d/pool_proj 12.8 630.8 0.403
51 inception_4e/1x1 53.0 297.7 1.531
52 inception_4e/3x3_reduce 33.1 277.0 1.294
53 inception_4e/3x3 180.6 290.3 4.902
54 inception_4e/5x5_reduce 6.6 492.8 0.466
55 inception_4e/5x5 40.1 378.6 1.322
56 inception_4e/pool 0.9 633.0 0.312
57 inception_4e/pool_proj 26.5 446.8 0.731
58 pool4/3x3_s2 0.4 1245.4 0.250
59 inception_5a/1x1 20.9 616.4 0.786
60 inception_5a/3x3_reduce 13.0 569.7 0.582
61 inception_5a/3x3 45.2 570.7 1.786
62 inception_5a/5x5_reduce 2.6 329.2 0.391
63 inception_5a/5x5 10.0 459.6 0.601
64 inception_5a/pool 0.4 531.7 0.146
65 inception_5a/pool_proj 10.4 514.9 0.546
66 inception_5b/1x1 31.3 607.0 1.133
67 inception_5b/3x3_reduce 15.7 612.0 0.625
68 inception_5b/3x3 65.0 606.1 2.366
69 inception_5b/5x5_reduce 3.9 375.0 0.410
70 inception_5b/5x5 15.1 475.0 0.866
71 inception_5b/pool 0.4 531.7 0.146
72 inception_5b/pool_proj 10.4 513.7 0.547
73 pool5/7x7_s1 0.1 405.5 0.236
74 loss3/classifier 0.0 2559.7 0.764
75 prob 0.0 10.0 0.192
---------------------------------------------------------------------------------------------
Total inference time 93.66
---------------------------------------------------------------------------------------------
Generating Profile Report 'output_report.html'...
Options
You can find the full documentation of the C API here and the Python API here. Gst-Inference uses only the C API and R2Inference takes care of devices, graphs, models and fifos. Because of this, we will only take a look at the options that you can change when using the C API through R2Inference.
The following syntax is used to change backend options on Gst-Inference plugins:
backend::<property>
For example to change the NCSDK API log level of the googlenet plugin you need to run the pipeline like this:
gst-launch-1.0 \
googlenet name=net model-location=/root/r2inference/examples/r2i/ncsdk/graph_googlenet backend=ncsdk backend::log-level=1 \
videotestsrc ! tee name=t \
t. ! queue ! videoconvert ! videoscale ! net.sink_model \
t. ! queue ! net.sink_bypass \
net.src_bypass ! fakesink
The backend::log-level=1
section of the pipeline sets the NC_RW_LOG_LEVEL
option of the NCSDK C API to 1
.
Device Options
All the device options are read only.
Property | C API Counterpart | Value | Description |
---|---|---|---|
thermal-throttling-level | NC_RO_THERMAL_THROTTLING_LEVEL | Integer (0,1,2) |
|
device-state | NC_RO_DEVICE_STATE | Integer (0,1,2,3) | The current state of the device:
|
current-memory-used | NC_RO_DEVICE_CURRENT_MEMORY_USED | Integer | Current memory used on the device. |
memory-size | NC_RO_DEVICE_MEMORY_SIZE | Integer | Total memory available on the device. |
max-fifo-num | NC_RO_DEVICE_MAX_FIFO_NUM | Integer | Max number of fifos. |
allocated-fifo-num | NC_RO_DEVICE_ALLOCATED_FIFO_NUM | Integer | Number of fifos currently allocated. |
max-graph-num | NC_RO_DEVICE_MAX_GRAPH_NUM | Integer | Max number of graphs. |
allocated-graph-num | NC_RO_ALLOCATED_GRAPH_NUM | Integer | Number of graphs currently allocated. |
option-class-limit | NC_RO_DEVICE_OPTION_CLASS_LIMIT | Integer | Highest option class supported. |
device-name | NC_RO_DEVICE_NAME | String | Device name. |
Fifo Options
Most of the R/W options on the FIFO can only be modified between creation and allocation, and R2Inference does both in a single method (Engine->Start()), so it is impossible to write on these options. R2Inference also fixates those options to our specific implementation, so they are not exposed on the plugin.
Global Options
Pay special attention to the log level enumeration, because it is ordered counter intuitively. 1 is actually the highest log level, 4 is the lowest and 0 the default.
Property | C API Counterpart | Value | Description |
---|---|---|---|
log-level | NC_RW_LOG_LEVEL | Integer | NCSDK debug log level from ncLogLevel_t enum
|
Graph Options
Property | C API Counterpart | Value | Description |
---|---|---|---|
graph-state | NC_RO_GRAPH_STATE | Integer | The current state of the graph from ncGraphState_t enum
|
graph-input-count | NC_RO_GRAPH_INPUT_TENSOR_DESCRIPTORS | Integer | Array of graph inputs. Returns the size of the array instead of the array itself. |
graph-output-count | NC_RO_GRAPH_OUTPUT_TENSOR_DESCRIPTORS | Integer | Array of graph outputs. Returns the size of the array instead of the array itself. |
graph-debug-info | NC_RO_GRAPH_DEBUG_INFO | String | Debug information. |
graph-name | NC_RO_GRAPH_NAME | String | Graph name. |
graph-option-class-limit | NC_RO_GRAPH_OPTION_CLASS_LIMIT | Integer | The highest option class supported. |
graph-version | NC_RO_GRAPH_VERSION | String | The version ([major, minor]) of the compiled graph. |