Difference between revisions of "GstInference/Supported backends/NCSDK"
(→Installation) |
(→Installation) |
||
Line 15: | Line 15: | ||
=Installation= | =Installation= | ||
− | You can install the NCSDK on a system running Linux directly, downloading a Docker container, on a virtual machine or using a Python virtual environment. | + | You can install the NCSDK on a system running Linux directly, downloading a Docker container, on a virtual machine or using a Python virtual environment. All the possible installation paths are documented on the [https://movidius.github.io/ncsdk/install.html official installation guide]. |
We also provide an installation guide with troubleshooting on the [[Intel_Movidius_NCSDK_Installation | Intel Movidius Installation wiki page]] | We also provide an installation guide with troubleshooting on the [[Intel_Movidius_NCSDK_Installation | Intel Movidius Installation wiki page]] |
Revision as of 13:25, 20 December 2018
Make sure you also check GstInference's companion project: R2Inference |
The NCSDK Intel® Movidius™ Neural Compute SDK (Intel® Movidius™ NCSDK) enables deployment of deep neural networks on compatible devices such as the Intel® Movidius™ Neural Compute Stick. The NCSDK includes a set of software tools to compile, profile, and validate DNNs (Deep Neural Networks) as well as APIs on C/C++ and Python for application development.
The NCSDK has two general usages:
- Profiling, tuning, and compiling a DNN models.
- Prototyping user applications, that run accelerated with a neural compute device hardware, using the NCAPI.
Contents
Installation
You can install the NCSDK on a system running Linux directly, downloading a Docker container, on a virtual machine or using a Python virtual environment. All the possible installation paths are documented on the official installation guide.
We also provide an installation guide with troubleshooting on the Intel Movidius Installation wiki page
Tools
mvNCCheck
Checks the validity of a Caffe or TensorFlow model on a neural compute device. The check is done by running an inference on both the device and in software and then comparing the results to determine a if the network passes or fails. This tool works best with image classification networks. You can check all the available options on the official documentation.
For example lets test the googlenet caffe model downloaded by the ncappzoo repo:
mvNCCheck -w bvlc_googlenet.caffemodel -i ../../data/images/nps_electric_guitar.png -s 12 -id 546 deploy.prototxt -S 255 -M 110
- -w indicates the weights file
- -i the input image
- -s the number of shaves
- -id the expected label id for the input image (you can find the id for any imagenet model here)
- -S is the scaling sice
- -M is the substracted mean after scaling
Most of these parameters are available from the model documentation. The command produces the following result:
lob generated
USB: Transferring Data...
USB: Myriad Execution Finished
USB: Myriad Connection Closing.
USB: Myriad Connection Closed.
Result: (1000,)
1) 546 0.99609
2) 402 0.0038853
3) 420 8.9228e-05
4) 327 0.0
5) 339 0.0
Expected: (1000,)
1) 546 0.99609
2) 402 0.0039177
3) 420 9.0837e-05
4) 889 1.2875e-05
5) 486 5.3644e-06
------------------------------------------------------------
Obtained values
------------------------------------------------------------
Obtained Min Pixel Accuracy: 0.0032552085031056777% (max allowed=2%), Pass
Obtained Average Pixel Accuracy: 7.264380030846951e-06% (max allowed=1%), Pass
Obtained Percentage of wrong values: 0.0% (max allowed=0%), Pass
Obtained Pixel-wise L2 error: 0.00011369892179413199% (max allowed=1%), Pass
Obtained Global Sum Difference: 7.236003875732422e-05
------------------------------------------------------------
mvNCCompile
Compiles a network and weights files from Caffe or TensorFlow models into a graph file that is compatible with the NCAPI.
For example, giving a caffe model (bvlc_googlenet.caffemodel) and a network description (deploy.prototxt):
mvNCCompile -w bvlc_googlenet.caffemodel -s 12 deploy.prototxt
This command will output the graph and output_expected.npy files, that will be used later on the API
mvNCProfile
Compiles a network, runs it on a connected neural compute device, and outputs profiling info on the terminal and on an HTML file. The profiling data contains layer performance and execution time of the model. The html version of the report also contains a graphical representation of the model. For example, to profile the googlenet network:
mvNCProfile deploy.prototxt -s 12
The output looks like:
mvNCProfile v02.00, Copyright @ Intel Corporation 2017
****** WARNING: using empty weights ******
Layer inception_3b/1x1 forced to im2col_v2, because its output is used in concat
/usr/local/bin/ncsdk/Controllers/FileIO.py:65: UserWarning: You are using a large type. Consider reducing your data sizes for best performance
Blob generated
USB: Transferring Data...
Time to Execute : 115.95 ms
USB: Myriad Execution Finished
Time to Execute : 98.03 ms
USB: Myriad Execution Finished
USB: Myriad Connection Closing.
USB: Myriad Connection Closed.
Network Summary
Detailed Per Layer Profile
Bandwidth time
# Name MFLOPs (MB/s) (ms)
=======================================================================
0 data 0.0 55877.1 0.005
1 conv1/7x7_s2 236.0 2453.0 5.745
2 pool1/3x3_s2 1.8 1346.8 1.137
3 pool1/norm1 0.0 711.3 0.538
4 conv2/3x3_reduce 25.7 471.6 0.828
5 conv2/3x3 693.6 305.9 11.957
6 conv2/norm2 0.0 771.6 1.488
7 pool2/3x3_s2 1.4 1403.3 0.818
8 inception_3a/1x1 19.3 554.6 0.560
9 inception_3a/3x3_reduce 28.9 458.3 0.703
10 inception_3a/3x3 173.4 319.2 4.716
11 inception_3a/5x5_reduce 4.8 1035.8 0.283
12 inception_3a/5x5 20.1 716.0 0.872
13 inception_3a/pool 1.4 648.5 0.443
14 inception_3a/pool_proj 9.6 657.0 0.455
15 inception_3b/1x1 51.4 446.0 0.999
16 inception_3b/3x3_reduce 51.4 445.1 1.001
17 inception_3b/3x3 346.8 261.0 8.228
18 inception_3b/5x5_reduce 12.8 879.9 0.453
19 inception_3b/5x5 120.4 536.8 2.510
20 inception_3b/pool 1.8 678.7 0.564
21 inception_3b/pool_proj 25.7 631.2 0.656
22 pool3/3x3_s2 0.8 1213.8 0.591
23 inception_4a/1x1 36.1 364.0 0.977
24 inception_4a/3x3_reduce 18.1 490.3 0.545
25 inception_4a/3x3 70.4 306.0 2.187
26 inception_4a/5x5_reduce 3.0 763.2 0.254
27 inception_4a/5x5 7.5 455.1 0.414
28 inception_4a/pool 0.8 604.6 0.297
29 inception_4a/pool_proj 12.0 613.0 0.389
30 inception_4b/1x1 32.1 349.6 0.995
31 inception_4b/3x3_reduce 22.5 385.6 0.780
32 inception_4b/3x3 88.5 280.9 2.888
33 inception_4b/5x5_reduce 4.8 576.7 0.373
34 inception_4b/5x5 15.1 339.7 0.885
35 inception_4b/pool 0.9 617.8 0.310
36 inception_4b/pool_proj 12.8 579.5 0.438
37 inception_4c/1x1 25.7 415.5 0.762
38 inception_4c/3x3_reduce 25.7 410.3 0.771
39 inception_4c/3x3 115.6 288.2 3.462
40 inception_4c/5x5_reduce 4.8 574.7 0.374
41 inception_4c/5x5 15.1 339.7 0.885
42 inception_4c/pool 0.9 615.3 0.311
43 inception_4c/pool_proj 12.8 577.3 0.440
44 inception_4d/1x1 22.5 382.9 0.786
45 inception_4d/3x3_reduce 28.9 489.2 0.679
46 inception_4d/3x3 146.3 402.9 2.981
47 inception_4d/5x5_reduce 6.4 728.9 0.305
48 inception_4d/5x5 20.1 408.5 0.979
49 inception_4d/pool 0.9 629.5 0.304
50 inception_4d/pool_proj 12.8 630.8 0.403
51 inception_4e/1x1 53.0 297.7 1.531
52 inception_4e/3x3_reduce 33.1 277.0 1.294
53 inception_4e/3x3 180.6 290.3 4.902
54 inception_4e/5x5_reduce 6.6 492.8 0.466
55 inception_4e/5x5 40.1 378.6 1.322
56 inception_4e/pool 0.9 633.0 0.312
57 inception_4e/pool_proj 26.5 446.8 0.731
58 pool4/3x3_s2 0.4 1245.4 0.250
59 inception_5a/1x1 20.9 616.4 0.786
60 inception_5a/3x3_reduce 13.0 569.7 0.582
61 inception_5a/3x3 45.2 570.7 1.786
62 inception_5a/5x5_reduce 2.6 329.2 0.391
63 inception_5a/5x5 10.0 459.6 0.601
64 inception_5a/pool 0.4 531.7 0.146
65 inception_5a/pool_proj 10.4 514.9 0.546
66 inception_5b/1x1 31.3 607.0 1.133
67 inception_5b/3x3_reduce 15.7 612.0 0.625
68 inception_5b/3x3 65.0 606.1 2.366
69 inception_5b/5x5_reduce 3.9 375.0 0.410
70 inception_5b/5x5 15.1 475.0 0.866
71 inception_5b/pool 0.4 531.7 0.146
72 inception_5b/pool_proj 10.4 513.7 0.547
73 pool5/7x7_s1 0.1 405.5 0.236
74 loss3/classifier 0.0 2559.7 0.764
75 prob 0.0 10.0 0.192
---------------------------------------------------------------------------------------------
Total inference time 93.66
---------------------------------------------------------------------------------------------
Generating Profile Report 'output_report.html'...
API
You can find the full documentation of the C API here and the Python API here. Gst-Inference uses only the C API and R2Inference takes care of devices, graphs, models and fifos. Because of this, we will only take a look at the options that you can change when using the C API through R2Inference.
R2Inference changes the options of the framework via the "IParameters" class. First you need to create an object:
r2i::RuntimeError error;
std::shared_ptr<r2i::IParameters> parameters = factory->MakeParameters (error);
Then call the "Set" or "Get" virtual functions:
parameters->Set(<option>, <value>)
parameters->Get(<option>, <value>)
Device Options
All the device options from the API are read only.
Option | Value | Description |
---|---|---|
NC_RO_DEVICE_THERMAL_STATS | float array | An array of lenght NC_RO_DEVICE_THERMAL_STATS with the temperature history of the device on Celsius. |
NC_RO_THERMAL_THROTTLING_LEVEL | 0,1,2 |
|
NC_RO_DEVICE_STATE | ncDeviceState_t enum value |
|
NC_RO_DEVICE_CURRENT_MEMORY_USED | positive int | Memory used on the device. |
NC_RO_DEVICE_MEMORY_SIZE | positive int | Total memory available on the device. |
NC_RO_DEVICE_MAX_FIFO_NUM | positive int | Max number of fifos. |
NC_RO_DEVICE_ALLOCATED_FIFO_NUM | positive int | Number of fifos currently allocated. |
NC_RO_DEVICE_MAX_GRAPH_NUM | positive int | Max number of graphs. |
NC_RO_ALLOCATED_GRAPH_NUM | positive int | Number of graphs currently allocated. |
NC_RO_DEVICE_OPTION_CLASS_LIMIT | positive int | Highest option class supported. |
NC_RO_DEVICE_FW_VERSION | [major, minor, hardware type, build number] | Device firmware version. |
NC_RO_DEVICE_HW_VERSION | ncDeviceHwVersion_t enum value |
|
NC_RO_DEVICE_MVTENSOR_VERSION | [major, minor] | mvtensor library version. |
NC_RO_DEVICE_NAME | string | Device name. |
Fifo Options
Fifo options are read only if they begin with the prefix NC_RO_FIFO and read/write if they begin with NC_RW_FIFO. Most of the R/W options on the FIFO can only be modified between creation and allocation, and R2Inference does both in a single method (Engine->Start()), so it is impossible to write on these options.
Option | Value | Description |
---|---|---|
NC_RW_FIFO_TYPE | ncFifoType_t enum value |
|
NC_RW_FIFO_DATA_TYPE | ncFifoDataType_t enum value |
|
NC_RO_FIFO_CAPACITY | positive int | FIFO queue size. |
NC_RO_FIFO_READ_FILL_LEVEL | positive int | Elements on an output FIFO queue. |
NC_RO_FIFO_WRITE_FILL_LEVEL | positive int | Elements on an input FIFO queue. |
NC_RO_FIFO_GRAPH_TENSOR_DESCRIPTOR | ncTensorDescriptor_t struct | Shape of the tensor on the FIFO. |
NC_RO_FIFO_STATE | ncFifoState_t enum value |
|
NC_RO_FIFO_NAME | string | FIFO name. |
NC_RO_FIFO_ELEMENT_DATA_SIZE | positive int | Size in bits of the FIFO elements. |
NC_RW_FIFO_HOST_TENSOR_DESCRIPTOR | ncTensorDescriptor_t struct | Shape of the tensor on application. |
Global Options
Pay special attention to the log level enumeration, because it is ordered counter intuitively. 1 is actually the highest log level, 4 is the lowest and 0 the default.
Option | Value | Description |
---|---|---|
NC_RW_LOG_LEVEL | ncLogLevel_t enum value |
|
NC_RO_API_VERSION | [major, minor, hotfix, release] | API version |
Graph Options
Option | Value | Description |
---|---|---|
NC_RO_GRAPH_STATE | ncGraphState_t enum value |
|
NC_RO_GRAPH_TIME_TAKEN | positive floats | Time per layer for the last inference in milliseconds. |
NC_RO_GRAPH_INPUT_TENSOR_DESCRIPTORS | ncTensorDescriptor_t struct | Array of graph inputs. |
NC_RO_GRAPH_OUTPUT_TENSOR_DESCRIPTORS | ncTensorDescriptor_t struct | Array of graph outputs. |
NC_RO_GRAPH_DEBUG_INFO | string | Debug information. |
NC_RO_GRAPH_NAME | string | Graph name. |
NC_RO_GRAPH_OPTION_CLASS_LIMIT | positive int | The highest option class supported. |
NC_RO_GRAPH_VERSION | [major, minor] | The version of the compiled graph. |
NC_RO_GRAPH_TIME_TAKEN_ARRAY_SIZE | positive int | Length of the time array (number of layers). |