Difference between revisions of "NVIDIA Vision Programming Interface (VPI) Demo"
m (→Demo) |
|||
Line 1: | Line 1: | ||
== General Description == | == General Description == | ||
− | |||
− | *Support for different processing backends <ref>Backend refers to a specific hardware module that runs a stage of an algorithm</ref> (CPU, GPU (CUDA), PVA <ref>PVA: Programmable Vision Accelerator, this specific processor is only available in the Jetson AGX Xavier</ref>) | + | NVIDIA® Vision Programming Interface (VPI) is a library that abstracts heterogeneous video stream computing on NVIDIA embedded devices. VPI provides a common API to use various hardware modules for accelerating computer vision applications. VPI has the following features: |
− | * | + | |
− | *Zero copy, shared memory mapping interface to manage data between the different backends. | + | * Support for different processing backends <ref>Backend refers to a specific hardware module that runs a stage of an algorithm</ref> (CPU, GPU (CUDA), PVA <ref>PVA: Programmable Vision Accelerator, this specific processor is only available in the Jetson AGX Xavier</ref>) |
− | *The API is designed to minimize memory allocations typically required just at the | + | * VPI allows a combination of different backends in the same processing pipeline. For example, one stage can be executed in the GPU, while the PVA would be executing another task within the same algorithm. |
− | *OpenCV and EGL | + | * Zero copy, shared memory mapping interface to manage data between the different backends. |
− | *Synchronization mechanisms that are agnostic of the backend being used. | + | * The API is designed to minimize initial memory allocations typically required just at the starting stage of many computer vision algorithms. Many computer vision applications can be generalized as 3 stages: initialization, main loop and clean up, so the API facilitates building applications in this scheme. |
+ | * OpenCV and EGL interoperability. | ||
+ | * Synchronization mechanisms that are agnostic of the backend being used. The same VPI synchronization API is used independent of the hardware accelerator. | ||
<references/> | <references/> | ||
Line 13: | Line 14: | ||
== Installation == | == Installation == | ||
− | + | VPI (v0.1) was released with JetPack 4.3. It can be installed using the [https://developer.nvidia.com/nvidia-sdk-manager sdkmanager]. | |
'''VPI installation path:''' /opt/nvidia/vpi/vpi-0.1 | '''VPI installation path:''' /opt/nvidia/vpi/vpi-0.1 | ||
== Architecture == | == Architecture == | ||
− | VPI is written in C/C++. | + | |
+ | VPI is written in C/C++. As shown in the figure below, this library provides a unified API (data structures, events, synchronization) for handling processing loads on different hardware accelerators. | ||
[[File:arch_overview.png|400px|thumb|center|'''Fig 1. General diagram of the VPI library. Source: https://docs.nvidia.com/vpi/architecture.html''']] | [[File:arch_overview.png|400px|thumb|center|'''Fig 1. General diagram of the VPI library. Source: https://docs.nvidia.com/vpi/architecture.html''']] | ||
Line 36: | Line 38: | ||
*EGL Interoperability | *EGL Interoperability | ||
− | You can find the full documentation of the API here: https://docs.nvidia.com/vpi/usergroup0.html | + | You can find the full documentation of the VPI API here: https://docs.nvidia.com/vpi/usergroup0.html |
== Samples == | == Samples == | ||
− | There are several samples | + | There are several samples provided with release 0.1. Samples are installed at /opt/nvidia/vpi/vpi-0.1. All samples are provided as simple CMake projects. Below are some instructions to build and test the samples. |
− | *First, make sure to have all building tools required: | + | * First, make sure to have all building tools required: |
<pre style="white-space: pre-wrap;"> | <pre style="white-space: pre-wrap;"> | ||
Line 54: | Line 56: | ||
This sample implements an image convolver with a simple edge detector kernel. | This sample implements an image convolver with a simple edge detector kernel. | ||
− | *To build one sample just follows these commands: | + | * To build one sample just follows these commands: |
+ | |||
<pre style="white-space: pre-wrap;"> | <pre style="white-space: pre-wrap;"> | ||
cd samples/01-convolve_2d | cd samples/01-convolve_2d | ||
Line 61: | Line 64: | ||
</pre> | </pre> | ||
− | *Usage | + | * Usage |
+ | |||
./vpi_sample_01_convolve_2d <backend> <input image> | ./vpi_sample_01_convolve_2d <backend> <input image> | ||
Line 74: | Line 78: | ||
This sample implements the disparity estimation from two stereo images (left and right). | This sample implements the disparity estimation from two stereo images (left and right). | ||
− | *To build one sample just follows these commands: | + | * To build one sample just follows these commands: |
+ | |||
<pre style="white-space: pre-wrap;"> | <pre style="white-space: pre-wrap;"> | ||
cd samples/02-stereo_disparity | cd samples/02-stereo_disparity | ||
Line 94: | Line 99: | ||
This sample implements a detector of Harris corners. | This sample implements a detector of Harris corners. | ||
− | *To build one sample just follows these commands: | + | * To build one sample just follows these commands: |
+ | |||
<pre style="white-space: pre-wrap;"> | <pre style="white-space: pre-wrap;"> | ||
cd samples/03-harris_keypoints | cd samples/03-harris_keypoints | ||
Line 101: | Line 107: | ||
</pre> | </pre> | ||
− | *Usage | + | * Usage |
+ | |||
./vpi_sample_03_harris_keypoints <backend> <input image> | ./vpi_sample_03_harris_keypoints <backend> <input image> | ||
Line 120: | Line 127: | ||
</pre> | </pre> | ||
− | *Usage | + | * Usage |
+ | |||
./vpi_sample_04_resample <backend> <input image> | ./vpi_sample_04_resample <backend> <input image> | ||
Line 137: | Line 145: | ||
</pre> | </pre> | ||
− | *Usage | + | * Usage |
+ | |||
./vpi_sample_05_timing <backend> | ./vpi_sample_05_timing <backend> | ||
The <backend> argument can be CPU, CUDA, or PVA. | The <backend> argument can be CPU, CUDA, or PVA. | ||
− | The result is | + | The result is a resampled image (though it is not actually written to disk) with performance measurements of elapsed time between events. |
+ | |||
+ | * Results | ||
− | |||
<pre style="white-space: pre-wrap;"> | <pre style="white-space: pre-wrap;"> | ||
nvidia@nvidia:~/vpi-0.1-samples/samples/05-timing$ ./vpi_sample_05_timing cpu | nvidia@nvidia:~/vpi-0.1-samples/samples/05-timing$ ./vpi_sample_05_timing cpu | ||
Line 181: | Line 191: | ||
</pre> | </pre> | ||
− | *Usage | + | * Usage |
+ | |||
./vpi_sample_06_klt_tracker <backend> <input video> <input bboxes> <output frames> | ./vpi_sample_06_klt_tracker <backend> <input video> <input bboxes> <output frames> | ||
Line 199: | Line 210: | ||
</pre> | </pre> | ||
− | *Usage | + | * Usage |
+ | |||
./vpi_blur <image file name> | ./vpi_blur <image file name> | ||
Line 209: | Line 221: | ||
== Demo == | == Demo == | ||
− | *The demo captures images from the camera and processes the live streaming frame by frame. Processed frames are store as png files. | + | |
− | *Input frames are stored using H264 (GStreamer) in a mastroska container. | + | * The demo captures images from the camera and processes the live streaming frame by frame. Processed frames are store as png files. |
− | *The demo can be used in all backends (Jetson AGX Xavier) | + | * Input frames are stored using H264 (GStreamer) in a mastroska container. |
− | *Usage: | + | * The demo can be used in all backends (Jetson AGX Xavier) |
+ | * Usage: | ||
+ | |||
./demo-gstreamer-vpi <backend> | ./demo-gstreamer-vpi <backend> | ||
Line 222: | Line 236: | ||
<br> | <br> | ||
'''Demo repository:''' https://gitlab.com/RidgeRun/code-snippets/jetson/xavier/demo-gstreamer-vpi | '''Demo repository:''' https://gitlab.com/RidgeRun/code-snippets/jetson/xavier/demo-gstreamer-vpi | ||
+ | |||
=== Performance === | === Performance === | ||
− | The following table summarizes | + | The following table summarizes throughput data obtained after running the demo app 10 times for 300 frames each in a Jetson AGX Xavier with default performance settings. CPU load measurements were obtained with tegrastats. |
{| class="wikitable" | {| class="wikitable" |
Revision as of 11:13, 30 November 2020
Contents
General Description
NVIDIA® Vision Programming Interface (VPI) is a library that abstracts heterogeneous video stream computing on NVIDIA embedded devices. VPI provides a common API to use various hardware modules for accelerating computer vision applications. VPI has the following features:
- Support for different processing backends [1] (CPU, GPU (CUDA), PVA [2])
- VPI allows a combination of different backends in the same processing pipeline. For example, one stage can be executed in the GPU, while the PVA would be executing another task within the same algorithm.
- Zero copy, shared memory mapping interface to manage data between the different backends.
- The API is designed to minimize initial memory allocations typically required just at the starting stage of many computer vision algorithms. Many computer vision applications can be generalized as 3 stages: initialization, main loop and clean up, so the API facilitates building applications in this scheme.
- OpenCV and EGL interoperability.
- Synchronization mechanisms that are agnostic of the backend being used. The same VPI synchronization API is used independent of the hardware accelerator.
Installation
VPI (v0.1) was released with JetPack 4.3. It can be installed using the sdkmanager.
VPI installation path: /opt/nvidia/vpi/vpi-0.1
Architecture
VPI is written in C/C++. As shown in the figure below, this library provides a unified API (data structures, events, synchronization) for handling processing loads on different hardware accelerators.
![](/wiki/images/thumb/e/e3/Arch_overview.png/400px-Arch_overview.png)
Modules
- Core: Array, Context, Event, Image, Pyramid, Stream
- BilateralFilter
- BoxFilter
- GaussianFilter
- HarrisCornersDetector
- Convolve2D
- Resample
- KLTBoundingBoxTracker
- SeparableConvolve2D
- Stereo Disparity
- EGL Interoperability
You can find the full documentation of the VPI API here: https://docs.nvidia.com/vpi/usergroup0.html
Samples
There are several samples provided with release 0.1. Samples are installed at /opt/nvidia/vpi/vpi-0.1. All samples are provided as simple CMake projects. Below are some instructions to build and test the samples.
- First, make sure to have all building tools required:
sudo apt-get install g++ cmake libopencv-dev
The following samples were tested in Jetson AGX Xavier
2D Image Convolution
This sample implements an image convolver with a simple edge detector kernel.
- To build one sample just follows these commands:
cd samples/01-convolve_2d cmake . make
- Usage
./vpi_sample_01_convolve_2d <backend> <input image>
The <backend> argument can be CPU, CUDA, or PVA.
The <input image> argument is the path to a png or jpeg image.
The result is an image with the filename edges_<backend>.png depending on the backend you set.
Stereo Disparity Estimator
This sample implements the disparity estimation from two stereo images (left and right).
- To build one sample just follows these commands:
cd samples/02-stereo_disparity cmake . make
- Usage
./vpi_sample_02_stereo_disparity <backend> <input image> <input image>
The <backend> argument can be CPU, CUDA, or PVA.
This application requires two images for the stereo pair estimation
The result is an image with the filename disparity_<backend>.png depending on the backend you set.
Harris Keypoint Extrator
This sample implements a detector of Harris corners.
- To build one sample just follows these commands:
cd samples/03-harris_keypoints cmake . make
- Usage
./vpi_sample_03_harris_keypoints <backend> <input image>
The <input image> argument is the path to a png or jpeg image.
The <backend> argument can be CPU, CUDA, or PVA. Note: This sample does not support the PVA backend.
The result is an image with the filename harris_keypoints_<backend>.png depending on the backend you set.
Image Resampling
This sample implements an image resampler (low-pass filter + downscaling).
cd samples/04-resample cmake . make
- Usage
./vpi_sample_04_resample <backend> <input image>
The <input image> argument is the path to a png or jpeg image.
The <backend> argument can be CPU, CUDA, or PVA. Note: This sample does not support the PVA backend.
The result is an image with the filename resampled_<backend>.png depending on the backend you set.
Measure Execution Time
cd samples/05-timing cmake . make
- Usage
./vpi_sample_05_timing <backend>
The <backend> argument can be CPU, CUDA, or PVA.
The result is a resampled image (though it is not actually written to disk) with performance measurements of elapsed time between events.
- Results
nvidia@nvidia:~/vpi-0.1-samples/samples/05-timing$ ./vpi_sample_05_timing cpu Input size: 1920 x 1080 NVMEDIA_ARRAY: 53, Version 2.1 NVMEDIA_VPI : 156, Version 2.3 Blurring elapsed time: 270.452057 ms Gaussian pyramid elapsed time: 55.426395 ms Total elapsed time: 325.878479 ms nvidia@nvidia:~/vpi-0.1-samples/samples/05-timing$ ./vpi_sample_05_timing cuda Input size: 1920 x 1080 NVMEDIA_ARRAY: 53, Version 2.1 NVMEDIA_VPI : 156, Version 2.3 Blurring elapsed time: 8.427521 ms Gaussian pyramid elapsed time: 5.799936 ms Total elapsed time: 14.227456 ms nvidia@nvidia:~/vpi-0.1-samples/samples/05-timing$ ./vpi_sample_05_timing pva Input size: 1920 x 1080 NVMEDIA_ARRAY: 53, Version 2.1 NVMEDIA_VPI : 156, Version 2.3 Blurring elapsed time: 53.625671 ms Gaussian pyramid elapsed time: 17.314356 ms Total elapsed time: 70.940025 ms
KLT Bounding Box Tracker
This sample implements a bounding box tracker using Kanade–Lucas–Tomasi feature tracker (KLT).
cd samples/06-klt_tracker cmake . make
- Usage
./vpi_sample_06_klt_tracker <backend> <input video> <input bboxes> <output frames>
The <input video> argument is the path to a video from which to extract the frames to be processed. The <backend> argument can be CPU, CUDA, or PVA. The <input boxes> argument is a text file containing input bounding boxes. It must follow the structure: <frame> <bbox_x> <bbox_y> <bbox_width> <bbox_height> The <output frames> argument is the prefix for all output frames.
Image Filter (blur)
This sample implements a 2D box filter (mean filter) over an input image.
cd samples/tutorial_blur cmake . make
- Usage
./vpi_blur <image file name>
This sample uses only the CUDA backend.
The <image file name> argument is the path to a png or jpeg image.
The result is a blurred image saved in png format.
Demo
- The demo captures images from the camera and processes the live streaming frame by frame. Processed frames are store as png files.
- Input frames are stored using H264 (GStreamer) in a mastroska container.
- The demo can be used in all backends (Jetson AGX Xavier)
- Usage:
./demo-gstreamer-vpi <backend>
Where <backend> argument can be: CPU, CUDA, or PVA.
Demo repository: https://gitlab.com/RidgeRun/code-snippets/jetson/xavier/demo-gstreamer-vpi
Performance
The following table summarizes throughput data obtained after running the demo app 10 times for 300 frames each in a Jetson AGX Xavier with default performance settings. CPU load measurements were obtained with tegrastats.
Backend | CPU load (%) | Frame rate (fps) | Elapsed Time (ms) |
---|---|---|---|
CPU | ~80% | 25.15 | 39.76 |
CUDA | ~70% | 325.39 | 3.07 |
PVA | ~66% | 129.95 | 7.70 |
References
https://docs.nvidia.com/vpi/index.html
RidgeRun Resources | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Visit our Main Website for the RidgeRun Products and Online Store. RidgeRun Engineering informations are available in RidgeRun Professional Services, RidgeRun Subscription Model and Client Engagement Process wiki pages. Please email to support@ridgerun.com for technical questions and contactus@ridgerun.com for other queries. Contact details for sponsoring the RidgeRun GStreamer projects are available in Sponsor Projects page. | ![]() ![]() |