Difference between revisions of "NVIDIA Vision Programming Interface (VPI) Demo"

Latest revision as of 06:29, 24 August 2022

General Description

NVIDIA® Vision Programming Interface (VPI) is a library that abstracts heterogeneous video stream computing on NVIDIA embedded devices. VPI provides a common API to use various hardware modules for accelerating computer vision applications. VPI has the following features:

Support for different processing backends ^[1] (CPU, GPU (CUDA), PVA ^[2])
VPI allows a combination of different backends in the same processing pipeline. For example, one stage can be executed in the GPU, while the PVA would be executing another task within the same algorithm.
Zero copy, shared memory mapping interface to manage data between the different backends.
The API is designed to minimize initial memory allocations typically required just at the starting stage of many computer vision algorithms. Many computer vision applications can be generalized as 3 stages: initialization, main loop and clean up, so the API facilitates building applications in this scheme.
OpenCV and EGL interoperability.
Synchronization mechanisms that are agnostic of the backend being used. The same VPI synchronization API is used independent of the hardware accelerator.

↑ Backend refers to a specific hardware module that runs a stage of an algorithm
↑ PVA: Programmable Vision Accelerator, this specific processor is only available in the Jetson AGX Xavier

Installation

VPI (v0.1) was released with JetPack 4.3. It can be installed using the sdkmanager.

VPI installation path: /opt/nvidia/vpi/vpi-0.1

Architecture

VPI is written in C/C++. As shown in the figure below, this library provides a unified API (data structures, events, synchronization) for handling processing loads on different hardware accelerators.

Fig 1. General diagram of the VPI library. Source: https://docs.nvidia.com/vpi/architecture.html

Modules

Core: Array, Context, Event, Image, Pyramid, Stream
BilateralFilter
BoxFilter
GaussianFilter
HarrisCornersDetector
Convolve2D
Resample
KLTBoundingBoxTracker
SeparableConvolve2D
Stereo Disparity
EGL Interoperability

You can find the full documentation of the VPI API here: https://docs.nvidia.com/vpi/usergroup0.html

Samples

There are several samples provided with release 0.1. Samples are installed at /opt/nvidia/vpi/vpi-0.1. All samples are provided as simple CMake projects. Below are some instructions to build and test the samples.

First, make sure to have all building tools required:

sudo apt-get install g++ cmake libopencv-dev

The following samples were tested in Jetson AGX Xavier

2D Image Convolution

This sample implements an image convolver with a simple edge detector kernel.

To build one sample just follows these commands:

cd samples/01-convolve_2d
cmake .
make

Usage

./vpi_sample_01_convolve_2d <backend> <input image>

The <backend> argument can be CPU, CUDA, or PVA.

The <input image> argument is the path to a png or jpeg image.

The result is an image with the filename edges_<backend>.png depending on the backend you set.

Stereo Disparity Estimator

This sample implements the disparity estimation from two stereo images (left and right).

To build one sample just follows these commands:

cd samples/02-stereo_disparity
cmake .
make

Usage

./vpi_sample_02_stereo_disparity <backend> <input image> <input image>

The <backend> argument can be CPU, CUDA, or PVA.

This application requires two images for the stereo pair estimation

The result is an image with the filename disparity_<backend>.png depending on the backend you set.

Harris Keypoint Extrator

This sample implements a detector of Harris corners.

To build one sample just follows these commands:

cd samples/03-harris_keypoints
cmake .
make

Usage

./vpi_sample_03_harris_keypoints <backend> <input image>

The <input image> argument is the path to a png or jpeg image.

The <backend> argument can be CPU, CUDA, or PVA. Note: This sample does not support the PVA backend.

The result is an image with the filename harris_keypoints_<backend>.png depending on the backend you set.

Image Resampling

This sample implements an image resampler (low-pass filter + downscaling).

cd samples/04-resample
cmake .
make

Usage

./vpi_sample_04_resample <backend> <input image>

The <input image> argument is the path to a png or jpeg image.

The <backend> argument can be CPU, CUDA, or PVA. Note: This sample does not support the PVA backend.

The result is an image with the filename resampled_<backend>.png depending on the backend you set.

Measure Execution Time

cd samples/05-timing
cmake .
make

Usage

./vpi_sample_05_timing <backend>

The <backend> argument can be CPU, CUDA, or PVA.

The result is a resampled image (though it is not actually written to disk) with performance measurements of elapsed time between events.

Results

nvidia@nvidia:~/vpi-0.1-samples/samples/05-timing$ ./vpi_sample_05_timing cpu
Input size: 1920 x 1080
NVMEDIA_ARRAY:   53,  Version 2.1
NVMEDIA_VPI :  156,  Version 2.3
Blurring elapsed time: 270.452057 ms
Gaussian pyramid elapsed time: 55.426395 ms
Total elapsed time: 325.878479 ms

nvidia@nvidia:~/vpi-0.1-samples/samples/05-timing$ ./vpi_sample_05_timing cuda
Input size: 1920 x 1080
NVMEDIA_ARRAY:   53,  Version 2.1
NVMEDIA_VPI :  156,  Version 2.3
Blurring elapsed time: 8.427521 ms
Gaussian pyramid elapsed time: 5.799936 ms
Total elapsed time: 14.227456 ms

nvidia@nvidia:~/vpi-0.1-samples/samples/05-timing$ ./vpi_sample_05_timing pva
Input size: 1920 x 1080
NVMEDIA_ARRAY:   53,  Version 2.1
NVMEDIA_VPI :  156,  Version 2.3
Blurring elapsed time: 53.625671 ms
Gaussian pyramid elapsed time: 17.314356 ms
Total elapsed time: 70.940025 ms

KLT Bounding Box Tracker

This sample implements a bounding box tracker using Kanade–Lucas–Tomasi feature tracker (KLT).

cd samples/06-klt_tracker
cmake .
make

Usage

./vpi_sample_06_klt_tracker <backend> <input video> <input bboxes> <output frames>

The <input video> argument is the path to a video from which to extract the frames to be processed. The <backend> argument can be CPU, CUDA, or PVA. The <input boxes> argument is a text file containing input bounding boxes. It must follow the structure: <frame> <bbox_x> <bbox_y> <bbox_width> <bbox_height> The <output frames> argument is the prefix for all output frames.

Image Filter (blur)

This sample implements a 2D box filter (mean filter) over an input image.

cd samples/tutorial_blur
cmake .
make

Usage

./vpi_blur <image file name>

This sample uses only the CUDA backend.

The <image file name> argument is the path to a png or jpeg image.

The result is a blurred image saved in png format.

Demo

The demo captures images from the camera and processes the live streaming frame by frame. Processed frames are store as png files.
Input frames are stored using H264 (GStreamer) in a mastroska container.
The demo can be used in all backends (Jetson AGX Xavier)
Usage:

./demo-gstreamer-vpi <backend>

Where <backend> argument can be: CPU, CUDA, or PVA.

Error creating thumbnail: Unable to save thumbnail to destination

Fig 1. General diagram of VPI demo app.

Fig 2. Sample of processed frame.

Demo repository: https://gitlab.com/RidgeRun/code-snippets/jetson/xavier/demo-gstreamer-vpi

Performance

The following table summarizes throughput data obtained after running the demo app 10 times for 300 frames each in a Jetson AGX Xavier with default performance settings. CPU load measurements were obtained with tegrastats.

Backend	CPU load (%)	Frame rate (fps)	Elapsed Time (ms)
CPU	~80%	25.15	39.76
CUDA	~70%	325.39	3.07
PVA	~66%	129.95	7.70

References

https://docs.nvidia.com/vpi/index.html

RidgeRun Resources

Quick Start

Client Engagement Process

RidgeRun Blog

Homepage

Technical and Sales Support

RidgeRun Online Store

RidgeRun Videos

Contact Us

Contact Us

Visit our Main Website for the RidgeRun Products and Online Store. RidgeRun Engineering informations are available in RidgeRun Professional Services, RidgeRun Subscription Model and Client Engagement Process wiki pages. Please email to support@ridgerun.com for technical questions and contactus@ridgerun.com for other queries. Contact details for sponsoring the RidgeRun GStreamer projects are available in Sponsor Projects page.

[1] Backend refers to a specific hardware module that runs a stage of an algorithm

[2] PVA: Programmable Vision Accelerator, this specific processor is only available in the Jetson AGX Xavier

[1]

[2]

Difference between revisions of "NVIDIA Vision Programming Interface (VPI) Demo"

Latest revision as of 06:29, 24 August 2022

Contents

General Description

Installation

Architecture

Modules

Samples

2D Image Convolution

Stereo Disparity Estimator

Harris Keypoint Extrator

Image Resampling

Measure Execution Time

KLT Bounding Box Tracker

Image Filter (blur)

Demo

Performance

References

Contact Us

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Useful Links

Legal

Services

Tools

@@ Line 1: / Line 1: @@
-= General Description =
+<seo title="NVIDIA Vision Programming Interface | NVIDIA VPI Demo | RidgeRun Developer" titlemode="replace" keywords="GStreamer, Linux SDK, Linux BSP,  Embedded Linux, Device Drivers, NVIDIA, Xilinx, TI, NXP, Freescale, Embedded Linux driver development, Linux Software development, Embedded Linux SDK, Embedded Linux Application development, GStreamer Multimedia Framework, Vision Programming Interface, VPI, VPI Demo, KLT Bounding Box, KLI, Image filter, Jetson, Jetson TX1, Jetson TX2, Xavier, NVIDIA Jetson Xavier, NVIDIA Jetson Xavier NX, Jetson Xavier, Xilinx, TI, i.MX8, i.MX6, Jetson Xavier NX, Jetson Nano, NVIDIA Jetson Orin"  description="Read about NVIDIA Vision Programming Interface (VPI) Demo in this RidgeRun Developer Wiki."></seo>
-NVIDIA® Vision Programming Interface (VPI) is a library that abstracts heterogeneous computing on NVIDIA embedded devices. It aims to provide a common API to use different hardware modules for the acceleration of computer vision applications. It has the following features:
-*Support for different processing backends <ref>Backend refers to a specific hardware module that runs a stage of an algorithm</ref> (CPU, GPU (CUDA), PVA <ref>PVA: Programmable Vision Accelerator, this specific processor is only available in the Jetson AGX Xavier</ref>)
+<table>
-*It allows a combination of different backends in the same processing pipeline. For example, one stage can be executed in the GPU, while the PVA would be executing another task within the same algorithm.
+<tr>
-*Zero copy, shared memory mapping interface to manage data between the different backends.
+<td><div class="clear; float:right">__TOC__</div></td>
-*The API is designed to minimize memory allocations typically required just at the initialization stage of many computer vision algorithms. Many computer vision applications can be generalized as 3 stages: initialization, main loop and clean up, so the API facilitates building applications in this scheme.
+<td>
-*OpenCV and EGL Interoperability.
+{{NVIDIA Preferred Partner logo}}
-*Synchronization mechanisms that are agnostic of the backend being used. It doesn't matter the backend, you use the same API.
+<td>
+<center>
+{{ContactUs Button}}
+</center>
+</tr>
+</table>
+== General Description ==
+NVIDIA® Vision Programming Interface (VPI) is a library that abstracts heterogeneous video stream computing on NVIDIA embedded devices. VPI provides a common API to use various hardware modules for accelerating computer vision applications. VPI has the following features:
+* Support for different processing backends <ref>Backend refers to a specific hardware module that runs a stage of an algorithm</ref> (CPU, GPU (CUDA), PVA <ref>PVA: Programmable Vision Accelerator, this specific processor is only available in the Jetson AGX Xavier</ref>)
+* VPI allows a combination of different backends in the same processing pipeline. For example, one stage can be executed in the GPU, while the PVA would be executing another task within the same algorithm.
+* Zero copy, shared memory mapping interface to manage data between the different backends.
+* The API is designed to minimize initial memory allocations typically required just at the starting stage of many computer vision algorithms. Many computer vision applications can be generalized as 3 stages: initialization, main loop and clean up, so the API facilitates building applications in this scheme.
+* OpenCV and EGL interoperability.
+* Synchronization mechanisms that are agnostic of the backend being used. The same VPI synchronization API is used independent of the hardware accelerator.
 <references/>
-= Installation =
+== Installation ==
-The VPI (v0.1) was released with JetPack 4.3. It can be installed using the [https://developer.nvidia.com/nvidia-sdk-manager sdkmanager].
+VPI (v0.1) was released with JetPack 4.3. It can be installed using the [https://developer.nvidia.com/nvidia-sdk-manager sdkmanager].
 '''VPI installation path:''' /opt/nvidia/vpi/vpi-0.1
-= Architecture =
+== Architecture ==
-VPI is written in C/C++. At it is shown in the figure below, this library provides a unified API (data structures, events, synchronization) for handling processing loads on different hardware accelerators.
+VPI is written in C/C++. As shown in the figure below, this library provides a unified API (data structures, events, synchronization) for handling processing loads on different hardware accelerators.
 [[File:arch_overview.png|400px|thumb|center|'''Fig 1. General diagram of the VPI library. Source: https://docs.nvidia.com/vpi/architecture.html''']]
-= Modules =
+== Modules ==
 *Core: Array, Context, Event, Image, Pyramid, Stream
@@ Line 36: / Line 52: @@
 *EGL Interoperability
-You can find the full documentation of the API here: https://docs.nvidia.com/vpi/usergroup0.html
+You can find the full documentation of the VPI API here: https://docs.nvidia.com/vpi/usergroup0.html
-= Samples =
+== Samples ==
-There are several samples starting at release 0.1. Samples are installed at /opt/nvidia/vpi/vpi-0.1. All samples are provided as simple CMake projects. Below are some instructions to build and test the samples.
+There are several samples provided with release 0.1. Samples are installed at /opt/nvidia/vpi/vpi-0.1. All samples are provided as simple CMake projects. Below are some instructions to build and test the samples.
-*First, make sure to have all building tools required:
+* First, make sure to have all building tools required:
 <pre style="white-space: pre-wrap;">
@@ Line 50: / Line 66: @@
 The following samples were tested in Jetson AGX Xavier
-== 2D Image Convolution ==
+=== 2D Image Convolution ===
 This sample implements an image convolver with a simple edge detector kernel.
-*To build one sample just follows these commands:
+* To build one sample just follows these commands:
 <pre style="white-space: pre-wrap;">
 cd samples/01-convolve_2d
@@ Line 61: / Line 78: @@
 </pre>
-*Usage
+* Usage
   ./vpi_sample_01_convolve_2d <backend> <input image>
-The <backend> argument can be cpu, cuda or pva.
+The <backend> argument can be CPU, CUDA, or PVA.
 The <input image> argument is the path to a png or jpeg image.
@@ Line 70: / Line 88: @@
 The result is an image with the filename '''edges_<backend>.png''' depending on the backend you set.
-== Stereo Disparity Estimator ==
+=== Stereo Disparity Estimator ===
 This sample implements the disparity estimation from two stereo images (left and right).
-*To build one sample just follows these commands:
+* To build one sample just follows these commands:
 <pre style="white-space: pre-wrap;">
 cd samples/02-stereo_disparity
@@ Line 84: / Line 103: @@
   ./vpi_sample_02_stereo_disparity <backend> <input image> <input image>
-The <backend> argument can be cpu, cuda or pva.
+The <backend> argument can be CPU, CUDA, or PVA.
 This application requires two images for the stereo pair estimation
@@ Line 90: / Line 109: @@
 The result is an image with the filename '''disparity_<backend>.png''' depending on the backend you set.
-== Harris Keypoint Extrator ==
+=== Harris Keypoint Extrator ===
 This sample implements a detector of Harris corners.
-*To build one sample just follows these commands:
+* To build one sample just follows these commands:
 <pre style="white-space: pre-wrap;">
 cd samples/03-harris_keypoints
@@ Line 101: / Line 121: @@
 </pre>
-*Usage
+* Usage
   ./vpi_sample_03_harris_keypoints <backend> <input image>
 The <input image> argument is the path to a png or jpeg image.
-The <backend> argument can be cpu, cuda or pva. '''Note:''' This sample does not support the pva backend.
+The <backend> argument can be CPU, CUDA, or PVA. '''Note:''' This sample does not support the PVA backend.
 The result is an image with the filename '''harris_keypoints_<backend>.png''' depending on the backend you set.
-== Image Resampling ==
+=== Image Resampling ===
 This sample implements an image resampler (low-pass filter + downscaling).
@@ Line 120: / Line 141: @@
 </pre>
-*Usage
+* Usage
   ./vpi_sample_04_resample <backend> <input image>
 The <input image> argument is the path to a png or jpeg image.
-The <backend> argument can be cpu, cuda or pva. '''Note:''' This sample does not support the pva backend.
+The <backend> argument can be CPU, CUDA, or PVA. '''Note:''' This sample does not support the PVA backend.
 The result is an image with the filename '''resampled_<backend>.png''' depending on the backend you set.
-== Measure Execution Time ==
+=== Measure Execution Time ===
 <pre style="white-space: pre-wrap;">
@@ Line 137: / Line 159: @@
 </pre>
-*Usage
+* Usage
   ./vpi_sample_05_timing <backend>
-The <backend> argument can be cpu, cuda or pva.
+The <backend> argument can be CPU, CUDA, or PVA.
+The result is a resampled image (though it is not actually written to disk) with performance measurements of elapsed time between events.
-The result is an image resampled (though it is not actually wrote into the disk) and some printed messages with measure of elapsed time between events.
+* Results
-*Results
 <pre style="white-space: pre-wrap;">
 nvidia@nvidia:~/vpi-0.1-samples/samples/05-timing$ ./vpi_sample_05_timing cpu
@@ Line 171: / Line 195: @@
 </pre>
-== KLT Bounding Box Tracker ==
+=== KLT Bounding Box Tracker ===
 This sample implements a bounding box tracker using Kanade–Lucas–Tomasi feature tracker (KLT).
@@ Line 181: / Line 205: @@
 </pre>
-*Usage
+* Usage
   ./vpi_sample_06_klt_tracker <backend> <input video> <input bboxes> <output frames>
 The <input video> argument is the path to a video from which to extract the frames to be processed.
-The <backend> argument can be cpu, cuda or pva.
+The <backend> argument can be CPU, CUDA, or PVA.
 The <input boxes> argument is a text file containing input bounding boxes. It must follow the structure: <frame> <bbox_x> <bbox_y> <bbox_width> <bbox_height>
 The <output frames> argument is the prefix for all output frames.
-== Image Filter (blur) ==
+=== Image Filter (blur) ===
 This sample implements a 2D box filter (mean filter) over an input image.
@@ Line 199: / Line 224: @@
 </pre>
-*Usage
+* Usage
   ./vpi_blur <image file name>
@@ Line 208: / Line 234: @@
 The result is a blurred image saved in png format.
-= Demo =
+== Demo ==
-*The demo captures images from camera, and process the live streaming frame by frame. Processed frames are store as png files.
-*Input frames are stored using H264 (GStreamer) in a mastroska container.
+* The demo captures images from the camera and processes the live streaming frame by frame. Processed frames are store as png files.
-*The demo can be used in all backends (Jetson AGX Xavier)
+* Input frames are stored using H264 (GStreamer) in a mastroska container.
-*Usage:
+* The demo can be used in all backends (Jetson AGX Xavier)
+* Usage:
   ./demo-gstreamer-vpi <backend>
-Where <backend> argument can be: cpu, cuda or pva.
+Where <backend> argument can be: CPU, CUDA, or PVA.
 [[File:demo_vpi_diagram.png|800px|thumb|center|'''Fig 1. General diagram of VPI demo app.''']]
+<br>
 [[File:output_sample_cuda.png|800px|thumb|center|'''Fig 2. Sample of processed frame.''']]
+<br>
+'''Demo repository:''' https://gitlab.com/RidgeRun/code-snippets/jetson/xavier/demo-gstreamer-vpi
-'''Demo repository:''' https://gitlab.com/RidgeRun/code-snippets/jetson/xavier/demo-gstreamer-vpi
+=== Performance ===
-== Performance ==
-The following table summarizes some throughput data obtained after running the demo app 10 times for 300 frames each in a Jetson AGX Xavier with default performance settings. CPU load measurements were obtained with tegrastats.
+The following table summarizes throughput data obtained after running the demo app 10 times for 300 frames each in a Jetson AGX Xavier with default performance settings. CPU load measurements were obtained with tegrastats.
 {| class="wikitable"
@@ Line 247: / Line 277: @@
 |}
-= References =
+== References ==
 https://docs.nvidia.com/vpi/index.html
-[[Category:nvidia]][[Category:vpi]][[Category:cuda]][[Category:computer vision]]
+{{ContactUs}}
+[[Category:CUDA]][[Category:Jetson]][[Category:JetsonNano]][[Category:JetsonTX2]][[Category:NVIDIA Xavier]][[Category:JetsonXavierNX]][[Category:NVIDIA Jetson Orin]]