GstQtOverlay plugin performance on NVIDIA Platforms

From RidgeRun Developer Connection
Jump to: navigation, search


Previous: Performance/Nano Index Next: Troubleshooting






Xavier NX Platform

For testing purposes, take into account the following points:

  • Maximum performance mode enabled: all cores, and Jetson clocks enabled
  • Jetpack 4.4 (4.2.1 or earlier is not recommended)
  • Base installation
  • The GstQtOverlay is surrounded by queues to measure the actual capability of GstQtOverlay

For measuring and contrasting the performances with and without NVMM support, we used the following pipelines:

NVMM:

gst-launch-1.0 videotestsrc pattern=black ! 'video/x-raw, width=1920, height=1080' ! queue ! nvvidconv ! queue ! 'video/x-raw(memory:NVMM)' ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf ! queue ! nvvidconv ! 'video/x-raw' ! fakesink sync=false

No NVMM

gst-launch-1.0 videotestsrc pattern=black ! 'video/x-raw, width=1920, height=1080' ! queue ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf ! queue ! fakesink sync=false

The results obtained:

Measurement Jetson NX
No NVMM 190 fps
NVMM
265 fps


With the addition of native UYVY and NV12 support for NVMM memory, we measured the performance for each format between using nvvidconv or the native support. The following pipelines were used:

NVMM using nvvidconv:

gst-launch-1.0 videotestsrc pattern=black ! 'video/x-raw, width=1920, height=1080, format=RGBA' ! queue ! nvvidconv ! queue ! 'video/x-raw(memory:NVMM), format=RGBA' ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf ! queue ! nvvidconv ! 'video/x-raw' ! fakesink sync=false

NVMM with native UYVY and NV12 support:

gst-launch-1.0 videotestsrc pattern=black ! 'video/x-raw, width=1920, height=1080, format=NV12' ! queue ! nvvidconv ! queue ! 'video/x-raw(memory:NVMM), format=NV12' ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf ! queue ! nvvidconv ! 'video/x-raw' ! fakesink sync=false

The results obtained:

Measurement RGBA UYVY NV12
NVMM with nvvidconv 227 fps
200 fps
177 fps
NVMM with native formats
227 fps
172 fps
177 fps

CPU usage

Take the following pipelines as reference:

No NVMM

gst-launch-1.0 videotestsrc is-live=true ! 'video/x-raw, width=1920, height=1080'  ! qtoverlay qml=gst-libs/gst/qt/main.qml ! 'video/x-raw, width=1920, height=1080' ! fakesink

NVMM:

gst-launch-1.0 videotestsrc is-live=true ! 'video/x-raw, width=1920, height=1080' ! nvvidconv ! 'video/x-raw(memory:NVMM), width=1920, height=1080' ! qtoverlay qml=gst-libs/gst/qt/main.qml ! nvvidconv ! 'video/x-raw, width=1920, height=1080'  ! fakesink sync=false

Results

Measurement No NVMM
NVMM
GstQtOverlay 16.5% 7.3%
Rest of pipeline 14% 27.8%
Total 30.5% 35.1%


The total consumption of the pipeline is higher in the NVMM case since there are more elements. The GstQtOverlay consumes less CPU in NVMM mode.

Nano Platform

For testing purposes, take into account the following points:

  • Maximum performance mode enabled: all cores, and Jetson clocks enabled
  • Jetpack 4.5 (4.2.1 or earlier is not recommended)
  • Base installation
  • The GstQtOverlay is surrounded by queues to measure the actual capability of GstQtOverlay

For measuring and contrasting the performances with and without NVMM support for every format, we used the following pipelines:

NVMM using nvvidconv:

gst-launch-1.0 videotestsrc pattern=black ! 'video/x-raw, width=1920, height=1080, format=NV12' ! queue ! nvvidconv ! queue ! 'video/x-raw(memory:NVMM), format=RGBA' ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf ! queue ! nvvidconv ! 'video/x-raw' ! fakesink sync=false

NVMM with native UYVY and NV12 support:

gst-launch-1.0 videotestsrc pattern=black ! 'video/x-raw, width=1920, height=1080, format=NV12' ! queue ! nvvidconv ! queue ! 'video/x-raw(memory:NVMM), format=NV12' ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf ! queue ! nvvidconv ! 'video/x-raw' ! fakesink sync=false

No NVMM

gst-launch-1.0 videotestsrc pattern=black ! 'video/x-raw, width=1920, height=1080' ! queue ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf ! queue ! fakesink sync=false

The results obtained:

Measurement RGBA UYVY NV12
No NVMM 57 fps
N/A N/A
NVMM with nvvidconv
121 fps
67 fps 61 fps
NVMM with native formats
123 fps
67 fps 61 fps

While using the native formats may not provide big performance gains here as it still uses nvvidconv to upload to NVMM memory, it allows connecting directly to some cameras that output NV12 in NVMM memory like with the nvarguscamerasrc element.

CPU usage

Taking the following pipelines as reference:

No NVMM

gst-launch-1.0 videotestsrc is-live=true ! 'video/x-raw, width=1920, height=1080'  ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf print-cpu-load=true ! 'video/x-raw, width=1920, height=1080' ! fakesink

NVMM:

gst-launch-1.0 videotestsrc is-live=true ! 'video/x-raw, width=1920, height=1080' ! nvvidconv ! 'video/x-raw(memory:NVMM), width=1920, height=1080' ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf print-cpu-load=true ! nvvidconv ! 'video/x-raw, width=1920, height=1080'  ! fakesink sync=false

Results

Measurement No NVMM
NVMM
GstQtOverlay 2% 2%
Rest of pipeline 17% 13%
Total 19% 15%


Previous: Performance/Nano Index Next: Troubleshooting