CUDA ISP for NVIDIA Jetson: Performance

From RidgeRun Developer Connection
< CUDA ISP for NVIDIA Jetson
Revision as of 14:09, 27 March 2023 by Lleon (talk | contribs) (GStreamer elements performance)
Jump to: navigation, search


  Index  






Library API performance

To measure the CUDA ISP API performance, we built a simple example that iterates over the apply methods and records performance metrics for each iteration. We recorded the duration of each apply method, the CPU and GPU usage during the application of the code, and the CPU RAM and GPU RAM usage. We recorded the performance on a Jetson Nano, Jetson Xavier NX, Jetson Xavier AGX, and Jetson Orin. We recorded the performance statistics over 3 buffer sizes:

  • A minimum 2x2 case, to test the maximum speeds that the apply methods could achieve
  • A medium 1920x1080 case, to illustrate the changes in performance as the buffer size increases
  • A maximum 3840x2160 case, to test performance on large buffers

Jetson Nano

Procesing Time

Procesing time (In microseconds, averaged over 100 iterations) 2x2 Buffers 1080p Buffers 4K Buffers
cudashift 136 135 147
cudadebayer 68 53 55
cudawhitebalancer 317 5071 18903
cudacolorspaceconverter 55 55 57

CPU and CPU RAM usage

Measurement (Averaged over 100 iterations) 2x2 Buffers 1080p Buffers 4K Buffers
CPU usage (%) 0.797500 0.836478 0.819940
CPU RAM usage (kB) 147071 146295 147580

GPU and GPU RAM usage

Measurement (Averaged over 100 iterations) 2x2 Buffers 1080p Buffers 4K Buffers
GPU usage (%) 0.0 25.12 94.6
GPU RAM usage (kB) 91967 91733 116833

Jetson Xavier NX

Procesing Time

Procesing time (In microseconds, averaged over 100 iterations) 2x2 Buffers 1080p Buffers 4K Buffers
cudashift 93 93 93
cudadebayer 39 39 31
cudawhitebalancer 375 1360 4249
cudacolorspaceconverter 33 35 34

CPU and CPU RAM usage

Measurement (Averaged over 100 iterations) 2x2 Buffers 1080p Buffers 4K Buffers
CPU usage (%) 0.482488 0.523657 0.477216
CPU RAM usage (kB) 171679 173539 171987

GPU and GPU RAM usage

Measurement (Averaged over 100 iterations) 2x2 Buffers 1080p Buffers 4K Buffers
GPU usage (%) 0.85 5.48 17.91
GPU RAM usage (kB) 98719 100387 106288

Jetson Xavier AGX

Procesing Time

Procesing time (In microseconds, averaged over 100 iterations) 2x2 Buffers 1080p Buffers 4K Buffers
cudashift 129 135 131
cudadebayer 54 48 39
cudawhitebalancer 667 4844 8091
cudacolorspaceconverter 38 45 52

CPU and CPU RAM usage

Measurement (Averaged over 100 iterations) 2x2 Buffers 1080p Buffers 4K Buffers
CPU usage (%) 0.409836 0.491435 0.458062
CPU RAM usage (kB) 172066 173613 173477

GPU and GPU RAM usage

Measurement (Averaged over 100 iterations) 2x2 Buffers 1080p Buffers 4K Buffers
GPU usage (%)
GPU RAM usage (kB) 101984 105247 107641

Jetson Orin

Procesing Time

Procesing time (In microseconds, averaged over 100 iterations) 2x2 Buffers 1080p Buffers 4K Buffers
cudashift
cudadebayer
cudawhitebalancer
cudacolorspaceconverter

CPU and CPU RAM usage

Measurement (Averaged over 100 iterations) 2x2 Buffers 1080p Buffers 4K Buffers
CPU usage (%)
CPU RAM usage (kB)

GPU and GPU RAM usage

Measurement (Averaged over 100 iterations) 2x2 Buffers 1080p Buffers 4K Buffers
GPU usage (%)
GPU RAM usage (kB)

GStreamer elements performance

To measure the performance, we have used two of our GStreamer tools: GstShark and GstPerf.

For testing purposes, take into account the following points:

  • Maximum performance mode enabled: all cores and Jetson clocks enabled.
  • Jetpack 4.6
  • A patch was applied to v4l2src to enable bayer10 captures. You can see how to apply the patch in this link: Apply patch to v4l2src

Jetson Xavier AGX

For all the elements, the processing time and FPS were measured with an input image with 1920x1200 resolution from a camera sensor.

The following pipeline was used to test the cudadebayer and cudaawb elements with an RGB image as output.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! cudaawb ! 'video/x-raw, format=RGB' ! fakesink

The following pipeline was used for the cudadebayer and cudaawb elements with an I420 image as output.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! cudaawb ! 'video/x-raw, format=I420' ! fakesink

The results obtained:


Xavier AGX
Element cudadebayer cudaawb
Output RGB I420 RGB I420
FPS 539 458 752 473
Processing time (seconds) 0.001854 0.002183 0.001329 0.002111




Error creating thumbnail: Unable to save thumbnail to destination




Jetson Xavier NX

For all the elements, the processing time and FPS were measured with an input image with 4K resolution from a camera sensor.

The following pipeline measured the processing time and FPS for the cudashift element.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, format=rggb' ! cudashift shift=5 ! fakesink

The following pipeline was used to test the cudadebayer and cudaawb elements with an RGB image as output.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! cudaawb ! fakesink

The following pipeline was used to test the cudadebayer and cudaawb elements with an I420 image as output.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! cudaawb ! 'video/x-raw, format=I420' ! fakesink


The results obtained:

Xavier NX
Element cudashift cudadebayer cudaawb
Output bayer 8 RGB I420 RGB I420
FPS 396 228 187 370 202
Processing time (seconds) 0.002522 0.004389 0.005353 0.002698 0.004952




Error creating thumbnail: Unable to save thumbnail to destination




Jetson Nano

For all the elements, the processing time and FPS were measured with an input image with 4K resolution from a camera sensor.

The following pipeline measured the processing time and FPS for the cudashift element.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, format=rggb' ! cudashift shift=0 ! fakesink

he following pipeline was used to testing the cudadebayer and cudaawb elements with an RGB image as output.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! cudaawb ! fakesink

The following pipeline was used to test the cudadebayer and cudaawb elements with an I420 image as output.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! cudaawb ! 'video/x-raw, format=I420' ! fakesink

The results obtained:

Nano
Element cudashift cudadebayer cudaawb
Output bayer 8 RGB I420 RGB I420
FPS 92 51 36 91 38
Processing time (seconds) 0.01088 0.01948 0.02769 0.01096 0.02605




Error creating thumbnail: Unable to save thumbnail to destination




More cameras

This section shows the performance results for the elements running simultaneously on multiple cameras on a Jetson Xavier AGX. For all the tests done with an RGB output image, the following pipeline was used to measure the processing time and FPS for the cudaawb and the cudadebayer element with an input image with 1920x1200 resolution coming from multiple camera sensors.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src device=/dev/video0 io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! cudaawb ! 'video/x-raw, format=RGB' ! fakesink

In the same way, all the tests are done with an I420 output image. the following pipeline was used to measure the processing time and FPS for the cudaawb and the cudadebayer element with an input image with 1920x1200 resolution coming from multiple camera sensors

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src device=/dev/video1 io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! cudaawb ! 'video/x-raw, format=I420' ! fakesink

The results obtained:

cudadebayer
Output RGB I420
Number of cameras Two Three Four Five Two Three Four Five
FPS 412 429 385 494 464 402 320 332
Processing time (seconds) 0.002426 0.002354 0.002597 0.002025 0.002154 0.002486 0.003128 0.003011
cudaawb
Output RGB I420
Number of cameras Two Three Four Five Two Three Four Five
FPS 397 374 689 347 429 450 289 296
Processing time (seconds) 0.002521 0.002672 0.001450 0.002883 0.002330 0.00220 0.003459 0.003375












  Index