Difference between revisions of "GstCUDA - Performance Profiling"

From RidgeRun Developer Connection
Jump to: navigation, search
Line 21: Line 21:
 
====Simple Capture to Display pipeline (without GstCUDA)====
 
====Simple Capture to Display pipeline (without GstCUDA)====
 
This measurement should be used as a reference to compare the glass to glass latency of the below pipelines with GstCUDA.
 
This measurement should be used as a reference to compare the glass to glass latency of the below pipelines with GstCUDA.
* '''''Glass to Glass latency: 112.2042693 ms'''''  (59.9252609 ms with tuned/optimized pipeline)
+
* '''''Glass to Glass latency: 112.2042693 ms   '''''  (59.9252609 ms with tuned/optimized pipeline)
 
Test pipeline:
 
Test pipeline:
 
<pre>
 
<pre>
Line 59: Line 59:
 
===== NVMM Direct Handling =====
 
===== NVMM Direct Handling =====
 
====== In-place:True ======
 
====== In-place:True ======
* '''''Glass to Glass latency: 145.5713375 ms'''''  (75.4149314 ms with tuned/optimized pipeline)
+
* '''''Glass to Glass latency: 145.5713375 ms   '''''  (75.4149314 ms with tuned/optimized pipeline)
 
Test pipeline:
 
Test pipeline:
 
<pre>
 
<pre>
Line 65: Line 65:
 
</pre>
 
</pre>
 
====== In-place:False ======
 
====== In-place:False ======
* '''''Glass to Glass latency: 332.9231919 ms'''''  (182.126088 ms with tuned/optimized pipeline)
+
* '''''Glass to Glass latency: 332.9231919 ms   '''''  (182.126088 ms with tuned/optimized pipeline)
 
Test pipeline:
 
Test pipeline:
 
<pre>
 
<pre>
Line 72: Line 72:
 
===== Unified Memory Allocator =====
 
===== Unified Memory Allocator =====
 
====== In-place:True ======
 
====== In-place:True ======
* '''''Glass to Glass latency: 136.4211149 ms'''''  (118.3355796 ms with tuned/optimized pipeline)
+
* '''''Glass to Glass latency: 136.4211149 ms   '''''  (118.3355796 ms with tuned/optimized pipeline)
 
Test pipeline:
 
Test pipeline:
 
<pre>
 
<pre>
Line 78: Line 78:
 
</pre>
 
</pre>
 
====== In-place:False ======
 
====== In-place:False ======
* '''''Glass to Glass latency: 197.1957698 ms'''''  (197.1957698 ms with tuned/optimized pipeline)
+
* '''''Glass to Glass latency: 197.1957698 ms   '''''  (197.1957698 ms with tuned/optimized pipeline)
 
Test pipeline:
 
Test pipeline:
 
<pre>
 
<pre>

Revision as of 15:16, 17 July 2019


Example 3: cudadebayer


Home

Home



This page shows GstCUDA performance profiling.


Glass to glass latency

This wiki contains the glass to glass latency measurements results of GstCUDA simple capture and display pipelines on a TX2. It contains the results for all the possible GstCUDA (cudafilter and cudamux) configurations and uses cases.

All the measurements were taken using the TX2 on the high-performance mode by running the following commands:

sudo nvpmodel -m 0 #Reboot after running it, so changes can take effect.
reboot
sudo ~/jetson_clocks

Jetpack 3.3 - IMX274 camera 4K@60fps glass to glass latency

Simple Capture to Display pipeline (without GstCUDA)

This measurement should be used as a reference to compare the glass to glass latency of the below pipelines with GstCUDA.

  • Glass to Glass latency: 112.2042693 ms (59.9252609 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v  nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

Cudafilter

NVMM Direct Handling
In-place:True
  • Glass to Glass latency: 178.9331237 ms

Test pipeline:

gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
In-place:False
  • Glass to Glass latency: 230.3850304 ms

Test pipeline:

gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
Unified Memory Allocator
In-place:True
  • Glass to Glass latency: 188.1192285 ms

Test pipeline:

gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
In-place:False
  • Glass to Glass latency: 306.2578894 ms

Test pipeline:

gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! nvvidconv ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false


Cudamux

NVMM Direct Handling
In-place:True
  • Glass to Glass latency: 145.5713375 ms (75.4149314 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v cudamux name=cuda in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
In-place:False
  • Glass to Glass latency: 332.9231919 ms (182.126088 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v cudamux name=cuda in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
Unified Memory Allocator
In-place:True
  • Glass to Glass latency: 136.4211149 ms (118.3355796 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v cudamux name=cuda in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
In-place:False
  • Glass to Glass latency: 197.1957698 ms (197.1957698 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v cudamux name=cuda in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false




Example 3: cudadebayer


Home

Home