Difference between revisions of "GstCUDA - Performance Profiling"

From RidgeRun Developer Connection
Jump to: navigation, search
m
 
(9 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{GstCUDA Page |  
+
{{GstCUDA/Head|previous=Example - opencvfilter|next=Contact Us|keywords=GstCUDA add-ons,GstCUDA framework}}
[[GstCUDA - Example 2: cudadebayer | Example 3: cudadebayer]]|
 
[[GstCUDA|Home]]|
 
  
 
This page shows GstCUDA performance profiling.
 
This page shows GstCUDA performance profiling.
  
__TOC__
+
<br>
 
+
<table>
 +
<tr>
 +
<td><div class="clear; float:right">__TOC__</div></td>
 +
<td valign=top>
 +
{{GStreamer debug}}
 +
</td>
 +
</table>
  
 
== Glass to glass latency ==  
 
== Glass to glass latency ==  
This wiki contains the glass to glass latency measurements results of GstCUDA simple capture and display pipelines on a TX2. It contains the results for all the possible GstCUDA (cudafilter and cudamux) configurations and uses cases.
+
This wiki contains the glass to glass latency measurement results of GstCUDA simple capture and display pipelines on a TX2. It contains the results for all the possible GstCUDA (cudafilter and cudamux) configurations and uses cases.
  
 
All the measurements were taken using the TX2 on the high-performance mode by running the following commands:
 
All the measurements were taken using the TX2 on the high-performance mode by running the following commands:
Line 21: Line 25:
 
====Simple Capture to Display pipeline (without GstCUDA)====
 
====Simple Capture to Display pipeline (without GstCUDA)====
 
This measurement should be used as a reference to compare the glass to glass latency of the below pipelines with GstCUDA.
 
This measurement should be used as a reference to compare the glass to glass latency of the below pipelines with GstCUDA.
* '''''Glass to Glass latency: 112.2042693 ms'''''
+
* '''''Glass to Glass latency: 112.2042693 ms'''''   --->  (59.9252609 ms with tuned/optimized pipeline)
 
Test pipeline:
 
Test pipeline:
 
<pre>
 
<pre>
Line 54: Line 58:
 
gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! nvvidconv ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
 
gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! nvvidconv ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
 
</pre>
 
</pre>
 
  
 
==== Cudamux ====
 
==== Cudamux ====
 
===== NVMM Direct Handling =====
 
===== NVMM Direct Handling =====
 
====== In-place:True ======
 
====== In-place:True ======
* '''''Glass to Glass latency: 145.5713375 ms'''''
+
* '''''Glass to Glass latency: 145.5713375 ms'''''   --->  (75.4149314 ms with tuned/optimized pipeline)
 
Test pipeline:
 
Test pipeline:
 
<pre>
 
<pre>
Line 65: Line 68:
 
</pre>
 
</pre>
 
====== In-place:False ======
 
====== In-place:False ======
* '''''Glass to Glass latency: 332.9231919 ms'''''
+
* '''''Glass to Glass latency: 332.9231919 ms'''''   --->  (112.3744414 ms with tuned/optimized pipeline)
 
Test pipeline:
 
Test pipeline:
 
<pre>
 
<pre>
Line 72: Line 75:
 
===== Unified Memory Allocator =====
 
===== Unified Memory Allocator =====
 
====== In-place:True ======
 
====== In-place:True ======
* '''''Glass to Glass latency: 136.4211149 ms'''''
+
* '''''Glass to Glass latency: 136.4211149 ms'''''   --->  (118.3355796 ms with tuned/optimized pipeline)
 
Test pipeline:
 
Test pipeline:
 
<pre>
 
<pre>
Line 78: Line 81:
 
</pre>
 
</pre>
 
====== In-place:False ======
 
====== In-place:False ======
* '''''Glass to Glass latency: 197.1957698 ms'''''
+
* '''''Glass to Glass latency: 197.1957698 ms'''''   --->  (197.1957698 ms with tuned/optimized pipeline)
 
Test pipeline:
 
Test pipeline:
 
<pre>
 
<pre>
Line 85: Line 88:
  
  
|keywords=GstCUDA add-ons,GstCUDA framework}}
+
{{GstCUDA/Foot|previous=Example - opencvfilter|next=Contact Us}}

Latest revision as of 13:58, 15 September 2020


Previous: Example - opencvfilter Index Next: Contact Us


Nvidia-preferred-partner-badge-rgb-for-screen.png



This page shows GstCUDA performance profiling.


Error something wrong.jpg Problems running the pipelines shown on this page?
Please see our GStreamer Debugging guide for help.

Glass to glass latency

This wiki contains the glass to glass latency measurement results of GstCUDA simple capture and display pipelines on a TX2. It contains the results for all the possible GstCUDA (cudafilter and cudamux) configurations and uses cases.

All the measurements were taken using the TX2 on the high-performance mode by running the following commands:

sudo nvpmodel -m 0 #Reboot after running it, so changes can take effect.
reboot
sudo ~/jetson_clocks

Jetpack 3.3 - IMX274 camera 4K@60fps glass to glass latency

Simple Capture to Display pipeline (without GstCUDA)

This measurement should be used as a reference to compare the glass to glass latency of the below pipelines with GstCUDA.

  • Glass to Glass latency: 112.2042693 ms ---> (59.9252609 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v  nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

Cudafilter

NVMM Direct Handling
In-place:True
  • Glass to Glass latency: 178.9331237 ms

Test pipeline:

gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
In-place:False
  • Glass to Glass latency: 230.3850304 ms

Test pipeline:

gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
Unified Memory Allocator
In-place:True
  • Glass to Glass latency: 188.1192285 ms

Test pipeline:

gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
In-place:False
  • Glass to Glass latency: 306.2578894 ms

Test pipeline:

gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! nvvidconv ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

Cudamux

NVMM Direct Handling
In-place:True
  • Glass to Glass latency: 145.5713375 ms ---> (75.4149314 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v cudamux name=cuda in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
In-place:False
  • Glass to Glass latency: 332.9231919 ms ---> (112.3744414 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v cudamux name=cuda in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
Unified Memory Allocator
In-place:True
  • Glass to Glass latency: 136.4211149 ms ---> (118.3355796 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v cudamux name=cuda in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
In-place:False
  • Glass to Glass latency: 197.1957698 ms ---> (197.1957698 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v cudamux name=cuda in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false



Previous: Example - opencvfilter Index Next: Contact Us