Difference between revisions of "GstCUDA - Performance Profiling"

From RidgeRun Developer Connection
Jump to: navigation, search
(Created page with "{{GstCUDA Page | Example 3: cudadebayer| Home| This page shows GstCUDA performance profiling. __TOC__ ==Description== Ri...")
 
Line 7: Line 7:
 
__TOC__
 
__TOC__
  
==Description==
+
= Glass to glass latency =  
RidgeRun offers ad-ons for GstCUDA. Those consists in full complete and ready to use elements that executes a specific CUDA algorithm that is integrated into the element code. Those ad-ons elements are based on the GstCUDA framework, and clearly shows the potential of this framework being used to generate a final product.
+
This wiki contains the glass to glass latency measurements results of GstCUDA simple capture and display pipelines on a TX2. It contains the results for all the possible GstCUDA (cudafilter and cudamux) configurations and uses cases.
  
 +
All the measurements were taken using the TX2 on the high-performance mode by running the following commands:
 +
<pre>
 +
sudo nvpmodel -m 0 #Reboot after running it, so changes can take effect.
 +
reboot
 +
sudo ~/jetson_clocks
 +
</pre>
  
Each add-on is sold separately. They are not included in the GstCUDA framework, but require it to work. So, to acquire an add-on you must have purchased GstCUDA at least once.
+
= Jetpack 3.3 - IMX274 camera 4K@60fps glass to glass latency =
 +
=Simple Capture to Display pipeline (without GstCUDA)=
 +
This measurement should be used as a reference to compare the glass to glass latency of the below pipelines with GstCUDA.
 +
* '''''Glass to Glass latency = 112.2042693 ms'''''
 +
Test pipeline:
 +
<pre>
 +
gst-launch-1.0 -v  nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
 +
</pre>
  
  
Over time, more ad-ons will be added. If you are interested in a new different ad-on, that isn't provided yet, please don't hesitate in [http://www.ridgerun.com/contact/ contact us].
+
== Cudafilter ==
 +
=== NVMM Direct Handling ===
 +
==== In-place:True ====
 +
* '''''Glass to Glass latency = 178.9331237 ms'''''
 +
Test pipeline:
 +
<pre>
 +
gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
 +
</pre>
 +
==== In-place:False ====
 +
* '''''Glass to Glass latency = 230.3850304 ms'''''
 +
Test pipeline:
 +
<pre>
 +
gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
 +
</pre>
 +
=== Unified Memory Allocator ===
 +
==== In-place:True ====
 +
* '''''Glass to Glass latency = 188.1192285 ms'''''
 +
Test pipeline:
 +
<pre>
 +
gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
 +
</pre>
 +
==== In-place:False ====
 +
* '''''Glass to Glass latency = 306.2578894 ms'''''
 +
Test pipeline:
 +
<pre>
 +
gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! nvvidconv ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
 +
</pre>
  
  
 
+
== Cudamux ==
== Ad-ons Index ==
+
=== NVMM Direct Handling ===
The following index gives a detailed description of the GstCUDA ad-ons.
+
==== In-place:True ====
 
+
* '''''Glass to Glass latency = 145.5713375 ms'''''
<html>
+
Test pipeline:
  <div class="toc" style="font-size:80%;">
+
<pre>
    <ol>
+
gst-launch-1.0 -v cudamux name=cuda in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
      <li> <a href=https://developer.ridgerun.com/wiki/index.php?title=GstCUDA_-_cudadebayer>cudadebayer</a></li>
+
</pre>
    </ol>
+
==== In-place:False ====
  </div>
+
* '''''Glass to Glass latency = 332.9231919 ms'''''
</html>
+
Test pipeline:
 +
<pre>
 +
gst-launch-1.0 -v cudamux name=cuda in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
 +
</pre>
 +
=== Unified Memory Allocator ===
 +
==== In-place:True ====
 +
* '''''Glass to Glass latency = 136.4211149 ms'''''
 +
Test pipeline:
 +
<pre>
 +
gst-launch-1.0 -v cudamux name=cuda in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
 +
</pre>
 +
==== In-place:False ====
 +
* '''''Glass to Glass latency = 197.1957698 ms'''''
 +
Test pipeline:
 +
<pre>
 +
gst-launch-1.0 -v cudamux name=cuda in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
 +
</pre>
  
  
 
|keywords=GstCUDA add-ons,GstCUDA framework}}
 
|keywords=GstCUDA add-ons,GstCUDA framework}}

Revision as of 19:18, 15 July 2019