Difference between revisions of "GstCUDA - Performance Profiling"

Latest revision as of 13:58, 15 September 2020

This page shows GstCUDA performance profiling.

Glass to glass latency

This wiki contains the glass to glass latency measurement results of GstCUDA simple capture and display pipelines on a TX2. It contains the results for all the possible GstCUDA (cudafilter and cudamux) configurations and uses cases.

All the measurements were taken using the TX2 on the high-performance mode by running the following commands:

sudo nvpmodel -m 0 #Reboot after running it, so changes can take effect.
reboot
sudo ~/jetson_clocks

Jetpack 3.3 - IMX274 camera 4K@60fps glass to glass latency

Simple Capture to Display pipeline (without GstCUDA)

This measurement should be used as a reference to compare the glass to glass latency of the below pipelines with GstCUDA.

Glass to Glass latency: 112.2042693 ms ---> (59.9252609 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v  nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

Cudafilter

NVMM Direct Handling

In-place:True

Glass to Glass latency: 178.9331237 ms

Test pipeline:

gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

In-place:False

Glass to Glass latency: 230.3850304 ms

Test pipeline:

gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

Unified Memory Allocator

In-place:True

Glass to Glass latency: 188.1192285 ms

Test pipeline:

gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

In-place:False

Glass to Glass latency: 306.2578894 ms

Test pipeline:

gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! nvvidconv ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

Cudamux

NVMM Direct Handling

In-place:True

Glass to Glass latency: 145.5713375 ms ---> (75.4149314 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v cudamux name=cuda in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

In-place:False

Glass to Glass latency: 332.9231919 ms ---> (112.3744414 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v cudamux name=cuda in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

Unified Memory Allocator

In-place:True

Glass to Glass latency: 136.4211149 ms ---> (118.3355796 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v cudamux name=cuda in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

In-place:False

Glass to Glass latency: 197.1957698 ms ---> (197.1957698 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v cudamux name=cuda in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

Previous: Example - opencvfilter

Index

Next: Contact Us

@@ Line 1: / Line 1: @@
-{{GstCUDA Page |
+{{GstCUDA/Head|previous=Example - opencvfilter|next=Contact Us|keywords=GstCUDA add-ons,GstCUDA framework}}
-[[GstCUDA - Example 2: cudadebayer | Example 3: cudadebayer]]|
-[[GstCUDA|Home]]|
 This page shows GstCUDA performance profiling.
-__TOC__
+<br>
+<table>
+<tr>
+<td><div class="clear; float:right">__TOC__</div></td>
+<td valign=top>
+{{GStreamer debug}}
+</td>
+</table>
 == Glass to glass latency ==
-This wiki contains the glass to glass latency measurements results of GstCUDA simple capture and display pipelines on a TX2. It contains the results for all the possible GstCUDA (cudafilter and cudamux) configurations and uses cases.
+This wiki contains the glass to glass latency measurement results of GstCUDA simple capture and display pipelines on a TX2. It contains the results for all the possible GstCUDA (cudafilter and cudamux) configurations and uses cases.
 All the measurements were taken using the TX2 on the high-performance mode by running the following commands:
@@ Line 21: / Line 25: @@
 ====Simple Capture to Display pipeline (without GstCUDA)====
 This measurement should be used as a reference to compare the glass to glass latency of the below pipelines with GstCUDA.
-* '''''Glass to Glass latency: 112.2042693 ms'''''
+* '''''Glass to Glass latency: 112.2042693 ms'''''   --->   (59.9252609 ms with tuned/optimized pipeline)
 Test pipeline:
 <pre>
@@ Line 54: / Line 58: @@
 gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! nvvidconv ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
 </pre>
 ==== Cudamux ====
 ===== NVMM Direct Handling =====
 ====== In-place:True ======
-* '''''Glass to Glass latency: 145.5713375 ms'''''
+* '''''Glass to Glass latency: 145.5713375 ms'''''   --->   (75.4149314 ms with tuned/optimized pipeline)
 Test pipeline:
 <pre>
@@ Line 65: / Line 68: @@
 </pre>
 ====== In-place:False ======
-* '''''Glass to Glass latency: 332.9231919 ms'''''
+* '''''Glass to Glass latency: 332.9231919 ms'''''   --->   (112.3744414 ms with tuned/optimized pipeline)
 Test pipeline:
 <pre>
@@ Line 72: / Line 75: @@
 ===== Unified Memory Allocator =====
 ====== In-place:True ======
-* '''''Glass to Glass latency: 136.4211149 ms'''''
+* '''''Glass to Glass latency: 136.4211149 ms'''''   --->   (118.3355796 ms with tuned/optimized pipeline)
 Test pipeline:
 <pre>
@@ Line 78: / Line 81: @@
 </pre>
 ====== In-place:False ======
-* '''''Glass to Glass latency: 197.1957698 ms'''''
+* '''''Glass to Glass latency: 197.1957698 ms'''''   --->   (197.1957698 ms with tuned/optimized pipeline)
 Test pipeline:
 <pre>
@@ Line 85: / Line 88: @@
-|keywords=GstCUDA add-ons,GstCUDA framework}}
+{{GstCUDA/Foot|previous=Example - opencvfilter|next=Contact Us}}

Difference between revisions of "GstCUDA - Performance Profiling"

Latest revision as of 13:58, 15 September 2020

Contents

Glass to glass latency

Jetpack 3.3 - IMX274 camera 4K@60fps glass to glass latency

Simple Capture to Display pipeline (without GstCUDA)

Cudafilter

NVMM Direct Handling

In-place:True

In-place:False

Unified Memory Allocator

In-place:True

In-place:False

Cudamux

NVMM Direct Handling

In-place:True

In-place:False

Unified Memory Allocator

In-place:True

In-place:False

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Useful Links

Legal

Services

Tools