GStreamer Encoding Latency in NVIDIA Jetson Platforms

From RidgeRun Developer Connection
Revision as of 12:09, 16 March 2022 by Jsalas (talk | contribs) (Created page with "= Introduction = This wiki is intended to evaluate the latency or processing time of the GStreamer hardware-accelerated encoders available in Jetson platforms. The results s...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction

This wiki is intended to evaluate the latency or processing time of the GStreamer hardware-accelerated encoders available in Jetson platforms.

The results shown here were obtained using a Jetson TX2 platform. The evaluation was intended for optimizing Jetpack 3.3.3 but some results using Jetpack 4.5.1 are included to compare the improvements in new releases.

Latency Tests

This evaluation involves two pipelines encoding simultaneously, using H264 codec. The test cases are for the following resolutions:

  • 1920x1080@50FPS
  • 1280x720@50FPS


The starting point pipelines are the following:

  • 1920x1080@50FPS
gst-launch-1.0 videotestsrc is-live=true ! "video/x-raw,format=I420,width=1920,height=1080,framerate=50/1" ! nvvidconv name=nv0 ! "video/x-raw(memory:NVMM)" ! omxh264enc name=enc0 control-rate=variable bitrate=20000000 profile=main ! video/x-h264,stream-format=byte-stream ! fakesink videotestsrc is-live=true ! "video/x-raw,format=I420,width=1920,height=1080,framerate=50/1" ! nvvidconv name=nv1 ! "video/x-raw(memory:NVMM)" ! omxh264enc name=enc1 control-rate=variable bitrate=20000000 profile=main ! video/x-h264,stream-format=byte-stream ! fakesink
  • 1280x720@50FPS
gst-launch-1.0 videotestsrc is-live=true ! "video/x-raw,format=I420,width=1280,height=720,framerate=50/1" ! nvvidconv name=nv0 ! "video/x-raw(memory:NVMM)" ! omxh264enc name=enc0 control-rate=variable bitrate=20000000 profile=main ! video/x-h264,stream-format=byte-stream ! fakesink videotestsrc is-live=true ! "video/x-raw,format=I420,width=1280,height=720,framerate=50/1" ! nvvidconv name=nv1 ! "video/x-raw(memory:NVMM)" ! omxh264enc name=enc1 control-rate=variable bitrate=20000000 profile=main ! video/x-h264,stream-format=byte-stream ! fakesink

For this analysis we are only interested in the encoders processing time, therefore the plots will only show the behavior of these.

Jetpack 3.3 OMX (TX2)

Base

  • 1920x1080
  • 1280x720

Here we can see the following results:

Resolution Mean Latency Peak Latency
1920x1280 ~22ms ~32ms
1280x720 ~15ms ~18ms

Different Profiles

In order to validate if the encoder profile affects the latency we tested the 3 different profiles with the 1080p resolution:

  • Baseline
  • Main
  • High

We can see that there is no real difference between baseline and main profile, but a high profile increases significantly the maximum processing time of the encoder:

Profile Mean Latency Peak Latency
Baseline ~22ms ~32ms
Main ~22ms ~32ms
High ~22ms ~53ms

Different Bitrates

Another common question is if the encoder bitrate affects the latency, so to test it we tried 4 different values with the 1080p resolution:

  • 4Mbit
  • 20Mbit
  • 60Mbit
  • 80Mbit

Note that none of the values seem to make a significant difference, so we can say that the bitrate does not affect the encoder processing time.

In relation to this we also tested if the bitrate control method had some effects on the processing time:

  • Variable
  • Constant

But it also does not seem to have any effect on latency.

H265

The H265 codec is being used more and more lately, so it is important to validate its performance too:

  • 1920x1080
  • 1280x720

We can see that there are less peaks in the plot, but also that the average processing time increased a little bit:

Resolution Mean Latency Peak Latency
1920x1080 ~28ms ~30ms
1280x720 ~25ms ~28ms

Jetpack 4.5 OMX (TX2)

The idea of these tests is to verify if there were improvements on the encoding with new releases.

Base

  • 1920x1080
  • 1280x720

Here we can see the following results:

Resolution Mean Latency Peak Latency
1920x1280 ~15ms ~22ms
1280x720 ~12ms ~20ms

As we can see, there were some small improvements in the 4.5 release compared to 3.3.

H265

As per the H265 codec in this newer release the results are the following:

  • 1920x1080
  • 1280x720
Resolution Mean Latency Peak Latency
1920x1280 ~15ms ~18ms
1280x720 ~15ms ~19ms

So here we also had a big improvement compared to the 3.3 release.


Jetpack 4.5 V4L2 (TX2)

NVIDIA has reported in several posts that OMX encoders are deprecated, therefore they recommend to use the V4L2 encoders nvv4l2h264enc and nvv4l2h265enc in newer releases. The base pipelines for these are:

The starting point pipelines are the following:

  • 1920x1080@50FPS
gst-launch-1.0 videotestsrc is-live=true ! "video/x-raw,format=I420,width=1920,height=1080,framerate=50/1" ! nvvidconv name=nv0 ! "video/x-raw(memory:NVMM)" ! nvv4l2h264enc name=enc0 control-rate=variable_bitrate bitrate=20000000 profile=Main ! video/x-h264,stream-format=byte-stream ! fakesink videotestsrc is-live=true ! "video/x-raw,format=I420,width=1920,height=1080,framerate=50/1" ! nvvidconv name=nv1 ! "video/x-raw(memory:NVMM)" ! nvv4l2h264enc name=enc1 control-rate=variable_bitrate bitrate=20000000 profile=Main ! video/x-h264,stream-format=byte-stream ! fakesink
  • 1280x720@50FPS
gst-launch-1.0 videotestsrc is-live=true ! "video/x-raw,format=I420,width=1280,height=720,framerate=50/1" ! nvvidconv name=nv0 ! "video/x-raw(memory:NVMM)" ! nvv4l2h264enc name=enc0 control-rate=variable_bitrate bitrate=20000000 profile=Main ! video/x-h264,stream-format=byte-stream ! fakesink videotestsrc is-live=true ! "video/x-raw,format=I420,width=1280,height=720,framerate=50/1" ! nvvidconv name=nv1 ! "video/x-raw(memory:NVMM)" ! nvv4l2h264enc name=enc1 control-rate=variable_bitrate bitrate=20000000 profile=Main ! video/x-h264,stream-format=byte-stream ! fakesink

Base

  • 1920x1080
  • 1280x720

Here we can see the following results:

Resolution Mean Latency Peak Latency
1920x1280 ~15ms ~22ms
1280x720 ~12ms ~20ms

As we can see, they have pretty much the same performance than the OMX encoder.

Maximum Performance

The V4L2 encoders have a property named maxperf-enable that can be set for decreasing the processing time and improving the performance:

  • 1920x1080
  • 1280x720

We can see the following results:

Resolution Mean Latency Peak Latency
1920x1280 ~8ms ~12ms
1280x720 ~5ms ~10ms

Note that here we have the best time compared to the other encoders and tests, so the maxperf-enable property seems to have a greatly significant effect on the processing time.

Maximum Performance and H265

The V4L2 encoder maxperf-enable is also available for the H265 encoder. The results for both resolutions are:

  • 1920x1080
  • 1280x720
Resolution Mean Latency Peak Latency
1920x1280 ~8ms ~12ms
1280x720 ~5ms ~10ms

So here it seems that the performance is the same for both H265 and H264 codecs.

Quality Tests

For these tests the main focus was to compare results between different configurations on the OMX and V4L2 encoders for H264 and H265 codecs. However, quality is hard to measure in an objective way, therefore we include some visual results for you to evaluate.

The tests were performed modifying the same configurations mentioned in the latency section and using the 1080p resolution.

Different Profiles

We tried modifying the profiles on both encoders to see if the quality was significantly affected but the high profile which is the next one after the main profile did not seem to improve a lot the quality. Additionally, we noticed in the latency section that the high profile increases significantly the encoder processing time, so it is not a good alternative to improve quality.

OMX

  • Error creating thumbnail: Unable to save thumbnail to destination

V4L2

  • Error creating thumbnail: Unable to save thumbnail to destination

Different Bitrate Control

The bitrate control was another relevant property to test, so we tried to set both encoders to use variable (VBR) and constant (CBR) bitrates, however it didn't seem to affect the H264 quality a lot.

H264

OMX

  • Vbr-cbr.gif

V4L2

  • Error creating thumbnail: Unable to save thumbnail to destination

H265

On the other side, for H265 it seems that the V4L2 encoder quality is a little more affected by this property, while the OMX encoder does not seem to be affected at all.

OMX

  • H265-vbr-cbr.gif

V4L2

  • Error creating thumbnail: Unable to save thumbnail to destination


Different Bitrates

Another worthy test is the bitrate variation, since this parameter is directly associated with the video quality. Also, since we verified in the latency section that it did not seem to affect the processing time of the encoder, it would be a good alternative to improve quality without decreasing performance.

Here at the results we can notice that with lower bitrates the H264 encoder behaves better than H265.

OMX

  • Error creating thumbnail: Unable to save thumbnail to destination

V4L2

  • Error creating thumbnail: Unable to save thumbnail to destination

Maximum Performance

The maxperf-enable property gives a really good boost to the V4L2 encoders in terms of latency, so it is important to validate if there's some tradeoff in quality. However, the results seem to tell that there is no difference in quality for both codecs just by enabling this property.

  • Error creating thumbnail: Unable to save thumbnail to destination

OMX vs V4L2

Finally, we compare both OMX and V4L2 encoders. They both have similar results on H264, however in H265 it seems that V4L2 has a bit better quality.


  • Error creating thumbnail: Unable to save thumbnail to destination

OMX vs V4L2

  • The gst-omx is a plugin that wraps available OpenMAX IL components and makes them available as standard GStreamer elements. This API allows library and codec implementers to rapidly and effectively utilize the full acceleration potential of new silicon, regardless of the underlying hardware architecture.
  • The gst-v4l2 is a plugin that uses the V4L2 kernel framework. Through the framework it creates new V4L2 elements, subscribes to V4L2 events, dequeues an event from an element, and sets/gets control values.
  • As per the release notes of Jetpack 4.2, gst-omx plugins are deprecated and are said to be removed in future releases. Starting from this same release NVIDIA provides the gst-v4l2 plugin to use instead, which has proven to have a better performance.

Conclusions

  • The minimum mean time achieved for OMX encoders on Jetpack 3.3.3 for a 1080p resolution is ~22ms, which means that 50 FPS are not viable.
  • H265 encoding in Jetpack 3.3 is slower than H264, so it can't be used as an alternative.
  • Bitrate is not related to processing time on OMX encoders so it is a good option to improve quality.
  • OMX encoders have a better performance on Jetpack 4.5.1, but if the upgrade is possible it is recommended to use V4L2 encoders with maxperf-enable=true property, which offer a really low latency among all the results:


Resolution OMX Jetpack 3.3 Mean Latency OMX Jetpack 4.5 Mean Latency V4L2 Jetpack 4.5 Mean Latency
1920x1280 (H264) ~22ms ~15ms ~8ms
1280x720 (H264) ~15ms ~12ms ~5ms
1920x1280 (H265) ~28ms ~15ms ~8ms
1280x720 (H265) ~25ms ~15ms ~10ms