Difference between revisions of "NVIDIA Jetson Orin/JetPack 5.0.2/Performance Tuning/Software Encoders For Jetson Orin Nano"
Efernandez (talk | contribs) (→FFmpeg) |
Efernandez (talk | contribs) (→FFmpeg) |
||
Line 261: | Line 261: | ||
| ultrafast || 57 || 71 || 42.4 | | ultrafast || 57 || 71 || 42.4 | ||
|- | |- | ||
− | | rowspan=6 | 1280x720 || veryslow || rowspan=3 | Variable || | + | | rowspan=6 | 1280x720 || veryslow || rowspan=3 | Variable || 12 || 256 || 77.0 |
|- | |- | ||
− | | medium || | + | | medium || 52|| 78 || 64.7 |
|- | |- | ||
| ultrafast || 60 || 56 || 39.9 | | ultrafast || 60 || 56 || 39.9 |
Revision as of 15:36, 1 February 2023
NVIDIA Jetson Orin RidgeRun documentation is currently under development. |
Contents
The Jetson Orin Nano does not include hardware units for video encoding (NVENC), unlike the other members of the NVIDIA Orin family. This means that users must find alternatives to encode their video other than the hardware-accelerated NVENC module, such as CPU-based encoding. CPU-based encoding solutions leave less CPU power for additional tasks and may not achieve the same performance as NVENC-based encoding. Our goal for this page is to evaluate some video encoding alternatives that work on the Jetson Orin Nano. In this section, you will find the results of several video encoding tests that will help assess the Orin Nano encoding capabilities and make it easier to select the solution that suits your product better.
Summary
We tested two options to encode video with H.264: FFmpeg and GStreamer. The tests were made with three different video resolutions, three different encoder presets, and different bitrate configurations. The idea was to obtain the maximum frame rate possible for each configuration and also compare the performance of the encoding tools in terms of CPU usage.
The results showed that the preset changed significantly the number of frames that could be processed each second. For instance, at 1080p with 10 Mb/s the difference between the slowest and fastest preset is close to 40 frames processed each second. The bitrate also affected the maximum frame rate considerably, The higher the bitrate the fewer frames that can be encoded each second.
Regarding CPU usage of the encoding tools (FFmpeg and GStreamer), at 1080p both tools show very close results, but for most cases GStreamer shows a slightly lower CPU load. The graphs in Figure 1 shows a summary of the results explained on this section. The graph on the left shows the difference of maximum frame rate between the tested presets at 10 Mb/s, the graph in the middle summarizes the behavior of the CPU load at 30 and 60 FPS with the configurations used at 10 Mb/s, and the graph on the left compares the impact of the bitrate on the CPU load for the 1080p 30 FPS settings. The graphs below represent the results from GStreamer with different 1080p configurations.
Experimental Setup
The results presented were obtained in the following hardware setup:
- NVIDIA Jetson Orin Nano (Emulated on a Jetson AGX Orin Developer Kit).
- JetPack 5.0.2.
- 8 GB RAM.
- 6 CPUs.
To learn more about AGX Orin Emulation, please visit the emulation features of the developer kit wiki page.
The videos used for testing have the following specifications (download the test video):
- Resolutions tested: 1920x1080, 1280x720, and 640x480.
- Duration of 15 seconds.
- 30 FPS.
- Pixel format YUV420P.
Our tests were designed to find:
- The maximum frame rate achievable given a resolution.
- The CPU usage with a fixed frame rate and resolution to emulate real-time processing.
Maximum Frame Rate
For this test, the goal is to obtain the maximum frame rate possible at a certain resolution and with different encoding configurations. The resolutions tested are: 1920x1080, 1280x720, and 640x480. For each resolution we tested three presets of the H.264 encoder: veryslow, medium, and ultrafast, which will affect the quality of encoded video. Finally, we tested the encoder with a variable and fixed bitrate. The variable bitrate average is between 3 Mb/s and 5 Mb/s for most cases.
FFmpeg
The goal of the tests is to show you what is the maximum frame rate that can be obtained for different resolutions, presets, and bitrates. First, the presets affect the quality of the compression of the output video, so a slower preset will provide better quality at the expense of a higher CPU utilization. The bitrate, similar to presets, affects the quality of the video, the higher the bitrate the more quality you are going to get, but also the more bandwidth you are going to need to stream a video or the more space you will need to store the final video. Table 1 summarizes the tests ran and the results for each resolution.
Table 1. FFmpeg maximum frame rate tests summary.
Resolution | Preset | Bitrate (kBits/s) | Max Frame Rate (FPS) | Max RAM Used (MB) | Average CPU Load (%) |
---|---|---|---|---|---|
1920x1080 | veryslow | Variable | 5 | 248 | 79.0 |
medium | 24 | 108 | 62.7 | ||
ultrafast | 43 | 69 | 50.7 | ||
veryslow | 1000 | 8 | 247 | 71.6 | |
medium | 46 | 106 | 55.6 | ||
ultrafast | 105 | 68 | 33.6 | ||
veryslow | 10000 | 5 | 247 | 79.1 | |
medium | 24 | 110 | 67.7 | ||
ultrafast | 71 | 74 | 47.4 | ||
1280x720 | veryslow | Variable | 812 | 136 | 71.7 |
medium | 54 | 71 | 63.4 | ||
ultrafast | 117 | 54 | 44.3 | ||
veryslow | 1000 | 19 | 136 | 79.6 | |
medium | 90 | 72 | 56.1 | ||
ultrafast | 199 | 54 | 47.0 | ||
640x480 | veryslow | Variable | 28 | 88 | 69.2 |
medium | 121 | 51 | 50.4 | ||
ultrafast | 222 | 49 | 38.3 | ||
veryslow | 1000 | 52 | 76 | 75.8 | |
medium | 190 | 52 | 55.3 | ||
ultrafast | 400 | 45 | 42.6 |
To give you an idea about how presets affect the quality of the image, Figure 2, 3, and 4 shows a frame of the test video for each preset tested on the 1080p video at 1 Mb/s bitrate. Clearly, the ultrafast preset shows a considerable reduction in quality compared to the very slow preset. So, the preset and bitrate depend on your use case. If you want fast encoding, but do not care as much for the image quality a faster preset or a lower bitrate may be useful. If good video quality is needed a slower preset or a higher bitrate will be the way to go. Also considering the CPU usage, clearly slower presets load the CPU more because of the increased compression quality. If you want to dive deeper into what these presets configure internally, refer to this Encoding presets for x264 documentation.
For reference, the command needed to run the tests with variable bitrate is shown below. Where the -crf (constant rate factor) flag value is chosen to have a video with average quality. The lower the value the better the quality. You might want to change the preset too. So, you can consult all available presets in H.264 Video Encoding Guide.
ffmpeg -f rawvideo -pix_fmt yuv420p -s:v 1920x1080 -r 30 -i input1080.yuv -c:v libx264 -crf 22 -preset ultrafast -tune zerolatency output.mp4
And the command to specify a fixed bitrate is presented below. Where the -b:v flag is the target bitrate, and the min and max rate flags are there to make sure we hit the target bitrate. We also need to specify the buffer size if the min and max rates are defined. In general, the size will be set to twice the bitrate.
ffmpeg -f rawvideo -pix_fmt yuv420p -s:v 1920x1080 -r 30 -i input1080.yuv -c:v libx264 -x264-params "nal-hrd=cbr" -b:v 10M -minrate 10M -maxrate 10M -bufsize 20M -preset ultrafast -tune zerolatency output.mp4
GStreamer
The tests made in this section are similar to those in the FFmpeg section. The idea is to see the difference in performance as well as the quality of the output video. The encoder used is the x264enc The results are summarized in Table 2.
Table 2. GStreamer maximum frame rate tests summary.
Resolution | Preset | Bitrate (kBits/s) | Max Frame Rate (FPS) | Max RAM Used (MB) | Average CPU Load (%) |
---|---|---|---|---|---|
1920x1080 | veryslow | Variable | 5 | 208 | 76.6 |
medium | 21 | 70 | 57.4 | ||
ultrafast | 61 | 29 | 52 | ||
veryslow | 1000 | 6 | 207 | 69.48 | |
medium | 31 | 73 | 52.4 | ||
ultrafast | 85 | 34 | 45.2 | ||
veryslow | 10000 | 3 | 210 | 82.4 | |
medium | 16 | 71 | 70.1 | ||
ultrafast | 45 | 36 | 51.3 | ||
1280x720 | veryslow | Variable | 10 | 101 | 81.2 |
medium | 42 | 34 | 65.9 | ||
ultrafast | 117 | 20 | 54.3 | ||
veryslow | 1000 | 11 | 102 | 80.5 | |
medium | 52 | 38 | 52.1 | ||
ultrafast | 129 | 20 | 37.0 | ||
640x480 | veryslow | Variable | 28 | 40 | 76.9 |
medium | 103 | 19 | 53.4 | ||
ultrafast | 238 | 13 | 31.9 | ||
veryslow | 1000 | 22 | 41 | 73.6 | |
medium | 99 | 18 | 54.2 | ||
ultrafast | 237 | 14 | 33.7 |
The results, in terms of the quality of the video, are very similar to FFmpeg. Video encoded with the fastest preset possible turns out with a slightly worse quality compared to the very slow and medium preset, which is expected but can be managed by changing the bitrate, the higher the bitrate the better the quality. A sample of what a video looks like is shown in Figures 5, 6, and 7. The images were taken from the 1920x1080 videos at only 1 Mb/s bitrate. If we changed the bitrate to 10 Mb/s, the difference is not that noticeable, and the ultrafast will not show a significant quality difference from a slower preset.
For this test, the pipeline used for variable bitrate is shown below.
gst-launch-1.0 filesrc location=input1080.yuv ! videoparse width=1920 height=1080 framerate=30/1 format=i420 ! x264enc tune=zerolatency insert-vui=true pass=quant quantizer=22 speed-preset=veryslow ! qtmux ! filesink location=output.mp4
The pipeline to encode with a fixed bitrate is the following. The pass property of the encoder is set to cbr, which means constant bitrate, and then we set the target bitrate with the corresponding property, which is in kBits/s.
gst-launch-1.0 filesrc location=input1080.yuv ! videoparse width=1920 height=1080 framerate=30/1 format=i420 ! x264enc tune=zerolatency insert-vui=true pass=cbr bitrate=10000 speed-preset=veryslow ! qtmux ! filesink location=output.mp4
Encoding CPU Usage
For these tests, the goal is to take video streams of 30 and 60 FPS as a live source (like a camera), and evaluate whether FFmpeg and GStreamer are able to fully encode the stream in real time and also evaluate resource usage with different configurations, mainly the CPU usage. Similar to previous tests, we evaluated the encoding with three different presets: veryslow, medium, and ultrafast. Also, two resolutions were tested: 1920x1080 and 1280x720. The variable bitrate average is between 3 Mb/s and 5 Mb/s for most cases.
FFmpeg
Results for a 30 FPS stream are shown in Table 3. We can see that the veryslow and the medium preset, although are the ones that provides better quality, are not able to encode the stream in real time at the frame rate needed at 1080p with variable or 10 Mb/s bitrate. If the bitrate is lowered to 1 Mb/s, the medium is able to encode at 30 FPS. The ultrafast shows no problem and is able to encode at constant 30 FPS with any configuration.
Table 3. FFmpeg real time encoding performance results for a 30 FPS stream.
Resolution | Preset | Bitrate (kBits/s) | Encoding Frame Rate (FPS) | Max RAM Used (MB) | Average CPU Load (%) |
---|---|---|---|---|---|
1920x1080 | veryslow | Variable | 4 | 246 | 63.9 |
medium | 20 | 109 | 52.8 | ||
ultrafast | 30 | 69 | 38.0 | ||
veryslow | 1000 | 7 | 246 | 59.8 | |
medium | 30 | 107 | 47.3 | ||
ultrafast | 30 | 68 | 30.6 | ||
veryslow | 10000 | 3 | 257 | 66.5 | |
medium | 16 | 111 | 52.9 | ||
ultrafast | 30 | 91 | 39.8 | ||
1280x720 | veryslow | Variable | 9 | 243 | 63.8 |
medium | 30 | 71 | 49.6 | ||
ultrafast | 30 | 54 | 29.2 | ||
veryslow | 1000 | 11 | 136 | 62.5 | |
medium | 30 | 73 | 42.0 | ||
ultrafast | 30 | 53 | 14.1 |
Then, for the 60 FPS video encoding results in Table 4, the same results as before can be seen, neither the veryslow nor medium preset are able to reach the desired frame rate at 1080p regardless of configuration. The ultrafast is able to encode the 60 FPS at the same rate only if the bitrate is 1 Mb/s.
Table 4. FFmpeg real time encoding performance results for a 60 FPS stream.
Resolution | Preset | Bitrate (kBits/s) | Encoding Frame Rate (FPS) | Max RAM Used (MB) | Average CPU Load (%) |
---|---|---|---|---|---|
1920x1080 | veryslow | Variable | 5 | 250 | 78.1 |
medium | 22 | 113 | 60.9 | ||
ultrafast | 40 | 89 | 50.7 | ||
veryslow | 1000 | 8 | 250 | 71.5 | |
medium | 42 | 123 | 55.8 | ||
ultrafast | 60 | 74 | 36.6 | ||
veryslow | 10000 | 3 | 273 | 78.8 | |
medium | 19 | 116 | 64.6 | ||
ultrafast | 57 | 71 | 42.4 | ||
1280x720 | veryslow | Variable | 12 | 256 | 77.0 |
medium | 52 | 78 | 64.7 | ||
ultrafast | 60 | 56 | 39.9 | ||
veryslow | 1000 | 16 | 137 | 70.3 | |
medium | 60 | 90 | 54.6 | ||
ultrafast | 60 | 59 | 26.6 |
The FFmpeg commands used to get the 30 FPS results are shown below. The first one contains the necessary params to maintain a fixed bitrate of 1000 kBits/s. The -r flag is used to indicate the frame rate of the input stream. Then, the second command is used for the variable bitrate tests. The -crf (constant rate factor) flag is used to tell FFmpeg we want variable bitrate with an average quality.
ffmpeg -f rawvideo -pix_fmt yuv420p -s:v 1920x1080 -r 30 -re -i input1080.yuv -c:v libx264 -x264-params "nal-hrd=cbr" -b:v 10M -minrate 10M -maxrate 10M -bufsize 20M -preset ultrafast -tune zerolatency output.mp4
ffmpeg -f rawvideo -pix_fmt yuv420p -s:v 1920x1080 -r 30 -re -i input1080.yuv -c:v libx264 -crf 22 -preset ultrafast -tune zerolatency output.mp4
GStreamer
The same tests cases that were applied to FFmpeg were tested with GStreamer. In Table 5, we have the results for a 30 FPS stream, as before, the veryslow and the medium preset are not able to encode the stream at 30 FPS, unless the bitrate is lowered to 1 Mb/s. The ultrafast is always able to reach 30 FPS encoding.
Table 5. GStreamer real time encoding performance results for a 30 FPS stream.
Resolution | Preset | Bitrate (kBits/s) | Encoding Frame Rate (FPS) | Max RAM Used (MB) | Average CPU Load (%) |
---|---|---|---|---|---|
1920x1080 | veryslow | Variable | 5 | 206 | 66.0 |
medium | 20 | 73 | 48.8 | ||
ultrafast | 30 | 29 | 29.7 | ||
veryslow | 1000 | 6 | 209 | 57.8 | |
medium | 30 | 71 | 47.1 | ||
ultrafast | 30 | 28 | 31.4 | ||
veryslow | 10000 | 3 | 208 | 67.9 | |
medium | 16 | 75 | 54.4 | ||
ultrafast | 30 | 33 | 42 | ||
1280x720 | veryslow | Variable | 10 | 98 | 64.7 |
medium | 30 | 36 | 46.3 | ||
ultrafast | 30 | 19 | 30.4 | ||
veryslow | 1000 | 11 | 105 | 66.0 | |
medium | 30 | 35 | 43.1 | ||
ultrafast | 30 | 21 | 13.0 |
A similar result was obtained for a 60 FPS stream, but also in Table 6. Except that in this case, the ultrafast was not able to encode in real time when the bitrate was set to 10 Mb/s.
Table 6. GStreamer real time encoding performance results for a 60 FPS stream.
Resolution | Preset | Bitrate (kBits/s) | Encoding Frame Rate (FPS) | Max RAM Used (MB) | Average CPU Load (%) |
---|---|---|---|---|---|
1920x1080 | veryslow | Variable | 10 | 260 | 76.9 |
medium | 22 | 93 | 62.9 | ||
ultrafast | 60 | 68 | 40.3 | ||
veryslow | 1000 | 10 | 238 | 74.4 | |
medium | 39 | 89 | 56.1 | ||
ultrafast | 60 | 55 | 45.5 | ||
veryslow | 10000 | 5 | 286 | 81.7 | |
medium | 19 | 100 | 69.5 | ||
ultrafast | 56 | 77 | 53.9 | ||
1280x720 | veryslow | Variable | 13 | 111 | 75.9 |
medium | 48 | 47 | 57.4 | ||
ultrafast | 60 | 28 | 35.2 | ||
veryslow | 1000 | 16 | 119 | 68.5 | |
medium | 60 | 46 | 48.0 | ||
ultrafast | 60 | 31 | 24.8 |
The pipelines used for these tests are the following. The first one is used for a fixed bitrate of 10000 kBits/s and the second one is used for variable bitrate.
gst-launch-1.0 filesrc location=input1080.yuv ! videoparse width=1920 height=1080 framerate=60/1 format=i420 ! identity sync=true ! x264enc pass=cbr bitrate=10000 insert-vui=true tune=zerolatency speed-preset=veryslow ! qtmux ! filesink location=output.mp4
gst-launch-1.0 filesrc location=input1080.yuv ! videoparse width=1920 height=1080 framerate=60/1 format=i420 ! identity sync=true ! x264enc pass=quant quanitizer=22 insert-vui=true tune=zerolatency speed-preset=veryslow ! qtmux ! filesink location=output.mp4
Results Analysis
For this section, we are going to compare the results from FFmpeg and GStreamer to outline the main differences of both options mainly from a resource usage perspective.
Maximum Frame Rate
For the 1080p videos, if we take a look at Table 1 and Table 2, we can see that GStreamer provided a slightly faster encoding. Although, both tools are expected to give a similar output due to being configured almost the same. In this case, depending on the bitrate configuration, for the veryslow preset, GStreamer was able to encode between 3 and 6 frames each second, FFmpeg encodes between 3 and 7 frames per second. The maximum frame rate for the medium preset is in the range of 16 and 31 FPS for both GStreamer and FFmpeg. Lastly, the ultrafast preset shows a frame rate between 45 and 81 FPS for GStreamer and 46 to 88 FPS for FFmpeg. The graphs below show the difference between variable bitrate and a fixed 10 Mb/s bitrate.
Error creating thumbnail: Unable to save thumbnail to destination |
Error creating thumbnail: Unable to save thumbnail to destination |
In terms of CPU usage during the encoding, at 1080p with variable bitrate, FFmpeg and GStreamer show a slightly lower CPU usage than with fixed bitrate of 10 Mb/s. This behavior can be seen on the graphs below.
Error creating thumbnail: Unable to save thumbnail to destination |
Error creating thumbnail: Unable to save thumbnail to destination |
Encoding CPU Usage
The results shown on this section are going to be focused on the values from the 1080p resolution, since it is the most demanding resolution and the other resolution tested follows a similar pattern. First, for the 30 FPS and 60 FPS streams we can see in both Table 3 and Table 5 the CPU usage for both variable and fixed bitrate of 10 Mb/s is too close to make an assumption about which one uses more resources than the other. Results vary with only 1 Mb/s bitrate.
Memory usage for FFmpeg show that the memory values stayed very close between both encoders for each preset and configuration. Overall, for the veryslow and ultrafast presets, FFmpeg tends to require more memory. However, GStreamer shows a higher memory usage with the medium preset. The following graphs summarize the data for memory usage.
Finally, the impact of the bitrate configuration is shown in Figure 18 and Figure 19, where we can see that, the CPU usage varies significantly depending on the bitrate. A higher bitrate will translate to a better quality but also much more CPU resources will be needed. In the end, you will need to consider the needs of your specific use case to select the appropriate settings. If a good quality is needed, a higher bitrate can be used at the expense of using much more CPU resources. Otherwise, lower bitrates can also provide decent quality while not needed as many resources from the CPU.
Error creating thumbnail: Unable to save thumbnail to destination |
Error creating thumbnail: Unable to save thumbnail to destination |
Conclusion
According to the tables and graphs explained above, we can conclude that:
- It would be possible to have approximately 4 1080p@30 FPS parallel streams being encoded without maxing out the CPU, as long as a lower bitrate is used and a faster preset is selected. It also helps to put the tune property set to zerolatency.
- It would be possible to have approximately 2 1080p@60 FPS parallel streams being encoded at the same time if a low bitrate is selected with a faster preset.
- Using GStreamer to encode a 1080p 30 FPS stream with a medium preset, there is 52% of the CPU left for other tasks if the bitrate is variable, 52% if the bitrate is 1 Mb/s, and 47% left for a 10 Mb/s bitrate. FFmpeg is close to these results as well.
- The maximum frame rate will depend on the bitrate configuration, the higher the bitrate the fewer the frames that can be encoded per second, but the higher the quality we will get.
- At 1080p, the encoders configured either the veryslow or medium preset were not able to encode the 30 and 60 FPS streams. The ultrafast was able to encode both streams at their respective frame rate. This result applies for all bitrates tested.