Difference between revisions of "NVIDIA Jetson TX2 - Argus vs V4L2 Latency Analysis"

From RidgeRun Developer Connection
Jump to: navigation, search
(Introduction to NVIDIA Jetson TX2 - Argus vs V4L2 Latency Analysis)
Line 5: Line 5:
 
In this wiki, a practical example of the measurement of latency when using libargus and V4L2 is shown. This wiki assumes that the reader is familiar with the Jetson TX2 VI concepts outlined in [[NVIDIA Jetson TX2 - Video Input Timing Concepts]] and the proposed measurement techniques in [[NVIDIA Jetson TX2 - VI Latency Measurement Techniques]].
 
In this wiki, a practical example of the measurement of latency when using libargus and V4L2 is shown. This wiki assumes that the reader is familiar with the Jetson TX2 VI concepts outlined in [[NVIDIA Jetson TX2 - Video Input Timing Concepts]] and the proposed measurement techniques in [[NVIDIA Jetson TX2 - VI Latency Measurement Techniques]].
  
=Where does argus frame metadata timestamp come from?=
 
  
Looking at previous experiments (see this [[TX2_-_V4L_vs_ISP_capture_timestamps_mismatch#Tests_in_JP_3.2.1| wiki]]) it looks like argus timestamps do not come from RTCPU, since when RTCPU had a difference around 5s with respect to system clock as seen in v4l2 frames timestamps, argus timestamps did not have such difference. This can be confirmed by VI4 top level diagram depicted in figure 1.
+
=Jetson TX2 latency measurement=
 
 
[[File:Vi4.png|frameless|center]]
 
 
 
=Where is the latency being generated=
 
 
 
To address this question, we performed the following experiment:
 
  
 
The OV5693 will capture at 1280x720@120 fps, labeling each frame captured with metadata_timestamp-system_timestamp.jpg, so that metadata_timestamp corresponds to the timestamp extracted with getSensorTimestamp and system timestamp is the timestamp (clock monotonic) captured after the sensor timestamp right after the frame metadata is retrieved.
 
The OV5693 will capture at 1280x720@120 fps, labeling each frame captured with metadata_timestamp-system_timestamp.jpg, so that metadata_timestamp corresponds to the timestamp extracted with getSensorTimestamp and system timestamp is the timestamp (clock monotonic) captured after the sensor timestamp right after the frame metadata is retrieved.

Revision as of 18:14, 21 June 2020

Introduction to NVIDIA Jetson TX2 - Argus vs V4L2 Latency Analysis

In this wiki, a practical example of the measurement of latency when using libargus and V4L2 is shown. This wiki assumes that the reader is familiar with the Jetson TX2 VI concepts outlined in NVIDIA Jetson TX2 - Video Input Timing Concepts and the proposed measurement techniques in NVIDIA Jetson TX2 - VI Latency Measurement Techniques.


Jetson TX2 latency measurement

The OV5693 will capture at 1280x720@120 fps, labeling each frame captured with metadata_timestamp-system_timestamp.jpg, so that metadata_timestamp corresponds to the timestamp extracted with getSensorTimestamp and system timestamp is the timestamp (clock monotonic) captured after the sensor timestamp right after the frame metadata is retrieved.

During camera capture, a kernel module captures system timestamp (clock monotonic) and then turns on a GPIO (initially off) that has connected a LED in the field of view of the camera. The timestamp is printed after the GPIO has been turned on to avoid extra delays between the timestamp extraction and the LED turn on.

For every experiment, we boot the board.

Then we can associate the first frame where the led was turned on, whose name includes the metadata_timestamp (t1) and the system timestamp (t2) with the timestamp printed by the kernel module (t0), as shown in figure 2. In summary:

  • t0 is the system timestamps acquired at the kernel module when turning on the LED.
  • t1 is the metadata_timestamps, that is, the timestamps obtained with getSensorTimestamps() and coming out of Argus (from either the ISP or the RTPU).
  • t2 is the system timestamps obtained within the Argus application just right at the moment we extract t1.

Before performing the test, the following assumptions are about to be validated:

  • t1 is a timestamp set during ISP processing, after image capture. It shall be greater than the RTCPU timestamps (frame acquisition time).
  • t1 shall be greater than t0 since it includes post-capture processing.
  • t0 - t1 shall be always negative.
  • t2 - t0 shall be always positive and provide a measurement of the total time since the sensor integration until its acquisition at the user space level.

We performed six experiments and measured the deltas, summarized in average (rounded) in the table below:

Delta values measured after six experiments
delta 0 4.6 ms
delta 1 29.4 ms
delta 2 34 ms

Before drawing conclusions from this table, we can take a look at the full data table:

There are some interesting conclusions based on the test results:

  • Test 1 and test 4 where delta 0 (computed as t0 - t1), is positive, which makes no sense if the same clock is being used on both timestamps. This might indicate that a different clock is being used in the ISP to timestamp the frames and that the delta between this clock and the system clock might change from one boot to the next.
  • The frame metadata timestamp (t1) does not seem to have nanosecond resolution, only microsecond, in contrast with RTCPU and monotonic system clock.
  • delta 2 is being measured with the same clock than t0, then it looks like for the resolution/framerate used, it takes approx 31 ms in average to the frame from the moment that the sensor is exposed to the moment when pixels are available.
  • delta0 never goes higher than 8ms.

At this point we turned to NVIDIA for clarification.

NVIDIA forum help

The following was our question for NVIDIA and their answer also can be found here.

Hi,

When extracting the captured frame timestamp in an argus app using getSensorTimestamp() the returned value does not show nanoseconds information (nanoseconds always come as zeroes), values look like:

3650072669000
3650072669000
3650124215000

In contrast, timestamps of the system clock (monotonic raw), and timestamps of frames captured with v4l2 bring nanosecond information. For instance, the system clock timestamps captured in the same argus app as above look like:

3650107542391
3650132798233
3650157586787

My questions are:

1. Is the ISP using a different clock to timestamp the frames that go through the ISP path, in contrast to v4l2's RTCPU (Memory controller path)?

2. Why does the timestamp extracted with getSensorTimestamp() does not have nanosecond information?

3. Where does this timestamp come from (ISP path)? It does not look to belong to the RTCPU nor the system clock.

Thanks


hello jcaballero,

these two pipeline have different mechanism to capture the timestamp, please also refer to below. 

v4l2
record the timestamp by system clock (monotonic raw), which the unit is nano-seconds already.
ISP
it's taking the system timestamp, which the unit is micro-seconds, then multiply it by 1000 to represent as nano-second. 

please note that, the getSensorTimestamp() function call returns the software shutter event time. 
please also access [TegraX2 Technical Reference Manual] and check [Figure 187: VI4 Top-Level Block Diagram], this shutter event time is reported by VI notify, which is based-on RTCPU.

suggest you may check similar discussion thread, Topic 1056995 as see-also.
thanks

New findings on timestamps origin

We checked the code related to the timestamps and found that both v4l2 and argus app frames timestamps come from the same source, in Jetpack 4.2 in contrast with Jetpack 3.x. The relevant code is in the file kernel/nvidia/drivers/platform/tegra/rtcpu/vi-notify.c. There, the RTCPU timestamp is extracted and adjusted with an offset to be in sync with the system clock and this way workaround the known issue of the RTCPU clock out of sync with the system clock.

The function responsible for computing the offset is get_ts_adjustment. Interestingly, it extracts the TSC (RTCPU clock) timestamp with the following call:

s64 tsc = arch_counter_get_cntvct();

Something very curious is that this tsc value is not used directly to compute the delta with respect to the system clock, instead, it is multiplied by a value ivn->ts_res_ns = 32.


Now, given that the original timestamps come from TSC, and that we have a way to get TSC, we ran a test where the frame timestamps were not adjusted with the offset to the system clock and the led module extracted the timestamp directly from the TSC and scaled by 32 as its done to compute the offset. We computed delta0 and found a case where it was positive (1.7ms).

With this result, we can conclude that there is no third clock, nor the offset adjustment is responsible for causing a positive delta0.

Now, looking at NVIDIA's answer, we did check that the SOF timestamp (RTCPU timestamp) is the same that is returned by the getSensorTimestamp method, though the nanoseconds value is zeroed. They also mentioned that this timestamp was the software shutter timestamp. Given that delta0 has not shown to be greater than 8ms (corresponding to the expected 120fps frame length), then we can propose a hypothesis for this as follows:

If the timestamp of the frame (t1) is obtained at the start of the frame length, there is a chance that in some cases the kernel module extracts the timestamp during the period of the frame length where the frame exposure has not started yet, and once this frame exposure time starts, the led was already on and the picture captures the frame light, as shown in the following figure.


To verify this explanation, we propose to enable trace, and try to establish matches between the NOTIFY system events timestamps and the frame/led timestamps, now that we now that the NOTIFY timestamps must be scaled by 32 to be in the same scale as the frame timestamps.

Experiment

This experiment consisted of enabling tracing, use RTCPU reference for led timestamp (t0) and frame timestamp (t1) (without system clock offset adjustment), so that tracing VI information could be associated with the frame of interest (in which the led turns on). For extracting the frame ID, we found that getCaptureId() was misleading, instead, we extracted the value of status->frame directly in the vi_notify.c file.

Analysis

Before proceeding to the analysis, its important to refresh some VI4 concepts.

  • According to the Parker TRM, ISP is physically separated from VI, and works in a different clock domain, ofter running faster, then VI must buffer data for ISP. Also, it mentions that there is a FIFO in the ISP that stores a small number of VI packets.
  • According to L4T multimedia api reference, getSensorTimestamp() (t1) returns the time that the first data from the capture arrives from the sensor.
  • According to NVIDIAs answer to our question, getSensorTimestamp() (t1) function call returns the software shutter event time.
  • VI has a notification engine that logs events timestamped by RTCPU clock, and can be accessed in CPU by enabling tracing. There are four events of interest for this experiment:
  1. ISPBUF_FS: Occurs when the Frame Start event that was detected in CHANSEL reaches the ISPBUF.
  2. CHANSEL_PXL_SOF: First pixel of frame.
  3. CHANSEL_PXL_EOF: Last pixel of frame.
  4. ISPBUF_FE: Occurs when the Frame End event that was detected in CHANSEL reaches the ISPBUF.

Results

The results obtained are shown in the table below.

We assume that t1 is taken when the first row of pixel data arrives at the VI, this is, as soon as the first sensor row exposure ends. Also, we know that VI has to buffer data for ISP, so there must be a delay between t1 and ISP first event, that is ISP_FS. This delay is shown in the table below and seems to be pretty precise of 150us. With this result, we can answer the question of the origin of the timestamp, knowing that it is pre-ISP.

Consider ISPBUF delta result, this was computed as ISPBUF_FS - ISPBUF_FE, and seems to be pretty precise 8ms. We don't know why this delta is so big, Could this be equal to the sensor integration time? This would be confirmed if we had enough data of the EOF timestamp. This timestamp is usually zero as per vi_notify extraction, however, in these tests there were a couple of cases in which it was not zero, then allowing us to calculate the delta SOF-EOF, that resulted in the value -7881952 for both cases, then confirming an integration time of approx. 8ms.

Finally, consider delta 0. These values are not so precise and we still see a couple of cases where delta0 is positive. For this experiment, in contrast with the previous, we ensured that the led covert fully the vertical FoV of the camera. We found that in tests 3 and 10, there was a rolling shutter artifact, the led light was observed only in the lower half of the image. We think that the led turned on in the middle of the sensor integration, and given that t1 is captured when the first row of pixel arrives to VI, we assume that this probably happens as soon as it's ready, then causing delta0 to be positive, since it actually occurred after the first row of data arrived to VI.

Consider delta0 for test 4, it is the smallest of the experiment, and it could be interpreted that the led turned on 0.5 ms before the led was turned on.


Effects of the exposure time on the results

We ran three tests in which we set the exposure to a fixed value, to check if the ISPBUF delta (ISPBUF_FS - ISPBUF_FE) would be reduced, given that each row exposure period was shorter than in the previous tests. We found that the exposure time did not have influence in the ISPBUF delta, that kept stable at 8ms. The data is shown below.

Positive delta0 actual explanation

Consider the picture below. We know that the frame is timestamped when the first pixel packet arrives at the CHANSEL module. Given that the sensor has rolling shutter, then it could be possible that the led turns on in the middle of the sensor exposure, then causing that some rows capture the light while others not. If the sensor sends pixels as soon as their row exposure time finishes, then the top rows not exposed to led light would trigger the frame timestamp. Then the bottom rows in the same frame would capture led light, causing the positive delta 0. For this week experiment, we put the led much closer to the camera so that we could capture these artifacts, and found that in fact the cases where the delta0 was positive, some of the top rows had little light in comparison with bottom rows.

ISPBUF internals

VI has to buffer data for ISP, since ISP runs at a faster clock rate than VI. Also, ISP is physically separated from VI. NVIDIA provides the image below to describe ISPBUF internals.

V4l2 path vs ISP path timing comparison

We will define an additional delta, called delta 3 that will be the time elapsed since the ATOMP_FE(for the v4l2 case) or ISPBUF_FE (for argus case) event to the moment when the frame data is available in user space. With this delta, we may be able to estimate the additional delay of the ISP path with respect to the v4l2 path.

Also we will denote with delta4 the time elapsed since the PXL_SOF event to the ATOMP_FE (for the v4l2 case) or ISPBUF_FE (for argus case) event to have an idea of the time elapsed between the moment when the first pixel packet arrives to VI to when the frame data has finished arriving to ISP or ATOMP in the v4l2 case.

We performed ten tests for each case, with a resolution of 2592x1458@30fps and averaged the results shown in the summaries below.

Summary of argus results

Delta Value (ms)
delta1 40.9
delta2 44.2
delta3 16.4
delta4 24.5

Summary of v4l2 results

Delta Value (ms)
delta1 76.6
delta2 90.4
delta3 52
delta4 24.5

Given these results, we looked into the code to find out why the deltas for v4l2 and in the channel.c file function tegra_channel_ring_buffer, we found the following comment: "release buffer N at N+2 frame start event". Based on this, we extracted the ATOMP_FS event for the N+2 frame for each measurement, so that we could estimate appropriately the value of the deltas. Assuming that the frame N data is ready when the ATOMP_FE occurs for frame N, but it is released to user space until ATOMP_FS event occurs for the frame N+2, then this mechanism would be adding ATOMP_FS(N+2) - ATOMP_FE(N) of delay to all deltas that use the user space timestamp (delta1, delta2 and delta3). In average the additional delay due to the queuing mechanism is 42.1ms. The adjusted deltas are shown below.

Delta Value (ms)
delta1 34.4
delta2 48.2
delta3 9.9
delta4 24.5

Conclusion

The difference between the delta3 of both paths is approximately 6.5ms, so we think that this is the actual ISP latency. The latency added previous to the ISP processing is of 24.5ms.


Related links

  1. NVIDIA Jetson TX2 - Video Input Timing Concepts
  2. NVIDIA Jetson TX2 - VI Latency Measurement Techniques


RidgeRun Resources

Quick Start Client Engagement Process RidgeRun Blog Homepage
Technical and Sales Support RidgeRun Online Store RidgeRun Videos Contact Us

OOjs UI icon message-progressive.svg Contact Us

Visit our Main Website for the RidgeRun Products and Online Store. RidgeRun Engineering informations are available in RidgeRun Professional Services, RidgeRun Subscription Model and Client Engagement Process wiki pages. Please email to support@ridgerun.com for technical questions and contactus@ridgerun.com for other queries. Contact details for sponsoring the RidgeRun GStreamer projects are available in Sponsor Projects page. Ridgerun-logo.svg
RR Contact Us.png