H.264 Motion Vector Extractor - H.264 Motion Vector Extractor Basics

From RidgeRun Developer Connection
Jump to: navigation, search




  Index Next: Getting Started




Motion estimation is a resource-heavy stage in many computer vision applications. This often limits the smallest embedded devices from running these applications in real-time. However, many embedded systems already contain a dedicated hardware module to perform H.264 video encoding without taxing the CPU which means they have the required hardware for motion estimation but without an easily accessible API. A possibility then arises to harness the motion estimation results from the H.264 encoding process in a real-time video stabilization system, by first encoding the video stream and then parsing the motion vectors from the compressed bitstream.

Before libmotion, there wasn't a dedicated solution for extracting motion vectors from the encoded H.264 bitstream, and the only software capable of performing this task does it as a byproduct of video decoding. Therefore, this library aims to avoid the performance hit of fully decoding the video stream. Also, developing this as a separate library enables any computer vision application to take advantage of the motion estimation performed in the H.264 hardware encoder.

Motion vectors

The H.264 coding standard was developed by a collaborative effort between the Moving Picture Experts Group (MPEG) and the Video Coding Experts Group (VCEG). It was developed to achieve a better compression rate and improved image quality over the MPEG-4 standard.

This standard uses a predictive approach to reduce the size of the information that has to be coded. Every video frame is divided into macroblocks with a size of 16x16 pixels, and each macroblock is coded as a motion vector from the most similar macroblock in the last frame, and a residual which encodes the differences with the referenced macroblock. The following figure provides an example of this approach. The H.264 algorithm calculates the apparent 2D motion transformation from one macroblock with respect to the same macroblock in the next frame. This motion information corresponds to the motion vector, determining the apparent movement of the 2D image from one frame to another.

Example of the motion estimation performed by the H.264 standard on a macroblock between a reference frame and the new frame.

Every H.264 encoded video contains this motion information that changes from frame to frame depending on the content. Despite this information is originally used for comprising the video information, it can be retrieved to get a notion of the overall apparent motion in the image. This is what libmotion uses to get an estimate of the motion in the video.

H.264 basic units

H.264 encodes frames in 3 different units:

  • I-frames: Completely independent frames used as a reference for other frames. Can only contain I-type macroblocks. They contain almost uncompressed information and provide the base for the reconstruction of the next frames. A special type of I-frame is the IDR-frame which represents a reset point in the sequence, no frames before an IDR-frame depends on frames after it.
  • P-frames: Frames based on motion prediction from I-frames or other P-frames. They can only reference a frame in the past which makes them the best ones to use for motion estimation. Can contain I and/or P macroblocks.
  • B-frames: Based on motion prediction from frames both in the past or in the future. This is the most efficient compression but doesn't work as well for motion estimation.

The following figure depicts the basic dependency between frame types in H.264:

H.264 Frames dependency. I-frames are used as a reference point for other P/ frames. P-frames are reconstructed from previous frames while B-frames may depend on previous and future frames.

What does libmotion do?

libmotion extracts the motion information from H.264 encoded video and provides it for further processing. This motion information can be used for different applications such as digital video stabilization, and motion detection, among others.

H.264 implements profiles that determine the set of features used for encoding. The current implementation of libmotion is based on the Baseline profile of the H.264 standard which guarantees only I and P frames to allow for better motion estimation from the extracted vectors.

  Index Next: Getting Started