How to enable A/B redundancy in Jetson TX2

From RidgeRun Developer Connection
Jump to: navigation, search



  Index  






Nvidia-preferred-partner-badge-rgb-for-screen.png

RR Contact Us.png

Introduction to A/B redundancy

A/B redundancy is useful to recover a system in case of a failure in one of its system partitions. Basically, there is a mirror for each partition and it is used in the case the main partition fails, so the system will fall back to the mirror (or recovery) partition.

NVIDIA Jetson TX2 has this capability. The main partitions do not have any suffix, whereas the recovery partitions have a _b suffix. For example:

kernel-dtb: Main partition
kernel-dtb_b: Recovery partition

However, the Jetson TX2 has the A/B redundancy disabled by default. So, partitions with _b suffix are not used by TX2 when the main partition fails.

This article describes how to enable it and how to recover a TX2 after a partition failure. To recover a TX2 is required to enable A/B redundancy, for this you can flash the new c-boot configuration using flash.sh or enable it from userspace using nv_update_engine.

General information

Relevant directories

Jetpack is divided into modules represented by directories. Let us assign a convenient name to sort these directories and specify where the different components are.

1. Jetpack installation directory: $JETPACKDIR

2. Bootloader directory: JETSON_BOOTLOADER=$JETPACKDIR/64_TX2/Linux_for_Tegra/bootloader

Inside each directory, there are important files. Let's specify what we can find in each one:

1. $JETPACKDIR: It contains all the Jetpack files, NVIDIA flashing binaries, and files for the right working of Tegra boards. Inside of it, there is a folder named 64_TX2, which contains such files.

2. $JETSON_BOOTLOADER: It contains the bootable image with the bootloader (./boot.img), the filesystem packed in an image file (./system.img), the device-tree transferred to the encrypted partition (if you do not define the device-tree in boot script configuration, the TX2 uses the DTB from this partition). Also, it contains all the important binaries used for signing, encrypting, enabling A/B redundancy, and writing files to the Jetson Tegra board.

Test environment

All the steps and procedures written below were tested on Jetson Tegra X2 in the following versions of Jetpack:

Jetpack 3.2.1
Jetpack 3.3

Enabling the A/B redundancy from flash

There is a configuration file to enable/disable the A/B redundancy. It is in $JETSON_BOOTLOADER/smd_info.cfg and is likely to have the following settings:

...
# SMD metadata information
< VERSION 3 >

#
# Config 1: Disable A/B support (Default)
#

# slot info order is important!
# <priority>    <suffix>     <retry_count>  <boot_successful>
15                  _a          7               1

#
# Config 2: Enable redundancy support (by removing comments ##)
#
##< REDUNDANCY_USER 1 >

# slot info order is important!
# <priority>    <suffix>     <retry_count>  <boot_successful>
##15                  _a          7               1
##14                  _b          7               1

The config 1 disables the redundancy and config 2 enables it. To enable the redundancy, uncomment the lines with ##. Also, comment the settings of config 1. In the end, the settings to enable the A/B redundancy are like the following:

# SMD metadata information
< VERSION 3 >

#
# Config 1: Disable A/B support (Default)
#

# slot info order is important!
# <priority>    <suffix>     <retry_count>  <boot_successful>
##15                  _a          7               1

#
# Config 2: Enable redundancy support (by removing comments ##)
#
< REDUNDANCY_USER 1 >

# slot info order is important!
# <priority>    <suffix>     <retry_count>  <boot_successful>
15                  _a          7               1
14                  _b          7               1

After modifying the smd_info.cfg, the next step is to make the BUP file (this file is used by C-Boot to control the booting partitions). For this, from $JETSON_BOOTLOADER/, run:

cd $JETSON_BOOTLOADER
sudo ./nv_smd_generator smd_info.cfg slot_metadata.bin

Finally, run the NVIDIA flashing tool:

cd $JETSON_BOOTLOADER
cd $JETPACKDIR
sudo ./flash.sh jetson-tx2 mmcblk0p1

Enabling the A/B redundancy from user space

This option is ideal for devices that have been flashed with an A/B disabled SMD image. In order to verify that A/B redundancy is disabled you can run the following command:

sudo nvbootctrl dump-slots-info

And based on its output you can verify if the redundancy was enabled previously. For example:

  • A/B disabled output
magic:0x43424e00,             version: 3             features: 0             num_slots: 1
slot: 0,             priority: 15,             suffix: _a,             retry_count: 7,             boot_successful: 1
slot: 1,             priority: 0,             suffix: ,             retry_count: 0,             boot_successful: 0
  • A/B enabled output
magic:0x43424e00,             version: 3             features: 3             num_slots: 2
slot: 0,             priority: 15,             suffix: _a,             retry_count: 7,             boot_successful: 1
slot: 1,             priority: 14,             suffix: _b,             retry_count: 7,             boot_successful: 1

As it can be noticed, when the redundancy is disabled the slot 1 has 0 in the priority, retry_count and boot_successful values and it has not suffix information.

If your case was that it was disabled you can enable the A/B redundancy by running:

sudo nv_update_engine --enable-ab

Verifying that A/B redundancy is enabled

For checking if the process was successful, run in the TX2:

sudo nvbootctrl dump-slots-info

It should show an output similar to this:

slot: 0,             priority: 15,             suffix: _a,             retry_count: 7,             boot_successful: 1
slot: 1,             priority: 14,             suffix: _b,             retry_count: 7,             boot_successful: 1

Where both slots have valid information in their properties.

TX2 A/B redundancy uses a group of partitions called "slots". A slot contains all the necessary partitions which make TX2 capable to boot properly. In the case of TX2, it has two slots: Slot 0 for the principal partitions and Slot 1 for recovery partitions. Besides, the difference between the principal partitions and the recovery ones is basically the suffix. For example, the principal partition which stores the DTB is kernel-dtb and the DTB recovery partition is kernel-dtb_b. The suffix _b indicates that is a recovery partition.

Also, it is important to highlight that the priority order defines which slot is used for booting. In the example shown above, slot 0 is going to be used by C-boot during the boot process.

Recovering the system after a partition failure

If the A/B redundancy is enabled and a principal partition (for example: kernel-dtb) gets broken, the TX2 will fall back to the recovery partition (kernel-dtb_b), which has the same content that the main partition had. However, this process will disable the principal partition indefinitely. For, going back to use the principal partition after fixing it, enable it in the target board (TX2) using:

SLOT=0
nvbootctrl set-active-boot-slot $SLOT

nvbootctrl set-active-boot-slot allows you to enable a partition again. The SLOT=0 partitions are the principal partitions and the SLOT=1 partitions are redundant or recovery partitions.

Use case

Suppose that kernel-dtb gets broken after an upgrade attempt. When the TX2 reboots, it will fall back to the kernel-dtb_b and the system is now usable again. After fixing kernel-dtb, the user rebooted the TX2, but it continues falling back to kernel-dtb_b.

To see what happened:

sudo nvbootctrl dump-slots-info

Giving:

slot: 0,             priority: 15,             suffix: _a,             retry_count: 0,             boot_successful: 0
slot: 1,             priority: 14,             suffix: _b,             retry_count: 7,             boot_successful: 1

Please, note that retry_count and boot_successful are both in zero. So, the SLOT 0 will not work.

For solving this issue, the user has to enable the SLOT 0 and mark it bootable once more. To do so:

SLOT=0
nvbootctrl set-active-boot-slot $SLOT

This leads to:

slot: 0,             priority: 15,             suffix: _a,             retry_count: 7,             boot_successful: 0
slot: 1,             priority: 14,             suffix: _b,             retry_count: 7,             boot_successful: 1

A retry_count value different of zero gives SLOT 0 a try in the next boot. If it boots properly, it will continue booting from SLOT 0 and TX2 will use kernel-dtb again.

Disabling the A/B redundancy from user space

You can disable the A/B redundancy whether it was enabled from flash or userspace methods by running:

sudo nv_update_engine --disable-ab

Conclusion

A/B redundancy is disabled by default in TX2. To enable it, modify smd_info.cfg, run nv_smd_generator and flash the TX2 or enable it from user space using nv_update_engine. If you use the flash method and do not want to rebuild the filesystem, you can use the -r parameter when executing flash.sh.

On the other hand, there are some useful commands in TX2 to verify the status of the A/B redundancy:

nvbootctrl get-current-slot: It shows the current slot
nvbootctrl set-active-boot-slot $SLOT: It chooses the $SLOT for the next boot.
nvbootctrl dump-slots-info: It shows all slots details.

See also


RidgeRun Resources

Quick Start Client Engagement Process RidgeRun Blog Homepage
Technical and Sales Support RidgeRun Online Store RidgeRun Videos Contact Us

OOjs UI icon message-progressive.svg Contact Us

Visit our Main Website for the RidgeRun Products and Online Store. RidgeRun Engineering informations are available in RidgeRun Professional Services, RidgeRun Subscription Model and Client Engagement Process wiki pages. Please email to support@ridgerun.com for technical questions and contactus@ridgerun.com for other queries. Contact details for sponsoring the RidgeRun GStreamer projects are available in Sponsor Projects page. Ridgerun-logo.svg
RR Contact Us.png


  Index