DMF Upgrade Procedures

This chapter describes how to upgrade the DANZ Monitoring Fabric (DMF) Controller and fabric switches after initial installation.

 

Upgrading the Controller

The default serial console baud rate on DMF 6.1 and later hardware appliances is 115200. Arista recommends against using the serial interface to perform an upgrade. However, when upgrading an existing Controller appliance with a serial interface set to 9600 baud, change the terminal setting to 115200 after upgrading to DMF 6.1.0 or later.
Note: The upgrade process checks for supported versions when upgrading. Refer to the most recent DANZ Monitoring Fabric 8.5.0 Release Notes for a list of supported versions for upgrade.
The DANZ Monitoring Fabric (DMF) Controller platform supports two partitions for Controller software images. The active partition contains the active image currently running on the Controller. The alternate partition can be updated without interrupting service. The image used for booting the Controller is called the boot image. In addition, the Controller has an image repository in its local file system where you can copy upgrade images. The upgrade image is verified for integrity before it is copied and rejected if it fails the checksum test.
Note: Copying an upgrade bundle to the Controller overwrites any older image currently in the image repository. When downloading a bundle identical to the image repository version, the operation fails when calculating the checksum after copying is complete.

After copying the upgrade image to the image repository, use the upgrade stage command to copy the upgrade image to the alternate partition and prepare the Controller for the upgrade.

The boot image remains in the active partition after copying the ISO image file to the alternate partition. After entering the upgrade launch command, the upgrade image is changed to the boot image and the Controller reboots.

Note: The upgrade process is not hitless.

The upgrade launch command copies the active state of the Controller to the alternate partition and reboots the Controller using the upgrade image. After completing the upgrade process, the alternate and active partitions are reversed, the Controller is running with the upgrade image, and the older image remains available in the alternate partition until a new image is copied into it, which will overwrite it.

As long as the original image remains in the alternate partition, use the boot partition alternate command to restore the Controller to its previous software and configuration.

Log files are written in a separate partition, available from either of the two images by entering the show logging controller command.

Upgrade Procedure Summary

Complete the following steps to upgrade the Controllers. The switch upgrade process, using ZTF, happens automatically (see “DMF Switch Automatic Upgrade section for details)
Note: Only the admin user can perform an upgrade or enter the show image command.

Procedure

  1. Log in to the active Controller using its individual IP address or the virtual IP address for the cluster.
  2. From the active Controller, copy the ISO image file to the image repository on the active Controller (see the Copying the ISO Image File to the Controller section).
  3. From the active Controller, enter the upgrade stage command and respond to the prompts as required (see the Staging the Upgrade section for details).
  4. From the active Controller, enter the upgrade launch command and respond to the prompts as required (see the Launching the Upgrade section for details).
    Note: Do not attempt to launch or stage another upgrade process until the current process is either completed or times out.

    Wait for the process to complete and verify the upgrade before making any configuration changes.

  5. Verify the upgrade (see the Verifying the Upgrade section).

Upgrade Options

The following options are available for use with the upgrade command.

cluster: with the cluster option, the upgrade command is executed cluster-wide. The user must log in to the active Controller to run the cluster option.

launch: Complete the upgrade process by rebooting the Controller from the alternate partition and transferring state and configuration information from the Controller to the upgraded Controller and its running-config. This keyword manages the transition from the current version to the next version, which may include rebooting all Controllers in the cluster, rebooting.

Use the following options with the launch keyword as required:
  • support-bundle: Generate a support bundle for use by Arista tech support.
  • switch-timeout: Optionally, specify the number of seconds to wait before terminating the command to a switch during the upgrade.
  • pre-launch-check: Identifies the status of the Controller regarding readiness for upgrade.
  • stage: Prepares the platform for the upgrade ahead of the actual upgrade process by copying the upgrade image to the alternate partition on the Controller.

Copying the ISO Image File to the Controller

The ISO image file is a software image that includes the files required to complete the upgrade of the Controller. It also contains the files for installing or upgrading the Switch Light OS image on switches. The primary component of the upgrade image is a new root file system (rootfs) for the Controller.

When copying the upgrade image, if there is not enough file space on the local file system, the system prompts to create space on the local file system. During the image copy process, if the integrity of the image cannot be verified, the system displays a message, and the copy fails. After copying the image, a warning message appears if the image is not a compatible upgrade image for the existing application, or if the image is not a newer version.
Note: To copy the image file (or other files) from the workstation command prompt, refer to the Copying Files Between a Workstation and a DMF Controller section.
Note: Starting from DMF release 8.4, due to an increased image size, when trying to copy a new image you will likely encounter the following error:
Error: Invalid Use: No space left, use "delete image" command to free space
It is therefore required to remove all images from the image partition, including the image for the currently running version, to free up space.
CONTROLLER-1# show cluster image
# Checksum Cluster NodesProductVersion Build
-|--------|--------------------------|----------------------|-------|-----|
1 d301aCONTROLLER-1, CONTROLLER-2 DANZ Monitoring Fabric 8.5.0 33
CONTROLLER-1#
 
CONTROLLER-1# delete cluster image all
all: delete: confirm ("y" or "yes" to continue): yes
CONTROLLER-1#
Note that you can also delete images on the individual Controllers like so:
CONTROLLER-1# show image
CONTROLLER-1# delete image all
Even though you have deleted all the images from the /images repository, you can still roll back to the current version after the next upgrade operation, as the current image is still present in the boot partition.
To use the Controller CLI to copy the ISO image file to the Controller Image repository from an external server, use the copy command, which has the following syntax:
controller-1# copy <source> <destination>
Replace source with a location accessible from the Controller node using one of the following protocols:
  • scp://<user>@<host>:path: Remote server filename, path, and user name with access permissions to the file. The remote system prompts for a password if required.
  • http://: HTTP URL including path and filename.
  • https://: HTTPS URL including path and filename
Replace destination with image://cluster. This will download the image from the remote server to all the nodes in the cluster. For example, the following command copies the file DMF-<X.X.X>-Controller-Appliance-<date>.iso from myscpserver.example.com to the Controller alternate partition for both active and standby Controller:
controller-1# # copy scp://This email address is being protected from spambots. You need JavaScript enabled to view it.:*DMF-<X.X.X>-Controller-Appliance-<date>.
iso* image://cluster
Use the show cluster image command to verify that the ISO image file has successfully been copied to both the active and standby Controller, as shown in the following example:
controller-1# show cluster image

Staging the Upgrade

The upgrade cluster stage command applies the specified upgrade image into the alternate partition for both active and standby Controllers. This command prepares the platform for an upgrade by populating the alternate partition with the contents of the new image. This is a separate step from the launch step, where the upgraded image becomes the active image to prepare the platform ahead of the upgrade process.

To view the upgrade images currently available, use the show upgrade image or show cluster image commands.

Use the upgrade cluster stage command to prepare the Controllers for the upgrade. This action copies the running-config to a safe location and marks the alternate partition as the boot partition.

The system responds as shown in the following example:
controller-1# # upgrade cluster stage
Upgrade stage: alternate partition will be overwritten
proceed ("yes" or "y" to continue): y
Note: Do not attempt to launch or stage another upgrade process until the current process is either completed or times out.
At this point, enter y to continue the staging process. The system continues the process and displays the following prompts:
Upgrade stage: copying image into alternate partition
Upgrade stage: checking integrity of new partition
Upgrade stage: New Partition Ready
Upgrade stage: Upgrade Staged
To verify that the system is ready for upgrade, enter the show boot partition command, as in the following example:
controller-1# #show boot partition

This command lists the available partitions and the information about the Controller versions installed on each. In this example, the original image remains in active state and is still the boot image, meaning if a reboot occurs now, the original image is used to reboot. The upgrade image will not be used to boot until it has been changed to the boot image, which is one of the effects of the upgrade cluster launch command.

To display the current status of the upgrade process, you can use the show upgrade progress command, as in the following example:
controller-1# # show upgrade progress
Note: Upgrade requires a successfully staged image partition. If the upgrade stage is interrupted, it is necessary to stage the image again to launch the upgrade.

Launching the Upgrade

The upgrade cluster launch command reboots the system using the upgrade image in the alternate partition and copies the current state from the previous Controller. The system collects information from the existing cluster, for example, the current running- config and saves this state in the staged partition, so it is available for the new cluster.

Once the running-config is applied, the existing operational state of the existing cluster is requested and then applied to the new cluster. The new cluster requests the switches from the old cluster, adjusts its local configuration to describe this new ownership, and reboots the switches.

The upgrade cluster launch command manages the transition from the current version to the next for both active and standby Controllers. This includes various steps, including rebooting all the Controllers in the cluster, rebooting all the switches, and upgrading the switches. After the standby node is upgraded, the roles of active and standby are reversed, so the upgraded node can assume control while the other node is upgraded and reboots.
Note: Do not attempt to launch or stage another upgrade process until the current process is either completed or times out.
Log into the active Controller via the Controller management IP or virtual IP and enter the upgrade cluster launch command:
controller-1# upgrade cluster launch
The system prompts to proceed with the reboot. Enter yes. The standby Controller reboots and displays the following messages while the active Controller waits for the standby Controller to boot up:
Upgrade Launched, Rebooting
Broadcast message from root@controller (unknown) at 19:40 ...The system is going down for
reboot
NOW!User initiated reboot

While rebooting, the SSH terminal session is terminated. Reconnect after the reboot is complete.

DMF Switch Automatic Upgrade

When the fabric switch reboots, it compares its current software image, manifest, and startup-config to the information in the switch manifest that the Controller sends after the switch reboots. The switch optimizes the process by caching its last known good copies of the software image and the startup-config in its local flash memory.

The switch automatically starts the upgrade process when it reboots if the ZTF server has a software image or startup-config with a different checksum than it currently has. This check is performed every time the switch restarts.

Verifying the Upgrade

After completing the upgrade process, verify that both Controllers and the fabric switches have been upgraded to the correct DANZ Monitoring Fabric and Switch Light OS versions.

To verify the upgrade of the Controllers, log in to each Controller and enter the show version command, as shown in the following example:
controller-1># show version
To verify the upgrade of the switches, enter the show switch all command, as shown in the following example:
controller-1># show switch all desc

This command displays the current switch version.

Verify Persistent IPAM Assigned IP Addresses after Upgrade

IPAM-assigned IP addresses for switches are persistent across DMF upgrades.

Configuration

No additional configuration changes are required to keep IPAM-assigned IP address configuration persistent across cluster upgrades. However, this assumes that IPAM has been previously configured.

CLI Commands

Refer to the sections Using L2 ZTF (Auto-Discovery) Provisioning Mode through Static Address Assignment Troubleshooting and Limitations for IPAM and switch IP address configuration, which is also valid after the upgrade.

 

Rolling Back an Upgrade

When deciding to rollback or downgrade the Controller software after an upgrade, note that the rollback or downgrade is not hitless and will take several minutes to complete. After both Controllers are up and have joined the cluster, check each switch version and reboot each switch as needed to ensure all the switches have an image compatible with the Controller version.

To restore the system to the previous image, complete the following steps:

Procedure
  1. On both the active and standby Controllers, enter the show boot partition command to ensure that the previous image remains in the Alternate partition.
    controller-1># show boot partition
  2. Reboot the active Controller node from the alternate partition by entering the boot command from the CLI prompt of the active Controller.
    controller-1># boot partition alternate
    Next Reboot will boot into partition 1 (/dev/vda2)
    boot partition: reboot? ("y" or "yes" to continue): yes

    Answer yes when prompted.

  3. Reboot the standby Controller node from the alternate partition by entering the boot command from the CLI prompt of the standby Controller.
    standby controller-2># boot partition alternate
    Next Reboot will boot into partition 1 (/dev/vda2)
    boot partition: reboot? ("y" or "yes" to continue): yes

    Answer yes when prompted.

  4. After both Controller nodes have restarted, reboot all switches by entering the system reboot switch command on the active Controller.
    controller-1># system reboot switch all
    system switch reboot all: connected switches:
    00:00:70:72:cf:c7:06:bf
    00:00:70:72:cf:c7:c5:f9
    00:00:70:72:cf:ae:b7:38
    00:00:70:72:cf:c8:f9:25
    00:00:70:72:cf:c7:00:ad
    00:00:70:72:cf:c7:00:63
    reboot may cause service interruption
    system switch reboot all ("y" or "yes" to continue): yes
    Answer ``yes`` when prompted.
  5. Wait for all the switches to reconnect.
    controller-1># show switch
  6. Verify that the standby Controller has rejoined the cluster with the reverted active Controller by entering the show controller command from both nodes.
    controller-1># show controller
Any connected DMF Service Node or DMF Recorder Node must be upgraded after upgrading DMF fabric Controllers and switches.
Upgrade from DMF Release 7.1.0 to later versions uses Zero Touch Fabric (ZTF) and occurs automatically for connected nodes after the DMF Controller is upgraded.
Note: If the DMF Recorder Node is in a different Layer 2 segment than the DMF Controller, a fresh installation of the Recorder is required to upgrade to Release 7.1.0.
For details about upgrading DMF Service Node software, refer to Installing and Upgrading the DMF Service Node.