Prometheus Endpoint Support for Infrastructure Metrics

Prometheus is an open source monitoring and alerting toolkit. It collects and stores metrics from different sources in a time-series database. Prometheus offers a powerful query language, which allows users to analyze and visualize the collected data in real-time. With its robust alerting system, Prometheus can also notify users of potential issues which helps with their timely resolution.

Starting with the DMF 8.5.0 release, the Prometheus server can scrape metrics from a DMF (DANZ Monitoring Fabric) deployment. The DMF Controller exposes interface counters, CPU usage, memory usage, sensor states, and disk usage statistics from all the devices including the Controllers from a single Prometheus endpoint.

Deployment

Figure 1. DMF Deployment with Prometheus Server

The aforementioned diagram shows a DMF deployment with an active/standby Controller cluster. In this environment, each Controller collects metrics from all the devices it manages as well as from both the Controller nodes. It then exposes them via the telemetry endpoint /api/v1/metrics/prometheus. This is an authenticated endpoint that listens on port 8443 and supports both Prometheus and OpenMetrics exposition formats.

Even though each Controller is capable of serving the telemetry information, it is recommended to use cluster's virtual IP (VIP) in the Prometheus configuration in order to achieve seamless continuity in the event of a Controller failover.

Configuration

No additional configuration is necessary on the DMF Controller to enable the metric collection. However, to allow the Prometheus service to access the new HTTP telemetry endpoint, a user access-token needs to be created. An admin user can choose an existing user or create a dedicated user with correct privileges and generate a token in order for Prometheus to fetch metrics from a fabric. The following sections describe the necessary configurations.

Permission

The group that the user belongs to needs to have sufficient permission to query the Prometheus endpoint. The following table summarizes the behavior.

Group Behavior
admin An access-token generated for a user in the admin group will have access to the telemetry endpoint.
read-only An access-token for a user in the read-only group will not have access to the telemetry endpoint.
Any custom group An access-token for a user in a group with TELEMETRY or DEFAULT permission but not with DEFAULT/SENSITIVE permission will have access.

To set telemetry permission for a custom group, use the following commands:

dmf-controller(config)# group group_name
dmf-controller(config)# permission category:TELEMETRY privilege read-only
dmf-controller(config)# associate user username

Access Token

Generate an access token for the user using the following command.

dmf-controller(config)# user username
dmf-controller(config-user)# access-token descriptive name for the access-token
access-token : ZxyHXL0QyOhDUogT8wjZj7ouSiVtWNB3

Prometheus Service

The following configuration needs to be added to the Prometheus server to fetch metrics from the Controller's /api/v1/metrics/prometheus endpoint periodically.

scrape_configs:
- job_name: <job_name>
scheme: https
authorization:
type: Bearer
credentials: <access-token>
metrics_path: /api/v1/metrics/prometheus
scrape_interval: <interval>
static_configs:
- targets:
- <vip>:8443
tls_config:
insecure_skip_verify: true

The table below depicts the recommended configurations for this feature.

Configuration Value
Credential The corresponding access token created on the Controller
Scrape Interval The minimum supported interval is 10s
Target Use the VIP of the DMF Controller cluster
TLS If a self-signed certificate is used on the Controller, add insecure_skip_verify: true

Please refer to the configuration guidelines of the specific Prometheus version you are using in the production.

Limitations

  • Every device does not support every metric. Check the Metrics Summary section for more details.
  • Software interfaces (for example, loopback, bond, and management) do not report counters for broadcast and unicast packets.
  • The reported interface names are the raw physical interface name (e.g., et1) rather than the user configured name associated with the role of an interface (e.g., filter1).
  • Resetting the interface counter does not have any effect on the counter values reported by the telemetry. The value is monotonically increasing and corresponds to the total count since the device was last powered up. This value only gets reset when the device is rebooted.

Notes

  • The configured name of a managed device (e.g., switch, recorder node etc.) on the Controller is used as the value of the device_name label for all the metrics corresponding to it. In the case of a Controller, the configured hostname is used in the device_name label. Thus, these names are expected to be unique in a specific DMF deployment.
  • It is possible that no metrics are collected from a device for a short time period. This may happen when the device is rebooting or when the Controllers experience a failover event.
  • Prometheus will add additional metrics, e.g., scrape_duration_seconds, scrape_samples_post_metric_relabeling, scrape_samples_scraped, and scrape_series_added. These metrics do not collect any data from the DMF fabric and can be ignored.
  • A metric representing a status that can take a fixed set of values is represented as a StateSet metric. Each possible state is reported as a separate metric. The current state is reported with value 1 and the other states with value 0. The state itself is reported as a label with the same name as that of the metric. For example, device_psu_oper_status will display multiple metrics for the operational status of a PSU (Power Supply Unit). If a PSU, psu1 is in the failed state, then the metric device_psu_oper_status{name="psu1", device_psu_oper_status="failed"} will report value 1. At the same time, we will see the metric device_psu_oper_status{name="psu1", device_psu_oper_status="ok"} with value 0 for another state ok.
  • Internally, all the metrics are fetched at 10 sec frequency except the ones associated with the sensors. These are currently collected at every minute.

Troubleshooting

If no metric is collected or no change in the metrics is visible on Prometheus for a few minutes, please follow the following troubleshooting steps:

  • The Controller cluster is reachable over its VIP and there is an elected active Controller.
  • You can retrieve the telemetry states in Prometheus exposition format using the token you created. Use command curl -k -H "Authorization: Bearer <token>" https://<vip>:8443/api/v1/metrics/prometheus
  • The telemetry connection to the devices is active by running show telemetry connection command. Check the Telemetry Collector section of the DANZ Monitoring Fabric User Guide for more details.

Resources

Appendix

Metrics Summary

The following table shows the details of each metric exposed by the DMF fabric. The supported metric column shows that if a metric is generally supported on the device type. However, some specific platforms or hardware might not report a specific metric.

Metric Description Supported device type
Ctrl SWL EOS SN RN
device_cgroup_cpu_percentage The normalized CPU utilization percentage by a control group Y Y N Y Y
device_cgroup_memory_bytes The memory used by a control group in bytes Y Y N Y Y
device_config_info The informational metrics of a managed device N Y Y Y Y
device_cpu_utilization_percentage The percentage utilization of a CPU on a device of the fabric Y Y Y Y Y
device_fan_oper_status The current operational status of a fan Y Y Y Y Y
device_fan_rpm The current rate of rotation of a fan Y Y Y Y Y
device_fan_speed_percentage The percentage of the max rotation capacity that a fan is currently rotating N Y N N N
device_interface_in_broadcast_packets_total The total number of broadcast packets received on the interface of a device Y Y Y Y Y
device_interface_in_discards_packets_total The number of discarded inbound packets by the interface of a device even though no errors had been detected Y Y Y Y Y
device_interface_in_errors_packets_total The number of inbound packets discarded at the interface of a device for errors Y Y Y Y Y
device_interface_in_fcs_errors_packets_total The number of received packets on the interface of a device with erroneous frame check sequence (FCS) Y Y Y Y Y
device_interface_in_multicast_packets_total The total number of multicast packets transmitted from the interface of a device Y Y Y Y Y
device_interface_in_octets_total The total number of octets received on the interface of a device Y Y Y Y Y
device_interface_in_packets_total The total number of packets received on the interface of a device Y Y Y Y Y
device_interface_in_unicast_packets_total The total number of unicast packets received on the interface of a device Y Y Y Y Y
device_interface_operational_status The operational state of an interface of a device Y Y Y Y Y
device_interface_out_broadcast_packets_total The total number of broadcast packets transmitted from the interface of a device Y Y Y Y Y
device_interface_out_discards_packets_total The number of discarded outbound packets at the interface of a device even though no errors had been detected. Y Y Y Y Y
device_interface_out_errors_packets_total The the number of outbound packets that could not be transmitted by the interface of a device because of errors Y Y Y Y Y
device_interface_out_multicast_packets_total The total number of multicast packets transmitted from the interface of a device Y Y Y Y Y
device_interface_out_octets_total The total number of octets transmitted from the interface of a device Y Y Y Y Y
device_interface_out_packets_total The total number of packets transmitted from the interface of a device Y Y Y Y Y
device_interface_out_unicast_packets_total The total number of unicast packets transmitted from the interface of a device Y Y Y Y Y
device_memory_available_bytes The current available memory at a device of the fabric Y Y Y Y Y
device_memory_total_bytes The total memory of this device Y Y N Y Y
device_memory_utilized_bytes The current memory utilization of a device of the fabric Y Y Y Y Y
device_mount_point_space_available_megabytes The amount of unused space on the filesystem Y Y N Y Y
device_mount_point_space_usage_percentage The used space in percentage Y Y N Y Y
device_mount_point_space_utilized_megabytes The amount of space currently in use on the filesystem Y Y N Y Y
device_mount_point_total_space_megabytes The total size of the initialized filesystem. Y Y N Y Y
device_psu_capacity_watts The maximum power capacity of a power supply N N Y N N
device_psu_input_current_amps The input current drawn by a power supply Y Y Y Y Y
device_psu_input_power_watts The input power to a power supply Y Y N Y Y
device_psu_input_voltage_volts The input voltage to a power supply Y Y Y Y Y
device_psu_oper_status The current operational status of a power supply unit Y Y Y Y Y
device_psu_output_current_amps The output current supplied by a power supply N Y Y N N
device_psu_output_power_watts The output power of a power supply N Y Y N N
device_psu_output_voltage_volts The output voltage supplied by a power supply N Y Y N N
device_thermal_oper_status The current operational status of a thermal sensor Y Y N Y Y
device_thermal_temperature_celsius The current temperature reported by a thermal sensor Y Y Y Y Y

* Ctrl = Controller, SWL = A switch running Switch Light

EOS = A switch running Arista EOS, SN = Service Node, RN = Recorder Node.

DMF Controller in Microsoft Azure

The DANZ Monitoring Fabric (DMF) Controller in Azure feature supports the operation of the Arista Networks DMF Controller on the Microsoft Azure platform and uses the Azure CLI or the Azure portal to launch the Virtual Machine (VM) running the DMF Controller.

Figure 1. Customer Azure Infrastructure

The DMF Controller in Azure feature enables the registration of VM deployments in Azure and supports auto-firstboot using Azure userData or customData.

Configuration

Configure Azure VMs auto-firstboot using customData or userData. There is no data merging from these sources, so provide the data via customData or userData, but not both.

Arista Networks recommends using customData as it provides a better security posture because it is available only during VM provisioning and requires sudo access to mount the virtual CDROM.

userData is less secure because it is available via Instance MetaData Service (IMDS) after provisioning and can be queried from the VM without any authorization restrictions.

If sshKey is configured for the admin account during Azure VM provisioning along with auto-firstboot parameters, then it is also configured for the admin user of the DMF Controllers.

The following table lists details of the first boot parameters for the auto-firstboot configuration.

Firstboot Parameters - Required Parameters

Key Description Valid Values
admin_password This is the password to set for the admin user. When joining an existing cluster node this will be the admin-password for the existing cluster node. string
recovery_password This is the password to set for the recovery user. string

Additional Parameters

Key Description Required Valid Values Default Value
hostname This is the hostname to set for the appliance. no string configured from Azure Instance Metadata Service
cluster_name This is the name to set for the cluster. no string Azure-DMF-Cluster
cluster_to_join This is the IP which firstboot will use to join an existing cluster. Omitting this parameter implies that the firstboot will create a new cluster.
Note: If this parameter is present ntp-servers, cluster-name, and cluster-description will be ignored. The existing cluster node will provide these values after joining.
no IP Address String  
cluster_description This is the description to set for the cluster. no string  

Networking Parameters

Key Description Required Valid Values Default Value
ip_stack What IP protocols should be set up for the appliance management NIC. no enum: ipv4, ipv6, dual-stack ipv4
ipv4_method How to setup IPv4 for the appliance management NIC. no enum: auto, manual auto
ipv4_address The static IPv4 address used for the appliance management NIC. only if ipv4-method is set to manual IPv4 Address String  
ipv4_prefix_length The prefix length for the IPv4 address subnet to use for the appliance management NIC. only if ipv4-method is set to manual 0..32  
ipv4_gateway The static IPv4 gateway to use for the appliance management NIC. no IPv4 Address String  
ipv6_method How to set up IPv6 for the appliance management NIC. no enum: auto, manual auto
ipv6_address The static IPv6 address to use for the appliance management NIC. only if ipv6-method is set to manual IPv6 Address String  
ipv6_prefix_length The prefix length for the IPv6 address subnet to use for the appliance management NIC. only if ipv6-method is set to manual 0..128  
ipv6_gateway The static IPv6 gateway to use for the appliance management NIC. no IPv6 Address String  
dns_servers The DNS servers for the cluster to use no List of IP address strings  
dns_search_domains The DNS search domains for the cluster to use. no List of the host names or FQDN strings  
ntp_servers The NTP servers for the cluster to use. no List of the host names of FQDN strings

0.bigswitch.pool.ntp.org

1.bigswitch.pool.ntp.org

2.bigswitch.pool.ntp.org

3.bigswitch.pool.ntp.org

Examples

{
"admin_password": "admin_user_password",
"recovery_password": "recovery_user_password"
}

Full List of Parameters

{
"admin-password": "admin_user_password",
"recovery_password": "recovery_user_password",
"hostname": "hostname",
"cluster_name": "cluster name",
"cluster_description": "cluster description",
"ip_stack": "dual-stack",
"ipv4_method": "manual",
"ipv4_address": "10.0.0.3",
"ipv4_prefix-length": "24",
"ipv4_gateway": "10.0.0.1",
"ipv6_method": "manual",
"ipv6_address": "be:ee::1",
"ipv6_prefix-length": "64",
"ipv6_gateway": "be:ee::100",
"dns_servers": [
"10.0.0.101",
"10.0.0.102"
],
"dns_search_domains": [
"dns-search1.com",
"dns-search2.com"
],
"ntp_servers": [
"1.ntp.server.com",
"2.ntp.server.com"
]
}

Limitations

The following limitations apply to the DANZ Monitoring Fabric (DMF) Controller in Microsoft Azure.

  • There is no support for any features specific to Azure-optimized Ubuntu Linux, including Accelerated Networking.

  • The DMF Controllers in Azure are only supported on Gen-1 VMs.

  • The DMF Controllers in Azure do not support adding the virtual IP address for the cluster.

  • There is no support for capture interfaces in Azure.

  • DMF ignores the Azure username and password fields.

  • There is no support for static IP address assignment that differs from what is configured on the Azure NIC.

  • The DMF Controllers are rebooted if the static IP on the NIC is updated.

Resources

Syslog Messages

  • Azure DMF Controller VMs can be accessed via an ssh login.

  • systemctl should be in a running state without any failed units for the Controllers to be in a healthy state as shown in the following example:

    dmf-controller-0-vm> debug bash
    admin@dmf-controller-0-vm:~$ sudo systemctl status
    dmf-controller-0-vm
    State: running
    Jobs: 0 queued
    Failed: 0 units

Troubleshooting

  • There are three possible failure modes:

    • VM fails Azure registration.

    • auto-firstboot fails due to a transient error or bug.

    • auto-firstboot parameter validation fails.

  • These failures can be debugged by accessing the firstboot logs, after manually booting the VM.
  • Azure DMF Controller VMs can be accessed via ssh. Firstboot logs can be accessed using debug bash as shown below:
    dmf-controller-0-vm> debug bash
    admin@dmf-controller-0-vm:~$ less /var/log/floodlight/firstboot/firstboot.log
  • For debugging parameter validation errors, access the parameter validation results using debug bash as shown below:
    dmf-controller-0-vm> debug bash
    admin@dmf-controller-0-vm:~$ less /var/lib/floodlight/firstboot/validation-results.json

Reforming Controller HA Cluster

Removing default IPv4 or IPv6 permit entry before adding a specific permit rule in the sync access list will permanently break communication between active and standby Controllers. Follow the procedure outlined in this appendix to recover from a split Controller HA cluster.

Controller Cluster Recovery

Controller-1 (IP:192.168.55.11, node-id:23955) is active and Controller-2 (IP:192.168.39.44, node-id:1618) is standby. Retrieve the Node-id using the show controller details command in the CLI.
DMF-CTL2(config-controller-access-list)# show controller details
Cluster Name : DMF-7050
Cluster UID : a5de38214971de42aa7b51b96ac7345f4f228b20
Cluster Virtual IP : 10.240.130.18
Redundancy Status : redundant
Redundancy Description : Cluster is Redundant
Last Role Change Time : 2022-11-05 00:56:04.862000 UTC
Cluster Uptime : 2 months, 1 week
# IPHostname @ Node Id Domain Id State StatusUptime
-|-------------|--------|-|-------|---------|-------|---------|---------------|
1 192.168.39.44 DMF-CTL2 * 16181 activeconnected 2 weeks, 2 days
2 192.168.55.11 DMF-CTL1 23955 1 standby connected 2 weeks, 2 days
~~~~~~~~~~~~~~~~~~~~~~~~~~~ Failover History ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# New Active Time completed Node Reason Description
-|----------|------------------------------|-----|---------------------|------------------|
1 220492022-11-05 00:55:35.994000 UTC 22049 cluster-config-change Changed connection
state: cluster configuration changed
Procedure
  1. [Controller-1] Add the sync 2 permit from the 0.0.0.0/0 rule.
  2. Controller-2 remains a standby so the user cannot add default rule to access-list sync until it transitions to active.
  3. [Controller-2] Run this command on Controller-2, system reset-connection switch all changingController-2 to active.
  4. [Controller-2] On Controller-2, add the default rule to access-list sync 2 permit from the 0.0.0.0/0 rule.
  5. [Controller-2] On controller-2, use the debug bash then run the following command:
    sudo bootstraptool -ks /etc/floodlight/auth_credentials.jceks --set 23955,192.168.55.11,
    6642
    Node id 23955 is the old active Controller-1 node-id and ip address is old active
    Controller-1 ip address.
  6. Wait for the cluster to reform.
  7. Controller-1 and Controller-2 may change their role after this recovery procedure, that is Controller-2 may become active.

Erasing DMF Appliance

Overwrite data to erase data stored on the DMF appliance securely. This section describes how to do this using the Dell LifeCycle Controller. There is an erase API for the DMF Recorder Node, but it does not securely remove the data from its disk. Instead, it unlinks files in the Index and Packet partitions so the file system can reclaim the space for future packets and indices. This approach was a design decision because unlinking files is faster than overwriting data. To securely erase data and to prevent anyone from accessing the data, use the following procedure.

Using the Dell Lifecycle Controller

  1. Restart the Recorder Node.
  2. During POST, press F10 to enter the Lifecycle Controller GUI.
    Figure 1. Dell Lifecycle Controller
  3. Select Hardware Configuration. Click on Repurpose or Retire System.
    Figure 2. Dell Lifecycle Controller

    The Retire or Repurpose System function enables the removal of data from the server by erasing server non-volatile stores and data stored on Hard Disk Drives (HDDs), Self-Encrypting Drive (SED), Instant Secure Erase (ISE), and Non-Volatile Memory drives (NVMes).

  4. Click on View Storage and Disks to display all the drives attached to the server supported for erasure. Only drives that can be erased and detected are displayed.
    Figure 3. Dell Lifecycle Controller Hardware Configuration
  5. Click Back and select to return to Step 1 (or 1a). Select Secure Erase Disk (Cryptographic Erase). Select Secure Erase Disk (Cryptographic Erase) and Standard Disks (Overwrite Data) if the system detects both types.
    Figure 4. Dell Lifecycle Controller > Repurpose/Retire a system
  6. Click Next to view the Step 2 Summary page, which displays the drives that will be erased. Click Finish.
    Figure 5. Dell Lifecycle Controller Repurpose/Retire a system
  7. Warning Physical Disk Erasure message appears about the erasure of the disks. Click Yes.
    Figure 6. Dell Lifecycle Controller Repurpose/Retire a system
  8. A Critical message indicates that the erasure process cannot be stopped once started. Click Yes.
    Figure 7. Dell Lifecycle Controller Repurpose/Retire a system
  9. The DMF appliance will reboot, and during POST, the display will show the Automated Task Application to erase the disks.
    Figure 8. DMF Appliance
  10. The Automated Task Application dialog box displays the task of erasing drives along with a progress bar. After the task finishes, the DMF Recorder Node turns off.
    Note: Depending on the amount of data on the DMF appliance, the disk erasure process can take some time.
    Figure 9. Dell Lifecycle Controller Repurpose/Retire a system

Installing a Controller VM

This appendix describes how to install and configure a VM for the DANZ Monitoring Fabric Controller.

 

General Requirements

The minimum hardware required for installing the VM software image on a supported Azure, KVM or ESXi version is as follows:
  • 8 vCPUs with a minimum scheduling of 2 GHz.
  • 32 GB of virtual memory.
  • 400 GB hard disk.
  • 1 virtual network interface reachable from the physical switches.
Note: The DANZ Monitoring Fabric Hardware Appliance is recommended for production deployment because VM performance depends on many factors in the hypervisor setup.

Installing on VMware ESXi/vSphere

Prerequisites

  • A DMF Controller image in OVA format (.ova file).
  • An ESXi host can be part of vSphere cluster. Refer to the DMF Hardware Compatibility List for supported versions of ESXi and vCenter.
  • A virtual network has already been created.
  • A machine with an installed vSphereClient or webClient.
Note: Do not use ESXi 5.1 GA. A known issue exists where installing a large VM causes the ESXi host to crash. Check the VMware website for more information.

VM Installation

To install the VM, complete the following steps:
  1. Log in to vCenter or ESXi host with vSphere Client. The following uses vCenter 6.7.
  2. Right-click on the host where the Controller VM will be deployed and select Deploy OVF Template.
  3. Browse or enter the path to the OVA file and click Next.
  4. Enter the name of the VM and click Next.
  5. Select Compute Resource and click Next.
  6. Leave the provision format at the default, select the datastore, and click Next.
  7. Select Network Mapping. Map to the network created for the DMF Controller and switches. Click Next.
  8. Click Next and click Finish.
  9. Start the VM and complete the steps in Chapter 2, Installing and Configuring the DMF Controller, to set up the Controller.

vMotion support for Virtual Controller

VMware vMotion is supported for Controller pairs. This support is offered only for vCenter 7.0.2.

The following are some additional points to remember when performing vMotion of the virtual Controllers:
  • vMotion can be performed on each of the Controllers in the HA pair individually at separate times.
  • Performing vMotion of the standby Controller first is recommended.
  • Always back up the Controller configuration before performing vMotion.

Installing on Ubuntu KVM

Prerequisites

  • DMF Controller virtual disk in qcow2 format (.qcow2 file).
  • Ubuntu host with Virtual Manage Manager installed.
  • Ubuntu host is connected to the management network via a bridge br0.

VM Installation

To install the VM, complete the following steps:
  1. Copy DMF-Controller-VM.qcow2 to /var/lib/libvirt/images/.
  2. Start Virtual Machine Manager,and choose Create a new virtual machine.
  3. Provide a name for the new VM and click Import existing disk image options. Click Forward to continue.
  4. Set the existing storage path to point to the provided DMF Controller image. Press Forward to continue.
  5. Set the Memory (RAM) and CPU. Allocate at least 4G RAM and 2 CPU instances for the image. Click Forward to continue.
  6. Select the checkbox Customize configuration before install. Expand Advanced options, change to Specify shared device name.
  7. Enter the Bridge name: br0, to bind the Controller virtual machine to the br0 bridge interface created previously. Click Finish to continue.
  8. Under the Processor section, expand Configuration. Select the Copy host CPU configuration option. This may improve performance dramatically, depending on your VM host. Click Apply to save the changes.
  9. Under the Disk 1 section, expand Advanced options and Performance options. Set the options on this page as follows:
    • Disk bus: VirtIO
    • Storage format: qcow2
    • Cache mode: Writeback
    • IO mode: default
  10. Click Apply to save the changes.
  11. Under the NIC section, set the Device model to virtio. Click Apply to save the changes.
  12. Select Begin installation to create the virtual machine.
  13. Follow the steps in Installing and Configuring the DMF Controller to set up the Controller.

Creating a USB Boot Image

This appendix provides details on how to create a bootable USB with Switch Light OS.

 

Creating the USB Boot Drive with MacOS X

Complete the following steps to create a bootable USB drive on MacOS X.

Procedure

  1. Insert the USB drive into a USB port on the Mac computer.
    Inserting the USB drive mounts the USB drive, but it must be unmounted to create a bootable disk.
  2. Open a Mac OS terminal window.
  3. Enter the diskutil command to list all the mounted disks, as in the following example:
    diskutil list
    Note: You can also use the MacOS Disk Utility GUI application (applications/utilities) to identify the mounted disks and unmount the USB drive.
  4. Identify the /dev/disk<x> label for the inserted USB drive.
  5. Unmount the USB drive (this is different than ejecting) using the following command.
    diskutil unmountdisk /dev/disk<x>
    Note: Replace <x> with the unique numeric identifier assigned by the system.
  6. Enter the sudo dd command in the terminal window to make the USB drive bootable.
    sudo dd if=<path to iso image> of=/dev/rdisk<x> bs=1024m
    Warning: Using the dd command with the wrong disk name can erase the installed OS or other vital information.

    Copy the Service Node appliance ISO image to the USB drive using this command. Using /dev/rdisk makes the copying faster (rdisk stands for a raw disk).

    Replace <x> with the drive identifier for the USB drive and replace <path to iso image> with the filename and path to the location where you downloaded the Service Node ISO image.

    For example, the following command copies the file dmf-service-node.iso to disk2:
    sudo dd if= dmf-service-node.iso of=/dev/rdisk2 bs=1024m

    Copying the image to the USB drive can take up to ten minutes.

    To monitor the progress of the write operation, enter the following command in a separate terminal window.
    $ while sudo killall -INFO dd; do sleep 5; done
    disk util eject

    Alternatively, select Eject from the File menu.

Creating the USB Boot Image with Linux

Complete the following steps to create a bootable USB drive using Linux.

Procedure

  1. Insert the USB drive into a USB port on the Linux workstation.
  2. Enter the following command in a Linux terminal window to identify the USB drive.
    disk -lu

    On Linux, the USB drive is typically /dev/sdb.

  3. Verify that the USB drive is not currently mounted, or unmount it if it is. Use the mount command to list the currently mounted devices.
  4. Use the sudo dd command to make the USB drive bootable by copying the Service Node ISO image.
    # sudo dd if=<path to iso image> of=/dev/sdb bs=4096
    Warning: Using the dd command with the wrong disk name can erase the installed OS or other vital information.
    Replace <path to iso image> with the filename and path to the location where you downloaded the Service Node ISO image. For example, the following command copies dmf-service-node.iso to the USB drive:
    # sudo dd if=dmf-service-node.iso of=/dev/sdb bs=4096

    Copying the image to the USB drive can take up to ten minutes.

  5. Eject the USB drive from your Linux workstation.

Creating a USB Boot Image Using Windows

Several Windows utilities are available for building a USB boot image from an ISO image. The following procedure uses the Rufus bootable image program.

To build a USB boot image using Windows, complete the following steps.

Procedure

  1. Download the Rufus utility from https://rufus.akeo.ie/.
  2. After downloading the utility, double-click the rufus.exe file.
    Figure 1. User Account Control
  3. Click Yes to allow the changes required for installation.
    Figure 2. Rufus: Create an ISO Image Option
  4. To create a bootable disk, select ISO Image.
    Figure 3. Rufus: Select ISO Image
  5. Click the CD-ROM icon.
    Figure 4. Open ISO Image File
  6. Select the file to use and click Open.
  7. Click Start to burn the ISO image to USB.
    Figure 5. Rufus: Start
    If an upgrade to syslinux is required, the system displays the following dialog box.
    Figure 6. User Account Control
  8. If this prompt appears, click Yes to continue.
  9. When prompted to use DD mode or ISO mode, choose ISO.
    The system displays a warning that the data on the USB drive will be destroyed, and a new image will be installed.
    Figure 7. Erasing Data Warning
  10. Click OK to confirm the operation.

Removing an Existing OS from a Switch

This appendix provides information on how to uninstall the OS from a switch.

Reverting from DMF (Switch Light OS) to EOS - 7050X3 and 7260CX3

This appendix details the procedure to revert the Arista 7050X and 7260X Series switch from Switch Light OS to EOS.

Procedure

To revert from SWLOS to EOS, complete the following steps:

  1. Connect to the switch via a serial connection and confirm that a SWLOS is installed. The serial console output of the switch should approximate the following example.
    Connected to 10.240.130.2.
    Escape character is '^]'.
    Switch Light OS SWL-OS-DMF-8.2.0(0), 2022-03-25.05:22-fdf3fa6
    Site1-F1 login:
  2. Log in to the switch via a serial connection and reboot the switch.
  3. Enter Aboot by interrupting the boot process by pressing Control-C.
    Watchdog enabled, will fire in 2 mins
    CBFS: 'Master Header Locator' located CBFS at [200:ffffc0)
    CBFS: Locating 'normal/romstage'
    CBFS: Found @ offset 5b3d40 size 7b7c
    Aboot 9.0.3-4core-14223577
    Press Control-C now to enter Aboot shell
    ^CWelcome to Aboot.
    Aboot#
  4. Once in Aboot, change the directory to /mnt/flash.
    Press Control-C now to enter Aboot shell
    ^CWelcome to Aboot.
    Aboot#
    Aboot# cd /mnt/flash
    Aboot# pwd
    /mnt/flash
    Aboot#
  5. List the files under the /mnt/flash directory and check the boot-config file for current swi.
    Aboot# pwd
    /mnt/flash
    Aboot# ls
    AsuFastPktTransmit.log SsuRestoreLegacy.log lost+found
    EOS-4.23.3M.swi aboot-chainloader.swi onl
    Fossil boot-config persist
    SWL-INSTALLER.swi debug schedule
    SsuRestore.log fastpkttx.backup startup-config
    Aboot#
    Aboot# cat boot-config
    SWI=flash:aboot-chainloader.swi
    Aboot#
  6. Edit the boot-config file to point to the existing EOS swi under the /mnt/flash directory.
    Aboot# vi boot-config
    Aboot#
    SWI=flash:aboot-chainloader.swi
    ~
    SWI=flash:/EOS-4.23.3M.swi
    ~
    Aboot# cat boot-config
    SWI=flash:/EOS-4.23.0F.swi
  7. Reboot the switch. The switch should boot up with the EOS image.
    Aboot# reboot
    Requesting system reboot
    Press Control-C now to enter Aboot shell
    Booting flash:/EOS-4.23.0F.swi
    [ 13.231125] Starting new kernel
    starting version 219
    Failed to apply ACL on /dev/kvm: Operation not supported
    Welcome to Arista Networks EOS 4.23.0F

Reverting from DMF (EOS) to EOS - 7280R

This appendix details the procedure to revert the Arista 7280R Series switch from DMF EOS to UCN EOS.

Procedure

To revert from DMF (EOS) to UCN EOS, complete the following steps:
  1. Log in to the switch via a serial connection and reboot the switch.
    Connected to 10.240.130.2.
    Escape character is '^]'.
    SAND-3 login: admin
    Output to this terminal is being recorded for diagnostic purposes.
    Note that only output that is visible on the console is recorded.
    SAND-3>en
    SAND-3#reload
    ! Signing certificate used to sign SWI is not signed by root certificate.
    Proceed with reload? [confirm]
  2. Enter Aboot and interrupt the boot process by pressing Control-C.
    agesawrapper_amdinitearly() returned AGESA_SUCCESS
    Watchdog enabled, will fire in 2 mins
    CBFS: 'Master Header Locator' located CBFS at [200:ffffc0)
    CBFS: Locating 'normal/romstage'
    CBFS: Found @ offset 5b3d40 size 7b7c
    Aboot 9.0.3-4core-14223577
    Press Control-C now to enter Aboot shell
    ^CWelcome to Aboot.
    Aboot#
  3. Once in Aboot, change the directory to /mnt/flash.
    Press Control-C now to enter Aboot shell
    ^CWelcome to Aboot.
    Aboot#
    Aboot# cd /mnt/flash
    Aboot# pwd
    /mnt/flash
    Aboot#
  4. List the files under the /mnt/flash directory and check the boot-config file for current swi.
    Aboot# pwd
    /mnt/flash
    Aboot# ls
    AsuFastPktTransmit.log
    EOS-4.25.2F.swi
    EOS-4.27.2F-26021868.uppsaladmfrel-x86_64.swi
    EOS-4.27.2F-26021868.uppsaladmfrel-x86_64.swi.tmp
    Fossil
    SsuRestore.log
    SsuRestoreLegacy.log
    boot-config
    debug
    fastpkttx.backup
    x86_64
    lost+found
    persist
    schedule
    startup-config
    zerotouch-config
    ztn-boot-info
    Aboot#
    Aboot# cat boot-config
    SWI=flash:/EOS-4.27.2F-26021868.uppsaladmfrel-x86_64.swi
    Aboot#
  5. Remove the startup-config file and edit the boot-config file to point to the existing EOS swi under the /mnt/flash directory.
    Aboot# rm -rf startup-config
    Aboot# vi boot-config
    Aboot#
    SWI=flash:aboot-chainloader.swi
    ~
    SWI=flash:/EOS-4.25.2F.swi
    ~
    Aboot# cat boot-config
    SWI=flash:/EOS-4.25.2F.swi
  6. Reboot the switch. The switch should boot up with the EOS image.
    Aboot# reboot
    Requesting system reboot
    Press Control-C now to enter Aboot shell
    Booting flash:/EOS-4.25.2F.swi
    [ 13.231125] Starting new kernel
    starting version 219
    Failed to apply ACL on /dev/kvm: Operation not supported
    Welcome to Arista Networks EOS 4.25.2F

Removing the existing OS from a Switch

Some switch platforms may have a preexisting operating system (OS) installed. When installing the Switch Light OS on top of an existing OS, there is a chance of failure. To avoid this issue, first uninstall any existing OS on the switch.

For example, to use a Dell switch with Force 10 OS (FTOS) pre-installed, remove FTOS before installing Switch Light OS. If FTOS is not first deleted, Switch Light installation may fail.

When you boot the switch, if only ONIE options are listed in the switch GNU GRUB boot menu, the switch does not have an existing OS installed. The following example shows the prompts that indicate no OS is installed on the switch.

Example 4: ONIE Prompts for a switch without an OS installed.
GNU GRUB version 2.02~beta2+e4a1fe391
+----------------------------------------------------------------------------+
|*ONIE: Install OS |
| ONIE: Rescue |
| ONIE: Uninstall OS |
| ONIE: Update ONIE |
| ONIE: Embed ONIE |
| |
+----------------------------------------------------------------------------+

If the switch prompt looks something like this, skip this section and proceed directly to the following section to install Switch Light OS.

Procedure

To delete FTOS from a Dell switch, complete the following steps:

  1. Confirm an OS is installed on the switch.
  2. Another OS exists if other options besides ONIE appear in the boot menu. The following example shows the options provided by FTOS installed on a Dell switch.
    GNU GRUB version 2.02~beta2+e4a1fe391
    +----------------------------------------------------------------------------+
    |*FTOS |
    | FTOS-Boot Line Interface |
    | DIAG-OS |
    | ONIE |
    | |
    +----------------------------------------------------------------------------+
  3. After the switch has booted and the prompt for FTOS is displayed, change to enable mode.
    DellEMC>enable
    The SupportAssist EULA acceptance option has not been selected. SupportAssist can be
    enabled once the SupportAssist EULA has been accepted. Use the: 'support-assist activate
    ' command to accept EULA and enable SupportAssist.
    DellEMC#Feb 13 22:36:44 %STKUNIT1-M:CP %SEC-4-ENABLE_PASSW_NOT_CONFIGURED: Enable password
    is required for authentication but
    not configured (by default from console)
    Feb 13 22:36:44 %STKUNIT1-M:CP %SEC-5-AUTHENTICATION_ENABLE_SUCCESS: Enable authentication
    success on console
    DellEMC#
  4. Reload the switch, do not save the configuration, and confirm the operation when prompted.
    DellEMC# reload
    System configuration has been modified. Save? [yes/no]: no
    Proceed with reload [confirm yes/no]: yes
    The following messages are displayed.
    Feb 13 22:37:17 %STKUNIT1-M:CP %CHMGR-5-RELOAD: User request to reload the chassis syncing
    disks... done
    unmounting file systems...
  5. To uninstall the OS, choose ONIE from the GNU GRUB boot menu, as shown in the following example.
    GNU GRUB version 2.02~beta2+e4a1fe391
    +----------------------------------------------------------------------------+
    |*FTOS |
    | FTOS-Boot Line Interface |
    | DIAG-OS |
    |*ONIE |
    | |
    +----------------------------------------------------------------------------+
    The ONIE submenu is displayed, as shown in the following example.
    GNU GRUB version 2.02~beta2+e4a1fe391
    +----------------------------------------------------------------------------+
    | ONIE: Install OS |
    | ONIE: Rescue |
    |*ONIE: Uninstall OS |
    | ONIE: Update ONIE |
    | ONIE: Embed ONIE |
    | EDA-DIAG |
    | |
    +----------------------------------------------------------------------------+
  6. From the ONIE submenu, choose ONIE Uninstall OS.

    The uninstall process can take up to 15 minutes. After completion, the switch will automatically reboot.

    The OS Uninstall log is displayed, starting with information about the existing OS, as shown in the following example.
    ONIE: OS Uninstall Mode ...
    Version : 3.27.1.1
    Build Date: 2016-09-07T16:44-0700
    Info: Mounting kernel filesystems... done.
    Info: Mounting ONIE-BOOT on /mnt/onie-boot ...
    <SNIP>
    The following messages are displayed when the process is complete and the switch reboots.
    Requesting system reboot
    sd 4:0:0:0: [sda] Synchronizing SCSI cache
    reboot: Restarting system
    reboot: machine restart
Note: FTOS was successfully uninstalled if only the ONIE options are shown after the switch reboots, as in Example 1 at the beginning of this section. When you see only ONIE options, proceed to the next section to install Switch Light OS.

Switch ONIE Upgrade Procedure

For detailed instructions on updating ONIE (Open Network Install Environment) on the Dell switches, please refer to the Firmware Upgrade for Dell Switches document at https://www.arista.com/assets/data/pdf/Dell-Switches-Firmware-Upgrade-Manual.pdf.

ONIE is a small operating system typically pre-installed as firmware on bare metal network switches. It provides an environment for automated provisioning.