References

Prometheus Endpoint Support for Infrastructure Metrics

Prometheus is an open source monitoring and alerting toolkit. It collects and stores metrics from different sources in a time-series database. Prometheus offers a powerful query language, which allows users to analyze and visualize the collected data in real-time. With its robust alerting system, Prometheus can also notify users of potential issues which helps with their timely resolution.

Starting with the DMF 8.5.0 release, the Prometheus server can scrape metrics from a DMF (DANZ Monitoring Fabric) deployment. The DMF Controller exposes interface counters, CPU usage, memory usage, sensor states, and disk usage statistics from all the devices including the Controllers from a single Prometheus endpoint.

Deployment

The aforementioned diagram shows a DMF deployment with an active/standby Controller cluster. In this environment, each Controller collects metrics from all the devices it manages as well as from both the Controller nodes. It then exposes them via the telemetry endpoint /api/v1/metrics/prometheus. This is an authenticated endpoint that listens on port 8443 and supports both Prometheus and OpenMetrics exposition formats.

Even though each Controller is capable of serving the telemetry information, it is recommended to use cluster's virtual IP (VIP) in the Prometheus configuration in order to achieve seamless continuity in the event of a Controller failover.

Configuration

No additional configuration is necessary on the DMF Controller to enable the metric collection. However, to allow the Prometheus service to access the new HTTP telemetry endpoint, a user access-token needs to be created. An admin user can choose an existing user or create a dedicated user with correct privileges and generate a token in order for Prometheus to fetch metrics from a fabric. The following sections describe the necessary configurations.

Permission

The group that the user belongs to needs to have sufficient permission to query the Prometheus endpoint. The following table summarizes the behavior.


Group	Behavior
`admin`	An access-token generated for a user in the `admin` group will have access to the telemetry endpoint.
`read-only`	An access-token for a user in the `read-only` group will not have access to the telemetry endpoint.
Any custom group	An access-token for a user in a group with TELEMETRY or DEFAULT permission but not with DEFAULT/SENSITIVE permission will have access.

To set telemetry permission for a custom group, use the following commands:

dmf-controller(config)# group group_name
dmf-controller(config)# permission category:TELEMETRY privilege read-only
dmf-controller(config)# associate user username

Access Token

Generate an access token for the user using the following command.

dmf-controller(config)# user username
dmf-controller(config-user)# access-token descriptive name for the access-token
access-token : ZxyHXL0QyOhDUogT8wjZj7ouSiVtWNB3

Prometheus Service

The following configuration needs to be added to the Prometheus server to fetch metrics from the Controller's /api/v1/metrics/prometheus endpoint periodically.

scrape_configs:
- job_name: <job_name>
scheme: https
authorization:
type: Bearer
credentials: <access-token>
metrics_path: /api/v1/metrics/prometheus
scrape_interval: <interval>
static_configs:
- targets:
- <vip>:8443
tls_config:
insecure_skip_verify: true

The table below depicts the recommended configurations for this feature.


Configuration	Value
Credential	The corresponding access token created on the Controller
Scrape Interval	The minimum supported interval is 10s
Target	Use the VIP of the DMF Controller cluster
TLS	If a self-signed certificate is used on the Controller, add `insecure_skip_verify: true`

Please refer to the configuration guidelines of the specific Prometheus version you are using in the production.

Limitations

Every device does not support every metric. Check the Metrics Summary section for more details.
Software interfaces (for example, loopback, bond, and management) do not report counters for broadcast and unicast packets.
The reported interface names are the raw physical interface name (e.g., et1) rather than the user configured name associated with the role of an interface (e.g., filter1).
Resetting the interface counter does not have any effect on the counter values reported by the telemetry. The value is monotonically increasing and corresponds to the total count since the device was last powered up. This value only gets reset when the device is rebooted.

Notes

The configured name of a managed device (e.g., switch, recorder node etc.) on the Controller is used as the value of the device_name label for all the metrics corresponding to it. In the case of a Controller, the configured hostname is used in the device_name label. Thus, these names are expected to be unique in a specific DMF deployment.
It is possible that no metrics are collected from a device for a short time period. This may happen when the device is rebooting or when the Controllers experience a failover event.
Prometheus will add additional metrics, e.g., scrape_duration_seconds, scrape_samples_post_metric_relabeling, scrape_samples_scraped, and scrape_series_added. These metrics do not collect any data from the DMF fabric and can be ignored.
A metric representing a status that can take a fixed set of values is represented as a StateSet metric. Each possible state is reported as a separate metric. The current state is reported with value 1 and the other states with value 0. The state itself is reported as a label with the same name as that of the metric. For example, device_psu_oper_status will display multiple metrics for the operational status of a PSU (Power Supply Unit). If a PSU, psu1 is in the failed state, then the metric device_psu_oper_status{name="psu1", device_psu_oper_status="failed"} will report value 1. At the same time, we will see the metric device_psu_oper_status{name="psu1", device_psu_oper_status="ok"} with value 0 for another state ok.
Internally, all the metrics are fetched at 10 sec frequency except the ones associated with the sensors. These are currently collected at every minute.

Troubleshooting

If no metric is collected or no change in the metrics is visible on Prometheus for a few minutes, please follow the following troubleshooting steps:

The Controller cluster is reachable over its VIP and there is an elected active Controller.
You can retrieve the telemetry states in Prometheus exposition format using the token you created. Use command curl -k -H "Authorization: Bearer <token>" https://<vip>:8443/api/v1/metrics/prometheus
The telemetry connection to the devices is active by running show telemetry connection command. Check the Telemetry Collector section of the DANZ Monitoring Fabric User Guide for more details.

Resources

Appendix

Metrics Summary

The following table shows the details of each metric exposed by the DMF fabric. The supported metric column shows that if a metric is generally supported on the device type. However, some specific platforms or hardware might not report a specific metric.

Metric	Description	Supported device type
Metric	Description	Ctrl	SWL	EOS	SN	RN
device_cgroup_cpu_percentage	The normalized CPU utilization percentage by a control group	Y	Y	N	Y	Y
device_cgroup_memory_bytes	The memory used by a control group in bytes	Y	Y	N	Y	Y
device_config_info	The informational metrics of a managed device	N	Y	Y	Y	Y
device_cpu_utilization_percentage	The percentage utilization of a CPU on a device of the fabric	Y	Y	Y	Y	Y
device_fan_oper_status	The current operational status of a fan	Y	Y	Y	Y	Y
device_fan_rpm	The current rate of rotation of a fan	Y	Y	Y	Y	Y
device_fan_speed_percentage	The percentage of the max rotation capacity that a fan is currently rotating	N	Y	N	N	N
device_interface_in_broadcast_packets_total	The total number of broadcast packets received on the interface of a device	Y	Y	Y	Y	Y
device_interface_in_discards_packets_total	The number of discarded inbound packets by the interface of a device even though no errors had been detected	Y	Y	Y	Y	Y
device_interface_in_errors_packets_total	The number of inbound packets discarded at the interface of a device for errors	Y	Y	Y	Y	Y
device_interface_in_fcs_errors_packets_total	The number of received packets on the interface of a device with erroneous frame check sequence (FCS)	Y	Y	Y	Y	Y
device_interface_in_multicast_packets_total	The total number of multicast packets transmitted from the interface of a device	Y	Y	Y	Y	Y
device_interface_in_octets_total	The total number of octets received on the interface of a device	Y	Y	Y	Y	Y
device_interface_in_packets_total	The total number of packets received on the interface of a device	Y	Y	Y	Y	Y
device_interface_in_unicast_packets_total	The total number of unicast packets received on the interface of a device	Y	Y	Y	Y	Y
device_interface_operational_status	The operational state of an interface of a device	Y	Y	Y	Y	Y
device_interface_out_broadcast_packets_total	The total number of broadcast packets transmitted from the interface of a device	Y	Y	Y	Y	Y
device_interface_out_discards_packets_total	The number of discarded outbound packets at the interface of a device even though no errors had been detected.	Y	Y	Y	Y	Y
device_interface_out_errors_packets_total	The the number of outbound packets that could not be transmitted by the interface of a device because of errors	Y	Y	Y	Y	Y
device_interface_out_multicast_packets_total	The total number of multicast packets transmitted from the interface of a device	Y	Y	Y	Y	Y
device_interface_out_octets_total	The total number of octets transmitted from the interface of a device	Y	Y	Y	Y	Y
device_interface_out_packets_total	The total number of packets transmitted from the interface of a device	Y	Y	Y	Y	Y
device_interface_out_unicast_packets_total	The total number of unicast packets transmitted from the interface of a device	Y	Y	Y	Y	Y
device_memory_available_bytes	The current available memory at a device of the fabric	Y	Y	Y	Y	Y
device_memory_total_bytes	The total memory of this device	Y	Y	N	Y	Y
device_memory_utilized_bytes	The current memory utilization of a device of the fabric	Y	Y	Y	Y	Y
device_mount_point_space_available_megabytes	The amount of unused space on the filesystem	Y	Y	N	Y	Y
device_mount_point_space_usage_percentage	The used space in percentage	Y	Y	N	Y	Y
device_mount_point_space_utilized_megabytes	The amount of space currently in use on the filesystem	Y	Y	N	Y	Y
device_mount_point_total_space_megabytes	The total size of the initialized filesystem.	Y	Y	N	Y	Y
device_psu_capacity_watts	The maximum power capacity of a power supply	N	N	Y	N	N
device_psu_input_current_amps	The input current drawn by a power supply	Y	Y	Y	Y	Y
device_psu_input_power_watts	The input power to a power supply	Y	Y	N	Y	Y
device_psu_input_voltage_volts	The input voltage to a power supply	Y	Y	Y	Y	Y
device_psu_oper_status	The current operational status of a power supply unit	Y	Y	Y	Y	Y
device_psu_output_current_amps	The output current supplied by a power supply	N	Y	Y	N	N
device_psu_output_power_watts	The output power of a power supply	N	Y	Y	N	N
device_psu_output_voltage_volts	The output voltage supplied by a power supply	N	Y	Y	N	N
device_thermal_oper_status	The current operational status of a thermal sensor	Y	Y	N	Y	Y
device_thermal_temperature_celsius	The current temperature reported by a thermal sensor	Y	Y	Y	Y	Y

* Ctrl = Controller, SWL = A switch running Switch Light

EOS = A switch running Arista EOS, SN = Service Node, RN = Recorder Node.

DMF Controller in Microsoft Azure

The DANZ Monitoring Fabric (DMF) Controller in Azure feature supports the operation of the Arista Networks DMF Controller on the Microsoft Azure platform and uses the Azure CLI or the Azure portal to launch the Virtual Machine (VM) running the DMF Controller.

The DMF Controller in Azure feature enables the registration of VM deployments in Azure and supports auto-firstboot using Azure userData or customData.

Configuration

Configure Azure VMs auto-firstboot using customData or userData. There is no data merging from these sources, so provide the data via customData or userData, but not both.

Arista Networks recommends using customData as it provides a better security posture because it is available only during VM provisioning and requires sudo access to mount the virtual CDROM.

userData is less secure because it is available via Instance MetaData Service (IMDS) after provisioning and can be queried from the VM without any authorization restrictions.

If sshKey is configured for the admin account during Azure VM provisioning along with auto-firstboot parameters, then it is also configured for the admin user of the DMF Controllers.

The following table lists details of the first boot parameters for the auto-firstboot configuration.

Firstboot Parameters - Required Parameters

Key	Description	Valid Values
admin_password	This is the password to set for the admin user. When joining an existing cluster node this will be the admin-password for the existing cluster node.	string
recovery_password	This is the password to set for the recovery user.	string

Additional Parameters

Key	Description	Required	Valid Values	Default Value
hostname	This is the hostname to set for the appliance.	no	string	configured from Azure Instance Metadata Service
cluster_name	This is the name to set for the cluster.	no	string	Azure-DMF-Cluster
cluster_to_join	This is the IP which firstboot will use to join an existing cluster. Omitting this parameter implies that the firstboot will create a new cluster. Note: If this parameter is present ntp-servers, cluster-name, and cluster-description will be ignored. The existing cluster node will provide these values after joining.	no	IP Address String
cluster_description	This is the description to set for the cluster.	no	string

Networking Parameters

Key	Description	Required	Valid Values	Default Value
ip_stack	What IP protocols should be set up for the appliance management NIC.	no	enum: ipv4, ipv6, dual-stack	ipv4
ipv4_method	How to setup IPv4 for the appliance management NIC.	no	enum: auto, manual	auto
ipv4_address	The static IPv4 address used for the appliance management NIC.	only if ipv4-method is set to manual	IPv4 Address String
ipv4_prefix_length	The prefix length for the IPv4 address subnet to use for the appliance management NIC.	only if ipv4-method is set to manual	0..32
ipv4_gateway	The static IPv4 gateway to use for the appliance management NIC.	no	IPv4 Address String
ipv6_method	How to set up IPv6 for the appliance management NIC.	no	enum: auto, manual	auto
ipv6_address	The static IPv6 address to use for the appliance management NIC.	only if ipv6-method is set to manual	IPv6 Address String
ipv6_prefix_length	The prefix length for the IPv6 address subnet to use for the appliance management NIC.	only if ipv6-method is set to manual	0..128
ipv6_gateway	The static IPv6 gateway to use for the appliance management NIC.	no	IPv6 Address String
dns_servers	The DNS servers for the cluster to use	no	List of IP address strings
dns_search_domains	The DNS search domains for the cluster to use.	no	List of the host names or FQDN strings
ntp_servers	The NTP servers for the cluster to use.	no	List of the host names of FQDN strings `0.bigswitch.pool.ntp.org` `1.bigswitch.pool.ntp.org` `2.bigswitch.pool.ntp.org` `3.bigswitch.pool.ntp.org`

Examples

{
"admin_password": "admin_user_password",
"recovery_password": "recovery_user_password"
}

Full List of Parameters

{
"admin-password": "admin_user_password",
"recovery_password": "recovery_user_password",
"hostname": "hostname",
"cluster_name": "cluster name",
"cluster_description": "cluster description",
"ip_stack": "dual-stack",
"ipv4_method": "manual",
"ipv4_address": "10.0.0.3",
"ipv4_prefix-length": "24",
"ipv4_gateway": "10.0.0.1",
"ipv6_method": "manual",
"ipv6_address": "be:ee::1",
"ipv6_prefix-length": "64",
"ipv6_gateway": "be:ee::100",
"dns_servers": [
"10.0.0.101",
"10.0.0.102"
],
"dns_search_domains": [
"dns-search1.com",
"dns-search2.com"
],
"ntp_servers": [
"1.ntp.server.com",
"2.ntp.server.com"
]
}

Limitations

The following limitations apply to the DANZ Monitoring Fabric (DMF) Controller in Microsoft Azure.

There is no support for any features specific to Azure-optimized Ubuntu Linux, including Accelerated Networking.
The DMF Controllers in Azure are only supported on Gen-1 VMs.
The DMF Controllers in Azure do not support adding the virtual IP address for the cluster.
There is no support for capture interfaces in Azure.
DMF ignores the Azure username and password fields.
There is no support for static IP address assignment that differs from what is configured on the Azure NIC.
The DMF Controllers are rebooted if the static IP on the NIC is updated.

Resources

Please refer to the following resources for more information.

Azure user data details: https://learn.microsoft.com/en-us/azure/virtual-machines/user-data
Azure custom data details: https://learn.microsoft.com/en-us/azure/virtual-machines/custom-data
Azure Gen1 vs Gen2 VMs: https://learn.microsoft.com/en-us/azure/virtual-machines/generation-2
Azure optimized Ubuntu Linux features: https://ubuntu.com/blog/microsoft-and-canonical-increase-velocity-with-azure-tailored-kernel
Azure NIC assignment behavior: https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/reset-network-interface-azure-linux-vm

Syslog Messages

Azure DMF Controller VMs can be accessed via an ssh login.

systemctl should be in a running state without any failed units for the Controllers to be in a healthy state as shown in the following example:

dmf-controller-0-vm> debug bash
admin@dmf-controller-0-vm:~$ sudo systemctl status
dmf-controller-0-vm
State: running
Jobs: 0 queued
Failed: 0 units

Troubleshooting

There are three possible failure modes:
- VM fails Azure registration.
- auto-firstboot fails due to a transient error or bug.
- auto-firstboot parameter validation fails.
These failures can be debugged by accessing the firstboot logs, after manually booting the VM.

Azure DMF Controller VMs can be accessed via ssh. Firstboot logs can be accessed using debug bash as shown below:

dmf-controller-0-vm> debug bash
admin@dmf-controller-0-vm:~$ less /var/log/floodlight/firstboot/firstboot.log

For debugging parameter validation errors, access the parameter validation results using debug bash as shown below:

dmf-controller-0-vm> debug bash
admin@dmf-controller-0-vm:~$ less /var/lib/floodlight/firstboot/validation-results.json

Reforming Controller HA Cluster

Removing default IPv4 or IPv6 permit entry before adding a specific permit rule in the sync access list will permanently break communication between active and standby Controllers. Follow the procedure outlined in this appendix to recover from a split Controller HA cluster.

Controller Cluster Recovery

Controller-1 (IP:192.168.55.11, node-id:23955) is active and Controller-2 (IP:192.168.39.44, node-id:1618) is standby. Retrieve the Node-id using the show controller details command in the CLI.

DMF-CTL2(config-controller-access-list)# show controller details
Cluster Name : DMF-7050
Cluster UID : a5de38214971de42aa7b51b96ac7345f4f228b20
Cluster Virtual IP : 10.240.130.18
Redundancy Status : redundant
Redundancy Description : Cluster is Redundant
Last Role Change Time : 2022-11-05 00:56:04.862000 UTC
Cluster Uptime : 2 months, 1 week
# IPHostname @ Node Id Domain Id State StatusUptime
-|-------------|--------|-|-------|---------|-------|---------|---------------|
1 192.168.39.44 DMF-CTL2 * 16181 activeconnected 2 weeks, 2 days
2 192.168.55.11 DMF-CTL1 23955 1 standby connected 2 weeks, 2 days
~~~~~~~~~~~~~~~~~~~~~~~~~~~ Failover History ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# New Active Time completed Node Reason Description
-|----------|------------------------------|-----|---------------------|------------------|
1 220492022-11-05 00:55:35.994000 UTC 22049 cluster-config-change Changed connection
state: cluster configuration changed

Procedure

[Controller-1] Add the sync 2 permit from the 0.0.0.0/0 rule.
Controller-2 remains a standby so the user cannot add default rule to access-list sync until it transitions to active.
[Controller-2] Run this command on Controller-2, system reset-connection switch all changingController-2 to active.
[Controller-2] On Controller-2, add the default rule to access-list sync 2 permit from the 0.0.0.0/0 rule.

[Controller-2] On controller-2, use the debug bash then run the following command:

sudo bootstraptool -ks /etc/floodlight/auth_credentials.jceks --set 23955,192.168.55.11,
6642
Node id 23955 is the old active Controller-1 node-id and ip address is old active
Controller-1 ip address.

Wait for the cluster to reform.
Controller-1 and Controller-2 may change their role after this recovery procedure, that is Controller-2 may become active.

Erasing DMF Appliance

Overwrite data to erase data stored on the DMF appliance securely. This section describes how to do this using the Dell LifeCycle Controller. There is an erase API for the DMF Recorder Node, but it does not securely remove the data from its disk. Instead, it unlinks files in the Index and Packet partitions so the file system can reclaim the space for future packets and indices. This approach was a design decision because unlinking files is faster than overwriting data. To securely erase data and to prevent anyone from accessing the data, use the following procedure.

Using the Dell Lifecycle Controller

Restart the Recorder Node.
During POST, press F10 to enter the Lifecycle Controller GUI.

Figure 1. Dell Lifecycle Controller
Select Hardware Configuration. Click on Repurpose or Retire System.

Figure 2. Dell Lifecycle Controller

The Retire or Repurpose System function enables the removal of data from the server by erasing server non-volatile stores and data stored on Hard Disk Drives (HDDs), Self-Encrypting Drive (SED), Instant Secure Erase (ISE), and Non-Volatile Memory drives (NVMes).
Click on View Storage and Disks to display all the drives attached to the server supported for erasure. Only drives that can be erased and detected are displayed.

Figure 3. Dell Lifecycle Controller Hardware Configuration
Click Back and select to return to Step 1 (or 1a). Select Secure Erase Disk (Cryptographic Erase). Select Secure Erase Disk (Cryptographic Erase) and Standard Disks (Overwrite Data) if the system detects both types.

Figure 4. Dell Lifecycle Controller > Repurpose/Retire a system
Click Next to view the Step 2 Summary page, which displays the drives that will be erased. Click Finish.

Figure 5. Dell Lifecycle Controller Repurpose/Retire a system
Warning Physical Disk Erasure message appears about the erasure of the disks. Click Yes.

Figure 6. Dell Lifecycle Controller Repurpose/Retire a system
A Critical message indicates that the erasure process cannot be stopped once started. Click Yes.

Figure 7. Dell Lifecycle Controller Repurpose/Retire a system
The DMF appliance will reboot, and during POST, the display will show the Automated Task Application to erase the disks.

Figure 8. DMF Appliance
The Automated Task Application dialog box displays the task of erasing drives along with a progress bar. After the task finishes, the DMF Recorder Node turns off.

Note: Depending on the amount of data on the DMF appliance, the disk erasure process can take some time.

Figure 9. Dell Lifecycle Controller Repurpose/Retire a system

Installing a Controller VM

This appendix describes how to install and configure a VM for the DANZ Monitoring Fabric Controller.

General Requirements

The minimum hardware required for installing the VM software image on a supported Azure, KVM or ESXi version is as follows:

8 vCPUs with a minimum scheduling of 2 GHz.
32 GB of virtual memory.
400 GB hard disk.
1 virtual network interface reachable from the physical switches.

Note: The DANZ Monitoring Fabric Hardware Appliance is recommended for production deployment because VM performance depends on many factors in the hypervisor setup.

Installing on VMware ESXi/vSphere

Prerequisites

A DMF Controller image in OVA format (.ova file).
An ESXi host can be part of vSphere cluster. Refer to the DMF Hardware Compatibility List for supported versions of ESXi and vCenter.
A virtual network has already been created.
A machine with an installed vSphereClient or webClient.

Note: Do not use ESXi 5.1 GA. A known issue exists where installing a large VM causes the ESXi host to crash. Check the VMware website for more information.

VM Installation

To install the VM, complete the following steps:

Log in to vCenter or ESXi host with vSphere Client. The following uses vCenter 6.7.
Right-click on the host where the Controller VM will be deployed and select Deploy OVF Template.
Browse or enter the path to the OVA file and click Next.
Enter the name of the VM and click Next.
Select Compute Resource and click Next.
Leave the provision format at the default, select the datastore, and click Next.
Select Network Mapping. Map to the network created for the DMF Controller and switches. Click Next.
Click Next and click Finish.
Start the VM and complete the steps in Chapter 2, Installing and Configuring the DMF Controller, to set up the Controller.

vMotion support for Virtual Controller

VMware vMotion is supported for Controller pairs. This support is offered only for vCenter 7.0.2.

The following are some additional points to remember when performing vMotion of the virtual Controllers:

vMotion can be performed on each of the Controllers in the HA pair individually at separate times.
Performing vMotion of the standby Controller first is recommended.
Always back up the Controller configuration before performing vMotion.

Installing on Ubuntu KVM

Prerequisites

DMF Controller virtual disk in qcow2 format (.qcow2 file).
Ubuntu host with Virtual Manage Manager installed.
Ubuntu host is connected to the management network via a bridge br0.

VM Installation

To install the VM, complete the following steps:

Copy DMF-Controller-VM.qcow2 to /var/lib/libvirt/images/.
Start Virtual Machine Manager,and choose Create a new virtual machine.
Provide a name for the new VM and click Import existing disk image options. Click Forward to continue.
Set the existing storage path to point to the provided DMF Controller image. Press Forward to continue.
Set the Memory (RAM) and CPU. Allocate at least 4G RAM and 2 CPU instances for the image. Click Forward to continue.
Select the checkbox Customize configuration before install. Expand Advanced options, change to Specify shared device name.
Enter the Bridge name: br0, to bind the Controller virtual machine to the br0 bridge interface created previously. Click Finish to continue.
Under the Processor section, expand Configuration. Select the Copy host CPU configuration option. This may improve performance dramatically, depending on your VM host. Click Apply to save the changes.
Under the Disk 1 section, expand Advanced options and Performance options. Set the options on this page as follows:
- Disk bus: VirtIO
- Storage format: qcow2
- Cache mode: Writeback
- IO mode: default
Click Apply to save the changes.
Under the NIC section, set the Device model to virtio. Click Apply to save the changes.
Select Begin installation to create the virtual machine.
Follow the steps in Installing and Configuring the DMF Controller to set up the Controller.

Creating a USB Boot Image

This appendix provides details on how to create a bootable USB with Switch Light OS.

Creating the USB Boot Drive with MacOS X

Complete the following steps to create a bootable USB drive on MacOS X.

Procedure

Insert the USB drive into a USB port on the Mac computer.
Inserting the USB drive mounts the USB drive, but it must be unmounted to create a bootable disk.
Open a Mac OS terminal window.
Enter the diskutil command to list all the mounted disks, as in the following example:
```
diskutil list
```
Note: You can also use the MacOS Disk Utility GUI application (applications/utilities) to identify the mounted disks and unmount the USB drive.
Identify the /dev/disk<x> label for the inserted USB drive.
Unmount the USB drive (this is different than ejecting) using the following command.
```
diskutil unmountdisk /dev/disk<x>
```
Note: Replace <x> with the unique numeric identifier assigned by the system.
Enter the sudo dd command in the terminal window to make the USB drive bootable.
```
sudo dd if=<path to iso image> of=/dev/rdisk<x> bs=1024m
```
Warning: Using the dd command with the wrong disk name can erase the installed OS or other vital information.

Copy the Service Node appliance ISO image to the USB drive using this command. Using /dev/rdisk makes the copying faster (rdisk stands for a raw disk).

Replace <x> with the drive identifier for the USB drive and replace <path to iso image> with the filename and path to the location where you downloaded the Service Node ISO image.
For example, the following command copies the file dmf-service-node.iso to disk2:
```
sudo dd if= dmf-service-node.iso of=/dev/rdisk2 bs=1024m
```
Copying the image to the USB drive can take up to ten minutes.
To monitor the progress of the write operation, enter the following command in a separate terminal window.
```
$ while sudo killall -INFO dd; do sleep 5; done
```
```
disk util eject
```
Alternatively, select Eject from the File menu.

Creating the USB Boot Image with Linux

Complete the following steps to create a bootable USB drive using Linux.

Procedure

Insert the USB drive into a USB port on the Linux workstation.
Enter the following command in a Linux terminal window to identify the USB drive.
```
disk -lu
```
On Linux, the USB drive is typically /dev/sdb.
Verify that the USB drive is not currently mounted, or unmount it if it is. Use the mount command to list the currently mounted devices.
Use the sudo dd command to make the USB drive bootable by copying the Service Node ISO image.
```
# sudo dd if=<path to iso image> of=/dev/sdb bs=4096
```
Warning: Using the dd command with the wrong disk name can erase the installed OS or other vital information.
Replace <path to iso image> with the filename and path to the location where you downloaded the Service Node ISO image. For example, the following command copies dmf-service-node.iso to the USB drive:
```
# sudo dd if=dmf-service-node.iso of=/dev/sdb bs=4096
```
Copying the image to the USB drive can take up to ten minutes.
Eject the USB drive from your Linux workstation.

Creating a USB Boot Image Using Windows

Several Windows utilities are available for building a USB boot image from an ISO image. The following procedure uses the Rufus bootable image program.

To build a USB boot image using Windows, complete the following steps.

Procedure

Download the Rufus utility from https://rufus.akeo.ie/.
After downloading the utility, double-click the rufus.exe file.

Figure 1. User Account Control
Click Yes to allow the changes required for installation.

Figure 2. Rufus: Create an ISO Image Option
To create a bootable disk, select ISO Image.

Figure 3. Rufus: Select ISO Image
Click the CD-ROM icon.

Figure 4. Open ISO Image File
Select the file to use and click Open.
Click Start to burn the ISO image to USB.

Figure 5. Rufus: Start

If an upgrade to syslinux is required, the system displays the following dialog box.

Figure 6. User Account Control
If this prompt appears, click Yes to continue.
When prompted to use DD mode or ISO mode, choose ISO.
The system displays a warning that the data on the USB drive will be destroyed, and a new image will be installed.

Figure 7. Erasing Data Warning
Click OK to confirm the operation.

Removing an Existing OS from a Switch

This appendix provides information on how to uninstall the OS from a switch.

Reverting from DMF (Switch Light OS) to EOS - 7050X3 and 7260CX3

This appendix details the procedure to revert the Arista 7050X and 7260X Series switch from Switch Light OS to EOS.

Procedure

To revert from SWLOS to EOS, complete the following steps:

Connect to the switch via a serial connection and confirm that a SWLOS is installed. The serial console output of the switch should approximate the following example.
```
Connected to 10.240.130.2.
Escape character is '^]'.
Switch Light OS SWL-OS-DMF-8.2.0(0), 2022-03-25.05:22-fdf3fa6
Site1-F1 login:
```
Log in to the switch via a serial connection and reboot the switch.

Enter Aboot by interrupting the boot process by pressing Control-C.

Watchdog enabled, will fire in 2 mins
CBFS: 'Master Header Locator' located CBFS at [200:ffffc0)
CBFS: Locating 'normal/romstage'
CBFS: Found @ offset 5b3d40 size 7b7c
Aboot 9.0.3-4core-14223577
Press Control-C now to enter Aboot shell
^CWelcome to Aboot.
Aboot#

Once in Aboot, change the directory to /mnt/ﬂash.

Press Control-C now to enter Aboot shell
^CWelcome to Aboot.
Aboot#
Aboot# cd /mnt/flash
Aboot# pwd
/mnt/flash
Aboot#

List the files under the /mnt/ﬂash directory and check the boot-config file for current swi.

Aboot# pwd
/mnt/flash
Aboot# ls
AsuFastPktTransmit.log SsuRestoreLegacy.log lost+found
EOS-4.23.3M.swi aboot-chainloader.swi onl
Fossil boot-config persist
SWL-INSTALLER.swi debug schedule
SsuRestore.log fastpkttx.backup startup-config
Aboot#
Aboot# cat boot-config
SWI=flash:aboot-chainloader.swi
Aboot#

Edit the boot-config file to point to the existing EOS swi under the /mnt/ﬂash directory.

Aboot# vi boot-config
Aboot#
SWI=flash:aboot-chainloader.swi
~
SWI=flash:/EOS-4.23.3M.swi
~
Aboot# cat boot-config
SWI=flash:/EOS-4.23.0F.swi

Reboot the switch. The switch should boot up with the EOS image.

Aboot# reboot
Requesting system reboot
Press Control-C now to enter Aboot shell
Booting flash:/EOS-4.23.0F.swi
[ 13.231125] Starting new kernel
starting version 219
Failed to apply ACL on /dev/kvm: Operation not supported
Welcome to Arista Networks EOS 4.23.0F

Reverting from DMF (EOS) to EOS - 7280R

This appendix details the procedure to revert the Arista 7280R Series switch from DMF EOS to UCN EOS.

Procedure

To revert from DMF (EOS) to UCN EOS, complete the following steps:

Connected to 10.240.130.2.
Escape character is '^]'.
SAND-3 login: admin
Output to this terminal is being recorded for diagnostic purposes.
Note that only output that is visible on the console is recorded.
SAND-3>en
SAND-3#reload
! Signing certificate used to sign SWI is not signed by root certificate.
Proceed with reload? [confirm]

Enter Aboot and interrupt the boot process by pressing Control-C.

agesawrapper_amdinitearly() returned AGESA_SUCCESS
Watchdog enabled, will fire in 2 mins
CBFS: 'Master Header Locator' located CBFS at [200:ffffc0)
CBFS: Locating 'normal/romstage'
CBFS: Found @ offset 5b3d40 size 7b7c
Aboot 9.0.3-4core-14223577
Press Control-C now to enter Aboot shell
^CWelcome to Aboot.
Aboot#

Once in Aboot, change the directory to /mnt/ﬂash.

Press Control-C now to enter Aboot shell
^CWelcome to Aboot.
Aboot#
Aboot# cd /mnt/flash
Aboot# pwd
/mnt/flash
Aboot#

List the files under the /mnt/ﬂash directory and check the boot-config file for current swi.

Aboot# pwd
/mnt/flash
Aboot# ls
AsuFastPktTransmit.log
EOS-4.25.2F.swi
EOS-4.27.2F-26021868.uppsaladmfrel-x86_64.swi
EOS-4.27.2F-26021868.uppsaladmfrel-x86_64.swi.tmp
Fossil
SsuRestore.log
SsuRestoreLegacy.log
boot-config
debug
fastpkttx.backup
x86_64
lost+found
persist
schedule
startup-config
zerotouch-config
ztn-boot-info
Aboot#
Aboot# cat boot-config
SWI=flash:/EOS-4.27.2F-26021868.uppsaladmfrel-x86_64.swi
Aboot#

Remove the startup-config file and edit the boot-config file to point to the existing EOS swi under the /mnt/ﬂash directory.

Aboot# rm -rf startup-config
Aboot# vi boot-config
Aboot#
SWI=flash:aboot-chainloader.swi
~
SWI=flash:/EOS-4.25.2F.swi
~
Aboot# cat boot-config
SWI=flash:/EOS-4.25.2F.swi

Reboot the switch. The switch should boot up with the EOS image.

Aboot# reboot
Requesting system reboot
Press Control-C now to enter Aboot shell
Booting flash:/EOS-4.25.2F.swi
[ 13.231125] Starting new kernel
starting version 219
Failed to apply ACL on /dev/kvm: Operation not supported
Welcome to Arista Networks EOS 4.25.2F

Removing the existing OS from a Switch

Some switch platforms may have a preexisting operating system (OS) installed. When installing the Switch Light OS on top of an existing OS, there is a chance of failure. To avoid this issue, first uninstall any existing OS on the switch.

For example, to use a Dell switch with Force 10 OS (FTOS) pre-installed, remove FTOS before installing Switch Light OS. If FTOS is not first deleted, Switch Light installation may fail.

When you boot the switch, if only ONIE options are listed in the switch GNU GRUB boot menu, the switch does not have an existing OS installed. The following example shows the prompts that indicate no OS is installed on the switch.

Example 4: ONIE Prompts for a switch without an OS installed.

GNU GRUB version 2.02~beta2+e4a1fe391
+----------------------------------------------------------------------------+
|*ONIE: Install OS |
| ONIE: Rescue |
| ONIE: Uninstall OS |
| ONIE: Update ONIE |
| ONIE: Embed ONIE |
| |
+----------------------------------------------------------------------------+

If the switch prompt looks something like this, skip this section and proceed directly to the following section to install Switch Light OS.

Procedure

To delete FTOS from a Dell switch, complete the following steps:

Confirm an OS is installed on the switch.

Another OS exists if other options besides ONIE appear in the boot menu. The following example shows the options provided by FTOS installed on a Dell switch.

GNU GRUB version 2.02~beta2+e4a1fe391
+----------------------------------------------------------------------------+
|*FTOS |
| FTOS-Boot Line Interface |
| DIAG-OS |
| ONIE |
| |
+----------------------------------------------------------------------------+

After the switch has booted and the prompt for FTOS is displayed, change to enable mode.

DellEMC>enable
The SupportAssist EULA acceptance option has not been selected. SupportAssist can be
enabled once the SupportAssist EULA has been accepted. Use the: 'support-assist activate
' command to accept EULA and enable SupportAssist.
DellEMC#Feb 13 22:36:44 %STKUNIT1-M:CP %SEC-4-ENABLE_PASSW_NOT_CONFIGURED: Enable password
is required for authentication but
not configured (by default from console)
Feb 13 22:36:44 %STKUNIT1-M:CP %SEC-5-AUTHENTICATION_ENABLE_SUCCESS: Enable authentication
success on console
DellEMC#

Reload the switch, do not save the configuration, and confirm the operation when prompted.

DellEMC# reload
System configuration has been modified. Save? [yes/no]: no
Proceed with reload [confirm yes/no]: yes

The following messages are displayed.

Feb 13 22:37:17 %STKUNIT1-M:CP %CHMGR-5-RELOAD: User request to reload the chassis syncing
disks... done
unmounting file systems...

To uninstall the OS, choose ONIE from the GNU GRUB boot menu, as shown in the following example.

GNU GRUB version 2.02~beta2+e4a1fe391
+----------------------------------------------------------------------------+
|*FTOS |
| FTOS-Boot Line Interface |
| DIAG-OS |
|*ONIE |
| |
+----------------------------------------------------------------------------+

The ONIE submenu is displayed, as shown in the following example.

GNU GRUB version 2.02~beta2+e4a1fe391
+----------------------------------------------------------------------------+
| ONIE: Install OS |
| ONIE: Rescue |
|*ONIE: Uninstall OS |
| ONIE: Update ONIE |
| ONIE: Embed ONIE |
| EDA-DIAG |
| |
+----------------------------------------------------------------------------+

From the ONIE submenu, choose ONIE Uninstall OS.
The uninstall process can take up to 15 minutes. After completion, the switch will automatically reboot.
The OS Uninstall log is displayed, starting with information about the existing OS, as shown in the following example.
```
ONIE: OS Uninstall Mode ...
Version : 3.27.1.1
Build Date: 2016-09-07T16:44-0700
Info: Mounting kernel filesystems... done.
Info: Mounting ONIE-BOOT on /mnt/onie-boot ...
<SNIP>
```
The following messages are displayed when the process is complete and the switch reboots.
```
Requesting system reboot
sd 4:0:0:0: [sda] Synchronizing SCSI cache
reboot: Restarting system
reboot: machine restart
```

Note: FTOS was successfully uninstalled if only the ONIE options are shown after the switch reboots, as in Example 1 at the beginning of this section. When you see only ONIE options, proceed to the next section to install Switch Light OS.

Switch ONIE Upgrade Procedure

For detailed instructions on updating ONIE (Open Network Install Environment) on the Dell switches, please refer to the Firmware Upgrade for Dell Switches document at https://www.arista.com/assets/data/pdf/Dell-Switches-Firmware-Upgrade-Manual.pdf.

ONIE is a small operating system typically pre-installed as firmware on bare metal network switches. It provides an environment for automated provisioning.

Switch CPLD Upgrade Procedure

For detailed instructions on upgrading the CPLD on the Dell switches, please refer to the Firmware Upgrade for Dell Switches document at https://www.arista.com/assets/data/pdf/Dell-Switches-Firmware-Upgrade-Manual.pdf.

Page 1 of 3

End of Support

References

Related Documents

Prometheus Endpoint Support for Infrastructure Metrics

Deployment

Configuration

Permission

Access Token

Prometheus Service

Limitations

Notes

Troubleshooting

Resources

Appendix

DMF Controller in Microsoft Azure

Configuration

Examples

Limitations

Resources

Syslog Messages

Troubleshooting

Reforming Controller HA Cluster

Controller Cluster Recovery

Erasing DMF Appliance

Using the Dell Lifecycle Controller

Installing a Controller VM

General Requirements

Installing on VMware ESXi/vSphere

Prerequisites

VM Installation

vMotion support for Virtual Controller

Installing on Ubuntu KVM

Prerequisites

VM Installation

Creating a USB Boot Image

Creating the USB Boot Drive with MacOS X

Creating the USB Boot Image with Linux

Creating a USB Boot Image Using Windows

Removing an Existing OS from a Switch

Reverting from DMF (Switch Light OS) to EOS - 7050X3 and 7260CX3

Reverting from DMF (EOS) to EOS - 7280R

Removing the existing OS from a Switch

Switch ONIE Upgrade Procedure

Switch CPLD Upgrade Procedure