Deploying CVP OVA on ESX
Deploying the CVP OVA file should be the first step in any setup. After the CVP OVA file is deployed, you can choose between the two configuration methods for CloudVision Portal (CVP).
Pre-requisites:
Use of the Deploy OVF Template requires the VMware Client Integration plugin, which is not supported by the Chrome browser after versions 42.
VMware vMotion and Snapshot Support
- Hadoop - open source framework from Apache that is used to store and process large datasets distributed across a cluster of servers
- Hbase - open source database from Apache that runs on Hadoop cluster
- Zookeeper - centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services
The CloudVision database (hbase/hadoop) is deployed across the three nodes within the CloudVision cluster. The integrity of this database is critical to the correct functioning of CloudVision, and thus, there are specific requirements on the hypervisor and storage for these virtual machines forming these nodes.
VMware Snapshots
Within the CloudVision infrastructure, data is constantly being written to Apache hadoop by all nodes. Disk snapshots used by VMware have no hooks into the hbase quiesce states, meaning a snapshot of a disk state would almost always be inconsistent and lead to database corruption during a restore process. This results in a snapshot having no meaningful use as a restore point due to the nature of the database, which is typical for database application performance using VMware Snapshots (VMware reference).
VMware Snapshots are very I/O intensive, leaving almost no I/O for the virtual machines during the snapshot process. Impact on resources, such as disk, can lead to hbase and zookeeper failures. These symptoms are evident in multiple cases where the support team has identified snapshots that were in progress before failures.
VMware does not recommend using VM Snapshots as backups (https://kb.vmware.com/s/article/1025279), therefore other backup mechanisms are recommended by Arista as outlined below.
VMware vMotion
- The virtual machine disks are shared between the source and target ESXi host
- Latency between ESXi hosts is less than 5ms
- Only one CloudVision node may be vMotioned at a time
Backup Solutions for CloudVision
Daily backups of the CloudVision provisioning data are automatically scheduled to be taken at 2AM UTC. This backup file is stored locally on the CloudVision cluster. Common practice by customers is to schedule a copy of this backup file from the CloudVision cluster to some external data store.
There is an example script to help automate the copying of the backup file available on the Arista Github site (link).
CloudVision telemetry data received from switches is replicated between the CloudVision clusters. In the event a single node becomes unavailable and a new node is added to the cluster, this telemetry data is replicated to the new node.
Arista EOS with the Streaming Telemetry agent (TerminAttr v1.7.1 and later) supports establishing connections to multiple CloudVision clusters. This enables the user to send the telemetry data to a backup CloudVision instance, to maintain an up-to-date redundant store.
There is a detailed explanation of this deployment model available on the Arista EOS Central site (https://arista.my.site.com/AristaCommunity/s/article/cvp-ha-deployment-guide), which would assist with the design and deployment of this HA solution.