BGP PIC Edge for EVPN vxlan Routes for Remote VTEP Failures

When a remote VTEP goes down, the IGP and BGP must recompute a new best path traffic destined to affected BGP prefixes originally reachable by the problematic VTEP. Currently, the BGP PIC is restricted to locally identifiable failures such as link failures.

To overcome such VTEP failure issues, support for EVPN-learned VTEPs improve convergence times in these scenarios by tying the liveness detection provided by the BFD sessions into existing BGP PIC support for software fast-failover. Upon detecting that a BFD session to a remote VTEP has gone down, the hardware forwarding agents will update the affected adjacencies before the corresponding underlay route has been removed from the FIB which can improve convergence times.

Figure 1. BGP PIC Edge for vxlan

The diagram above outlines a scenario in which CE-2 is sending traffic bound for 10.10.10.0/24 via PE-12 to PE-11 to CE-1. PE-11 goes down, however we have BFD sessions from PE-21 to the remote VTEPs of PE-11 and PE-12 and, therefore detect that it goes down and quickly update the forwarding to send the traffic along the pre-computed backup path via PE-12 to CE-1.

Configuring BGP PIC Edge for EVPN vxlan Routes for Remote VTEP Failures

Use the bfd vtep evpn command to configure the BGP PIC Edge for EVPN vxlan routes for remote VTEPs. This command is configured under the vxlan Tunnel Interface (VTI).
switch# config
switch(config)# interface vxlan1
switch(config-if-Vx1)# bfd vtep evpn interval <interval> min-rx <min-rx> multiplier <multiplier>

This configuration uses the specified timer values to initiate BFD sessions for all VTEPs learned through EVPN vxlan for this VTI.

  • interval – Transmit rate in milliseconds
  • min-rx – Expected minimum incoming rate in milliseconds
  • multiplier – BFD multiplier

Example

switch(config-if-Vx1)# bfd vtep evpn interval 100 min-rx 100 multiplier 3

In this example (assuming symmetric configuration on other PE devices), any BFD for vxlan session initiated on the VTI would have a detect time of 300ms (interval of 100ms multiplied by 3).

To utilize these BFD sessions, traffic must have an alternate path tin the event that the session goes down. This would include other paths in an ECMP group or a backup path.

As mentioned, by default the above configuration will initiate BFD sessions for all VTEPs learned through EVPN vxlan for the VTI.
switch# config
switch(config)# interface vxlan1
switch(config-if-Vx1)# bfd vtep evpn prefix-list <PREFIX-LIST>

This command uses a supplied prefix list to filter and select the candidate VTEPs. By default, an empty prefix list will act as a deny-all and not initiate BFD sessions with any learned VTEPs.

Show Commands

The show interface <VTI> command is used to view if BFD is enabled on the VTI and to see the timers used for any of the BFD sessions or any prefix-list configured for filtering BFD sessions.
switch# show interface vxlan1
vxlan1 is up, line protocol is up (connected)
  Hardware is vxlan
  Source interface is Loopback0 and is active with 10.1.1.1
  Replication/Flood Mode is headend with Flood List Source: CLI
  Remote MAC learning is disabled
  VNI mapping to VLANs
  Static VLAN to VNI mapping is
  Dynamic VLAN to VNI mapping for 'evpn' is
    [4092, 30000]     [4093, 20000]
  Dynamic VLAN to VNI mapping for 'vccbfd' is
    [4091, 0]
  Note: All Dynamic VLANs used by VCS are internal VLANs.
        Use 'show vxlan vni' for details.
  Static VRF to VNI mapping is
   [vrf0, 20000]
  MLAG Shared Router MAC is 0000.0000.0000
  BFD is enabled with transmit interval 50, receive interval 50, multiplier 3, VTEP prefix list pl-example

The existing show bfd peers command is used to view the state of the BFD for vxlan sessions.
switch# show bfd peers
VRF name: default
-----------------
DstAddr    MyDisc       YourDisc    Interface/Transport    Type        LastUp           LastDown       LastDiag       State
-------- ----------  ------------- ---------------------- ---------  --------------  ------------    -------------- -----------
10.1.1.2   1965370229   3607849318                  NA       vxlan    01/12/21 10:45          NA       No Diagnostic       Up
10.1.1.3   1355343148   2407539267                  NA       vxlan    01/12/21 10:45          NA       No Diagnostic       Up

The show ip route <VRF> command can be used to determine which prefixes are eligible for fast-failover.
switch# show ip route vrf example-vrf

VRF: example-vrf
Codes: C - connected, S - static, K - kernel,
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
       R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
       O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
       NG - Nexthop Group Static Route, V - vxlan Control Service,
       DH - DHCP client installed default route, M - Martian,
       DP - Dynamic Policy Route, L - VRF Leaked,
       RC - Route Cache Route

Gateway of last resort is not set

 C        20.0.2.0/24 is directly connected, Ethernet14/1
 B E      99.99.0.0/24 [200/0] via VTEP 10.1.1.2 VNI 30000 router-mac fc:bd:67:3d:21:fd
                               via VTEP 10.1.1.3 VNI 30000 router-mac ba:ed:43:3f:ca:8e backup

In the above example there is a prefix with the primary path using a vxlan tunnel to VTEP 10.1.1.2 and has a backup vxlan tunnel to VTEP 10.1.1.3. Both paths are monitored via BFD.

In other use cases the prefix may have multiple paths with ECMP in which one or multiple of the paths are vxlan tunnels to remote VTEPs monitored by these BFD for vxlan sessions.
switch# show ip route vrf example-vrf

VRF: example-vrf
Codes: C - connected, S - static, K - kernel,
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
       R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
       O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
       NG - Nexthop Group Static Route, V - vxlan Control Service,
       DH - DHCP client installed default route, M - Martian,
       DP - Dynamic Policy Route, L - VRF Leaked,
       RC - Route Cache Route

Gateway of last resort is not set

 C        20.0.2.0/24 is directly connected, Ethernet14/1
 B E      99.99.0.0/24 [200/0] via VTEP 10.1.1.2 VNI 30000 router-mac fc:bd:67:3d:24:fe
                               via VTEP 10.1.1.3 VNI 30000 router-mac ba:ed:43:3f:ca:8e

MLAG

This feature applies only to the scenario when remote prefix is known via two different MLAG VTEP pairs.

Prior to 4.26.0F this feature is only supported on the primary switch of an MLAG pair due to the use of Shared VTEP IP within MLAG pair as vxlan tunnel source/destination. If BFD for vxlan packets are received on the secondary MLAG switch, they will be forwarded to the primary MLAG switch for processing. Because only the primary MLAG switch will have BFD state for remote VTEPs, if a BFD session to a remote VTEP goes down only the primary MLAG switch will perform the fast-failover, while the secondary MLAG switch will retain current behavior. Therefore, it is not recommended to use this feature in conjunction with MLAG.

As of 4.26.0F, the primary MLAG switch will sync its BFD for vxlan state to the secondary MLAG switch to allow the secondary to failover to an alternate path as well. To view the synced state, a new show command has been added.

switch# show bfd peers protocol vxlan mlag primary 
Remote VTEPS for vxlan1 on MLAG primary:
VTEP         BFD Status 
----------  ---------- 
10.1.1.2       up
10.1.1.3       up

However, because this state is synced across devices, the secondary MLAG switch will not be as performant in reacting to the BFD state transitions as the primary MLAG switch, which is natively responding to the BFD session.

Another exception is the multi-VTEP MLAG feature, which allows BFD for vxlan to run on the secondary MLAG switch. When running with multi-VTEP MLAG both the primary and secondary switches will run independent BFD sessions to remote VTEPs and react to BFD state transitions separately. Each switch will use the local VTEP IP of the VTI as the source IP address for the BFD sessions, which must differ from the MLAG VTEP IP.

In summary, it is only recommended to use MLAG with this feature if configured with the multi-VTEP IP feature referenced above.

Troubleshooting

  • Ensure that BFD configuration is present on the relevant VTI and that the VTI status shows BFD as enabled using the mentioned show interface <VTI> command.
  • As mentioned in the prior section, BFD state transitions are Syslogged and will display if a BFD session to a remote VTEP goes down.
  • Upon fast-failover to a separate path, show ip route will still display FIB state that may display the original path. To view the post failover prefix state, show ip hardware ale vrf <VRF> <prefix> can be used instead.

Limitations

  • Support is limited to EVPN vxlan.
  • IPv6 vxlan underlay with this feature is not supported.