EVPN VXLAN Single-Gateway Centralized Routing

In a traditional EVPN VXLAN centralized anycast gateway deployment, multiple L3 VTEPs serve the role of the centralized anycast gateway. For the hosts to have a consistant ARP binding for any of the individual centralized gateway VTEPs, each VTEP operating as a centralized gateway is configured with a virtual router MAC (VARP MAC).A virtual VTEP IP (VARP VTEP IP) is shared between all of the L3 VTEPs operating as centralized gateways. Each centralized gateway VTEP also advertises an EVPN type-3 route for both its primary VTEP IP and VARP VTEP IP, so both IPs end up in the overlay floodset.

The traditional configuration works fine, but in the specific case of a network with only a single L3 VTEP centralized gateway (or single MLAG pair operating as the L3 VTEP centralized gateway), this leads to unnecessary BUM traffic. When both the physical VTEP IP and the VARP IP end up in the overlay floodset, BUM traffic is duplicated to the centralized gateway, which can overhead workloads that have a lot of broadcast or multicast traffic. There is only a single centralized gateway, so it is not necessary to have a VARP VTEP IP to provide a stable ARP binding for the gateway.

The EVPN VXLAN Single-Gateway Centralized Routing feature enables a single L3 VTEP (or single MLAG pair) operating as an anycast gateway to not configure a VARP VTEP IP, thereby eliminating the duplicate BUM traffic caused by the VARP VTEP IP being in the overlay floodset. It accomplishes this with two changes:
  • A change to the default ARP behavior when a host ARPs for the MAC address associated with a virtual IP address (SVI IP).

    Previously, ARPing for a virtual IP returned different MAC address bindings, depending on whether or not a VARP VTEP IP was configured. If a VARP VTEP IP was configured, the ARP request returns the configured VARP MAC. If one was not, the ARP request returns the switch router MAC. This feature changes the behavior to always respond with the configured VARP MAC to an ARP request for a virtual IP. This closes an exception to the rule that virtual IPs are always associated with the VARP MAC.

  • A new EVPN MAC VRF configuration command that generates an EVPN type-2 route for the VARP MAC with a nexthop of the physical VTEP IP.

In a traditional EVPN centralized anycast gateway development, the presence of a configured VARP VTEP IP advertises an EVPN type-2 route for the VARP MAC with a nexthop of the VARP VTEP IP. This allows TOR switches to learn the ARP binding of the centralized anycast gateway (presumably, their default gateway). With this feature, no VARP VTEP IP is configured, so an alternative method is required to advertise the appropriate EVPN route. This feature adds a new EVPN MAC VRF configuration command, redistribute router-mac next-hop vtep primary ,which when configured on a MAC VRF advertises an EVPN type-2 route for the VARP MAC with a nexthop of the primary VTEP IP. This allows TOR switches to learn the ARP binding of the centralized anycast gateway.

Configuration

On the multi-homing PEs, you must configure the Ethernet Segment (ES) to the CE. In addition, the configuration needed for asymmetric IRB or symmetric IRB must be configured on the local and remote PEs.
Figure 1. Asymmetric IRB with ipv4


In the example, CE1 is a multi-homed CE in VLAN20. CE2 is a remote CE in VLAN30. Asymmetric IRB is configured for inter-VLAN traffic.

Configuration on PE1:
switch(config)# interface Port-Channel100
switch(config-if-Po100)# switchport access vlan 20
! 
switch(config-if-Po100)# evpn ethernet-segment
switch(config-evpn-es)# identifier 0033:3333:3333:3333:3333
switch(config-evpn-es)# route-target import 00:03:00:03:00:03
switch(config-evpn-es)# lacp system-id 1234.5678.0123
!
switch(config)# interface Ethernet1
switch(config-if-Et1)# switchport mode trunk
switch(config-if-Et1)# channel-group 100 mode on
!
switch(config)# interface Loopback0
switch(config-if-Lo0)# ip address 10.255.0.0/32
!
switch(config)# interface Vlan20
switch(config-if-Vl20)# ip address virtual 20.0.20.1/24
!
switch(config)# interface Vlan30
switch(config-if-Vl30)# ip address virtual 20.0.30.1/24
!
switch(config)# interface VXLAN1
switch(config=if-Vx1)# VXLAN source-interface Loopback0
switch(config=if-Vx1)# VXLAN udp-port 4789
switch(config=if-Vx1)# VXLAN vlan 20 vni 10020
switch(config=if-Vx1)# VXLAN vlan 30 vni 10030
!
switch(config)# ip virtual-router mac-address 00:00:80:00:00:00
!
switch(config)# router bgp 300
switch(config-router-bgp)# router-id 0.0.0.1
switch(config-router-bgp)# neighbor 10.0.0.1 remote-as 303
switch(config-router-bgp)# neighbor 10.0.0.1 ebgp-multihop
switch(config-router-bgp)# neighbor 10.0.0.1 send-community extended
switch(config-router-bgp)# neighbor 10.0.0.1 maximum-routes 12000
switch(config-router-bgp)# redistribute static
   !
switch(config-router-bgp)# vlan 20
switch(config-macvrf-20)# rd 10.255.0.0:20
switch(config-macvrf-20)# route-target both 64500:10020
switch(config-macvrf-20)# redistribute learned
   !
switch(config-router-bgp)# vlan 30
switch(config-macvrf-30)# rd 10.255.0.0:30
switch(config-macvrf-30)# route-target both 64500:10030
switch(config-macvrf-30)# redistribute learned
   !
switch(config-macvrf-30)# address-family evpn
switch(config-router-bgp-af)# neighbor 10.0.0.1 activate

The Ethernet segment to the multi-homed CE is configured on the port channel interface Port-Channel 100. SVI 20 and SVI 30 along with VARP IP are configured for inter-subnet routing. A VARP MAC is configured globally on PE1. The configuration on PE2 is similar to the configuration shown above. On PE3, SVI 20 and SVI 30 are configured along with VARP IP and VARP MAC.

Symmetric IRB with ipv4 example:

Configuration on PE1:
switch(config)# vrf instance red
switch(config-vrf-red)# rd 10.255.0.0:0
!
switch(config-vrf-red)# interface Port-Channel100
switch(config-if-Po100)# switchport access vlan 20
!
switch(config-if-Po100)# evpn ethernet-segment
switch(config-evpn-es)# identifier 0033:3333:3333:3333:3333
switch(config-evpn-es)# route-target import 00:03:00:03:00:03
switch(config-evpn-es)# lacp system-id 1234.5678.0123
!
switch(config-if-Po100)# interface Ethernet6/6/1
switch(config-if-Et6/6/1)# switchport mode trunk
switch(config-if-Et6/6/1)# channel-group 100 mode on
!
switch(config-if-Et6/6/1)# interface Loopback0
switch(config-if-Lo0)# ip address 10.255.0.0/32
!
switch(config-if-Lo0)# interface Vlan20
switch(config-if-Vl20)# vrf red
switch(config-if-Vl20)# ip address virtual 20.0.20.1/24
!
switch(config-if-Vl20)# interface VXLAN1
switch(config-if-Vx1)# VXLAN source-interface Loopback0
switch(config-if-Vx1)# VXLAN udp-port 4789   
switch(config-if-Vx1)# VXLAN vlan 10 vni 10010  
switch(config-if-Vx1)# VXLAN vlan 20 vni 10020
switch(config-if-Vx1)# VXLAN vrf red vni 20000
!
switch(config-if-Vx1)# ip virtual-router mac-address 00:00:80:00:00:00
!
switch(config)# router bgp 300
switch(config-router-bgp)# router-id 0.0.0.1
switch(config-router-bgp)# maximum-paths 2
switch(config-router-bgp)# neighbor 10.0.0.1 remote-as 303
switch(config-router-bgp)# neighbor 10.0.0.1 ebgp-multihop
switch(config-router-bgp)# neighbor 10.0.0.1 send-community extended
switch(config-router-bgp)# neighbor 10.0.0.1 maximum-routes 12000
switch(config-router-bgp)# redistribute static
!
switch(config-router-bgp)# vlan 20     
switch(config-macvrf-20)# rd 10.255.0.0:20
switch(config-macvrf-20)# route-target both 64500:10020
switch(config-macvrf-20)# redistribute learned
!
switch(config-macvrf-20)# address-family evpn
switch(config-router-bgp-af)# neighbor 10.0.0.1 activate
!
switch(config-router-bgp-af)# vrf red
switch(config-router-bgp-vrf-red)# rd 10.255.0.0:0
switch(config-router-bgp-vrf-red)# route-target import evpn 64500:20000
switch(config-router-bgp-vrf-red)# route-target export evpn 64500:20000
switch(config-router-bgp-vrf-red)# router-id 10.255.0.0
!

The Ethernet segment to the multi-homed CE1 is configured on the port channel interface Port-Channel 100. SVI 20 along with VARP IP and VARP MAC is configured. Also, IP VRF is configured which is needed for symmetric IRB. The configuration on PE2 is similar. On PE3, IP VRF and SVI 30 are configured for symmetric IRB.

VXLAN example

A netwok with two VRFs, red and blue has VLANs 10 and 20 in red and VLANs 30 and 40 in blue. The spines act as a route reflectors. Multicast groups are used to encapsulate traffic arriving in a VRF such that it is delivered to VTEPs that have that VRF provisioned.

interface Loopback0
   ip address 10.0.0.20/32
!
vlan 10
vlan 20
vlan 30
vlan 40
!
interface Ethernet1
   switchport access vlan 10
!
interface Ethernet2
   switchport access vlan 20
!
interface Ethernet3
   switchport access vlan 30
!
interface Ethernet4
   switchport access vlan 40
!
interface Vlan10
   vrf red 
   ip address virtual 192.168.1.0/24
   ip igmp
   pim ipv4 local-interface loopback0
!
interface Vlan20
   vrf red 
   ip address 192.168.2.0/24
   ip igmp
!
interface Vlan30
   vrf blue
   ip address 192.168.1.0/24
   ip igmp
!
interface Vlan40
   vrf blue
   ip address 192.168.2.0/24
   ip igmp
!
interface VXLAN1
   VXLAN source-interface Loopback0
   VXLAN vlan 10 vni 10
   VXLAN vlan 20 vni 20
   VXLAN vlan 30 vni 30
   VXLAN vlan 40 vni 40
   VXLAN vrf red vni 100
   VXLAN vrf blue vni 200
   VXLAN vlan 10 flood group 225.1.1.2
   VXLAN vlan 20 flood group 225.1.1.3
   VXLAN vlan 30 flood group 226.1.1.2
   VXLAN vlan 40 flood group 226.1.1.3
   VXLAN vrf red multicast group 225.1.1.1
   VXLAN vrf blue multicast group 226.1.1.1
!

Show commands

The following examples are based on the sample topology and configuration in the previous sections.

On the remote VTEP, to display the EVPN routes to the multi-homed CE (20.0.20.2):
switch# show bgp evpn route-type mac-ip 20.0.20.2
BGP routing table information for VRF default
Router identifier 0.0.3.1, local AS number 302
Route status codes: s - suppressed, * - valid, > - active, # - not installed, E - ECMP head, 
                    e - ECMP S - Stale, c - Contributing to ECMP, b - backup
                    % - Pending BGP convergence
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  LocPref Weight  Path
 * >     RD: 10.255.0.0:20 mac-ip 0000.0101.0000 20.0.20.2
                                10.255.0.0            -       100     0       303 300 i
 * >     RD: 10.255.0.1:20 mac-ip 0000.0101.0000 20.0.20.2
                                10.255.0.1            -       100     0       303 301 i

As shown above, there are two EVPN MAC-IP routes for the multi-homed CE.

On the remote PE, to display the installed routes to the multi-homed CE:
switch# show ip route vrf red 20.0.20.2/32

VRF: red
Codes: C - connected, S - static, K - kernel,
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
       R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
       O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
       NG - Nexthop Group Static Route, V - VXLAN Control Service,
       DH - DHCP client installed default route, M - Martian,
       DP - Dynamic Policy Route, L - VRF Leaked

 B E      20.0.20.2/32 [200/0] via VTEP 10.255.0.0 VNI 20000 router-mac 00:00:78:01:00:00
                               via VTEP 10.255.0.1 VNI 20000 router-mac 00:00:78:04:00:00

Two routes to the multi-homed CE are installed from the two EVPN MAC-IP routes and they form L3 ECMP.

On the remote PE, to check the details of the two routes in BGP RIB:
switch# show ip bgp 20.0.20.2/32 vrf red
BGP routing table information for VRF red
Router identifier 10.255.0.2, local AS number 302
BGP routing table entry for 20.0.20.2/32
 Paths: 2 available
  303 300
    10.255.0.0 from 10.0.2.1 (0.0.1.1), imported EVPN route, RD 10.255.0.0:20
      Origin IGP, metric 0, localpref 100, IGP metric 0, weight 0, received 00:14:43 ago, 
      valid, external, ECMP head, ECMP, best, ECMP contributor
      Extended Community: Route-Target-AS:64500:10020 Route-Target-AS:64500:20000 
      TunnelEncap:tunnelTypeVXLAN EvpnMacMobility:1 EvpnRouterMac:00:00:78:01:00:00
      Remote VNI: 20000
      Rx SAFI: Unicast
  303 301
    10.255.0.1 from 10.0.2.1 (0.0.1.1), imported EVPN route, RD 10.255.0.1:20
      Origin IGP, metric 0, localpref 100, IGP metric 0, weight 0, received 00:14:43 ago, 
      valid, external, ECMP, ECMP contributor
      Extended Community: Route-Target-AS:64500:10020 Route-Target-AS:64500:20000 
      TunnelEncap:tunnelTypeVXLAN EvpnMacMobility:1 EvpnRouterMac:00:00:78:04:00:00
      EvpnNdFlags:pflag
      Remote VNI: 20000
      Rx SAFI: Unicast

The second route has EvpnNdFlags:pflag to indicate that this is a proxy MAC-IP route.

This command shows information about the SBD instance that is created when evpn multicast is configured under an IP VRF:
switch# show bgp evpn instance sbd red
EVPN instance: SBD red
  Route distinguisher: 100:1
  Service interface: VLAN-based
  Local IP address: 10.0.0.20
  Encapsulation type: VXLAN
vtep2#show bgp evpn instance sbd blue
EVPN instance: SBD red
  Route distinguisher: 200:1
  Service interface: VLAN-based
  Local IP address: 10.0.0.20
  Encapsulation type: VXLAN

The OISM supported capability is set in IMET routes for the SBD when evpn multicast is configured for a VRF. The IGMP proxy flag is not set in IMET routes for VLANs in the VRF unlessredistribute igmp is configured in a VLAN:
switch# show bgp evpn route-type imet next-hop 10.0.0.10 detail
BGP routing table information for VRF default
Router identifier 0.0.0.1, local AS number 300
BGP routing table entry for imet 10.0.0.10, Route Distinguisher: 10:1
 Paths: 1 available
  Local
    10.0.0.10 from 10.0.0.1 (0.0.1.1)
      Origin IGP, metric -, localpref 100, weight 0, valid, internal, best
      Extended Community: Route-Target-AS:10:1 TunnelEncap:tunnelTypeVXLAN
      VNI: 10
      PMSI Tunnel: PIM-SSM Tree, MPLS Label: 10, Leaf Information Required: false, 
      Tunnel ID: 10.0.0.10, 225.1.1.2
BGP routing table information for VRF default
Router identifier 0.0.0.1, local AS number 300
BGP routing table entry for imet 10.0.0.10, Route Distinguisher: 100:1
 Paths: 1 available
  Local
    10.0.0.10 from 10.0.0.1 (0.0.1.1)
      Origin IGP, metric -, localpref 100, weight 0, valid, internal, best
      Extended Community: Route-Target-AS:100:1 TunnelEncap:tunnelTypeVXLAN Multicast 
      Flags: IGMP proxy, OISM-supported, SBD
      VNI: 100
      PMSI Tunnel: Ingress Replication, MPLS Label: 100, Leaf Information Required: false, 
      Tunnel ID: 10.0.0.10

This command shows information about the single SMET route for the group with the RD of VRF red:
switch# show bgp evpn route-type smet multicast 228.1.1.1
BGP routing table information for VRF default
Router identifier 0.0.1.1, local AS number 300
Route status codes: s - suppressed, * - valid, > - active, E - ECMP head, e - ECMP
                    S - Stale, c - Contributing to ECMP, b - backup
                    % - Pending BGP convergence
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop
          Network                Next Hop              Metric  LocPref Weight  Path
 * >     RD: 100:1 smet (S, G): (*, 228.1.1.1) originating IP: 10.0.0.10
                                 10.0.0.10             -       100     0       i
 * >     RD: 100:1 smet (S, G): (*, 228.1.1.1) originating IP: 10.0.0.20
                                 -                     -        -      0       i
 * >     RD: 100:1 smet (S, G): (*, 228.1.1.1) originating IP: 10.0.0.30
                                 10.0.0.30             -       100     0       i
 * >     RD: 200:1 smet (S, G): (*, 228.1.1.1) originating IP: 10.0.0.20
                                 -                     -        -      0       i
 * >     RD: 200:1 smet (S, G): (*, 228.1.1.1) originating IP: 10.0.0.40
                                 10.0.0.40             -       100     0       i

This command shows information about the SPMSI route for the group:
switch# show bgp evpn route-type spmsi
BGP routing table information for VRF defaultRouter identifier 0.0.0.1, local AS number 300
Route status codes: s - suppressed, * - valid, > - active, E - ECMP head, e - ECMP
                    S - Stale, c - Contributing to ECMP, b - backup
                    % - Pending BGP convergence
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop
          Network                Next Hop              Metric  LocPref Weight  Path
 * >     RD:  10:1 spmsi (S, G): (*, *) originating IP: 10.0.0.10
                                 10.0.0.10             -       100     0       i
 * >     RD:  10:1 spmsi (S, G): (*, *) originating IP: 10.0.0.20
                                 -                     -       -       0       i
 * >     RD:  20:1 spmsi (S, G): (*, *) originating IP: 10.0.0.20
                                 -                     -       -       0       i
 * >     RD:  30:1 spmsi (S, G): (*, *) originating IP: 10.0.0.20
                                 -                     -       -       0       i
 * >     RD:  40:1 spmsi (S, G): (*, *) originating IP: 10.0.0.20
                                 -                     -       -       0       i
 * >     RD:  10:1 spmsi (S, G): (*, *) originating IP: 10.0.0.30
                                 10.0.0.30             -       100     0       i
 * >     RD:  20:1 spmsi (S, G): (*, *) originating IP: 10.0.0.30
                                 10.0.0.30             -       100     0       i
 * >     RD:  30:1 spmsi (S, G): (*, *) originating IP: 10.0.0.30
                                 10.0.0.30             -       100     0       i
 * >     RD:  40:1 spmsi (S, G): (*, *) originating IP: 10.0.0.30
                                 10.0.0.30             -       100     0       i
 * >     RD:  30:1 spmsi (S, G): (*, *) originating IP: 10.0.0.40
                                 10.0.0.40             -       100     0       i
 * >     RD: 100:1 spmsi (S, G): (*, *) originating IP: 10.0.0.10
                                 10.0.0.10             -       100     0       i
 * >     RD: 100:1 spmsi (S, G): (*, *) originating IP: 10.0.0.20
                                 -                     -       -       0       i
 * >     RD: 200:1 spmsi (S, G): (*, *) originating IP: 10.0.0.20
                                 -                     -       -       0       i
 * >     RD: 100:1 spmsi (S, G): (*, *) originating IP: 10.0.0.30
                                 10.0.0.30             -       100     0       i
 * >     RD: 200:1 spmsi (S, G): (*, *) originating IP: 10.0.0.30
                                 10.0.0.30             -       100     0       i
 * >     RD: 200:1 spmsi (S, G): (*, *) originating IP: 10.0.0.40
                                 10.0.0.40             -       100     0       i

Limitations

The EVPN VXLAN all-active multi-homing integrated routing and bridging feature has the following limitations:
  • L2 loop-free protocols, such as Spanning Tree Protocol (STP) are not supported between PE-CE with EVPN VXLAN All-Active Multi-homing. The topology must be loop-free. STP must be disabled for the Multihomed VLANs on PEs and CEs.

  • A limit of two multi-homing destinations are installed at one time for any particular MAC address. Any other valid destinations are ignored in the context of switching unicast traffic, until one of the active destinations is removed. The multi-homing PEs operate normally regardless of the number of PEs on the ethernet segment; this limitation only affects the selection of a destination for unicast traffic.

  • VXLAN GPE is not supported, so packets cannot be flagged as having been broadcast. As a result, unicast packets that are known at the sender and unknown at the receiver are dropped if the receiving PE is not a designated forwarder. Similarly, unicast packets that are unknown at the sender and known at the receiver are duplicated at the receiving CE. This state automatically resolves itself as the BGP network distributes the appropriate type 1 auto-discovery and type 2 MAC/IP advertisement EVPN routes.

  • Fast mass withdrawal is not supported, so there may be a delay if an interface harboring a large number of MAC addresses goes down.

Flood Traffic Filtering with EVPN

The VXLAN fabric managed by EVPN does not always flood with broadcast, multicast, or unknown MAC traffic. Sometimes, ARP request broadcast and ND multicast traffic flood the VXLAN fabric as well. There may be other cases where flooding ARP plus other traffic is allowed, but not all broadcast traffic into the fabric.

Most of the ARPs are learned through EVPN, and for some cases where the ARP is not learned through EVPN, it is acceptable to flood such ARP requests. However, rate limit the ARP flooding going into the VXLAN fabric.

Configuration

The default command disables flooding of different kinds of traffic into the VXLAN fabric to restrict only ARP and ND traffic flooding.

switch(config-rtr-l2-vpn)# flooding default disabled

The following command enables flooding of ARP packets into the VXLAN fabric again.
switch(config-rtr-l2-vpn)# arp flooding

The above command starts flooding ARP traffic into the VXLAN fabric while all other traffic including ND traffic is prevented from getting flooded into the fabric. The following command enables flooding of ND traffic.
switch(config-rtr-l2-vpn)# nd flooding

Note: These commands are in effect only when EVPN is enabled. ARP & ND flooding are enabled by default.

Show commands

The show VXLAN counters software command shows the number of ARP and ND packets that were prevented from getting flooded into the VXLAN fabric.

switch# show VXLAN counters software 
. . . . . . . . 
Tx pkts after IPv6 encapsulation                     :  0
SW pkts forwarded to remote VTEPs via HW HER         :  4
SW pkts forwarding to remote VTEPs via HW HER failed :  0
Packets suppressed from getting flooded              :  0