Integrated Routing and Bridging
In traditional data center design, inter-subnet forwarding is provided by a centralized router, where traffic traverses across the network to a centralized routing node and back again to its final destination. In a large multi-tenant data center environment this operational model can lead to inefficient use of bandwidth and sub-optimal forwarding.
To provide a more optimal forwarding model and avoid traffic tromboning, the IETF draft Integrated Routing and Bridging in EVPN proposes integrating routing and bridging functionality directly onto the VTEP, thereby allowing the routing operation to occur as close to the end host as possible. The draft proposes two forwarding models for the Integrated Routing and Bridging (IRB) functionality, which are termed asymmetric IRB and symmetric IRB.These two models are described in the following sections.
In the asymmetric IRB model, the inter-subnet routing functionality is performed by the ingress VTEP, with the packet after the routing action being VXLAN bridged to the destination VTEP. The egress VTEP only then needs to remove the VXLAN header and forward the packet onto the local Layer 2 domain based on the VNI to VLAN mapping. In the return path, the routing functionality is reversed with the destination VTEP now performing the ingress routing and VXLAN bridging operation, hence the term asymmetric IRB.
To provide inter-subnet routing on all VTEPs for all subnets, an anycast IP address is utilized for each subnet and configured on each VTEP. The anycast IP acts as the default gateway for the hosts; therefore, regardless of where the host resides, the directly attached VTEPs can act as the host’s default gateway. The host MAC and MAC to IP bindings are learned by each VTEP based on a combination of local learning/ARP snooping and type-2 route advertisement from remote VTEPs.
In a typical implementation, the optional MAC and IP, type-2 route is advertised separately from the MAC only type-2 route. This is done so that if the MAC and IP route is cleared, for example the ARP flushed, or the ARP timeout is set to less than the MAC timeout, then the MAC only route will still exist.
The format of the two advertised type-2 routes for Server-1 are illustrated below, where the RD IP-A:1010 and route-target 1010:1010 are used to distinguish the uniqueness of the route and allow the route to be imported into the correct remote MAC-VRF based on the route-target import policy of the VTEP.
For the traffic flow between Server-1 in subnet-10 and Server-4 in subnet-11, the ingress VTEP (VTEP-1) locally routes the packet into subnet-11/VNI 1011 and then VXLAN bridges the frame, inserting the VNI 1011 into the VXLAN header with an inner DMAC equal to the destination host, Server-4. This requires the receiving VTEP (VTEP-4) to only perform a local Layer 2 lookup, based on the VNI to VLAN mapping, for the DMAC of Server-4.
- VNI Scaling: The number of VNIs supported on a hardware VTEP will be finite, so not all VNIs can reside on all VTEPs. This is especially true in data-center deployments, where the TOR’s have traditionally been more resource constrained than chassis-based edge systems.
- Forwarding memory scaling: The VTEPs needs to store all host MACs and ARP entries for all subnets in the network, on leaf switch this is hardware resource which again will be a finite resource defined by the specific hardware platform deployed at the leaf.
Symmetric IRB
To address the scale issues of the asymmetric model, in the symmetric model the VTEP is only configured with the subnets that are present on the directly attached hosts. Connectivity to non-local subnets on a remote VTEP is achieved through an intermediate IP-VRF. The subsequent forwarding model for symmetric IRB is illustrated in the figure below, for traffic between Server-1 on subnet-10 (Green) and Server-4 on the remote subnet-11 (Blue). In this model, the ingress VTEP routes the traffic between the local subnet-10) and the IP-VRF, which both VTEPs are a member of, the egress VTEP then routes the frame from the IP-VRF to the destination subnet. The forwarding model results in both VTEPs performing a routing function, hence the term symmetric IRB.
To provide the inter-subnet routing, when the subnet is stretched across multiple VTEPs, an anycast IP address is utilized for each subnet, but only configured on the VTEP’s where the subnet exists. The host MAC and MAC to IP bindings are learned by each VTEP based on a combination of local learning/ARP snooping and type-2 route advertisements.
For the symmetric IRB model the type-2 (MAC and IP) route is advertised with two labels and two route-targets corresponding to the MAC-VRF the MAC address is learned on and the IP-VRF. Remote VTEP’s receiving the route, import the IP host route into the corresponding IP-VRF based on the IP-VRF route-target and if the corresponding MAC-VRF exists on the VTEP the MAC address is imported into the local MAC-VRF based on the MAC-VRF’s Route-Target. The import behavior for the type-2 route is illustrated in the diagrams below for the host Server-1.
If the MAC-VRF exists locally on the receiving router, both the IP host route will be installed in the IP-VRF, and the MAC address will be installed in the MAC-VRF. With both a MAC route in the MAC-VRF and an IP host route in the IP-VRF, the VNI used in the data-path will depend on whether the traffic is being VXLAN bridged between hosts in the same VNI (1010) or VXLAN routed (VNI 2000).
Compare this to the figure below, where the MAC-VRF does not exist on the receiving VTEP (VTEP-2). In this case, the MAC route is not installed and ignored, as there is no corresponding Route Target on the VTEP. In this scenario, only the IP-VRF host route is installed on VTEP-2. Traffic from VTEP-2 destined to hosts on subnet-10, are therefore always VXLAN routed via the IP-VRF, VNI 2000.
- Multi-protocol Reachable NLRI (MP_REACH_NLRI) attribute is used to carry the next-hop hop for the advertised route. In the context of a VXLAN forwarding plane, this will be the source address of the advertising VTEP.
- Route Distinguisher of the advertising node’s MAC-VRF. For Server-1 in the example above, this would be IPA:1010.
- MAC address field contains the 48-bit MAC address of the host being advertised. For Server-1 in the example above, this would be MAC-1.
- IP address and length fields contain the IP address and 32-bit mask for the host being advertised. For Server-1 in the example above, this would be IP-1.
- MAC-VRF label, this contains the VNI number (label) corresponding to the local Layer 2 domain/MAC-VRF the host MAC was learned on. For Server-1 in the example above, this would be VNI 1010.
- IP-VRF label, this contains the VNI number (label) corresponding to the MAC-VRF’s associated lP-VRF. For MAC-VRF 10 in the example above, this would be IP-VRF 2000.
- Extended community Route Target for the IP-VRF. This contains the route-target of the IP-VRF associated with the learned MAC address.
- Extended community Router MAC. This field advertises the system MAC of the advertising VTEP and is used as the DMAC for any packet sent to the VTEP via the IP-VRF.
- Extended community Route Target for the MAC-VRF. This contains the route-target of the MAC-VRF associated with the learned MAC address.
IP VPN
- Maintaining privacy.
- Allowing for IP address overlap amongst customers.
- Constraining route distribution - so that only the service provider routers that need the routes have them.
This is achieved through the usage of VRFs, Route Distinguishers and Route-Targets
- Specifics an BGP IPv4 VPN control plane with a MPLS data plane.
- BGP control plane, new address family to advertise IP VPN prefixes.
- This RFC obsoleted the original RFC 2547.
- MPLS data-plane defined in multiple RFCs and drafts.
The RED circle in the figure below highlights the main Drafts and RFCs used today for an MPLS data-plane.
IPv4 VPN and IPv6 VPN are an extensions of the BGP protocol introducing new address families: IPv4 (address family number 1), IPv6 (address family number 2), and a subsequent address family number 128: MPLS Layer 3 VPN unicast.They areused to exchange overlay IP prefix reachability information between MP-BGP peers.
- Update
- Withdrawal
Each route type has its own NLRI prefix format and ach route type advertises its own set of prefixes to update/withdraw.
The format of the IPv4 VPN prefix update route is illustrated in the following figure. As detailed, the update route contains the VPN route (prefix and RD), the next-hop for the route and the advertising router ID, along with the MPLS Label, along with a number of path attributes (where the RT extended communities are defined), which are associated with these IPv4 NLRIs.
The output in IPv4 VPN route as shown on PE, and the IPv6 VPN route as shown on PE offers a more detailed view of the route as displayed on a PE router.
The following is an illustration of a basic MPLS Layer 3 VPN topology.
An IP VRF is used on a PE router for each customer (Layer 3 overlay). VRF IP routes are exported into the MP-BGP table and advertised to remote PEs as VPN routes. The exported VPN routes carry the Route-Target (RT) extended communities that are configured as export route-targets on the IP VRF from which they were exported.
The RTs carried by the VPN routes received by a PE are matched against the VRF import route-target configuration. When a received route carries an RT that is configured as an import route-target on an IP VRF, the route is imported into the IPv4 or IPv6 table for that VRF.
PE routers allocate per-VRF and address family Labels that are advertised as part of the VPN route NLRI. Forwarding of overlay packets between PEs across the underlay requires underlay MPLS connectivity provided by a backbone.