What is it?
Virtual Extensible LAN (VXLAN) is an overlay technology that allows layer 2 traffic to extend over a shared layer 3 infrastructure by means of encapsulation.
What problems does it solve?
VXLAN like many overlay solutions was set out to solve several issues. To increase the number of VLANs available in a data center, provide Multi-tenant Isolation as well as help scale the layer 2 infrastructure.
For more information on Network Virtualization Overlays in general click here.
VXLAN uses encapsulation to provide a means of extending Layer 2 networks across the data center. The following image provides an overview of the packet used:
Starting from the center of the image you have the original layer 2 frame, this is then encapsulated with a VXLAN header where just as with VLANs, an identifier is set to help isolate the domain to selected hosts. After that the data is wrapped in a UDP header and the IP address of the target VXLAN peer.
The two key components used within VXLAN are:
- VXLAN Tunnel Endpoint (VTEP): The VTEP identifies a VXLAN device that can encapsulate and de-encapsulate VXLAN traffic.
- VXLAN Network Identifier (VNI): A VNI is the 24 bit identifier that is used to uniquely identify the VXLAN
How it works
First we need a device, be it software or hardware based that will take our layer 2 data, perform VXLAN encapsulation and identify other VXLAN capable endpoints, our VTEP.
So lets start with a simple example whereby ‘Server-A’ wants to communicate with ‘Server-B’. For this first example we are assuming a VXLAN tunnel has been established.
- As far as ‘Server-A’ [192.168.1.100] is concerned, ‘Server-B'[192.168.1.101] is on the same network. So as with any IP based communication an ARP Broadcast will be sent to determine the MAC address of the destination, in this case ‘Server-B’.
- The ARP request is seen by our local VTEP (VTEP-A) which encapsulates the ARP request with a VXLAN header including the VNI and then forwards the packet over the ‘Underlay IP Network’ to ‘VTEP-B’ (We’ll cover how VTEP’s establish communication in a moment).
- ‘VTEP-B’ will strip off the packet header on the understanding the packet was destined for itself. Seeing a VXLAN header it will look up the VNI (VNI 100) and sends the ARP request to all devices mapped to that VNI. In addition ‘VTEP-B’ will record the IP address of ‘VTEP-A’ and MAC of ‘Server-A’ in it’s forwarding table for VNI 100
- ‘Server-B’ receives the ARP request forwarded by ‘VTEP-2’ and responds with its MAC address. Learning the IP and MAC of ‘Server-A’ in the process.
- ‘VTEP-B’ on receiving the ARP reply from ‘Server-B’ notes the MAC address of ‘Server-A’ and forwards the reply back to ‘VTEP-A’.
- ‘VTEP-A’ receives the packet strips off the packet header and forwards the ARP response directly to ‘Server-A’. Also noting the MAC address of ‘Server-B’ and IP of ‘VTEP-B’ in its forwarding table.
- All subsequent packets are then sent between the servers transparently through the VXLAN tunnel.
The important thing to note is that neither host has any awareness of VXLAN, they both believe they are on exactly the same LAN segment.
So the big question we skipped over is how did VTEP-A and VTEP-B learn about each other.
The original specification of VXLAN used multicast, which required the underlay network to have a running Multicast protocol to operate.. In this set up each VNI would need to be mapped to a multicast group. Which in turn would be registered with a multicast Rendezvous Point (RP).
Using the previous example when VTEP-A received the ARP request from Server-A, it would encapsulate the ARP request into a VXLAN packet and send this to the multicast RP, which would in turn send it to all other VTEP’s registered for that Multicast group.
The remote VTEP (VTEP-B) would strip the packet and record the Source IP address of VTEP-A as well as the MAC address of Server-A. It would then complete the ARP request locally sending back the MAC address of Server-B by encapsulating the response in a VXLAN packet. However this time as the VTEP now knows the MAC of Server-A is located behind VTEP-A, as well as the IP address for VTEP-A it can send the packet directly.
Finally allowing VTEP-A to record the MAC address of Server-B and IP address of VTEP-B.
Speak to almost any network engineer about wanting to deploy multicast and they try to resolve your issue by using anything other than.
To this end most vendors now offer a Unicast VXLAN alternative, however at present no reference to Unicast has been added to RFC 7348
Solving the Problems
Increase the number of VLANs available in a Data Center
A VLAN used to segment Ethernet frames into multiple broadcast domains is composed of 12 bits. This only provides 4094 VLANs. Which has only started to become an issue over the past few years through the adoption of virtualization. In contrast a VXLAN, VNI is composed of a 24 bit ID allowing up to 16 Million unique segments.
Provide Multi-tenant Isolation
Due to the fact that only devices in the same VXLAN segment can communicate with one another. With the 24 bit VNI already discussed, you can not only have a lot of networks, but a lot of tenants. Each of which are capable of excepting overlapping IP or MAC addresses providing they are not in the same VXLAN.
Scale the layer 2 infrastructure across the Data Center
To expand on the issue a little more, the alternative to using a L3 core is to stretch the VLAN and L2 broadcast domain across the data center. One of the many issues with this design is the number of MAC addresses that must be stored on the Top of Rack (ToR) switches. In a physical environment this was rarely an issue, but as the bulk of hosts connecting to the ToR being a hypervisor host your typical 48 port ToR could easily have thousands of MAC addresses. When you then add in multiple ToR switches the strain begins to show whereby you exhaust the switches CAM table leading to packet loss
With a L3 core the ToR still needs to maintain the MAC addresses for all connected devices, but all MAC addresses for devices on other ToR switches is offloaded to the VTEP.
NOTE: VTEPs must NOT fragment VXLAN packets, to this end, during VXLAN implementation the MTU should be increased though out the network as necessary.
Several vendors have adopted VXLAN as their overlay of choice, these are available both as both software and hardware variants. The limitation with software is that you are restricted to communicating with virtual machines. Which is summary means if you have 100% virtual environment, you’re good to go. However if you have any nodes that are physical, you may wish to look into a hardware appliance.
- VMware NSX
- Cisco Nexus 9000 Series Switches
- F5 Big IP
- Arista 7150