A software-defined network: is it an evolution or a revolution in networking? The hype of SDN has been around for several years, but as yet it doesn’t seem to have managed to get much traction outside of the MSPs and Fortune 500 companies with regard to SDN, and telcos with regard to SD-WAN. When, if ever, will the SDN meltwater reach the fertile plains of the LME?
For this, we really need to look to history.
Previously Published on TVP Stragegy (The Virtualization Practice)
One of the frustrations of SDN has always been the fact that if you ask six different people for a definition of SDN, you’ll get ten different answers, at least. This stems in part from the usual IT buzzword symptoms. When a system is used for competitive advantage, each company wants to define its own brand of “The Thing”—to try to “own” the thing and become the de facto standard for it. There is also a deeper issue with SDN, precisely because it is networking.
When we talk about “the network,” we often think of one thing: one set of interconnected computers. Sometimes we think of the internet: of many interconnected networks. In reality, there are many different networks that even the smallest of companies use every day now. Each of these has different needs, different solutions, and different flavours of SDN. Add into that public and hybrid cloud, and we have many, many networks in use. Some of these we have control over, but many of them we don’t. However, that doesn’t mean that SDN isn’t playing its part.
I’ve written before about the difficulty as a user of getting hold of VMware’s NSX and about other problems with the release, but a small recap is in order. Founded in 2007, Nicira was bought by VMware in 2012 for its SDN platform. This consists of deep integration that combines the open VXLAN standard with vSphere’s vShield-like products and some other bit of magic to yield a fully functioning microsegmentation system. Although Nicira is available for OpenStack, too, VMware’s focus has always been on the vSphere implementation and using NSX, combined with some of the vShield products to replace VMware’s own vCNS (vCloud Networking and Security). This $1 billion acquisition has been with VMware for as long as Nicira existed as a company. By now, we would expect it to simply be another part of the VMware product line.
Many years ago, when VMware was a little-known start-up, one of the biggest factors in the growth of its hypervisor was the ability of systems administrators to get ahold of the product and play with it. The trial licenses enabled the full product set, which was unusual at the time, and were simply time-limited. The VMTN subscription included non-production licenses for testing. This, combined with the previously unknown willingness of VMware staff to interact on the company’s forum led to an immense community of enthusiasts who wanted to use the product and practically begged their bosses to bring it in.
VMware just released details about the latest version of NSX—6.2.2. What is interesting about this release is that it is the first that is split into tiers. The release pages are full featured, and although pricing doesn’t appear to be available yet on the website, hopefully this will be a fully public release that doesn’t require jumping through hoops to get. Since VMware acquired Nicira in 2012, the NSX product has been a bit of a dark horse, kept well stabled and not allowed out to run free. The product has been available only to selected customers and partners, presumably with high-volume sales that will support a large amount of VMware employee time in each deployment.
Unlike VMware’s other products, and tellingly vCNS (vCloud Networking and Security), NSX was a single SKU with an all-or-nothing full feature set approach. With 6.2.2, this has changed. We are now looking at VMware’s standard three-tier approach. This could be a positive step. It gives customers options, and the ability to start small and grow into the full NSX product set as their needs change. It also splits out some of the complex Service Provider features from the view of most customers, making it less intimidating and, at the same time, less like customers are paying for features they do not need.
With the recent layoffs at VMware, one of the biggest surprises was the loss of almost the whole Workstation/Fusion team. For many, this is the end of an era. Not only was Workstation one of VMware’s first products, but it was the one that gave numerous people the opportunity to play with new tech and ultimately show off the systems to and get buy-in from management. It let Devs test different builds quickly and easily, and it let server teams test updates and changes quickly and, importantly, safely.
A community built up around Workstation and Fusion that was fueled by the VMTN (VMware Technical Network) subscription and forums. I still have my VMTN T-shirt. The subscription and forums offered easy ways to share ideas and provided a cheap “in” to VMware’s software, which created a huge pool of evangelists who still promote the tech today. The combination of long-lasting trial versions, easily available and readable documentation (VMware’s docs have always been some of the best in the business), and a well-moderated community lowered the barrier to entry for VMware products in a way that no other company has achieved.
In networking, as in life, we often use the same terms to mean many different things. One of the biggest culprits of this in networking is “edge.” An edge device is usually considered to be a device that connects into a network in only one place. Traffic can flow from an edge device, or it can flow to an edge device, but it can never, ever flow through an edge device. I say never—that’s not entirely true, but I’ll get back to that later. In a campus network, the edge devices are things like users’ computers, laptops, and printers; mobile phones; and tablets.
In data centers, the end devices are servers or, more than likely in the SDDC, virtual machines, or possibly containers. The exception to the rule about traffic not flowing through an edge device is the “edge router,” which more often than not takes the form of a firewall: a perimeter firewall. If we consider north/south versus east/west traffic flows, north/south traffic flows move between the edge and the core, and east/west circumnavigates the network, to take the globe analogy a step further. This distinction becomes important as we look at the direction that networking has taken, and the direction I believe it will continue to take.
For the last eighteen months, VMware has been pushing NSX as the third pillar of its software-defined data center (SDDC). NSX has three big selling points that VMware promotes: taking control of the network, automation and orchestration, and microsegmentation. The first two are standard SDDC fare: first, pull the function into software, abstract where necessary, and orchestrate to bring operational advantage; second, break down silos and allow a more agile approach. But the last, microsegmentation, is a good place to focus for a moment.
The term “microsegmentation” is taken from the marketing world, where it is used to mean “a more advanced form of market segmentation that groups a number of customers of the business into specific segments based on various factors including behavioral predictions,” according to Wikipedia. Microsegmentation has an analogous definition in the networking world, where it is used to mean an advanced form of segmentation of groups of servers based on various factors. Microsegmentation is in many ways a crossover, or a subset, of both network functions virtualization (NFV) and software-defined networking (SDN), focused on the data center. The aim is to reduce and control east-west traffic in a way that hasn’t been possible before. But what’s the point?
The big story of the last few weeks has been Dell’s $67B acquisition of EMC, and with it, VMware. This is big news for the industry—news that will have ramifications all over the software-defined data centre. One of the most interesting implications is how Dell will reconcile its own SDN strategy with VMware’s NSX vision. Do the two work together? VMware paid $1.2B for Nicira. With currently around 400 customers, as reported by VMware, and roughly one in four of those running in production, NSX is a relatively small but highly lucrative gem in the crown jewels of VMware. Dell will want to see something come from that aspect of this acquisition.
Since Dell acquired Force10 in 2011, it has had a stable of network offerings, though perhaps not with quite the clout of the more focused network vendors. Dell runs 3 to 5% of the switching market, depending on whom you ask. Dell gives those enterprises that want it a one-stop shop, with switch and router options at every level, from unmanaged modular switches to line-rate chassis switches right through the 40G and 100G space: options that rightly complement its server and storage offerings.
This is my final port in the NSX Packet walk series. So far I have discussed only so called “East/West” traffic. That is traffic which is moving from one VM, or physical machine, in our network to another. This traffic will never leave the datacenter, and in a lot of cases, will never leave the same rack in a small system, or NSX system.
Traditional Network Design
In the traditional network, traffic would be separated by purpose onto different VLANs, and would all be funneled towards the network core to be routed. North-bound traffic (i.e. traffic leaving the network) would then be routed to a physical firewall, before leaving the network via an edge router. South-bound (i.e. traffic entering the network) would traverse in the opposite direction.
This has the very obvious disadvantage that for traffic to reach the servers, the correct VLANs must be in place, and the correct firewall rules must have been implemented at the edge. Historically the network and security teams would have each handled that, and requests that involved a new subnet would take a long time while those teams processed the request.
Virtualised Network – Physical Next Hop
As we’ve seen, internally we have removed the need for the VLANs that span outside of our compute clusters for the most part. All of our East/West traffic is handled by Distributed Routers. The first, and most obvious step to making North/South Traffic then is to utilise the DLRs ability to perform dynamic routing to pass traffic to a physical router as the next hop.
Using OSPF or BGP would mean that the next hop router knows of our internal networks as and when we create them. The downside to this is that we still need to pass the VLAN the Physical router is connected to to all of the compute nodes.
Virtualised Network – Edge Router
The next option we could come up with would be to put a VM performing routing in the Edge Rack. We could then have dynamic routing updates from this VM to the DLR, and from this VM to the next hop router.
As this VM is in the edge rack, the external VLAN only needs to be passed into the hosts in the Edge Rack.
The biggest constraint here is pushing all of the North/South traffic through the edge rack, and the vulnerability of the NSX Edge VM. If the Edge VM fails, we would lose all North/South traffic. This has been alleviated by VMware by allowing multiple Edge VMs.
This VM is called the NSX Edge Services gateway; It is an evolution of the vShield Edge that was first part of vCloud Director, and later vCNS.
The Edge services gateway can have up to 10 internal, up-link or trunk interfaces. This combines with the “Edge Router” which we have so far referred to as the Distributed Logical Router (DLR) which can have up to 8 up-links and 1,000 internal interfaces. In essence, a given Edge services gateway can connect to multiple external networks, or multiple DLRs (or both) and a given DLR can utilise multiple Edge Services Gateways for load balancing and resilience.
The figure below (taken from the VMware NSX Design Guide version 2.1 (fig 41)) shows the logical and physical networks we will be thinking about.
In the top part of the figure we can see the green circle with Arrows, which represents the combination of the DLR and Edge Services Gateway, is connected to both of the logical switches, and also to the up-link to the L3 network. We can envisage how there could be other up-links to a WAN, or DMZ (or even multiple DMZs), or to other L3 networks if we had multiple ISPs etc. These up-links come from the pool of 10 links in the Edge Services Gateway. The logical Switches connect to the DLR which can connect to up to 1,000 logical switches.
Connectivity between the DLR and the Edge is through a transit network.
It is possible to configure BGP, or OSPF between the Edge Services Gateway and the DLR. This means that we can have multiple Edge Services Gateways (up to 8) with connections to a given DLR, which can use ECMP (Equal Cost Multi-Pathing) to spread the North/South traffic load over the multiple gateways and also give resilience. This is very much and Active/Active setup.
The Alternative is to have the Edge Services Gateway deployed as a HA pair. This means that we get an Active/Passive setup whereby if one Edge fails, the other takes over within a few seconds. This is used when the Active/Active option above is not possible, due to using the other Edge services that are available such as Load Balancing, NAT and the Edge Firewall.
Of course, we can have multiple layers of Edge Services gateways if necessary, with HA Pairs running NAT close to the logical switches, and ECMP aggregating the traffic outbound.
This ends our short series on NSX and Packet flows. Although the later posts have become much more generic and less about how the packets actually move, that to some extent is precisely the point of NSX. We gain the ability to think much more logically about our whole datacenter network, with almost no reliance on physical hardware. We can micro-segment traffic so that only the allowed VMs see it, regardless of where they are running. We can connect to existing networks and migrate slowly and seamlessly into NSX. We can even plug our internet transit directly into hosts and bypass physical firewall and routing devices.
This is the fourth post in the NSX Packet Walks series. You probably want to start at the first post.
Up to now we have focused on the traffic from one VM to another VM somewhere within the NSX system, as well as how traffic moves between physical hosts. But what if you’re data centre isn’t 100% virtualised? Can you still use NSX? What are the constraints? This post will look at this question. Continue reading “NSX Packet Walks – VLAN Bridge”