This lesson covers the following exam topics from Cisco’s official 300-401 V1.0 Enterprise Network Core Technologies (ENCOR) exam blueprint.
Before we get into the specifics of network design discussion for an enterprise network, it behooves us to look at the big picture and the very fundamentals of network design.
The network is simply a resource, and a means to an end. Every enterprise network is laid out to facilitate the applications running on top of it. The network will meet its goals if enterprise applications can run in a reliable and performant manner. With the increasing adoption of cloud applications (or SaaS apps such as CRM or HRM), i.e., applications that are hosted by the providers (such as Salesforce) in their own data centers as opposed to being on-premise, the role of the network changes again. In the new world of cloud apps, the network still has to provide reliable and performant access to those off-premise apps, but even more so maintain the necessary user experience, security, compliance and visibility and control with the help of solutions such as Cloud Access Security Broker (or CASB).
While you’ve to build your network for current requirements, it must be able to evolve, for example where your core design choices stay the same (for example, 2-tier versus 3-tier architecture), so think in a modular fashion. Still, at the same time, other parts of the network can evolve, much like building blocks of a Lego. Whether you are designing for only on-premise or everything off-prem (SaaS/PaaS), you are design will still need to be performant, resilient, and scalable.
Until 2009, i.e. before the advent of Software-Defined Networking (or SDN), there was only one way to pick your building blocks when designing your network, i.e. go with Cisco or Juniper as far as routing and switching go. Today, we live in a world where we have more choices available to us, so ensuring that your network design choices at least take into account the possibility of and mitigate against the vendor lock-in, is a must. It still doesn’t mean that open source and open standard vendors (such as Cumulus Networks) can provide you the best in class solution but keeping your design flexible and keeping your options open would pay off over time.
Enterprise campus network can span over a single building or a group of buildings spread out over a large geographic area much like a college campus but still in closer proximity. The primary goal of the campus design is to deliver the fastest speed (say 1 or 10 Gbps) and variety of access (LAN, WLAN) options to the endpoints.
Campus network design can be organized around four core principles, i.e.
In 1999, Cisco pioneered the campus network design with hierarchical design model which used a layered approach. The hierarchical network design can help break down otherwise complex and flat networks into multiple smaller and manageable network tiers or layers. Each layer is focused on a specific set of requirements and roles. With this design, network designers can pick the most suitable platform and software features for each layer. As we discussed earlier, regardless of how a network was designed, the ability to modify an existing design, i.e. without rip and replace, is of utmost importance. There can be many underlying reasons for such modifications, i.e. addition of newer services, more bandwidth, and so on.
When you think of network design, you’re likely thinking about the most discussed and much talked about three-tier or layer design. Three-layer design is most suited to large enterprise campus networks. Those three well-known layers are
Now, let me describe primary functions of each layer.
Two-layer design is a modified three-layer design where the core has been collapsed into the distribution layer. The main motivation for the collapsing core has to do with cost and the operational simplicity that it brings. It is best suited for small to medium-size networks.
It is worth noting that the above discussion is about enterprise campus design and not enterprise data center. The Campus is where end-users connect to the network whereas data center provides connectivity to the servers and devices such as load balancers and storage arrays. Let me summarize the key differences between the two network designs before we move on.
|Campus Network||Data Center Network|
|Architecture||Three or two-tier||Three-tier or Leaf-Spine Clos|
|Traffic Flow||Mostly north-south||North-South and East-West (depending on the applications)|
|Speeds and Feeds||Mostly 1G for access and 10/40G for uplinks||Mostly 10/40G for access and 10/40/100G for uplinks|
|Oversubscription||Typically, 20:1||Typically, 1:1 or 4:1|
|Failure domains||Mostly limited impact||Mostly larger impact|
|Access Medium||Wired and Wireless||Wired only|
When designing an enterprise network, network engineers should try to include redundancy at each layer. Let’s first discuss the broad HA and redundancy considerations.
Let’s now discuss some specific redundancy considerations by each campus network layer.
|L2 versus L3||L3 design are better than L2||L3 to Core switches|
L2 to Access switches
|Link redundancy||Redundant p2p L3 interconnections lead to faster convergence||Dual L3 equal-cost paths to Core layer||SW redundancy via LACP or EC|
The purpose of default gateway or first-hop redundancy is to help protect against a single node failure so that traffic from end hosts can continue flowing through active default gateway device after a small sub-second convergence.
In the hierarchical design that we have discussed so far, distribution switches define the L2/L3 network boundary and act as the default gateway to the entire L2 domain facing the access layer. Without some form of redundancy in place, default gateway failure could result in a massive outage.
HSRP, VRRP, and GLBP are three popular first-hop routing protocols for implementing default gateway redundancy. HSRP and GLBP are Cisco proprietary, whereas VRRP is an IETF standard based protocol defined in RFC 3768 and RFC 5798.
HSRP and VRRP are the recommended protocols and can provide sub-second failover with some tuning for redundant distribution switches. If you are using Cisco switches, best practices indicate that you would be better off using feature-rich HSRP however VRRP is a must when your design requires vendor inter-op.
The configuration snippet below shows how you can use HSRP in an enterprise campus deployment and achieve sub-second failover times.
It is strongly recommended to configure HSRP with a preemption feature which allows a previously failed device to reclaim its role upon recovery. It is the desired behavior because STP/RSTP root should be the same device as the HSRP primary device for a given subnet or VLAN. Without consolidating HSRP primary and STP root in a single device, the transit link between the distribution switches can act as a transit link where traffic to/from default gateway takes multiple L2 hops. It is also recommended that preemption delay is set to 150% of the time that it takes for the switch to boot up from scratch.
HSRP preemption needs to be configured with switch boot time and overall connectivity to the rest of the network. If preemption and neighbor adjacency occur before switch has L3 connectivity to the core, no traffic will actually and remain blackholed until complete L3 connectivity is restored.
GLBP protects traffic against device or circuit failure much like HSRP or VRRP, but in addition to that, it also allows packet load sharing between a group of redundant routers. Prior to GLBP, you could only implement HSRP or VRRP hacks to get load balancing to work. For example, you could configure distributes devices as alternate root switches and divide and direct traffic from VLANs into both. Yet another hack would have been to use multiple HSRP groups on a single interface and use DHCP to alternate between the default gateways. As you can see, none of these hacks are clean and could very easily become an administrative nightmare.
HSRP uses a virtual IP and MAC pair which is always assumed by the active router whereas GLBP uses one virtual IP address for multiple virtual MAC addresses.
The configuration snippet below shows GLBP configuration.
Let’s now wrap up the FHRP discussion with a side by side comparison table.
|Vendor Interop||Cisco proprietary||IETF standard||Cisco proprietary|
|Redundancy mechanism||Active / Standby||Active / Standby||Active / Active|
|Preemption||Supported, disabled by default||Supported, enabled by default||Supported, enabled by default|
|Multicast address for hellos||184.108.40.206/220.127.116.11||18.104.22.168||22.214.171.124|
|Transport||UDP 1985||IP (Protocol #112)||UDP 3222|
Today, most network devices can provide a level of high availability intra-box, i.e. in the form of redundant supervisors such as Cisco Catalyst 6500, 4500, and Nexus 7K. When you have redundant supervisors, the box can also support Stateful Switchover or SSO which ensures that the standby supervisor blade contains state information from the active blade and can thus switchover and become primary to assume the L2 forwarding function.
The Cisco Catalyst 6500 and N7K switches support L3 Non-Stop Forwarding or NSF which allows redundant supervisors to assume L3 forwarding functions without tearing down and rebuilding L3 neighbor adjacencies in the event of primary supervisor failure.
Now, in a hierarchical network design, the core and distribution nodes are connected via L3 p2p links which means distribution or core related failures are about loss of link, i.e. if a supervisor fails on a non-redundant device, the links fail and the network simply re-converges through the second core or distribution device within sub-200 milliseconds let’s say using EIGRP or OSPF.
With redundant supervisors, links are not dropped due to SSO/NSF convergence event if a supervisor were to fail. There will be a momentary interruption to traffic during SSO, once SSO is completed NSF follows suit. This will obviously result in some downtime during re-convergence too.
If the L2/L3 boundary is in the access layer, i.e. a routed access design, then SSO/NSF can provide an increased level of HA. If your access layer design is switched (or L2), then you should consider using dual redundant supervisors with SSO.