Course sections

Core Routing, Lecture 2

Lesson 2: Border Gateway Protocol (BGP)

Describe, implement, and troubleshoot IBGP, EBGP, and MP-BGP

Peer-group, template

In older versions of Cisco IOS software, BGP update messages were grouped based on peer group configurations. This method of grouping neighbors for BGP update message generation reduced the amount of system processing resources needed to scan the routing table. This method, however, had the following limitations:

  • All neighbors that shared the same peer group configuration also had to share the same outbound routing policies.
  • All neighbors had to belong to the same peer group and address family. Neighbors configured in different address-families could not belong to different peer groups.

These limitations existed to balance optimal update generation and replication against peer group configuration. These limitations also caused the network operator to configure smaller peer groups, which reduced the efficiency of update message generation and limited the scalability of neighbor configuration.

A peer template is a configuration pattern that can be applied to neighbors that share common policies. Peer templates are reusable and support inheritance, which allows the network operator to group and apply distinct neighbor configurations for BGP neighbors that share common policies. Peer templates also allow the network operator to define very complex configuration patterns through the capability of a peer template to inherit a configuration from another peer template.

There are two types of peer templates:

  • Peer session templates are used to group and apply the configuration of general session commands that are common to all address family and Network Layer Reachability Information (NLRI) configuration modes.
  • Peer policy templates are used to group and apply the configuration of commands that are applied within specific address-families and NLRI configuration modes.

Peer templates improve the flexibility and enhance the capability of neighbor configuration. Peer templates also provide an alternative to peer group configuration and overcome some limitations of peer groups. With the configuration of the BGP Configuration Using Peer Templates feature and the support of the BGP Dynamic Update Peer-Groups feature, the network operator no longer needs to configure peer groups in BGP and can benefit from improved configuration flexibility and faster convergence.

This command can be used to check whether the routes are being advertised:

Router#show ip bgp 172.16.10.0/24

BGP routing table entry for 172.16.10.0/24, version 24480684

Bestpath Modifiers: deterministic-med

Paths: (4 available, best #3)

Not advertised to any peer   <—

Typical reasons for this include:

  • absence of network statement with the exact prefix and mask
  • exact route is not in the IP routing table

Further Reading

https://bit.ly/2YmG8xi

Active, passive

When a BGP speaker first initializes, it uses a local ephemeral TCP port, or random port number greater than 1024, and attempts to contact each configured BGP speaker on TCP port 179 (the well-known BGP port). The speaker initiating the session performs an active open, while the peer performs a passive open. It’s possible for two speakers to attempt to connect to one another at the same time; this is known as a connection collision. When two speakers collide, each speaker compares the local router ID to the router ID of the colliding neighbor. The BGP speaker with the higher router ID value drops the session on which it is passive, and the BGP speaker with the lower router ID value drops the session on which it is active (i.e., only the session initiated by the BGP speaker with the larger router ID value is preserved).

States, timers

 Idle State:

  • Refuse all incoming BGP connections
  • Start the initialization of event triggers.
  • Initiates a TCP connection with its configured BGP peer.
  • Listens for a TCP connection from its peer.
  • Changes its state to Connect.
  • If an error occurs at any state of the FSM process, the BGP session is terminated immediately and returned to the Idle state. Some of the reasons why a router does not progress from the Idle state are:
  • TCP port 179 is not open
  • A random TCP port over 1023 is not open
  • Peer address configured incorrectly on either router
  • AS number configured incorrectly on either router

 Connect State:

  • Waits for successful TCP negotiation with peer.
  • BGP does not spend much time in this state if the TCP session has been successfully established.
  • Sends Open message to peer and changes state to OpenSent.
  • If an error occurs, BGP moves to the Active state. Some reasons for the error are:
  • TCP port 179 is not open.
  • A random TCP port over 1023 is not open.
  • Peer address configured incorrectly on either router.
  • AS number configured incorrectly on either router.

Active State:

  • If the router was unable to establish a successful TCP session, then it ends up in the Active state.
  • BGP FSM tries to restart another TCP session with the peer and, if successful, then it sends an Open message to the peer.
  • If it is unsuccessful again, the FSM is reset to the Idle state.
  • Repeated failures may result in a router cycling between the Idle and Active states. Some of the reasons for this include:
  • TCP port 179 is not open.
  • A random TCP port over 1023 is not open.
  • BGP configuration error.
  • Network congestion.
  • Flapping network interface.

 OpenSent State:

  • BGP FSM listens for an Open message from its peer.
  • Once the message has been received, the router checks the validity of the Open message.
  • If there is an error it is because one of the fields in the Open message does not match between the peers, e.g., BGP version mismatch, MD5 password mismatch, the peering router expects a different My AS, etc. The router then sends a Notification message to the peer indicating why the error occurred.
  • If there is no error, a Keepalive message is sent, various timers are set and the state is changed to OpenConfirm.

OpenConfirm State:

  • The peer is listening for a Keepalive message from its peer.
  • If a Keepalive message is received and no timer has expired before reception of the Keepalive, BGP transitions to the Established state.
  • If a timer expires before a Keepalive message is received, or if an error condition occurs, the router transitions back to the Idle state.

Established State:

  • In this state, the peers send Update messages to exchange information about each route being advertised to the BGP peer.
  • If there is any error in the Update message then a Notification message is sent to the peer, and BGP transitions back to the Idle state.
  • If a timer expires before a Keepalive message is received, or if an error condition occurs, the router transitions back to the Idle state.

BGP keepalive timer is 60 seconds and the hold-timer is 180 seconds. When a BGP connection negotiate the hold-timer between two BGP peers started, the smaller of the two hold-timers will be chosen. Internet is not a stable network, setting the hold-timer too low will be bad to router CPU as the route will keep on withdrawing and adding. We usually keep the BGP hold-timer as it is. However, if you use BGP in a stable WAN environment, you may choose to reduce the hold-timer for fast convergence.

Dynamic neighbors

BGP dynamic neighbor support allows BGP peering to a group of remote neighbors that are defined by a range of IP addresses. Each range can be configured as a subnet IP address. BGP dynamic neighbors are configured using a range of IP addresses and BGP peer groups.

After a subnet range is configured for a BGP peer group and a TCP session is initiated by another router for an IP address in the subnet range, a new BGP neighbor is dynamically created as a member of that group. After the initial configuration of subnet ranges and activation of the peer group (referred to as a listen range group), dynamic BGP neighbor creation does not require any further CLI configuration on the initial router. Other routers can establish a BGP session with the initial router, but the initial router need not establish a BGP session to other routers if the IP address of the remote peer used for the BGP session is not within the configured range.

Implement and troubleshoot IBGP and EBGP

EBGP, IBGP

BGP is an exterior gateway protocol (EGP), used to perform inter-domain routing in TCP/IP networks. A BGP router needs to establish a connection (on TCP port 179) to each of its BGP peers before BGP updates can be exchanged. The BGP session between two BGP peers is said to be an external BGP (eBGP) session if the BGP peers are in different autonomous systems (AS) . A BGP session between two BGP peers is said to be an internal BGP (iBGP) session if the BGP peers are in the same autonomous systems.

By default, the peer relationship is established using the IP address of the interface closest to the peer router. However, using the neighbor update-source command, any operational interface, including the loopback interface, can be specified to be used for establishing TCP connections. This method of peering using a loopback interface is useful since it will not bring down the BGP session when there are multiple paths between the BGP peers, which would otherwise result in tearing down the BGP session if the physical interface used for establishing the session goes down. In addition to that, it also allows the routers running BGP with multiple links between them to load balance over the available paths.

To allow the redistribution of i-BGP routes into an interior gateway protocol such as IS-IS or OSPF or EIGRP, use the bgp redistribute-internal command in router configuration mode.

4-bytes AS number

During the early time of BGP development and standardization, it was assumed that availability of a 16-bit binary number to identify the Autonomous System (AS) within BGP would have been more than sufficient. The 16-bit AS number, also known as the 2-byte AS number, provides a pool of 65,536 unique Autonomous System numbers. The IANA manages the available BGP Autonomous System Numbers (ASN) pool, with the assignments being carried out by the Regional Registries. 2-byte ASN is now regarded as historical.

A solution to this depletion is the expansion of the existing 2-byte AS number to a 4-byte AS number, which provides a theoretical 4,294,967,296 unique AS numbers.

The Cisco IOS BGP “4-byte ASN” feature allows BGP to carry a Autonomous System Number (ASN) encoded as a 4-byte entity.

Private AS

Private autonomous system (AS) numbers which range from 64,512 to 65,535 are used to conserve globally unique AS numbers. Globally unique AS numbers (1 – 64,511) are assigned by InterNIC. These private AS number cannot be leaked to a global Border Gateway Protocol (BGP) table because they are not unique (BGP best path calculation expects unique AS numbers). It allows the stripping of private AS numbers out of the AS_PATH list before the routes are propagated to a BGP peer.

Generally, customer networks and their routing policies are an extension of the respective Internet Service Providers (ISPs). When a customer network is large, the service provider may assign an AS number using a couple of different methods in order to manage the network and routing policies.

  • One way is by permanently assigning an AS number in the range of 1 to 64511. This is done when a customer network connects to two different ISPs, such as multi-homing. This situation mandates that customer network should have a unique AS number so that it can uniquely propagate its BGP routes to a global BGP mesh via two ISPs.
  • A second way is by assigning a Private AS number in the range of 64,512 to 65,535. This is done when a customer network connects to a single ISP (either single-homed or dual-homed to the same ISP) and the intention is to conserve the AS numbers. It is not recommended that you use a private AS number if you are planning to connect to multiple ISPs in the future.

When a private AS number is allocated to the customer network, the BGP updates from the customer network to ISP will have the private AS number in its AS_PATH list. When the ISP propagates its network information to the global BGP table (Internet), it should not propagate the AS_PATH with the private AS number of the customer to the Internet. To help the ISP remove the private AS number from its AS_PATH list, use the Cisco IOS remove-private-as command.

To remove the private AS number, use the neighbor x.x.x.x remove-private-as router configuration command. The neighbor x.x.x.x remove-private-as per-neighbor configuration command forces BGP to drop the private AS numbers. You can configure this command for external BGP neighbors. When the outbound update contains a sequence of private AS numbers, this sequence is dropped.

Explain attributes and best-path selection

BGP assigns the first valid path as the current best path. BGP then compares the best path with the next path in the list, until BGP reaches the end of the list of valid paths. This list provides the rules that are used to determine the best path:

  1. Prefer the path with the highest WEIGHT.
  2. Prefer the path with the highest LOCAL_PREF.
  3. Prefer the path that was locally originated via a network or aggregate BGP subcommand or through redistribution from an IGP.
  4. Prefer the path with the shortest AS_PATH
  5. Prefer the path with the lowest origin type.
  6. Prefer the path with the lowest multi-exit discriminator (MED).
  7. Prefer eBGP over iBGP paths.
  8. Prefer the path with the lowest IGP metric to the BGP next hop.
  9. When both paths are external, prefer the path that was received first (the oldest one).
  10. Prefer the route that comes from the BGP router with the lowest router ID.
  11. If the originator or router ID is the same for multiple paths, prefer the path with the minimum cluster list length.
  12. Prefer the path that comes from the lowest neighbor address.

Further Reading

http://goo.gl/jTwpaQ

Implement, optimize and troubleshoot routing policies

Attribute manipulation

BGP is a protocol that uses route attributes to select the best path to a destination.

BGP Path Attributes

BGP uses several attributes for the path-selection process. BGP uses path attributes to communicate routing policies. BGP path attributes include next hop, local preference, AS path, origin, multi-exit discriminator (MED), atomic aggregate, and aggregator. Of these, the AS path is one of the most important attributes: It lists the number of AS paths to reach a destination network. BGP attributes can be categorized as well-known or optional. Well-known attributes are recognized by all BGP implementations. Optional attributes do not have to be supported by the BGP process; they are used on a test or experimental basis. Well-known attributes can be further subcategorized as mandatory or discretionary. Mandatory attributes are always included in BGP update messages. Discretionary attributes might or might not be included in the BGP update message.

Next-Hop Attribute

The next-hop attribute is the IP address of the next IP hop that will be used to reach the destination. The next-hop attribute is a well-known mandatory attribute. With eBGP, the eBGP peer sets the next hop when it announces the route. Multi-access networks use the next-hop attribute where there is more than one BGP router.

Local Preference Attribute

The local preference attribute indicates which path to use to exit the AS. It is a well-known discretionary attribute used between iBGP peers and is not passed on to external BGP peers. In Cisco IOS Software, the default local preference is 100. The higher local preference is preferred.

Origin Attribute

Origin is a well-known mandatory attribute that defines the source of the path information. Do not confuse the origin with comparing whether the route is external (eBGP) or internal (iBGP). The origin attribute is received from the source BGP router.

There are three types of origin attributes:

  • IGP—Indicated by an i in the BGP table. Present when the route is learned by way of the network statement.
  • EGP—Indicated by an e in the BGP table. Learned from EGP.
  • Incomplete—Indicated by a ? in the BGP table. Learned from redistribution of the route.

In terms of choosing a route based on origin, BGP prefers routes that have been verified by an IGP over routes that have been learned from EGP peers, and BGP prefers routes learned from eBGP peers over incomplete paths.

AS_Path Attribute

The AS path is a well-known mandatory attribute that contains a list of AS numbers in the path to the destination. Each AS prepends its own AS number to the AS path. The AS path describes all the autonomous systems a packet would have to travel to reach the destination IP network. It is used to ensure that the path is loop-free. When the AS path attribute is used to select a path, the route with the fewest AS hops is preferred.

MED Attribute

The MED attribute, also known as a metric, tells external BGP peers the preferred path into the AS when multiple paths into the AS exist. In other words, MED influences which one of many paths a neighboring AS uses to reach destinations within the AS. It is an optional non-transitive attribute carried in eBGP updates. The MED attribute is not used with iBGP peers. The lowest MED value is preferred, and the default value is 0. Paths received with no MED are assigned a MED of 0. The MED is carried into an AS but does not leave the AS.

Community Attribute

Although it is not an attribute used in the routing-decision process, the community attribute groups routes and applies policies or decisions (accept, prefer) to those routes. It is a group of destinations that share some common property. The community attribute is an optional transitive attribute of variable length.

Atomic Aggregate and Aggregator Attributes

The atomic aggregate attribute informs BGP peers that the local router used a less specific (aggregated) route to a destination without using a more specific route.

If a BGP router selects a less specific route when a more specific route is available, it must attach the atomic aggregate attribute when propagating the route. The atomic aggregate attribute lets the BGP peers know that the BGP router used an aggregated route. A more specific route must be in the advertising router’s BGP table before it propagates an aggregate route.

Conditional advertisement

Normally, routes are propagated regardless of the existence of a different path. The BGP conditional advertisement feature uses the non-exist-map and the advertise-map keywords of the neighbor advertise-map command in order to track routes by the route prefix. If a route prefix is not present in output of the non-exist-map command, then the route specified by the advertise-map command is announced. This feature is useful for multi-homed networks, in which some prefixes are advertised to one of the providers only if information from the other provider is not present (this indicates a failure in the peering session or partial reachability). The conditional BGP announcements are sent in addition to the normal announcements that a BGP router sends to its peers.

Outbound route filtering

The BGP Prefix-Based Outbound Route Filtering feature uses Border Gateway Protocol (BGP) outbound route filter (ORF) send and receive capabilities to minimize the number of BGP updates that are sent between BGP peers. Configuring this feature can help reduce the amount of system resources required for generating and processing routing updates by filtering out unwanted routing updates at the source. For example, this feature can be used to reduce the amount of processing required on a router that is not accepting full routes from a service provider network.

Communities, extended communities

A community is a BGP attribute that may be added to each prefix. Communities are transitive optional attributes, meaning BGP implementations do not have to recognize the attribute and at the network operator’s discretion carry it through an AS or pass it on to another AS. The community attribute can be thought of as simply a flat, 32-bit value that can be applied to any set of prefixes. It can be read as a 32-bit value or split into two portions, the first 2 bytes representing an ASN and the last 2 bytes as a value with a predetermined meaning.

The values 0x00000000 through 0x0000FFFF and 0xFFFF0000 through 0xFFFFFFFF are reserved. Most modern router software displays communities as ASN:VALUE. In this format the communities 1:0 through 65534:65535 are available for use. The convention is to use the ASN of your own network as the leading 16 bits for your internal communities and communities that you accept from and send to your customers.

When OSPF is used as PE-CE routing protocol, BGP uses extended communities to convey various OSPF attributes however there are a few exceptions (e.g. network type) to what attributes are conveyed.

Multi-homing

Border Gateway Protocol (BGP) is one of the key protocols to use to achieve Internet connection redundancy. When you connect your network to two different Internet service providers (ISPs), it is called multi-homing. Multi-homing provides redundancy and network optimization. It selects the ISP which offers the best path to a resource. When you are running BGP with more than one service provider, you run the risk that your autonomous system (AS) will become a transit AS. This causes Internet traffic to pass through your AS and potentially consume all of the bandwidth and resources on the CPU of your router.

Implement and troubleshoot scalability

Route-reflector, cluster

BGP requires that all iBGP speakers be fully meshed. However, this requirement does not scale well when there are many iBGP speakers. Instead of configuring a confederation, another way to reduce the iBGP mesh is to configure a route reflector. When the route reflector receives an advertised route, depending on the neighbor, it takes the following actions:

  • A route from an external BGP speaker is advertised to all clients and non-client peers.
  • A route from a non-client peer is advertised to all clients.
  • A route from a client is advertised to all clients and non-client peers. Hence, the clients need not be fully meshed.

To configure a route reflector and its clients, use the following command in router configuration mode:

Router(config-router)# neighbor ip-address | peer-group-name route-reflector-client

Whenever an IBGP route is reflected (propagated to another IBGP peer), the route reflector appends two optional, non-transitive attributes to the BGP route:

  • If the route does not have the Originator ID attribute (it has not been reflected before), the router ID of the IBGP peer from which the route has been received is copied into the Originator ID attribute.
  • If the route does not have the Cluster list attribute, it’s added to the route.
  • The value configured with the bgp cluster-id router configuration command (or the router ID of the route reflector if the cluster-id is not configured) is prepended to the Cluster list attribute.

Route reflector does not change or remove any other attributes of the reflected routes (even non-transitive attributes), ensuring that the iBGP routes are not changed within the autonomous system.

Confederations

The implementation of BGP confederation reduces the iBGP mesh inside an AS. The key is to divide an AS into multiple ASs and assign the whole group to a single confederation. Each AS alone has iBGP fully meshed and has connections to other ASs inside the confederation. Even though these ASs have eBGP peers to ASs within the confederation, the ASs exchange routing as if they used iBGP. In this way, the confederation preserves next hop, metric, and local preference information. To the outside world, the confederation appears to be a single AS.

In order to configure a BGP confederation, issue this command:

bgp confederation identifier autonomous-system

The confederation identifier is the AS number of the confederation group. The issue of this command performs peering between multiple ASs within the confederation:

bgp confederation peers autonomous-system [autonomous-system]

Further Reading

http://goo.gl/oQ2Etm

Aggregation, AS set

Border Gateway Protocol (BGP) allows the aggregation of specific routes into one route with use of the aggregate-address address mask [as-set] [summary-only] [suppress-map map-name] [advertise-map map-name] [attribute-map map-name] command. When you issue the aggregate-address command without any arguments, there is no inheritance of the individual route attributes (such as AS_PATH or community), which causes a loss of granularity.

Use of the as-set argument creates an aggregate address with a mathematical set of autonomous systems (ASs). This as-setargument summarizes the AS_PATH attributes of all the individual routes. These sample configurations enable you to examine this feature and how this argument helps BGP detect and avoid loops.

The Aggregate with the as-set Argument section shows you how to use as-set to save the AS_PATH attributes with a specific route. In some cases, you can require a change in the attributes of the aggregate route. Examples of such attributes include metric, community, and origin.

Implement and troubleshoot multiprotocol BGP

IPv4, IPv6, VPN address-family

Border Gateway Protocol (BGP) is one of the key protocols to use to achieve Internet connection redundancy. When you connect your network to two different Internet service providers (ISPs), it is called multi-homing. Multi-homing provides redundancy and network optimization. It selects the ISP which offers the best path to a resource. When you run BGP with more than one service provider, you run the risk that your autonomous system (AS) will become a transit AS. This causes Internet traffic to pass through your AS and potentially consume all of the bandwidth and resources on the CPU of your router.

The router using Multiprotocol BGP (MP-BGP) distributes the VPN routing information using the MP-BGP extended communities.

Router> show ip bgp

BGP table version is 5, local router ID is 200.200.200.1

Status codes: s suppressed, d damped, h history, * valid, > best, i – internal, r RIB-failure

Origin codes: i – IGP, e – EGP, ? – incomplete

 

Network        Next Hop       Metric LocPrf Weight Path

r> 6.6.6.0/24   10.10.13.3                   0          130      0 30 i

*> 7.7.7.0/24               10.10.13.3                   0          125      0 30 i

When BGP tries to install the best path prefix into Routing Information Base (RIB) (for example, the IP Routing table), RIB might reject the BGP route due to any of these reasons:

  • Route with better administrative distance already present in IGP. For example, if a static route already exists in IP Routing table.
  • Memory failure.
  • The number of routes in VPN routing/forwarding (VRF) exceeds the route-limit configured under the VRF instance.

In such cases, the prefixes that are rejected for these reasons are identified by “r RIB-failure” in the show ip bgp command output and are not advertised to the peers.

With Route Target Constraint (RTC), the RR sends only wanted VPN4/6 prefixes to the PE. The support is through a new address family rtfilter for both VPNv4 and VPNv6.

The Route Target (RT) filtering information is obtained from the VPN RT import list from all the VRFs on the PE router. The PE router sends this filtering information as a BGP update in the address family rtfilter to the RR. This filtering information or RT membership is encoded in the Network Layer Reachability Information (NLRI) of the MP_REACH_NLRI and MP_UNREACH_NLRI attributes. The receiving BGP peer translates this NLRI into a filter and installs this filter outbound to the sending peer. The receiving BGP peer uses this filter to decide which VPNv4/6 prefixes to send or not send, dependent upon the presence of attached RTs.

Further Reading

http://goo.gl/BzTJGh

Implement and troubleshoot AS path manipulations

Local AS, allow AS in, remove private AS

The local-AS feature allows a router to appear to be a member of a second autonomous system (AS), in addition to its real AS. This feature can only be used for true eBGP peers. You cannot use this feature for two peers that are members of different confederation sub-ASs. neighbor allowas-in command is issued in order to allow BGP at the other side to inject updates.

To remove the private AS number, use the neighbor x.x.x.x remove-private-as router configuration command.

The debug ip bgp updates command displays the received prefixes with its attributes from the neighbor.

Prepend

AS-path prepending is configured in Cisco IOS with route-map based per-neighbor outbound filter. The actual prepending is specified within the route-map with the set as-path prepend command.

Regexp

You can use regular expressions in the ip as-path access-list command with Border Gateway Protocol (BGP).

RegEx Description
? repeats the previous character one or zero times
* repeats the previous character zero or many times
+ repeats the previous character one or more times
^ matches the beginning of a string
$ matches the end of a string
[] is a range
_ matches the space between AS numbers or the end of the AS PATH list
\ is an escape character

“^[0-9]+$” regular expression string means routes originated in any directly connected single AS, or in other words, the routes directly originated by the peers of your AS.

Further Reading

http://goo.gl/zwXNPX

Implement and troubleshoot other features

Multipath

The BGP Multipath Load Sharing for eBGP and iBGP feature allows you to configure multipath load balancing with both external BGP (eBGP) and internal BGP (iBGP) paths in Border Gateway Protocol (BGP) networks that are configured to use Multiprotocol Label Switching (MPLS) Virtual Private Networks (VPNs). This feature provides improved load balancing deployment and service offering capabilities and is useful for multi-homed autonomous systems and Provider Edge (PE) routers that import both eBGP and iBGP paths from multi-homed and stub networks.

BGP synchronization

If your AS passes traffic from another AS to a third AS, BGP should not advertise a route before all routers in your AS learn about the route via IGP. BGP waits until IGP propagates the route within the AS and then advertises it to external peers. A BGP router with synchronization enabled does not install iBGP learned routes into its routing table if it is not able to validate those routes in its IGP. Issue the no synchronization command under router bgp in order to disable synchronization. This prevents BGP from validating iBGP routes in IGP.

Soft reconfiguration, route refresh

When the routing policy of a BGP neighbor changes, the session must be reset (cleared) for the changes to take effect. Because resetting a BGP session can be disruptive to networks, a soft reset method is recommended for reconfiguring the routing table.

Previously, in order to reconfigure the inbound routing table, both the local BGP router and the BGP peer first needed to be configured to store incoming routing policy updates using the neighbor soft-reconfiguration command. Additional resources, particularly memory, were required to store the inbound routing table updates. The clear ip bgp command could then initiate the soft reset, which generated a new set of inbound routing table updates using the stored information. This feature provides an additional method for soft reset that allows the dynamic exchange of route refresh requests and routing information between BGP routers, and the subsequent re-advertisement of the respective outbound routing table. Soft reset using the route refresh capability does not require pre-configuration and consumes no additional memory resources.

Describe BGP fast convergence features

Prefix independent convergence

The BGP Prefix Independent Convergence (PIC) improves convergence after a network failure. This convergence is applicable to both core and edge failures on IP and MPLS networks. You can use this feature to create and store an alternate path in the routing information base (RIB), forwarding information base (FIB) and the Cisco Express Forwarding (CEF). When a failure is detected, the alternate path immediately takes over, enabling fast failover.

These are the benefits of the feature:

  • An alternate path for failover allows faster restoration of connectivity.
  • Reduced traffic loss.
  • Constant convergence time so that the switching time is the same for all prefixes.

Add-path

The BGP Additional Paths feature allows the advertisement of multiple paths through the same peering session for the same prefix without the new paths implicitly replacing any previous paths. This behavior promotes path diversity and reduces multi-exit discriminator (MED) oscillations. BGP routers and route reflectors (RRs) propagate only their best path over their sessions. The advertisement of a prefix replaces the previous announcement of that prefix (this behavior is known as an implicit withdraw). The implicit withdraw can achieve better scaling, but at the cost of path diversity.

Path hiding can prevent efficient use of BGP multipath, prevent hitless planned maintenance, and can lead to MED oscillations and suboptimal hot-potato routing. Upon nexthop failures, path hiding also inhibits fast and local recovery because the network has to wait for BGP control plane convergence to restore traffic. The BGP Additional Paths feature provides a generic way of offering path diversity; the Best External or Best Internal features offer path diversity only in limited scenarios.

The BGP Additional Paths feature provides a way for multiple paths for the same prefix to be advertised without the new paths implicitly replacing the previous paths. Thus, path diversity is achieved instead of path hiding.

Next-hop address tracking

The BGP Support for Next-Hop Address Tracking feature is enabled by default when a supporting Cisco IOS software image is installed. Border Gateway Protocol (BGP) next-hop address tracking is event driven. BGP prefixes are automatically tracked as peering sessions are established. Next-hop changes are rapidly reported to the BGP routing process as they are updated in the Routing Information Base (RIB). This optimization improves overall BGP convergence by reducing the response time to next-hop changes for routes installed in the RIB. When a best-path calculation is run in between BGP scanner cycles, only next-hop changes are tracked and processed.

You can use bgp nexthop command to configure next-hop address tracking.

Describe and optimize BGP scale and performance

Tuning BGP transport mechanism is a very important factor for improving BGP performance in the cases where purely BGP-based re-convergence process is in use. TCP is the underlying transport used for propagating BGP UPDATE messages and optimizing TCP performance directly benefits BGP. If you take the full Internet routing table, which is above 300k prefixes (Y2010), then simply transporting the prefixes alone will consume over 10 Megabytes, not to count the path attributes and other information. Tuning TCP transport performance includes the following:

  1. Enabling TCP Path MTU discovery for every neighbor, to allow the TCP selecting optimum MSS size. Notice that this requires that no firewall blocks the ICMP unreachable messages used during the discovery process
  2. Tuning the router’s ingress queue size to allow for successful absorption of large amount of TCP ACK messages. When a router starts replicating BGP UPDATES to its peers, every peer responds with TCP ACK message to normally every second segment sent (TCP Delayed ACK). The more peers router has, the higher will be the pressure on the ingress queue.

When using BGP-only convergence mechanic, detecting a link failure is normally based on BGP KEEPALIVE timers, which are 60/180 seconds by default. It could be noted that TCP keepalives could be used for the same purpose, but since BGP already has similar mechanics these are not of any big help. It is possible to tune the BGP keepalive timers to be as low as 1/3 seconds, but the risk of peering session flapping become significant with such settings. Such instability is dangerous since there is no built-in session dampening mechanism in BGP session establishment process. Therefore, some other mechanism should be preferred – either BFD or fast BGP peering session deactivation. The last option is on by default for eBGP sessions, and tracks the outgoing interface associated with the BGP session.

Using BFD is the best option on multipoint interfaces, such as Ethernet, that do not support fast link down detection e.g. by means of Ethernet OAM. BFD is especially attractive in the platforms that implement it in the hardware. The command to activate BFD fallover is neighbor fall-over bfd.

Even though BGP NHT enables fast reaction to IGP events, the convergence time is still not deterministic, because it depends on the number of prefixes BGP needs to be processed for best-path selection. Previously, we discussed how having multiple equal-cost BGP paths could be used for redundancy and fast failover at the forwarding engine level, without involving any BGP best-path selection. What if the paths are unequal – is it possible to use them for backup? In fact, since BGP treats the local AS as a single hop, all BGP speakers select the same path consistently, and changing from one path to another synchronously among all speakers should not create any permanent routing loops. Thus, even in scenarios where equal-cost BGP multi-path is not possible, the secondary paths may still be used for fast failover, provided that a signaling mechanism to detect the primary path failure exists. We already know that BGP NHT could be used to detect a failure and propagate this information quickly to all BGP speakers, triggering local switchover. This switchover does not require any BGP table walks and best-path re-election, but simply is a matter of changing the forwarding information – provided that hierarchical FIB is in use. Therefore, this process does not depend on the number of BGP prefixes, and thus known as Prefix Independent Convergence (PIC) process.

Achieving fast BGP convergence is not easy, because BGP is a complicated routing protocol running overlay on top of an IGP process. We found out that tuning purely BGP-based convergence requires the following general steps:

  • Tuning BGP TCP Transport and router ingress queues to achieve faster routing information propagation.
  • Proper organization of outbound policies for achieving optimum update group construction.
  • Tuning BGP Advertisement Interval if needed to respond to fast “Down->Up” conditions
  • Activating BGP fast external fallover and possible BFD for fast external peering session deactivation.Tuning the underlying IGP for fast convergence. It is possible to tune the IGP even for large network to converge under one second.
  • Enabling BGP Next-Hop Tracking process for all BGP speakers and tuning the BGP NHT delay in accordance with IGP response time.
  • Applying IGP summarization carefully to avoid hiding BGP NHT information.
  • Leveraging IGP for propagation of external peering link failures, in addition to relying on BGP peering session deactivation.
  • Using the Add-Path Functionality in critical BGP speakers (e.g. RRs) to allow for propagation of redundant paths if supported by implementation.
  • Use BGP PIC or fast backup switchover in the environments that allow for multiple paths to be propagated – e.g. multihomed MPLS VPN sites using different RD values.

Further Reading

http://goo.gl/cbm1en