IT Management Solutions

20 Aug

BGP (Border Gateway Protocol) is one of the most famous protocols that we have in networks. This protocol allows us to propagate network prefixes through the internet, and it is commonly used in big corporations and ISPs. Furthermore, it is one of the biggest sources of problems when we talk about reachability issues. This is not a surprise, as BGP is considered the most complex routing protocol.

In this blog post we are not going to talk about BGP itself (you have some links here and here to get a better understanding of the protocol itself), but about the basics of monitoring BGP within your company network, and more specifically, using SolarWinds Network Performance Monitor (NPM) for this task.

Based on my experience as SolarWinds engineer (currently), and network engineer (in the past), I would divide how to apply basic BGP monitoring into three sections:

  • Peer Status
  • Prefixes Received
  • Routes

 

Peer Status

This is probably the most simple way to monitor BGP,  and probably the most commonly used ways to do so. In order to exchange network prefixes with the rest of BGP peers, first of all, the router needs to establish a connection with them. If the status of the BGP neighbourship is not ‘established’, no exchange of routes will occur.  There are two ways to get the status of the BGP peers: by polling some SNMP OIDs; or by receiving SNMP traps from the routers. Polling is the most important option, as this method actively asks for the status information, so we always know what the last polled state was. With SNMP Traps, we are waiting to be told of a change and on its own is not enough to know what the current state is, but is very good at telling us in real-time that neighbour changes have occurred.

 

SNMP Polling (Active Collection)

SolarWinds has the built-in capability to monitor the status of the BGP peers. Basically, SolarWinds gets the value of the OID 1.3.6.1.2.1.15.3.1.2, which shows the status of the BGP peers.  To activate this feature, list resources on the router (Settings > Manage Nodes > Select device > List Resources) where you want to monitor BGP neighbours and tick the BGP neighbours option.

SolarWinds BGP

When this feature is enabled on a device, by default a view resource will appear on the node details page of the devices where this feature is enabled showing the status of the BGP peers, along with other information.

Routing Neighbors

 

SNMP Traps (Passive Collection)

SolarWinds monitors neighbour status every 5 minutes by default, which, for most of the situations, is enough. However, in some other situations, this frequency will not give us all the information at the speed we need. Using SNMP traps allows us to get the information as soon as it happens, and also can give us some extra information.

How to enable BGP traps on Cisco

How to enable BGP traps on Juniper

There are two main SNMP traps that we want to receive when monitoring BGP:

  • Backwards transition: this trap is issued when the BGP has a new status ‘lower’ than the last one. For example, if the peer goes from Established to Idle.
  • Established State: this trap is issued when a BGP peer reaches established status
  • State change: this trap will be issued every time there is a change in the peer state, either backwards or forward.

All these three SNMP traps are interesting, however, there can be the situation when some of them are sent for the same event. For example, if there is a backwards transition (established to idle) this will trigger the backwards transition trap (obviously), and the state change trap as well (there has been a change on the state of the neighborship). Or a similar situation with established state and state change.

For most of the devices compatible with BGP, there is only one specific OID  that will be sent for Backwards transition, however, some vendors, such as Cisco or Juniper among others, have some other specific SNMP Trap that will be sent along with the default one. This happens because, even though there is a specific MIB branch under the standard branch (1.3.6.1.2.1) which contains event message support for the BGP protocol, which in theory will support all of the messages related to BGP events. However, there are some devices which have events not supported in this ‘shared’ MIB structure and therefore, have another BGP MIB branch under the private and vendor enterprises branch (1.3.6.1.4.1) allowing them to extend the event and polling structure beyond the shared standard branch.

Vendor

Branch

Event

OID

All

Standard branch

Backwards Transition

1.3.6.1.2.1.15.0.2

All

Standard branch

Established State

1.3.6.1.2.1.15.0.1

Cisco

Private branch

Backwards Transition

1.3.6.1.4.1.9.9.187.0.2

Cisco

Private branch

State Change

1.3.6.1.4.1.9.9.187.0.1

Cisco

Private branch

Established State

1.3.6.1.4.1.9.9.187.0.5

Juniper

Private branch

Backwards Transition

1.3.6.1.4.1.2636.5.1.1.1.0.2

Juniper

Private branch

Established State

1.3.6.1.4.1.2636.5.1.1.1.0.2

NOTE: there are other SNMP traps available in some of the vendors, however, the ones above are the most important ones.

We have mentioned before that the SNMP Traps from the private branch normally extend the information available, compared to the traps from the standard branch. Let’s have a closer look. For example, these are the backwards transition traps that a Cisco device will send when these events occur.

Standard branch SNMP Trap:
TRAP:           CES-BGP-DEFAULTS-MIB:bgpTraps.0.2 :
Last Error:     bgpPeerLastError.192.168.10.101 = BAA=,
Current Status: bgpPeerState.192.168.10.101 = idle(1),
Device Up Time: sysUpTime = 14 days 16 hours 6 minutes 34.39 seconds,
Device IP:      experimental.1057.1.0 = 192.168.10.103,
Trap Origin:    snmpTrapEnterprise = CES-BGP-DEFAULTS-MIB:bgpTraps

Cisco branch SNMP Trap:
TRAP:           CISCO-BGP4-MIB:cbgpBackwardTransition :
Last Error:     bgpPeerLastError.192.168.10.101 = BAA=,
Current Status: bgpPeerState.192.168.10.101 = idle(1),
Last Status:    cbgpPeerPrevState.192.168.10.101 = established(6),
Reason:         cbgpPeerLastErrorTxt.192.168.10.101 = hold time expired,
Device Up Time: sysUpTime = 14 days 16 hours 6 minutes 34.39 seconds,
Device IP:      experimental.1057.1.0 = 192.168.10.103,
Trap Origin:    snmpTrapEnterprise = CISCO-BGP4-MIB:ciscoBgp4MIB

As you may have noticed, the Cisco branch trap gives you a little bit more information, in this case, previous status and last error.

It is important to review and confirm which branch your device generates SNMP Traps for (Standard or Private) and if both utilise the Private branch as this is likely to have more information within it than the Standard branch message. The following link provides information on creating alerts within SolarWinds Orion:

How to create an alert for Traps in SolarWinds

 

Prefixes Received

When peering with ISPs, one of the common issues that we might have stopped receiving prefixes from the ISP router. This can be a big problem because it might be unnoticed if we only monitor the status of the BGP neighbourship.

It is also a problem when the ISP router advertises too many prefixes, as our router might start to receive more routes than the router memory can take. If this same router is peering internally with other routers that also perform critical routing functions within the network, this overhead could lead to a bad outcome for network function.

The management branch of the BGP MIB file does not contain an OID that allows us to monitor this metric, therefore we have to rely on the private branch of each vendor. This means that some vendors may give us this information and some others may not, so a review is always necessary to determine if and what the OID will be. A

On the table below you will find the main metrics that we recommend to monitor via SNMP active polling.

Metric Description
Accepted Prefixes Allows us to know how many prefixes have been received from the BGP peer. If the number of prefixes is 0 for a long time (2 hours) this might indicate a problem with the peer.
Prefix Threshold During the configuration of BGP on the Cisco router, we have the option to define a threshold (in %). Once the threshold is reached, the router will send a trap advertising the amount of prefixes received from a peer has exceeded the threshold. We can monitor this value in SolarWinds in order to create our own automation processes.
Maximum Prefixes Allowed This gives us the total amount of prefixes allowed on this neighbour. One of the actions, when the limit is reached, is to bring down the BGP peer connection.
Advertised Prefixes This monitors the prefixes we are advertising. This is important to monitor in order to know whether we are advertising too many prefixes or not enough.

Cisco is one of the vendors that will give us the most of the metrics we need. Depending on how BGP is configured in the router, the OIDs might differ. This depends on where you are using basic BGP or with address families.

If BGP is configured without address families (basic BGP), then the OIDS are the following:

Accepted Prefixes

1.3.6.1.4.1.9.9.187.1.2.1.1.1

Maximum Prefixes Allowed

1.3.6.1.4.1.9.9.187.1.2.1.1.3

Advertised Prefixes

1.3.6.1.4.1.9.9.187.1.2.1.1.4

 

Otherwise, if BGP has been configured with address families, then the OIDs are the following:

Accepted Prefixes

1.3.6.1.4.1.9.9.187.1.2.4.1.1

Maximum Prefixes Allowed

1.3.6.1.4.1.9.9.187.1.2.4.1.3

Prefix Threshold

1.3.6.1.4.1.9.9.187.1.2.4.1.4

Advertised Prefixes

1.3.6.1.4.1.9.9.187.1.2.4.1.6

 

BGp Preferences Accepted

In the above screenshots, you can see these values output in Orion in chart form, as this allows us to see the level of activity within the protocol the device is seeing.

Example of basic configuration for Cisco devices:

neighbor 192.168.10.101 maximum-prefix 500 80

  • neighbor IP address is 192.168.10.101
  • maximum number of prefixes allowed are 500
  • when the number of prefixes received is over 80% of the maximum (500x80% = 400)

To demonstrate the differences between vendors and how available data can be different, here we are using Juniper routers and only have the options to monitor received and advertised prefixes.

Accepted Prefixes

1.3.6.1.4.1.2636.5.1.1.2.6.2.1.7

Advertised Prefixes

1.3.6.1.4.1.2636.5.1.1.2.6.2.1.10

 

These are the Universal Device Pollers (UnDPs) that can be imported into SolarWinds.

How to import UnDP

>>>DOWNLOAD UNIVERSAL DEVICE POLLER - JUNIPER BGP<<<

>>>DOWNLOAD UNIVERSAL DEVICE POLLER - CISCO BGP (BASIC)<<<

>>>DOWNLOAD UNIVERSAL DEVICE POLLER - CISCO BGP (AF)<<<

 

Routes

On this particular topic, there are two main areas that we should monitor: flapping routes, and AS path.

Flapping Routes

Monitoring flapping routes are not exclusive to BGP, we should monitor flapping routes for each single routing protocol such as OSPF or EIGRP as well. The good news here is that SolarWinds can monitor this out of the box, just make sure you are monitoring the routing table when you list resources on a router (see List Resources section above).

Top !0 Flapping Routers

 

AS Path

The other important metric here is the AS path. In order to know the route that the packets are following to reach a particular subnet, BGP uses the property AS path, determining the Autonomous Systems that the packet will go through to reach the destination. It is important to monitor the existing AS paths in order to detect any type of DDoS attack, hijacking as these are methods used to exploit the BGP protocol or merely to know the route our traffic will take.

In Cisco, we can monitor the AS path using the following OID 1.3.6.1.4.1.9.9.187.1.1.1.1.8  and for Juniper, it is 1.3.6.1.4.1.2636.5.1.1.3.5.1.4

If you are testing this metric with Cisco, it is necessary to convert the default HEX format of the output into a format which is more human readable. This can be performed using the SQL query within the Orion widget; Custom Table. 

>>>DOWNLOAD SQL QUERY AS PATH<<<

AS Path

NOTE: this SQL script has been only tested for 16-bit AS numbers, not for 32-bit AS numbers. It also only includes up to the third AS, however, it could be edited to work with 32-bit AS numbers and more AS in the path.

And that’s all I wanted to share with you guys and gals.  I hope this has been informative for you, and don’t hesitate to contact me with any question or ideas that you may have regarding the use of SolarWinds.

 

Training Courses for SolarWinds Customers

Prosperon Networks are the UK's leading authority on SolarWinds IT Management Solutions. We run training courses that suit a number of roles in your organisation, these courses cater for engineers, helpdesk operators and management personnel who all use monitoring platforms differently. The SolarWinds products retain their simplicity and ease of use, however, product training in some form is recommended to get the most out of the tools we use every day.

 >>>Register Course Interest<<<