IT Management Solutions

25 May

In monitoring, the importance of baselining can’t be overstated. It’s all well and good that you are gathering utilisation data for memory, CPU, and other metrics, but without adequately calculated thresholds how do you really know what the performance is? Of course, you have the options of using global defaults, but the best practice approach is to place thresholds at the object layer. For the latter, these can be individual, statically set values, but there is a very useful alternative: Dynamic Thresholds.

Dynamic thresholds do not rely on a single, static value set on a case-by-case by individual users, nor do they depend on a single one-size-fits-all global default. These dynamic thresholds are calculated based on real metrics polled from the object in question. They change over time, adjusting depending on the values captured for the metric in question, with data figures relative to the last 7 days. This is of significant benefit, as the correlation to true usage data allows for anomalous increases in resources to be identified i.e. a deviation from what is ‘learnt’ to be normal.

Here, we are going to look at how we can take advantage of calculated baseline values within SolarWinds.

Manage SolarWinds Orion Thresholds

Clicking the ‘Use Dynamic Baseline Thresholds button above to insert ${USE_BASELINE_CRITICAL} into the field is rarely the right thing to do as it is just too simplistic in the values it will use. You will see below that there are more capable methods of creating an appropriate dynamic baseline.

SolarWinds Alerting Thresholds

When using dynamic thresholds, the background processes are doing the following:

  1. SolarWinds works out the mean average of the data points (the sum of the metrics divided the number of data points).
  2. SolarWinds calculates the standard deviation (the distance between the data points and the mean are calculated; these values are each squared, summed up, and then divided by the number of data points; the square root of this figure is the standard deviation).

We can reference these dynamic thresholds using the following variables:

Variable Description
${MEAN} The sum of statistics, divided by the number of data points
${STD_DEV}

Standard deviation is a measure of variation between a set of values; it is calculated as follows:

1: Subtract the mean from each data point and square each result

2: Total the results, and then divide by the number of data points

3: The square root of that value is the standard deviation

${USE_BASELINE_WARNING} This is the mean of the data points, plus 2 × the standard deviation
${USE_BASELINE_CRITICAL} This is the mean of the data points, plus 3 × the standard deviation


Already I’m sure you are thinking about the benefits of putting these in place. For many metrics, you might be more concerned with an atypical increase in the values being polled, rather than a single unchanging threshold.

For example, you might be monitoring the interface utilisation on a firewall. The utilisation can vary drastically depending on the time of year, the time of day, the number of users, and many other factors.

Alerting for this metric for some values should be based on sudden and unusual spikes, in the worst case perhaps indicating a DDoS attack or user profile that does not meet your available resources. Imagine that over the Christmas period many in the business are on holiday and activity through the firewall is reduced. If there is an unusual spike in connections, this might fall below a static threshold (likely set based on expected values for a busy period). However, dynamic thresholds would pick this up, informing you that the number of connections falls outside the norm for that period, warranting investigation.

Of course, the default warning and critical dynamic thresholds can be customised even further. Let’s say that the 3 × standard deviation is a bit tight for a particular metric. Perhaps only the very greatest jumps in values polled might be worth breaching a threshold. In this case, the ${USE_BASELINE_CRITICAL} can be replaced by other variables:

${MEAN}

${STD_DEV}

In this way, your critical threshold could contain the following formula to represent a threshold value of 10 × the standard deviation above the average:

${MEAN} + 20 * ${STD_DEV}

SolarWinds Alerting Thresholds 2

This gives you even more control over your thresholds than the ${USE_BASELINE_CRITICAL} variable – if you are going to break the mold with dynamic thresholds, don’t pass up the opportunity to apply your own calculation!

One other noteworthy part of dynamic thresholds for the Server and Application Monitor module is that the time period which is used for calculation can be modified:

  1. Go to Settings > SAM Settings
  2. Click on Polling Settings in the Thresholds & Polling section
  3. Scroll down to the Database Settings and adjust the days in the Baseline Data Collection Duration field
    1. Note: The Baseline Data Collection Duration cannot exceed the Detailed Statistics Retention

Hopefully, this article has proven useful to you for fine-tuning your environment for dynamically adjusting thresholds. When it comes to alerting, default thresholds which don’t account for the nature of your environment have a huge impact – I heartily recommend reviewing this great feature. We look forward to hearing your thoughts, and all expect to see you on Thwack to continue the dynamic SolarWinds discussion!

 

Training Courses for SolarWinds Customers

Prosperon Networks are the UK's leading authority on SolarWinds IT Management Solutions. We run training courses that suit a number of roles in your organisation, these courses cater for engineers, helpdesk operators and management personnel who all use monitoring platforms differently. The SolarWinds products retain their simplicity and ease of use, however product training in some form is recommended to get the most out of the tools we use every day.

 >>>Register Course Interest<<<