Community Articles

Analyzing Network Data for Capacity Planning

Mike Johns, Product Research Engineer, NetQoS, Inc.

Network capacity planning is fundamental to network engineering. But despite (or perhaps because of) this, there is little agreement as to how it should be done. Ultimately, there is no magic bullet when it comes to forecasting the utilization of resources, but there does exist a variety of techniques, each with strengths and weaknesses, that can help identify trends.

Choosing Meaningful Metrics

Perhaps the most crucial decision to be made is the set of metrics to be analyzed. For example, depending on whether the focus is on finding existing bottlenecks or predicting future ones, different metrics may be applicable. Determining which network links may become congested in the future (but are not so now) can be done by trending utilization percentages. Figure 1 shows an ideal candidate for trending utilization. However, links that are already congested will show consistently high utilization numbers that oscillate, potentially leading to relatively a flat trend line (Figure 2). For a variety of reasons, particularly congestion control mechanisms within protocols, it is difficult using only this metric to distinguish whether such a link is over utilized or just well-utilized. Therefore, it may be useful in such cases to trend additional metrics, such as dropped packets, TCP window size, etc.

Capacity Planning

Figure 1: Utilization often shows a clear trend when relatively lowAnalyzing Network Data

Capacity Planning

Figure 2: High utilization values make finding trends more difficult

Analysis Methods

At any given instant, a network resource is either 100% utilized or 0% utilized; thus, any meaningful utilization data is aggregated over a period of time. Different techniques for doing this aggregation provide different advantages.

Averaging data is the most common and easily interpreted method. Averaged data can serve as a useful point of reference when looking at data created using other methods. Long-term averages tend to hide sustained activity at peak times because network usage patterns tend to vary cyclically. This results in, for instance, a link appearing to be 30% utilized when it is in fact 90% utilized during peak times and 1% utilized otherwise. Figure 3 illustrates such a situation. With averaging as the only technique in use, this situation is impossible to distinguish from a link that is 30% utilized all day. Filtering out certain hours of the day helps to alleviate this problem but also introduces additional complexity in determining which hours of the day are meaningful for a link.

Capacity Planning

Figure 3: Day/Night cycles can cause averages to skew low. This link is nearly 100% utilized through most of the day, but not busy at all during the night, causing the average to report as about 50% utilized.

Percentile analysis is particularly useful in capacity planning. The nth percentile of a data set is defined as the value within it such that n percent of the other values are below it. By examining, for instance, the 90th percentile of link utilization within a particular day, a network administrator can determine the value at which, for at least 10% of the day, the link is utilized at least that much. While averaging such data tends to lose information about temporary spikes in traffic, percentile analysis is sensitive to those spikes that have significant impact while retaining the desirable property of not overreacting to short-term anomalies.

This analysis method is susceptible to issues that arise when values are not evenly distributed because percentile values are taken from within the data set,as is often the case with network data. For example, consider a link (Figure 4) that is 90% utilized 10% of the time and 10% utilized for the rest. The 90th percentile, then, is 90%. The 89th percentile, however, is 10%. So depending on the particular percentile in use, the answer produced by this method can vary substantially. While in many cases percentile analysis has several desirable properties for capacity planning, it should be used with an awareness of this type of situation.

Capacity Planning

Figure 4: A contrived example demonstrating the pitfalls of using percentile analysis. The 90th percentile is 0.9, but the 89th percentile is 0.1.

Time-over-threshold analysis provides another angle for determining when a resource is becoming over utilized. This technique tracks the percentage of time in which the value of the metric is over a given threshold value. While the links in Figure 5 and Figure 6 have the same average, and the 90th percentile is actually less in Figure 6, time over threshold analysis will indicate that this link is more likely to actually provide a degraded user experience for at least part of the time.

Capacity Planning

Figure 5: This link is not likely to be causing performance degradation, but has a higher 90th percentile and higher average than Figure X,which is likely to cause degraded peformance for part of the time.

This can be resolved by examining time over threshold for both links, with a threshold set to 0.65.

Capacity Planning

Figure 6: This link has a low average and 90th percentile,but is likely to cause performance degradation towards the middle.

While no single technique provides a complete answer, using them together presents an opportunity for far more accurate analysis, each method compensating for the weaknesses of another.

Using Trend Lines

Given data analyzed in any of the above ways, we can perform a linear regression to detect trends. The line given by this technique can be used in several ways.

The slope of the line tells us the rate of growth (or reduction) in the data set. For example, this allows for the examination of which network links will be the most problematic in the long term. However, while ordering a list of links by slope tells us which are growing fastest, it does not necessarily tell us which will need to be upgraded soonest. For example, an interface that is currently 80% utilized with a slope of 1% per day will need to be upgraded sooner than an interface that is 10% utilized with a slope of 3% per day. While the second interface is growing faster and would eventually surpass the first if utilization numbers over 100% were possible, the first will hit its threshold earlier.

To account for this, we can use the trend line to determine the projected value of the metric at a given date. Ordering a list this way tells us, for example, which links are projected to be most utilized on the day at which purchase orders for new hardware are due. This method combines the effects of the rate of change and the current value to give a more complete picture.

Analyzing Constituent Data

Once a high level analysis has identified, for example, those network links projected to spend the most time over 90% utilization at a future date, the question immediately arises as to why this is happening. This can be answered by performing the same analysis on, for instance, the volume of data per protocol on a particular interface. This allows one to determine the applications most responsible for growth, whether there are any new applications that are projected to contribute substantially to future resource utilization, etc.

Such awareness is helpful in determining how resource utilization may be controlled in ways other than adding bandwidth. If windows media is projected to become the majority bandwidth consumer, for example, the answer may be bandwidth throttling, a QoS policy, revising or enforcing usage guidelines, or a variety of other possibilities that cost substantially less than upgrading the link.

Conclusion

While forecasting the future based exclusively on data from the past is a dodgy proposition, particularly when human behavior is involved, data easily produced by a variety of techniques can provide a basis for doing so with careful interpretation.

Learn more about network capacity planning


sitemap | legal | request info | contact

 

NetQoS - Network Performance Management products and services for the world's largest networks. © 2001-2008 NetQoS, Inc. All rights reserved.