With more and more customers complaining about their
online experience, is network monitoring up to the job?

In the digital era, consumers have more choice than was ever imagined a little over a decade ago. The well-worn phrase, 'competition is only a click away', defines the sentiment that reinforces the expectation of both consumer and supplier alike. The supplier is excited by the simplicity and efficiency of reaching a global market. The consumer is inundated with the magnitude of available choice. The Internet delivers a model of intense expectation and competition on a global scale.

The rules are simple, businesses that fail to meet the customer expectation risk an exodus without comment; most customers, frustrated from poor service, do not complain, they simply move on to another competitor. Businesses that surpass customer expectation differentiate themselves from the competition in a heavily saturated marketplace. There are few exceptions; it defines the battle for the customer.

The Number One Complaint

I spend a large part of my working week helping a wide variety of customers, both consumers and providers, in the analysis of network connections that are failing to deliver an acceptable service. I receive ten or more requests per day and the number one complaint is always along the lines of, "I am having an application problem but my service provider states there is nothing wrong with the connection".

This level of response from any provider violates the cardinal rule of good service. If the customer is complaining then there is definitely a problem, real or perceived, makes no difference. Answering to the contrary is just plain bad customer service and only serves to frustrate.

Unfortunately this level of disconnect between supplier and customer is far too common. Conventional network monitoring solutions fail in the task of experience assessment because the core focus only measures the availability and utilization of resources deemed material for the service delivery. The problem being that the provider has no means to extrapolate the customer experience from typical monitoring events and, in the battle for the customer, the experience is the only measure that matters. No customer leaves a provider because a router is 80% busy or the latency is 70 milliseconds (ms), but many a customer will walk because the VoIP is garbled or the movie pixilated. The unprecedented growth of personal media devices and the explosion of real-time social applications, means the experience tolerance paradigm has dramatically changed. When it comes to social applications the user demands a first class experience. There are no acceptable half way measures and the provider's have no room for error.

Case Study

The following case study from a recent customer clearly demonstrates the dilemma of why a monitoring solution fails in a modern network founded on policies. The customer purchased an expensive private 20Mbps fiber optic service to connect a new distribution warehouse to the corporate HQ. The business application was critical because it fueled the distribution of the company's product to its customers.

The fiber connection was 100Mbps regulated to deliver 20Mbps. The problem reported to my office was that the application was failing in its purpose and testing the connection reported a throughput of only 1Mbps, considerably slower than the 20Mbps required.

The customer reported the poor performance results to the provider only to receive the response, 'our monitoring shows the connection is performing correctly, the problem must be your application not the network'. On presentation of the test results, the customer was greeted with a flat response that the results are 'bogus'.

Cutting the support issue short, the problem was triggered by a regulatory policy to contain the 100Mbps connection to the contracted 20Mbps. As long as data was at the 20Mbps level all the monitoring lights were green. What monitoring failed to detect, is that because the connection latency was only 2ms end-to-end, the pipe saturated to capacity with just 3 packets of data. As the TCP payload comprised many more packets than just 3, the excess data immediately triggered the regulatory policy which simply discarded the excess. Of course this policy exasperated the application performance problem because, as fast as data was discarded, it was retransmitted resulting is a payload that far exceeded the requirement for the application by several orders of magnitude.

Monitoring reported everything as fine because, from a pure bits and bytes perspective, it was. Latency at 2ms was perfect, data was flowing at 20Mbps, all lights were green. However, from user and application experience perspective, about 90% of what comprised the monitored 20Mbps service was unusable data because a high percentage of bytes were lost in transit as a result of the regulatory policy. In short the monitored utilization showed good but the application experience was bad. The provider's monitoring failed because it was unable to recognize that out of 20Mbps flowing only 1Mbps was usable data.

The biggest part of the problem was not specifically the poor throughput performance. The real problem occurred because the provider dismissed the customer's complaint based on monitoring results that failed in their purpose. The first rule of business is that the customer is king, by dismissing the customer's complaint and not spending the time to investigate and witness the issue, the customer was forced to tolerate a material business problem over a number of months which could have been solved in a number of hours with the right approach and the right tools. This problem could have easily resulted in the loss of the customer's business.

