Beautiful Numbers


It's the interrogation - asking the right questions of your data - that is the key - Dr Inna Kolyshkina, Manager, PwC

John Forbes Nash Jr, the man at the centre of the Hollywood film A Beautiful Mind, has a gift for numbers. In the film his character stares at a wall of complex code data and sees patterns and associations invisible to others. “He saw the world in a way that no one could have imagined,” runs the movie tagline.

Today a profession is emerging which has a similar ability to discern trends and see patterns within the massive stores of data that organisations hold about transactions and people. It’s called predictive business intelligence - a name that distinguishes the discipline from the kind of two-dimensional statistical work that sometimes goes by the name of data mining.

As Dr Richard Brookes, actuary, mathematics PhD and director for financial risk management at PricewaterhouseCoopers observes: “Once you have those enormous databases, you have the information that you need to actually predict the behaviour of customers.”

Dr Inna Kolyshkina, another maths PhD working in data mining at PwC, explains that in the more simplistic forms of data analysis, theoretical understanding of cause and effect is achieved by constructing elegant models based on certain assumptions.

Interrogating the data

“What is happening today among advanced practitioners is very different,” she says. “Forget about elegant models. What we are doing is taking massive data sets - gigabytes or even terabytes of raw data - and then combining the full grunt power of modern computing technology with intelligent interrogation of the data.

“We are searching for the best possible, real-life predictive model, out of hundreds of thousands of possibilities. To get the right answer we have to ensure that data is being used to its full potential - and today’s technology makes the search both rapid and exhaustive. But it’s the interrogation - asking the right questions of your data - that is the key.”

Similarly advanced data mining is now being used to identify disease-causing proteins within human cells for targeting by new drugs. But in fact the type of complex questions that highly trained statistical analysts ask of biological systems can be asked of almost any large data set - or business system.

As a result, many businesses and government agencies are now acquiring unprecedented powers of highly targeted analysis for purposes ranging from assessing the loyalty and profitability of customers, to uncovering fraud and even potentially preventing disasters such as terrorist attacks.

At the moment, applications are particularly prevalent in mature markets, such as banking, energy and telecommunications, where companies tend to grow at the expense of their competitors. But the potential of advanced data mining to deliver powerful strategic insights, as shown in the WorkCover case discussed below, applies to almost any business with a large customer database.

Transforming CRM

“Advances in data mining are transforming, among other things, traditional customer relationship management (CRM),” explains Richard Brookes. “The traditional approach will usually show the actuarial present value of the future profits that are expected to be generated by different groups. Acquisition and retention strategies, including pricing, are based on such group profiles.

“But the traditional approach tends to focus on the product more than the customer: it will usually not show, for example, how a loss-making customer today may be highly profitable tomorrow; or how a loss-making customer with respect to one product may be profitable with respect to certain others. By contrast, the modelling fire-power of advanced data mining enables a company to ‘see’ and analyse the behaviour of individual consumers.”

Without such fire-power, many insurance and telecommunications companies are struggling to offer discounts to customers for buying multiple products (‘bundles’). Because they cannot automatically ‘see’ their customers, they must search manually or ask their customers to tell them that they own multiple products.

“Corporations are fairly aware of who their customers are,” says PwC data management specialist Suzanna Mladenovic. “What to do with that awareness - and how to get it at a much more granular level - is the challenge.” By delivering such ‘granular’ insights, intelligent interrogation of data helps unlock product synergies and target resources.

WorkCover: a model approach

One close-to-home illustration of the power of the more advanced forms of data mining is provided by work recently completed by PwC at WorkCover NSW. This is bringing significant benefits to NSW workers and some 360,000 employers across the State.

WorkCover integrates responsibilities covering prevention of worker injury, rehabilitation and compensation into a single authority that is funded through a levy on workers compensation premiums. Under this system, industry bears the direct cost of occupational health and safety services and the management of the system. The authority works closely with numerous licensed insurers that issue and administer workers compensation insurance policies on its behalf. These insurers collect premiums, inform the employers of their responsibilities and administer most claims.

The rising costs of the scheme - reflected in the current deficit in the WorkCover insurance portfolio of some $2.9 billion - have for some years been a matter of concern. The implementation of provisional liability in 2001 went a long way toward improving outcomes by reducing costly disputes. The number of claims classified as “not yet determined” fell dramatically, from 13-14 per cent prior to 2001 to 3-4 per cent in 2002-03. But the concerns remain.

To further improve the operation of the portfolio, the authority last year commissioned PwC to help it develop a model known as a statistical case estimation model (or SCE model). Similar but simpler models are traditionally used for actuarial reserving. However after 18 months’ work, the use of advanced data mining techniques has produced a model that will in fact deliver far wider benefits for NSW employers and the millions of employees covered by the scheme.

Trials of the new model show that it can predict individual case costs more accurately than traditional methods. The authority will now be able to answer questions such as how much a new claim, such as a broken ankle or a bad back, is likely to cost over time.

While the traditional claims process can explain 19 per cent of statistical variability within claims, the new model can explain 49 per cent of such variations.

“You could use this as the basis for setting the estimates on claim files - and setting premiums - across the scheme,” says Rob Thomson, Assistant General Manager of WorkCover’s insurance division.

Most significantly, the model will not only enable WorkCover to improve claims handling: it will also clearly show employers how and why they should change behaviours for their own benefit and that of their employees. “It’s not just a case estimating model,” says Thomson. “It has the potential to be an influencer of behaviours in the marketplace.”

WorkCover now hopes that by better understanding the costs of specific types of claims, and being able to analyse them at the level of individual workers and employers, it will be able to educate employers to prevent problems occurring or improve remediation processes when they do.

“The model will enable the case officer to understand what some of the key potential cost drivers are relative to a specific type of injury. It will give them a flavour for the things they need to look for as they manage a claim,” says Thomson.

PwC’s Richard Brookes points out that this is likely to produce further reductions in costly disputes: “We would hope that the model will mean less argument about costs and more focus on getting individuals back to work,” he says. “There should also be better correlations between employers’ claims and premiums.”

The model should also give insurers an objective means of estimating the value of future claims based on the known cost of past events. “At the moment, individual employers can put a lot of pressure on individual claims officers through case reviews of their files before renewal,” says Thomson. “Using a statistical case estimation model takes that sort of attempt to influence the outcome away from it.”

Implement strategies

Predictive business intelligence requires application of a continuum of data mining disciplines ranging from data processing to high level statistical analysis, allied with industry knowledge.

Fine-tuning the model - and the data

“Study the past if you would define the future,” said Confucius. The value of a good model is that by giving a better understanding of the past, it can be used to predict the future. The more accurate the model, and the data within it, the better the crystal ball.

“The thing that has impressed me with the model we’ve developed with PwC is the fairly high degree of certainty and credibility that it seems to have relative to what we’ve seen elsewhere,” says Thomson. “That’s a credit to the work that PwC has done because it’s fair to say that the quality of data across our scheme is a little dubious.”

The new WorkCover model holds about five years’ of data about claims, including the eventual cost of payouts and the efficacy of remediation approaches. To make it more useful, Thomson’s team is now focused on improving the quality of this data and the information collected.

“Data quality is one of the key things now on our agenda,” he says. “It is about getting the data right at the source. If this model is going to have continuing value it has to be updated, reviewed and monitored to ensure that it remains current.”

WorkCover is certainly not the only institution with data quality problems. In fact, most organisations are grappling with poorly maintained and inconsistent data sets. PwC’s Suzanna Mladenovic says that a study by the firm of 600 large organisations worldwide found less than 50 per cent had complete confidence in the quality of their own data. Less than 20 per cent trusted data passed on by other organisations.

Notwithstanding the well-known “garbage in, garbage out” computer principle, new data mining techniques can often ‘see’ patterns and associations even among messy data. The main tendencies are captured from within the “noise” of such data by using efficient, robust, self-testing algorithms and various forms of artificial intelligence such as decision trees, neural networks, clustering and link analysis.

Because some algorithms quickly become obsolete in this very rapidly developing field, there is a need to ensure that the methodology is up-to-the-minute. According to Inna Kolyshkina, a form of hybrid modelling - using successive layers of models to interrogate the data - is often most efficient as a way of producing increasingly accurate answers.

The process of choosing the right modelling techniques, and refining the selected model, will usually itself reveal issues of data structure and quality that need to be addressed, or suggest other forms of data that may be more predictive.

It is here that communication with the business owners becomes critical. In the WorkCover case, Richard Brookes’ team had to engage in extensive open dialogue with business owners to ensure that the model being designed was accurate and would have application in the real world. Rob Thomson describes the approach of the PwC team as “open and candid, and willing to listen and learn.”

Referring to the involvement in the project of PwC statisticians, data quality experts and insurance industry specialists, Thomson says the model “has had a lot of discussion and a lot of peer review internally within PwC - it hasn’t been done in a vacuum. If they’re prepared to put in these extra resources to check that things stack up, it gives you confidence. They’ve also taken into account feedback from us so it has a certain amount of business ownership within WorkCover.”

But more than communication skills are required: success in this field depends upon the willingness of operational, IT and strategic groups to work together towards a common goal. That’s not always as easy as it sounds: as Richard Brookes observes, “It can be hard because these groups sometimes tend not to talk to each other and are not used to jointly owning a project which delivers value at the combined level.”

But the necessity of improved communication across a company may be seen as another of data mining’s indirect benefits.

Thomson says the next step at WorkCover - after the data issues are addressed - will be to introduce the model as an everyday, working tool: “We need to take it on board inside the business - rather than just on the technical side - and sit back and think, ‘Now we have this tool, how can we best use it to improve the scheme?’ Then we can move forward in a more positive and proactive way.

“I think there are probably some areas that we haven’t even thought of yet where we can benefit from this.”

For further information contact PwC partner Conor O’Dowd
conor.odowd@au.pwc.com
Tel: + 61 2 8266 2625


© 2008 PricewaterhouseCoopers. All rights reserved. PricewaterhouseCoopers refers to the network of member firms of PricewaterhouseCoopers International Limited, each of which is a separate and independent legal entity.
Accessibility information Skip navigation Countries online