How Non-Quality Data Can Cost Money

IntroductionWhen viewed from a high level, the cost ofpreventing knowledge workers from performing value
poor quality data can affect a company's bottom-lineadded work towards market share growth. Scarce
in two ways. First, there's the cost of scrap andhuman resources are often a bottleneck towards
rework, and second, missed opportunities.An exampleprogress, like running one more marketing campaign,
of scrap and rework costs might be when an agentdelivering insight in a product portfolio's performance,
errs in recording a customer's address details, andetcetera.2. Information quality assessment or inspection
consequently a marketing premium is sent to thecosts- People spend time in assessment processes
wrong address. Later, the customer calls towhen they are aware of suspect data quality; in any
complain.The complaint needs to be handled (extra calldatabase project, each and every file of questionable
center time), the address details then need to bequality needs to be inspected for data quality problems
entered a second time (rework), and a secondfirst.This time is irreplaceable, forever lost and never
premium needs to be sent. The initial premium isrecouped in any way. Merely assessing if data is of
scrapped.An example of missed opportunity costssufficient quality is specialist work. This requires access
might be a credit card that is not granted because theto scarce resources that are often a bottleneck
calculated credit score (erroneously) falls below thetowards progress.3. Information quality process
cutoff score, and the customer is rejected. Theimprovement and defect prevention costs-
opportunity to make a sale is lost, when marketingDevelopment costs to rework existing front-end
costs were already incurred.In this whitepaper, Iapplications; data entry applications need to enforce
attempt to supply a comprehensive list of potentialdata quality by performing validity checks, and
data quality costs.Cost Categories of Informationminimizing keystrokes and eye-hand movements. On
QualityThe costs of data quality can be broken downthe basis of usability findings, interface improvements
in 3 categories:1. Immediate costs of non-quality data.invariably lead to both higher efficiency and better data
This happens when the primary process breaks downquality.- Management attention to redefine
as a result of erroneous data. Or, information scrapaccountabilities and monitor improved information
and rework, when immediately apparent errors orquality; steering the organization towards higher data
omissions in the data need to be circumvented inquality requires changing accountabilities and
support of the primary business process. For example,continuously monitoring improvement. This topic will
data entry of a non-valid ZIP code requiresneed to stay high on management's agenda to create
back-office staff to look this up again and correct itlasting improvement.ConclusionProblems in data quality
before sending out a product.2. Information qualityoften go unnoticed. It can be both a source of process
assessment or inspection costs. These are costsinefficiencies (timeliness), as well as operational costs
efforts expended for (re)assuring processes work(direct and indirect losses). In neither of these cases is
properly. Every time a 'suspect' data source is handled,it apparent that improvement is possible from
the time spent to seek reassurance of data quality isenhancing data quality.One of the pernicious
an irrecoverable expense.3. Information quality processconsequences of suboptimal data quality is that the
improvement and defect prevention costs. Brokencost of poor quality data is usually hidden. Lack of data
business processes need to be improved to eliminatequality is not obvious to those not deliberately looking
unnecessary information costs. When a data capturefor it. Quantifying costs isn't always easy. What makes
or processing operation malfunctions, it requires fixing.the indirect costs of poor data quality so pernicious is
This is the long-term investment needed to avoidthat the relation between data quality problems and its
further losses.1. Immediate costs of non-qualityconsequences is non-obvious, and often only occurs
dataProcess failureFor example, capturing erroneouswith a substantial time delay. Therefore, the connection
customer data like address, contact information,between downstream consequences and poor quality
account details.- Irrecoverable costs; e.g. premiumsdata is often not made, and the problems are not
sent in vain to non-existing customer addresses.-attributed to their true cause.The cause of many
Liability and exposure costs; for instance credit riskdownstream data quality costs can easily remain
losses when data quality problems cause erroneouslylargely hidden (e.g. data quality), and therefore
offering credit to a customer who is not consideredinsufficiently subject to management attention and
creditworthy on the basis of self-supplied information.-intervention. Also, progress after improvement efforts
Recovery costs of unhappy customers; time spentis gradual, relatively slow, in large part 'cultural', and
handling complaints. Information Scrap and Rework-therefore difficult to monitor and track.Another, and
Redundant data handling; because many processesprobably the most significant problem caused by
are 'known' to rely on inaccurate data, it is customarypoor-quality information, is that it frustrates the most
for front-line and back-office staff to maintain littlevaluable resource of the company: its employees.
private "lists" of all sorts. These serve merely as aNon-quality information prevents knowledge workers
backup or improved version of what is available in thefrom performing their job effectively. On top of that, it
primary database. Apart from further problems likealienates customers because of wrong information
'maintenance' and 'recovery' not being possible forabout them, and to them. Customer data is the raw
these private lists, such activities are redundant, andmaterial that needs to be managed for what it is: a
non-value adding.- Costs of chasing missing information;strategic resource.Data quality is far more than
a field that has not been filled out properly, or not at all,accurate data entry. It stems from monitoring
needs to be looked up later on in the process. Excessdownstream data usage, maintaining comprehensive
time and costs, inefficiency, and not in the least placeand up-to-date meta data, and nurturing a corporate
an aggravation factor. Time spent looking up missingculture of naturally doing things right at the first attempt.
information is not being spent servicing the customerOnly then will knowledge workers learn to expect data
better.- Business rework costs; e.g. reissuing a creditquality, and enforce it because it's the natural thing to
card that was sent out with a misspelled customerdo. Letting data quality slide will promote a culture of
name.- Workaround costs; when a primary key isnegligence, and disdain for the use of one's most
missing or faulty, laborious fuzzy matches need to beprecious assets: customer information.The case for
performed to match records. This kind of work isaccurate source data is further underlined when one
challenging, and eats up precious time of the mostrealizes that the source in and of itself does little more
highly skilled database workers.- Data verificationthan support primary processes, which is fine.
costs; e.g. costs of reworking data entry. But also,However, the greater value to the organization comes
analyses by knowledge workers must begin byfrom enhancing these data, from deriving new
checking the correctness of data available beforeinformation from source data.The investment in
beginning analysis.- Program rewrite costs; rewritingimproving information quality is recouped several times
programs that fail to run because of invalid entriesin decreased costs, and improved value of information
found in the data. E.g.: sometimes pre- orto accomplish strategic business goals.Rapid access to
post-conversion scripts needed to be written to dealhigh quality data is the decisive factor in an
with the content of source systems prior to loading inorganization's ability to assess and adapt it's business
a Data Warehouse environment.- Data cleansing andmodel to changing market conditions. As corporations
correction costs; when feeds are processed to loadbecome ever more 'digitized', those that get a grip on
into the Data Warehouse, these data need to betheir data quality assurance processes can reap great
transformed for reasons that stem from quality issues.rewards. In a highly turbulent market this may well be
Any data cleansing and scrubbing that needs to bethe critical factor in determining the survivors in a
performed in the ETL process is essentially redundantcompetitive business, and therefore prove to be
and unnecessary insofar this is caused by faulty initialultimately priceless.ResourcesLarry P. English (1999)
data entry. For example, when a mailing is done on theImproving Data Warehouse and Business Information
basis of a problematic customer file, dedicated scriptsQuality: Methods for Reducing Costs and Increasing
need to be run to deal with the (known!) errors in theProfits. Wiley, ISBN 0- 471-25383-9Jack E. Olson
address fields. This process needs to be repeated for(2003) Data Quality: the Accuracy Dimension. Morgan
every mailing. Since such customer files are oftenKaufman, ISBN 1-55860-891-5Sid Adelman, Larissa
shared across departments and systems,sourceMoss & Majid Abai (2005) Data Strategy. Addison-
changes need to be negotiated with all end users ofWesley, ISBN 0-321-24099-5Article download "How
these data.- Data cleansing software costs; dataNon-Quality Data Can Cost Money"XLNT Consulting -
cleansing software (like Vality, Ascential, etc.) is usuallyTurning Data Into Dollars.Tom Breur: Biographical
very expensive. However, there's a tradeoff betweenSketchTom Breur is a consultant out of deep passion
scarce labor doing this 'by hand', and the fact that ETLfor his work. He can be profoundly analytic, in his
data quality software to help with such tasks typicallypassionate quest to drive out the deepest business
has very high license costs. Purchase may sometimesissues and the nexus point of a business model. It's all
prove remarkably economical when related to (oftenabout finding where the least effort will generate the
unseen) labor costs for manually improving datamost results.Once the business challenge becomes
quality.Lost and missed opportunity costs- Lostclear Tom loves to roll up his sleeves and get his
opportunity costs; when e.g. misspelling customer name'hands dirty'. Be it data analysis, market research, data
on the card causes the customer to not use their cardmining or database work. Once the hands-on work
(instead of calling up to complain about this) thegets started, his eyes begin to flicker, and he has a
business looses their future revenue.- Missedtendency to get carried away.Tom has an academic
opportunity costs; when unhappy customers directlybackground in Psychology, an education he took up
influence their social environment, they generatetwice. Initially he majored in Clinical Psychology (1986),
negative publicity. This will make it harder to sell toyears later he went back to college to study
people in the social network of displeased customers.-Economic Psychology (1996) with an emphasis on
Lost shareholder value; information quality puts a drainquantitative methods.Tom is fluent in Dutch, English,
on precious resources (scarce database experts),French and German.