| IntroductionWhen viewed from a high level, the cost of | | | | preventing knowledge workers from performing value |
| poor quality data can affect a company's bottom-line | | | | added work towards market share growth. Scarce |
| in two ways. First, there's the cost of scrap and | | | | human resources are often a bottleneck towards |
| rework, and second, missed opportunities.An example | | | | progress, like running one more marketing campaign, |
| of scrap and rework costs might be when an agent | | | | delivering insight in a product portfolio's performance, |
| errs in recording a customer's address details, and | | | | etcetera.2. Information quality assessment or inspection |
| consequently a marketing premium is sent to the | | | | costs- People spend time in assessment processes |
| wrong address. Later, the customer calls to | | | | when they are aware of suspect data quality; in any |
| complain.The complaint needs to be handled (extra call | | | | database project, each and every file of questionable |
| center time), the address details then need to be | | | | quality needs to be inspected for data quality problems |
| entered a second time (rework), and a second | | | | first.This time is irreplaceable, forever lost and never |
| premium needs to be sent. The initial premium is | | | | recouped in any way. Merely assessing if data is of |
| scrapped.An example of missed opportunity costs | | | | sufficient quality is specialist work. This requires access |
| might be a credit card that is not granted because the | | | | to scarce resources that are often a bottleneck |
| calculated credit score (erroneously) falls below the | | | | towards progress.3. Information quality process |
| cutoff score, and the customer is rejected. The | | | | improvement and defect prevention costs- |
| opportunity to make a sale is lost, when marketing | | | | Development costs to rework existing front-end |
| costs were already incurred.In this whitepaper, I | | | | applications; data entry applications need to enforce |
| attempt to supply a comprehensive list of potential | | | | data quality by performing validity checks, and |
| data quality costs.Cost Categories of Information | | | | minimizing keystrokes and eye-hand movements. On |
| QualityThe costs of data quality can be broken down | | | | the basis of usability findings, interface improvements |
| in 3 categories:1. Immediate costs of non-quality data. | | | | invariably lead to both higher efficiency and better data |
| This happens when the primary process breaks down | | | | quality.- Management attention to redefine |
| as a result of erroneous data. Or, information scrap | | | | accountabilities and monitor improved information |
| and rework, when immediately apparent errors or | | | | quality; steering the organization towards higher data |
| omissions in the data need to be circumvented in | | | | quality requires changing accountabilities and |
| support of the primary business process. For example, | | | | continuously monitoring improvement. This topic will |
| data entry of a non-valid ZIP code requires | | | | need to stay high on management's agenda to create |
| back-office staff to look this up again and correct it | | | | lasting improvement.ConclusionProblems in data quality |
| before sending out a product.2. Information quality | | | | often go unnoticed. It can be both a source of process |
| assessment or inspection costs. These are costs | | | | inefficiencies (timeliness), as well as operational costs |
| efforts expended for (re)assuring processes work | | | | (direct and indirect losses). In neither of these cases is |
| properly. Every time a 'suspect' data source is handled, | | | | it apparent that improvement is possible from |
| the time spent to seek reassurance of data quality is | | | | enhancing data quality.One of the pernicious |
| an irrecoverable expense.3. Information quality process | | | | consequences of suboptimal data quality is that the |
| improvement and defect prevention costs. Broken | | | | cost of poor quality data is usually hidden. Lack of data |
| business processes need to be improved to eliminate | | | | quality is not obvious to those not deliberately looking |
| unnecessary information costs. When a data capture | | | | for it. Quantifying costs isn't always easy. What makes |
| or processing operation malfunctions, it requires fixing. | | | | the indirect costs of poor data quality so pernicious is |
| This is the long-term investment needed to avoid | | | | that the relation between data quality problems and its |
| further losses.1. Immediate costs of non-quality | | | | consequences is non-obvious, and often only occurs |
| dataProcess failureFor example, capturing erroneous | | | | with a substantial time delay. Therefore, the connection |
| customer data like address, contact information, | | | | between downstream consequences and poor quality |
| account details.- Irrecoverable costs; e.g. premiums | | | | data is often not made, and the problems are not |
| sent in vain to non-existing customer addresses.- | | | | attributed to their true cause.The cause of many |
| Liability and exposure costs; for instance credit risk | | | | downstream data quality costs can easily remain |
| losses when data quality problems cause erroneously | | | | largely hidden (e.g. data quality), and therefore |
| offering credit to a customer who is not considered | | | | insufficiently subject to management attention and |
| creditworthy on the basis of self-supplied information.- | | | | intervention. Also, progress after improvement efforts |
| Recovery costs of unhappy customers; time spent | | | | is gradual, relatively slow, in large part 'cultural', and |
| handling complaints. Information Scrap and Rework- | | | | therefore difficult to monitor and track.Another, and |
| Redundant data handling; because many processes | | | | probably the most significant problem caused by |
| are 'known' to rely on inaccurate data, it is customary | | | | poor-quality information, is that it frustrates the most |
| for front-line and back-office staff to maintain little | | | | valuable resource of the company: its employees. |
| private "lists" of all sorts. These serve merely as a | | | | Non-quality information prevents knowledge workers |
| backup or improved version of what is available in the | | | | from performing their job effectively. On top of that, it |
| primary database. Apart from further problems like | | | | alienates customers because of wrong information |
| 'maintenance' and 'recovery' not being possible for | | | | about them, and to them. Customer data is the raw |
| these private lists, such activities are redundant, and | | | | material that needs to be managed for what it is: a |
| non-value adding.- Costs of chasing missing information; | | | | strategic resource.Data quality is far more than |
| a field that has not been filled out properly, or not at all, | | | | accurate data entry. It stems from monitoring |
| needs to be looked up later on in the process. Excess | | | | downstream data usage, maintaining comprehensive |
| time and costs, inefficiency, and not in the least place | | | | and up-to-date meta data, and nurturing a corporate |
| an aggravation factor. Time spent looking up missing | | | | culture of naturally doing things right at the first attempt. |
| information is not being spent servicing the customer | | | | Only then will knowledge workers learn to expect data |
| better.- Business rework costs; e.g. reissuing a credit | | | | quality, and enforce it because it's the natural thing to |
| card that was sent out with a misspelled customer | | | | do. Letting data quality slide will promote a culture of |
| name.- Workaround costs; when a primary key is | | | | negligence, and disdain for the use of one's most |
| missing or faulty, laborious fuzzy matches need to be | | | | precious assets: customer information.The case for |
| performed to match records. This kind of work is | | | | accurate source data is further underlined when one |
| challenging, and eats up precious time of the most | | | | realizes that the source in and of itself does little more |
| highly skilled database workers.- Data verification | | | | than support primary processes, which is fine. |
| costs; e.g. costs of reworking data entry. But also, | | | | However, the greater value to the organization comes |
| analyses by knowledge workers must begin by | | | | from enhancing these data, from deriving new |
| checking the correctness of data available before | | | | information from source data.The investment in |
| beginning analysis.- Program rewrite costs; rewriting | | | | improving information quality is recouped several times |
| programs that fail to run because of invalid entries | | | | in decreased costs, and improved value of information |
| found in the data. E.g.: sometimes pre- or | | | | to accomplish strategic business goals.Rapid access to |
| post-conversion scripts needed to be written to deal | | | | high quality data is the decisive factor in an |
| with the content of source systems prior to loading in | | | | organization's ability to assess and adapt it's business |
| a Data Warehouse environment.- Data cleansing and | | | | model to changing market conditions. As corporations |
| correction costs; when feeds are processed to load | | | | become ever more 'digitized', those that get a grip on |
| into the Data Warehouse, these data need to be | | | | their data quality assurance processes can reap great |
| transformed for reasons that stem from quality issues. | | | | rewards. In a highly turbulent market this may well be |
| Any data cleansing and scrubbing that needs to be | | | | the critical factor in determining the survivors in a |
| performed in the ETL process is essentially redundant | | | | competitive business, and therefore prove to be |
| and unnecessary insofar this is caused by faulty initial | | | | ultimately priceless.ResourcesLarry P. English (1999) |
| data entry. For example, when a mailing is done on the | | | | Improving Data Warehouse and Business Information |
| basis of a problematic customer file, dedicated scripts | | | | Quality: Methods for Reducing Costs and Increasing |
| need to be run to deal with the (known!) errors in the | | | | Profits. Wiley, ISBN 0- 471-25383-9Jack E. Olson |
| address fields. This process needs to be repeated for | | | | (2003) Data Quality: the Accuracy Dimension. Morgan |
| every mailing. Since such customer files are often | | | | Kaufman, ISBN 1-55860-891-5Sid Adelman, Larissa |
| shared across departments and systems,source | | | | Moss & Majid Abai (2005) Data Strategy. Addison- |
| changes need to be negotiated with all end users of | | | | Wesley, ISBN 0-321-24099-5Article download "How |
| these data.- Data cleansing software costs; data | | | | Non-Quality Data Can Cost Money"XLNT Consulting - |
| cleansing software (like Vality, Ascential, etc.) is usually | | | | Turning Data Into Dollars.Tom Breur: Biographical |
| very expensive. However, there's a tradeoff between | | | | SketchTom Breur is a consultant out of deep passion |
| scarce labor doing this 'by hand', and the fact that ETL | | | | for his work. He can be profoundly analytic, in his |
| data quality software to help with such tasks typically | | | | passionate quest to drive out the deepest business |
| has very high license costs. Purchase may sometimes | | | | issues and the nexus point of a business model. It's all |
| prove remarkably economical when related to (often | | | | about finding where the least effort will generate the |
| unseen) labor costs for manually improving data | | | | most results.Once the business challenge becomes |
| quality.Lost and missed opportunity costs- Lost | | | | clear Tom loves to roll up his sleeves and get his |
| opportunity costs; when e.g. misspelling customer name | | | | 'hands dirty'. Be it data analysis, market research, data |
| on the card causes the customer to not use their card | | | | mining or database work. Once the hands-on work |
| (instead of calling up to complain about this) the | | | | gets started, his eyes begin to flicker, and he has a |
| business looses their future revenue.- Missed | | | | tendency to get carried away.Tom has an academic |
| opportunity costs; when unhappy customers directly | | | | background in Psychology, an education he took up |
| influence their social environment, they generate | | | | twice. Initially he majored in Clinical Psychology (1986), |
| negative publicity. This will make it harder to sell to | | | | years later he went back to college to study |
| people in the social network of displeased customers.- | | | | Economic Psychology (1996) with an emphasis on |
| Lost shareholder value; information quality puts a drain | | | | quantitative methods.Tom is fluent in Dutch, English, |
| on precious resources (scarce database experts), | | | | French and German. |