Data quality is more important than ever as organizations move from ‘gut-feeling’ to data-based decision making. As Forrester puts it: ‘Insight is the engine of today’s businesses. The value of a company can be measured by the performance of its data.’ As a manager you have to make decisions based on data every day; so, the one thing you want to know is if you can trust the data, as increasingly more is created every day.
In this article, we give you a quick and easy way to assess your data quality, in collaboration with a few team members – plus more background information. The method is based on the ‘Friday Afternoon Measurement’ from Thomas C. Redman – which I will come on to later.
Dimensions of Bad Data
Before you start it is important to recognize there are different dimensions to bad data. We follow the dimensions as composed by DAMA, the global community of Data Management Professionals.
Data completeness can both refer to the proportion of available records against potential records (for example, 100% of customers have a record in the database) as well as available values per record (for example, 80% of required fields are filled on a record).
Each thing in the real world should be recorded only once in the database. The company ACME Inc. should be represented by one record. Deduplication solutions will help you with this issue.
Does the data at a point in time reflect the real world at that same point in time? For example, when a sales agent enters his meeting and call notes only at the end of each week, the data is not timely for the previous days.
Data is valid if it adheres to the syntax of its definition. An email address has a syntax of local-part@domain. Both local-part and domain have specific guidelines.
Accuracy is about the question if the data correctly describes the real world. If your lead data states that Mike Wazowski is a ‘Scare Executive’ at ‘Monsters Inc.’, but he actually is ‘Chief Scare Officer’ at ‘Monsters University’ your data for the fields ‘role’ and ‘company’ is not accurate.
Consistency describes how identical more representations of the real world in separate databases are. If your CRM states that a customer is born in 1981 but your e-commerce system states 1980, your data is not consistent.
Data Quality Assessment – ‘Friday Afternoon Measurement’
Now we know how to recognize different types of bad data, it is time to look at the data quality assessment, by following the process as described by Thomas C. Redman:
- 30 minutes: meeting preparation
- 1 hour: data assessment meeting with the team
- 1 hour: individual data assessment
- 30 minutes: analysis of the results
Plan a short meeting (max. 1 hour) with a few members of a department you want to gather input from. For this meeting, we will focus on completeness, timeliness and validity. Uniqueness, accuracy and consistency require more time to assess, and will be checked individually.
Step 1: Prepare
Prepare the meeting by gathering 100 random data records of the object(s) the department works with. Export these to an Excel or Google Spreadsheet sheet.
Step 2: Focus Columns
In the meeting, the first step is to identify the 10-15 most important columns. Delete the other columns and add an extra ‘Field A Correct, Field B Correct, etc.’ column after each column. You will get columns like ‘First Name Correct, Last Name Correct, Email Address Correct, etc.’. Also, add a final column called ‘Perfect Record’.
Step 3: Label the Records
Walkthrough the records one by one. Mark incorrect fields in the columns you added with the type of bad data. When a field on a record is marked as incorrect, also mark the column ‘Perfect Record’ with a ‘No’. You have now gathered all data you need from the meeting.
Step 4: Accuracy, Consistency & Uniqueness
With help from your colleagues, you have gathered information on data quality regarding completeness, timeliness and validity. In Step 4, we will assess uniqueness, accuracy and consistency.
Accuracy & Consistency
From your 100 random records, take a subset of 20 records. Compare these records to the latest information from the real world; for example, do a LinkedIn search to see if a contact or lead has an accurate associated company and role. If your organization stores data on the same object in different datasets, do a comparison between these records. Update your spreadsheet with your findings.
You cannot assess uniqueness within your dataset of 100 random records. Chances are very slim that if a duplicate record exists, both are selected in your 100 random records.
To assess uniqueness, we recommend running a basic duplicate finding job on your objects. Do this with the basic out-of-the-box Salesforce Duplicate Management feature or with a more advanced solution.
To get an indication of the number of duplicates, just use a basic scenario. The real number of duplicates is probably higher, but it will give you an indication.
Step 5: Analyze
Analyze the results you gathered in Step 3 and Step 4. With a simple pivot table, you can see which fields and which type of bad data should be addressed first. The duplicate number has to be assessed separately.
Step 6: Improve
Do not try to improve all objects, fields and bad data types at once. Choose a few to start with and continue from there.
The first step is to do a root cause analysis. Find out how the data entered the system and who touched the data afterwards. Try to intervene as early in the process as possible (it is better to validate input in a web form than to fix it in your CRM). Improving your data is possible without spending huge sums by training your employees and implementing solutions for validating, formatting and deduplicating your data.
Business demands change over time. Plan the 3-hour data assessment as a recurring event to make sure you stay ahead of the curve. Get free Duplicate Check, Record Validation trials and advice from our experts to get the most out of your data quality efforts.