Admins / Data

Ultimate Guide to Salesforce Data Quality and Data Cleansing

By Lucy Mazalon

Data management has always provided challenges. Poor quality data in Salesforce starts a vicious cycle where users mistrust the data, which hinders user adoption and further degrades data quality when users are not actively adding and updating records. It’s best to ‘nip it in the bud’ as soon as possible.

Salesforce data cleansing is no one’s favorite task (OK, maybe there are people out there who get a kick from Salesforce data clean-ups!). This is natural for growing organizations that have other aspects to focus on, such as their product/service and customer experience.

Over time, orgs will fill up with bad data, unused fields, and components that slow down page load times. Admins are tasked with projects that deliver what’s ‘shiny’ and new – but it’s important to take stock as poor data quality will impair how fast you can do digital transformation (it will come back to bite you when you least expect it).

Cleaning Salesforce data can be overwhelming, so knowing where to start is the objective of this guide. You’re not going to magically clean Salesforce data overnight. However, once you know how to identify good quality data from bad, there are steps you can start taking with routines to establish, too.

Data Quality Metrics

Understanding how your data quality looks currently is the first stage. This baseline will help you understand not only the extent of bad data, but also how much progress you have made with all the work you put in.

The Most Common Measures of Data Quality

MeasureDescriptionExample
CompletenessThe proportion of available records against potential records, or available values per record.100% of your customers have a record in the database. 80% of required fields are filled on a record.
UniquenessAvoiding duplicate records. Records that exist only once in the real world should be recorded only once in the database.The company ACME Inc. should be represented by one record, instead of having ACME Inc. and ACME as two separate account records.
TimelinessDoes the data at a point in time reflect the real world at that same point in time?When a sales rep enters his meeting notes only at the end of each week, the data has not been representative during the days in between (as there was a lag).
ValidityAdheres to the syntax of its definition, and is usable for its intended purpose.An email address has a syntax of local-part@domain. Both ‘local-part’ and ‘domain’ have specific guidelines.
AccuracyThe data correctly describes the real world.If your lead data states that Mike Wazowski is a Scare Executive at Monsters Inc., but he actually is the Chief Scare Officer at Monsters University, your data for the role and company fields is not accurate.
ConsistencyDoes the data represent the real world, even across multiple databases in your organization?If your CRM states that a customer was born in 1981, but your e-commerce system states 1980, your data is not consistent.

Later, we’ll come to the 1-hour data quality exercise you should be doing every week/month to keep a pulse on how the state of your data is changing over time (as advised by data quality application vendor, Plauti).

There is also what Chris Hyde, SVP of Technology at Validity, termed the 3 Cs – a take on data quality that prioritizes what we should be focusing on in the changing data privacy landscape – Compliance (plus Completeness and Correctness).

Finally, your data won’t be suffering from these poor measures equally; you may find that one/two are worse than others.

Data Capture (for Completeness and Timeliness)

Data can be captured at multiple points in your business processes. Both internal employees and external customers/prospects can contribute to the data collection effort.

The point here is that for completeness, you need to put guardrails in place to ensure that the correct data points, to the format you expect, are captured when required. The same or additional capabilities can be leveraged to make the process more straightforward. Here are some examples…

  • Validation rules: These rules ensure that records in your database adhere to the expected level of field data before the record can be saved (i.e. supporting completeness). This is conditional, meaning that the rules can be set up to only ‘kick in’ when the record meets specific criteria (see examples here). They are also beneficial to directing users to the key information that needs to be filled in at that point in time (i.e. supporting timeliness).
  • Flows: This type of automation can help ‘fill in the blanks’ when other information is known. Plus, they can be used to populate a data point that’s a combination of multiple pieces of information – a round-up into one key data point, if you like.
  • Screen flows: These are elements that can be added as part of an overall automated process (i.e. a Flow), giving users a slimmed-down interface (a screen or series of screens) that guides them to input data.
  • Activity Capture: Activities are any form of communication your teams have with customers/prospects. These are notoriously challenging to get users to record in Salesforce, yet they are an important part of data timeliness. For example, a user won’t want to continually add every email they’ve sent into Salesforce, so automated solutions that identify the recipient/s and automatically sync emails from Gmail/Outlook to Salesforce are worth the investment.
  • Progressive profiling: A webform technique that is a feature of most marketing automation platforms. When someone returns to a form, only specific form fields that they have not previously completed will be shown. This keeps fields to a minimum, increasing the chance they will complete the form while capturing additional information that’s up to date. Account Engagement (Pardot) has this capability, and you can build it in Marketing Cloud.
  • Integrations: Your organization will inevitably use other databases. Integrations between Salesforce and other databases may sync on a real-time basis or in batches (e.g. nightly). The sync frequency will impact the timeliness of your data.

When it comes to completeness, there’s a balance to strike. You need to have enough data to do business (for example, you need a company’s address in order to send someone a product). However, you don’t necessarily need the phone numbers and the email addresses of 50 executives or 200 employees of that company. This brings in the seventh data quality metric: compliance. You may not have permission to hold that data, and it may not be necessary for you to do business. There’s a growing regulatory pressure around the world driving this metric.

Deduplication (for Uniqueness)

Records that exist only once in the real world should be recorded only once in the database. This is deduplication – the practice of avoiding duplicate records. However, there’s no ‘one size fits all’ to how it should be done.

Duplicate records pose great risks to businesses. From misleading users when they have to sift through multiple records and skewed reporting to jeopardizing the customer/prospect experience if sales or service teams get their ‘wires crossed’ with colleagues who may already be working on that opportunity or case.

The first stage of tackling duplicates is to define what a duplicate record means to your organization, object by object. The criteria to identify a duplicate lead (e.g. email, company) may not be the same that determines a duplicate contact.

There’s also the concept of deliberate duplicates in orgs that are strictly partitioned, for example, by business unit or region.

Salesforce Duplicate Rules (and Matching Rules) are a good place to start. There are standard ones already set up when you purchase Salesforce. However, third-party applications take this much further and give admins an interface to define very granular deduplication rules (we’ll share some options later in this guide).

READ MORE: Complete Guide to Salesforce Duplicate Rules

Data Validation (for Validity)

Validation ensures that data adheres to the syntax of its definition and is usable for its intended purpose. The two prime examples of this are email addresses and mailing addresses.

  • Email addresses: Is the email you have going to reach the intended recipient? Emails are made up of two components: the ‘local-part’ (before the @) and the domain (after the @). The local part makes an email address unique to an individual user. Especially in B2B organizations, email addresses can become invalid – for example, when someone leaves their job at an organization, they don’t take their email address with them. Email verification, therefore, checks that both parts are valid.
  • Mailing addresses: You can verify that a postcode is correct if you have someone’s business address. However, typing out addresses is time-consuming and prone to human error (this is even before we get onto the topic of data formatting, which will be discussed shortly).

Data Enrichment (for Accuracy)

A few years back, data enrichment was a commonly used practice for organizations to bulk out their databases. These were providers that essentially provided a library of data that you could purchase, and sync to Salesforce (or any other database).

You may remember Data.com, a Salesforce product that has since been retired. The “Data Clean & Prospect” capability allowed you to filter down a database of companies according to whom you’d like to target and then compare your Salesforce (Accounts, Leads, and Contacts) data against the Dun & Bradstreet database.

However, there have been several occurrences in the data privacy landscape that have posed data enrichment as a questionable tactic for supporting data accuracy. Think GDPR, CCPA, and the death of third-party cookies.

Data hoarding was encouraged by data enrichment – in other words, holding any piece of data possible without a specified use for it.

You could consider it this way: should I have your phone number simply because we hold your first name, surname, and email address? There are data enrichment organizations that could provide the phone number, but do I have permission to hold it?

In some ways, data accuracy has become one data quality metric that has faded into obscurity in favor of completeness, timeliness, and compliance.

Standardization (for Consistency)

Data that doesn’t comply with the required format becomes a pain point when data needs to be utilized by users, a process, or an integrated system. Data that is not standardized will result in it being unusable or, in the case of automated processes, will cause errors.

Data standardization involves transforming your existing data to conform to required formats so the data is usable across your business processes. The standards will differ according to the system that needs to process it – or, you could say, the ‘cocktail of systems in your tech stack’.

READ MORE: Mastering Data Standardization for Your Salesforce Org

Again, mailing addresses are a common culprit here. These need to be both valid and standardized – for example, country names to be populated in the country field in Salesforce instead of ISO abbreviations.

8 Actions to Take Towards Better Salesforce Data Quality

1. Report Folders Organization and Naming Conventions

With users creating reports and dashboards, and having the ability to access, edit, and write to all folders, this area of Salesforce can become messy very quickly. It’s a classic example of data inconsistency and something that I’ve seen organizations struggle to keep under control.

As you will be using reports to monitor data quality, let’s get this sorted first…

  • Naming conventions: Define a structure for how reports should be named. Stacy wrote a comprehensive guide on naming conventions, which includes examples.
READ MORE: Salesforce Naming Conventions (+ How to Enforce Them)

But it doesn’t stop at reports and dashboards. Salesforce records (such as opportunities, email templates, etc.) require naming conventions. Consider what’s happening in other tools, like Account Engagement (Pardot).

READ MORE: Pardot Naming Convention and Account Organization Tips
  • Folders: Ensure only the users from the relevant departments are given access to their folders to stop cluttering or accidentally misplacing a report/dashboard.
  • Reports on dashboards: Make it clear which reports are used on dashboards (yes, you guessed it) with a naming convention that’s aligned to the dashboard, e.g. “Sales: Quarterly Revenue”.
  • The ‘colon trick’: Consider using Scott Hemmeter’s trick to reduce the length of your report names and possibly add more characters that would have otherwise been cut off.
  • Assign ‘report masters’: Super users can be an ally in ensuring everything is kept organized. These designated users will be the only ones to have the permissions to edit and save to public folders.

2. Declutter Page Layouts

An excessive number of fields means users are not populating them, resulting in incomplete record data.

While you may initially think a field is useful, it’s often the case that either these fields aren’t used or that they don’t need to be on every (or any page layout).

READ MORE: Guide to Page Layouts in Salesforce

There are a few ways to prevent and analyze poor field usage:

READ MORE: Salesforce Page Layouts Best Practices for Marketing Teams
  • Record types: If page layouts need to be distinctly different according to the type of record while belonging to the same object, e.g. different campaigns for “email” vs. “tradeshow”.
READ MORE: When to Use Record Types vs. Page Layouts?
  • Dynamic forms: Dynamically display specific fields according to the data in the record, e.g. when an Opportunity reaches a certain stage, you can have a set of fields appear that are only relevant to the current stage.
READ MORE: Salesforce Dynamic Forms: Overview & Deep Dive Tutorial

3. Create Validation Rules

Validation rules were already mentioned to support both completeness and timeliness. These are highlighted here, as they are a simple, low-effort step towards better data quality.

Verify that the data entered by a user meets certain criteria before the user can save the record. Salesforce Admins set up the rules as statements, which act like yes/no questions – the answer must be “no” to all of them. If not, an error message appears, where the admin has explained what the user must do to correct the record.

READ MORE: How to Use Validation Rules in Salesforce (+ Examples)

Of course, this ‘simple, low-effort’ tactic comes with caveats:

  • Be selective over when to enact validation rules; enforce too many, and you’ll frustrate your users.
  • Establish a naming convention, or you’ll no doubt get lost with your own configuration.

4. Set Up Monitoring Reports

Salesforce reports will give you a glimpse into most (if not all) of the data quality metrics we outlined. They can serve as a quick, ‘on demand’ way to check whether key fields are populated, where there are missing fields, or find duplicates using groupings for a field you consider unique (for example, email address).

READ MORE: Quick Check for Duplicates in Your Salesforce Database

However, there are also apps on the AppExchange and beyond that could extend this effort. We’ll cover these in a later section.

5. Data Governance

Data governance is not the same as data management. Data governance is a collection of principles and practices that govern the “data life cycle” – from the point at which data is generated, through to its usage, and finally, to archiving/purging (i.e. deleting).

READ MORE: Data Governance vs. Data Management in Salesforce

Setting up data governance is more than putting one-off tactics in place – it’s about building a cross-functional team that can bring everyone together while choosing the most important ‘stakeholders’. These could very well be the management layer (who has oversight and is working towards a specific vision).

For example, you could have the Salesforce Admin, the Sales Director (and/or a representative from their team). The same goes for marketing (director and/or representative) and customer service (director and/or representative). If you’re a larger company, certainly, members of the CISO team are essential.

6. Help Users Understand Why Data Is Important

We outlined the vicious cycle that Salesforce orgs can fall into – poor quality data breeds mistrust among your user base, which leads to poor adoption and inaccurate, inconsistent and stale data.

Getting your users onboard should never be overlooked. How you approach this will be different according to the target persona. However, there are small wins you can achieve – for example, producing insightful reports for the sales team that guides them to selling into account ‘whitespace’ more effectively.

While validation rules will enforce data entry, adding field tooltips can help; by hovering over the icon, the importance of that data point can be explained.

7. Establish Long-Term Data Cleansing Routines

Depending on the size of your database and the scope of your operations, you will come to a conclusion on how often you need to check up on your data quality.

Whether it’s weekly/monthly, you can spend an hour conducting a data quality assessment. While it may seem manual, it’s a good exercise to get you familiar with the state of your data as time goes on. Plus, it will help you to see patterns in data quality that will prompt you to amend any automated data manipulation (for example, adjusting deduplication rules).

READ MORE: The 1-Hour Salesforce Data Quality Assessment Exercise

On a quarterly/annual basis, you can tackle bigger-picture data quality issues. The two examples of ”spring cleans” are below:

8. Explore Third-Party Tools

Once you know the capabilities of Salesforce’s built-in data tools, you may find limitations. In that case, it’s time to explore other options on the market. Some partner applications have already been mentioned throughout the guide, but here is the summary as a jumping-off point for your exploration and evaluation:

Tools for Completeness/Timeliness

  • ZoomInfo Field Trip: We’ve looked at ways to prevent poor field usage, so here’s a way to analyze the extent of the problem. Field Trip will analyze fields by object, showing you the percentage of fields populated, data type, and required fields. You can zoom in by field and see how many records have data populated. Finally, run your own Salesforce reports on its findings.
  • Gridbuddy: If you’re struggling to have users update record data, how about you enhance their interface with all the data they need in a single, editable grid within Salesforce? With supercharged list view qualities, updating records becomes much faster, encouraging the input of complete and fresh data.
  • PipeLaunch: Pulls data in from LinkedIn ready for prospecting. This data is surfaced in a wide range of Lightning Web Components (LWCs) you can add to your Lightning Record Pages.
  • ZoomInfo SalesOS (similar to PipeLaunch).
  • Form builders: Sharing the data quality mission with your customers and prospects isn’t a negative move. Webforms enable these external users to keep their data up-to-date, improving their overall experience as your teams will have a better picture of who they are as individuals. Aside from adding validation that’s aligned with your own org’s rules, keep in mind that a valuable feature is the ability to have people create/update related records – for example, a contact can update their billing address (on the account) as well as their recent order details. This will reduce data management overheads, with data going to the correct destination, versus having to create funky workarounds in your CRM. Examples of webform platforms are FormAssembly and Formstack.
  • Revenue Grid: As we mentioned earlier, activities are any form of communication your teams have with customers/prospects, but they are notoriously challenging to get users to record in Salesforce. While known as a revenue intelligence platform, Revenue Grid also provides an activity capture tool that goes beyond Salesforce’s Einstein Activity Capture.

Tools for Uniqueness

  • DemandTools by Validity: A Salesforce Admin’s staple for over two decades. You’ll find supercharged deduplication capabilities, run assessments that are summarized with the color-coded “quality” bar, and much more.
  • Duplicate Check by Plauti: Another option to find and merge duplicates, Plauti’s admin interface, is especially worth checking out.

Tools for Validity/Consistency

  • Email verification: There are multiple tools on the market, most of which work on a pay-as-you-go, credits basis. Read our guide on the topic here.
  • AddressTools: Users have to complete address details manually. Based on the errors admins discover over time, they are burdened with data clean-up and creating and managing complex validation rules. AddressTools adds that important layer of validation, providing a better experience for both users and admins.
  • DemandTools by Validity: As well as deduplication, DemandTools can help data standardization by transforming data into the correct format that’s usable for your organization.

There are other prospecting and pipeline data quality tools listed in the guide below.

READ MORE: The Top 12 Tools to Maximize Salesforce Data

Summary

Data management has always provided challenges. Over time, orgs will fill up with bad data, unused fields, and components that slow down page load times – and it will come back to bite you when you least expect it.

Cleaning Salesforce data can be overwhelming, so knowing where to start has been the objective of this guide. While you’re not going to magically clean Salesforce data overnight, now you know how to identify good quality data from bad, and steps you can start taking with routines to establish, too.

The Author

Lucy Mazalon

Lucy is the Operations Director at Salesforce Ben. She is a 10x certified Marketing Champion and founder of The DRIP.

Leave a Reply