Admins / Architects / Data / Data Cloud

A Guide to Salesforce Duplicate Management in the Age of Data Cloud

By Mehmet Orun

Despite extensive resources on Salesforce duplicate management, Customer 360, and Data Cloud, organizations repeatedly encounter familiar challenges.

They initiate duplicate management without fully understanding their data, leading to errors. They believe that consolidating information into a single record is the solution, which often results in lost context and higher operational costs. They assess duplicates in isolation, without considering the broader Salesforce platform, thus missing potentially superior solutions. So how can these be mitigated?

Salesforce Match Rules

To effectively manage duplicates, understanding the fundamentals of matching rules is crucial. These rules apply to Salesforce’s solutions, third-party tools, and Data Cloud’s identity resolution features. You define criteria using two or more fields – if these fields match or are sufficiently similar, the records are considered duplicates.

Matching for fields like names can be specified as Exact or Fuzzy (e.g., Bob vs. Robert). Solutions like Salesforce Data Cloud intelligently handle formatting differences, recognizing similar phone numbers as identical despite variations in formatting (e.g. +1 415-555-1212 vs (415) 555-1212).

The design of your match rules significantly impacts the outcome. Using multiple fields can increase the accuracy of matches but also risk missing legitimate matches due to overly stringent criteria. Conversely, minimal criteria can lead to false positives, especially if data within the fields is unreliable or incorrect like commonly found ‘noemail@noemail.com’ or ‘idk@idk.com.’

Let’s take the following, simple example. You’ll likely have many more email or phone fields in your org or need to worry about regional or ethnic naming conventions (e.g. Anglican vs. Latin vs. Middle-Eastern).

You’ll need to evaluate the outcomes for different match rule scenarios.

Match Rule ScenarioOutcome
Exact First Name, Exact Last Name, Exact EmailNo records match. Leaving out fields that could have been used in matching leads to lost opportunities.
Fuzzy First Name, Exact Last Name, Exact Email, Exact Phone, Exact MobileNo records match. Comparing data only within the same fields means the denormalized data model, i.e. Phone vs. Mobile fields leads to lost opportunities.
Fuzzy First Name, Exact Last Name, Exact Email
Or
Fuzzy First Name, Exact Last Name, Exact Phone (normalized)
Or
Fuzzy First Name, Exact Last Name, Exact Address (normalized)
The first two records match even though the email addresses were different.

This may or may not be a correct match, depending on whether we know if the address or phone number are personal or corporate contact points.

The scenario you want to avoid the most is where three records somehow matched then merged and you lost the underlying details.

DO ensure that your match rules are comprehensive yet flexible enough to capture all potential duplicates without being overly restrictive. 

DO NOT create overly stringent rules that might exclude valid matches or fail to distinguish between unique entries.

Identify Fields That Are Impactful in Matching

If you’re a CRM admin or architect, familiarize yourself with the various contact points within Contact, Lead, or Account records. Beyond standard Email and Phone fields, additional URL or String fields might be useful for matching. Data profiling techniques, as detailed in this article, help you identify the most effective fields for matching by analyzing data types, distinct ratios, and PII classifications. 

Below is a Custom Dashboard based on Cuneiform for CRM data profiling statistics:

DO use data profiling to identify and utilize the most effective fields for duplicate matching. 

DO NOT overlook additional fields that could provide crucial matching data.

Once you have your set of fields, the next stage is to understand what kind of data is stored within them,  e.g. if we’re looking at a Contact or Lead record, is the Phone or Email field for the individual, associated Account, or a mix of both?

If there are clear rules you can put in place (e.g. for a given record type it’s 90% about the individual but for another record type it’s about the organization), depending on the technology you’re using, you can define more precise matching rules. 

Identify Field Values That May Be Problematic in Matching

Common issues like defaulting business Contact or Lead information to company addresses can increase the likelihood of incorrect matches. Mandatory validation rules that compel users to fill fields often lead to invalid entries, while users entering personal email addresses instead of official ones can further complicate data integrity.

Data profiling can uncover which field values frequently appear and guide efforts to clean up or reevaluate these entries.

Disproportionately frequent field values that may indicate invalid values based on data profiling analytics

DO identify contact point values that show up disproportionately in Phone, Email, or Address fields.  

DO assess and classify field values as invalid (verifiable junk), wrong context (about the organization vs. person), or valid.  

DO clean up your data in the system of record when possible. 

DO NOT perform these in production first and without a backup that you can recover from.

To Merge or Not to Merge

Deciding whether to merge duplicates depends on several factors:

  • If duplicates represent the same individual in different contexts or roles.
  • The accuracy of the matching outcomes.
  • The potential data loss from merging accurate matches.

In scenarios where merging could obscure or lose critical data, consider maintaining separate records or using a unified profile approach like Data Cloud’s Key Ring, which preserves original records while linking them to a unified profile.

DO carefully evaluate each potential duplicate case to determine the appropriate action. 

DO NOT rush into merging records without considering the broader implications for data integrity and user needs.

In the above example, we want to have a unified understanding of our engagement with Sam and/or Samantha Smith. However, we want to do this in a way where we don’t lose the various email addresses, phone numbers, or address details. If we merge the records, we’d need to choose what to keep. If we add even more email or phone number fields, this would quickly become unwieldy.

If you don’t have Data Cloud, you can use in-platform matching rules to identify records that appear related without merging. You can then use the match-link instead of the match-merge pattern to show the related transactional records based on the dedupe key. I’ve implemented this alongside experienced data and solution architects such as Alan Dray multiple times, so the approach works but may involve development you may not be willing to invest in. 

Data Cloud is your other productized alternative, where the keyring approach keeps the source records as it is, creating a unified profile, maintaining the source record to a unified profile relationship, and re-establishing the relationships as new information becomes available.

Data Cloud’s Unified Profile Approach – A Better Alternative to Merging Records

Creating a unified profile enables the maintenance of multiple contexts for the same entity, which is essential when dealing with the same person or organization across different scenarios. This approach allows for the easy mapping of various contact points from a denormalized record to a normalized profile.

Data Cloud implements match rules akin to those in Salesforce CRM but with a critical distinction: matched records contribute to a unified profile, and updates to source records reflect dynamically in this profile. For example, new information can correct associations automatically, ensuring accuracy in profile management. This method ensures that only verified contact points remain, leading to clean and reliable data profiles.

Let’s look at the practical implications of this data model for the Sam/Samual vs. Samantha example above.  

  • We can map any denormalized contact point field in our source data model to the normalized contact point model for matching.
  • Data Cloud will normalize phone numbers whenever possible, so it’ll provide a standardized, unified view from the various input formats, providing a complete and consistent data set.
  • As new information becomes available at the source record level, Data Cloud will re-match updated source records, continuously providing the highest quality unified profile possible.

Leading us to end up with two unified profiles:

  • Sam Smith, with the email sam.smith@acme.com, and personal mobile number +442012345678
  • Samantha (or Sam, depending on your data reconciliation rules) Smith, with two emails: samantha.smith@acme.com and ssmith@personalemail.com, and one personal mobile phone number +442012349876.

DO follow Data Cloud contact point mapping best practice guidelines to maintain data lineage while having robust, scalable logic for data cleansing or standardization.

DO NOT lose other valid contact point information, even if it’s irrelevant for matching, such as Business Phone or Address details.  

DO consider filtering out repeat values using a formula field in Data Streams.

Do You Really Need to Merge Records?

Often, the impulse to merge CRM records stems from the need for automation interfaces to correctly associate transactions to CRM records or CRM users complaining about not knowing which Intentional or Unintentional duplicate records to work with as they appear incomplete and inconsistent. It’s crucial to understand your data thoroughly before merging to avoid errors:

  • DO assess whether a singular record approach is feasible without introducing errors or losing valuable information.
  • DO NOT overlook the complexity required to correct or prevent data loss from previous integration mishaps.
  • DO consider shifting your automation and profile linking processes to utilize the unified profiles in Data Cloud.
  • DO NOT neglect the security protocols that necessitate data segregation, often preventing merging.
  • DO deliver a holistic view of the business transactions in your CRM using Data Cloud Related Lists.

Once I Have Cleansed My Data I Have Nothing to Match On

After you’ve cleansed your data and determined which fields are reliable for match rules, you might find that some records no longer have enough fields populated for effective matching. It’s crucial to identify and classify these records based on their matchability and importance to your business operations.

To identify if a record has valid or invalid contact points, you can implement count or categorization formulas. This is applicable both in Salesforce CRM and Data Cloud since they both support formula fields.

Use your data profiling insights to turn the observed patterns into your formula design. Use lengths and string patterns as appropriate to continuously identify when bad data shows up in your org.

DO utilize the matchability formula as a critical filter in your duplicate management strategy. This approach ensures that your efforts and resources are focused efficiently, only on records that can genuinely be aligned or reconciled.

DO NOT proceed with handling unmatched records without carefully assessing their relevance and necessity. For records that are essential but currently unmatchable, you should undertake targeted data enrichment efforts to improve their completeness and accuracy.

For records deemed important yet unmatchable, a targeted data cleanup initiative is essential. These records should be monitored over time using Data Quality KPIs to track improvements in their matchability score. This ongoing monitoring will help determine whether your data quality interventions are effective.

For records that are unimportant or consistently unmatchable, consider a robust data governance strategy involving archival or purging. Removing these records from active CRM systems can significantly streamline both the user experience and system performance while ensuring compliance with data retention policies. This approach not only cleanses your system but also optimizes it by reducing clutter and enhancing operational efficiency.

Summary

Tackling the complexities of duplicate records in Salesforce demands a nuanced understanding of your data. Effective duplicate management involves more than just merging records or employing tools; it requires strategic decisions about when to merge, when to create unified profiles, and when to enrich data to fill existing gaps.

Employing a thoughtful approach that includes assessing the matchability of records and considering the broader implications of data integration can significantly enhance your CRM’s functionality and data integrity. This holistic strategy ensures that every action taken contributes positively to the overall business objectives, enhancing user interactions, ensuring data compliance, and improving decision-making processes.

Enhance your understanding through data profiling, enrich your customer and partner knowledge with reliable data sources, and leverage these insights to guide your data governance decisions. Regularly back up and archive your data to maintain only what is useful and relevant, ensuring that what you deliver is always contextually relevant and compliant for your end users. Address unintentional duplicates in your system or record whenever possible and unify intentional duplicates in Data Cloud to have a complete understanding without taking on unnecessary risks.

The Author

Mehmet Orun

Salesforce Veteran and Data Management SME, working with Salesforce since 2005 as a Customer, Employee, Practice Lead, and Partner. Now GM and Data Strategist for PeerNova, an ISV partner focused on data reliability, as well as Data Matters Global Community Leader.

Leave a Reply