Data Cloud / Data

7 Common Data Design Mistakes in Data Cloud And How to Avoid Them

By Mehmet Orun

In the rapidly evolving landscape of data management, Salesforce Data Cloud stands out as a single technology platform to connect, harmonize, unify, and activate data. That said, integrating data from multiple sources remains a challenge. Each data source has its own naming conventions, schemas, data content, and levels of reliability. 

Without effective data designs, organizations cannot realize Data Cloud’s full value. In the best-case scenario, these implementations become significant drains on time and credits (i.e. dollars).  While in the worst-case scenario, companies end up using untrustworthy data to make decisions.

In my experience as a data strategist, I’ve advised hundreds of customers on best practices for data design. Time and time again, I see Customer 360 implementations suffering from seven mistakes specific to data design. 

1. Not Using Deliberate Data Samples to Guide Data Model Designs

When starting any new Data Cloud project, be mindful of potential credit waste. You may be tempted to blindly land all of your data and start mapping, but this would be a mistake. Processing large volumes of data during the design phase will needlessly use credits and extend the time required to achieve results. Instead, I recommend working with not only a sample subset of records but, crucially, a deliberate sample. 

The foundation of any successful Data Cloud implementation lies in the accuracy and representativeness of the data sample used for design. Effective samples provide a proper representation of the characteristics, complexities, and business value proposition of the broader data set. Only a deliberate data set will provide correct, relevant insights for your analysis.

Do Not Rely on Random Data Samples

While random data samples might suffice for testing, they often fall short when transitioning to full-scale implementations. Random samples may overlook critical data patterns essential for informed data modeling, governance, mapping, transformation,  cleansing, and identity resolution decisions. 

Use Deliberate Data Samples 

Samples should capture a comprehensive snapshot of data patterns to guide decision-making. For initial proof of concepts, aim for a deep sample across a long historical period without going over one million records. I recommend snapshotting the current year plus 5-7 years (or 2-3 years of historical data for B2Cs) as a starting point. To narrow down larger data sets, filter further by geography. 

Going wide with data sampling involves including a broad range of data across various categories. For example, including Opportunities from different sectors like software sales, service, and events, regardless of the specific focus of the Data Cloud project. If the goal is to build a predictive model for upsell Opportunities in software sales, going wide would mean incorporating data from all these sectors into the analysis. When credit costs are a concern, reduce the time horizon (current, plus one year) to shrink the data volume processed.

2. Insufficient Source Data Analysis

The objective of Data Cloud is to unify your data. Overlooking data analysis can lead to significant discrepancies in the unified data model. This undermines the data’s reliability and usability, making it harder to build solutions on the DMOs.

To realize the full power of Data Cloud requires accurately mapping fields – and data values – into a single, comprehensive view. Merely mapping similar objects without considering their different fields and field values is a recipe for failure. We must reconcile the varying schemas, data content, and levels of data reliability present in different sources.

Do Not Make Assumptions About Data Content

Yes, with Data Bundles mappings and relationships can be set up automatically for data streams. However, just because multiple sources share the same field names does not imply they have identical configurations. This applies to standard fields in Salesforce too. The standard Account “Billing Country” field can exhibit significant variations in values across sources, even when considering data solely from Salesforce orgs.

Conversely, do not add fields from a DLO to the DMO data model, just because you have not seen the name before. Fields may have different names but represent the same data. This can even occur within the same data source.

For example, one org may use Account_Segment__C = Enterprise, while another uses Target_Market__C = EBU. Are these fields describing the same information? Should we consolidate them into a single field in Data Cloud?

Do Review Data Content to Inform Data Design

Use data analysis to accurately map and integrate semantically similar data. Native data profiling can help you identify overlapping field values across DLOs even when field names are different.

Field value distribution will identify where the different versions of the “same” value occur within the data set. Here we see that two sources spell out Country while another two sources use the two-digit ISO code. I recommend mapping all four sources to one target field using a standardized format.

 

Also, look for potentially overlapping content in your data. First, identify when multiple fields share common values. Then compare the values across these fields to determine if they represent the same concept.

In this example, Industry__c, Customer_Type_c__c, and Segment_c__C all share the value “Government”. I recommend a deeper analysis to decide the best way to map these fields.

Understanding which fields to unify and which to keep distinct simplifies your data modeling task significantly.

3. Oversimplifying Data Modeling Decisions

Data modeling goes beyond defining what records to capture and how to map them. Success depends on granular decisions about the data content itself. To truly unify data you need to spend time rationalizing discrepancies across your data sources. 

How will you standardize field names and types? How do fields relate to each other? You may also want to define formula fields as part of your data model. 

Do Not Skip Over Data Granularity

When assessing your data, it’s common to find different levels of data granularity in different sources. For example, the United Kindom is made up of England, Scotland, Wales, and Northern Ireland. In one source you may only be capturing “United Kingdom,” while another source may capture the four constituent countries. Relying on Data Cloud’s automatic field mapping will result in five values, which is simply wrong.

Do Determine the Right Level of Data Granularity to Persist in the DMO

When dealing with fields that categorize attributes, you must decide whether to maintain granularity or if values can be aggregated. Otherwise, you might be unable to meet certain use cases your stakeholders depend on. These decisions can also affect the bidirectional data integration from Data Cloud back to the original sources. You can always map Wales to the United Kingdom, but the reverse is not true. Not all UK citizens are Welsh. 

4. Skipping Data Governance in Data Standardization

Data governance is the comprehensive process of managing data across the enterprise. It ensures consistent policies and guidelines for integrating and utilizing information across various data sources, business functions, and use cases.

When implementing Data Cloud, you will need to resolve naming conventions and make other data standardization decisions. This is the perfect time to engage the data governance processes. 

Do Not Make Arbitrary Decisions on Standardization

Randomly selecting a field value for standardization can lead to user confusion and hinder long-term value. Avoid gambling with your data governance. 

Let Value Frequency Guide Design Decisions

Data consistency is essential to deliver reliable data.  Being able to explain what drove data standardization decisions is essential for adoption.

I recommend using data profiling insights to quantify actual field utilization. I recommend addressing data granularity concerns first, then standardizing on the most frequently used values within the data content. This approach will simplify decision-making and enable you to deliver business value faster.

5. Hard Coding Data Mappings 

Field mapping is performed in Data Mapper, but that’s only part of the equation. Data mapping involves matching data elements (field values) from multiple DLOs to corresponding elements in the DMO. Data mapping encompasses the identification, transformation, and reconciliation of data elements to ensure consistency and compatibility across source data content.

With Data Cloud the end goal of data unification is not just about standardizing the data DMOs. You also need to share the standardized data back to the source environments to support operational use cases.

Do Not Hard Code Source to Target Data Mappings.

It’s tempting to add your data mapping logic directly in your code, especially if you only have a few values to map– but this is a bad idea. Hard coding data mappings in this way means that every time you need to change a mapping you’ll have to update and redeploy the code.

Keep Data Mapping in a Crosswalk Table

Your data mappings should be easily accessible and searchable by anyone who needs to reference, use, or update them. I recommend using a crosswalk table to store the mappings, which the code can reference for mapping logic. When possible, build your mapping tables to support bi-directional mappings.

6. Not Identifying and Addressing Obvious Data Quality Issues

Nothing undermines credibility faster than a demo or training with blatantly bad data. You don’t want to spend time and credits only to identify your best customer is John Doe with email@email.com with the phone number 123-456-7890. 

Do Not Assume Values are Valid Because They are Frequent

A field value that shows up a lot in your data set is not necessarily valid. In fact, it may signal just the opposite. Know what else shows up in high frequency? Null values and commonly used fake data values.

Use Data Cloud’s Transformation Capabilities to Remove Proven Bad Data 

Before you turn on identity resolution rules, profile all source data for any field you may use in matching. Examine identifier and contact point fields to find commonly occurring fake data in your data sources. Pay special attention to field values that show up disproportionately. Look for situations where users may have compromised data quality by populating required fields with erroneous data.

As a best practice, I recommend using transformation to cleanse and normalize data. The resulting second-level DLO will have standardized data values free from fake email addresses, phone numbers, tax IDs, and other bad data.

7. Assuming Data Never Changes

If we polled Data Cloud consultants, none of them would claim data is static. So why do so many implementations ignore the critical step of implementing data trend monitoring to detect changes? Investing in native data monitoring with flow-based alerts, even at a small level, can yield significant dividends in terms of design resilience. Your future self will thank you for identifying unexpected changes that need to be addressed.

Do Not Make Design Decisions as if Data is Static

Data changes – ignoring this fact will only cause headaches in the future. 

Use Monitoring to Respond to Data Changes

Let’s go back to the data standardization example. You had five source values and mapped and transformed them to the data standard without hard coding. What if that same data source now includes a sixth value? Can you detect it and alert someone that a new mapping is needed? Resilient designs are backed by monitoring, so when unexpected data changes occur, you can respond to them quickly. 

If you are using native vs external data profiling solutions, you already have an advantage. Your profiling statistics are stored within Salesforce, where you can use Flow to take appropriate action. 

One more bonus to data trend monitoring: you can track the rate of growth and change of the underlying data.  This is the most important element in forecasting data credit needs for long-term planning. 

Steer Clear of Data Design Pitfalls With Data Profiling

You can easily avoid all of these data design mistakes with a better understanding of your data sources. By knowing where similarities and discrepancies exist, you can make better data design decisions and improve the resilience of Data Cloud. 

A native data profiling solution, like Cuneiform for Data Cloud, ensures secure and efficient source data analysis and ongoing monitoring. This eliminates the need for time-consuming, manual queries and spreadsheets that become increasingly difficult to maintain as more data sources are added.

The Author

Mehmet Orun

Salesforce Veteran and Data Management SME, working with Salesforce since 2005 as a Customer, Employee, Practice Lead, and Partner. Now GM and Data Strategist for PeerNova, an ISV partner focused on data reliability, as well as Data Matters Global Community Leader.

Leave a Reply