Data Cloud is one of the most elusive Salesforce products, marketed with its ‘magical’ capabilities. Recently, Salesforce announced they would be giving Data Cloud licenses for free (with limitations), so many organizations have been getting hands-on with this platform capability.
With any major part of the Salesforce platform you’re attempting to learn, there’s inevitably a learning curve. Part of getting started (and feeling comfortable) with the technology is understanding the terminology used and what it means in practice.
In this guide, we explain key terminology so you can understand data modeling concepts, and how data goes from one stage to another in Data Cloud, leading up to activating these perfected segments.
1. Primary Key vs. Foreign Key
These two data source concepts are fundamental to working with Data Cloud, as you will be streaming data sets from various sources where the data is not necessarily structured the same. If you have worked with Marketing Cloud data extensions, you are highly likely to already be familiar with this.
- Primary key: The field that distinguishes rows of data in a data set from one another. For example, the Salesforce record ID.
- Foreign key: Used to connect data between two unique tables or sources. For example, OrderID could be found on a customer record, and also in an order details dataset from your eCommerce site. The OrderID connects the two tables together.
2. Data Ingestion
Data Cloud is hungry for data, so ingestion is how we feed it. There are various ways to pump data into Data Cloud, including:
- SDKs (software development kits): These are kits that developers can use to get integrations set up faster. Examples include the Interactions SDK and Engagement Mobile SDK, developed by Salesforce to capture events from popular sources.
- Connectors: Also developed by Salesforce, these are pre-built integrations that connect other Salesforce products to Data Cloud, and are more ‘plug-in-and-play’ in nature.
- Ingestion API: The option for developers to build integrations from scratch, from data sources that aren’t covered by the SDKs or connectors.
3. Data Streams
Datasets (from various data sources) enter Data Cloud as data streams. The reason why these are ‘streams’ is that data is continually being pumped into Data Cloud according to the frequency you define.
This will be dependent on a) how fresh you need the data to be, and b) the capacity of the various APIs at work. For instance, you may need some data to be immediately available in Data Cloud (to send an instant, highly personalized email from Marketing Cloud), whereas other data is fine to be refreshed on a nightly basis.
Data streams fall into two broad categories:
- Real-time data streams: Data is updated immediately.
- Batched data streams: Data is batched, to be updated on a frequency, such as hourly, daily, etc.
See an example of a Marketing Cloud customer data extension here.
4. Data Source Object, Data Lake Object, and Data Model Object
Picture the Salesforce data model, where objects are relational to one another (i.e. one object is related to another in some way). These objects work together to capture and work with data that’s been ingested into Data Cloud. The three objects work in the order outlined above.
- Data Source object: The first stop for ingested data, where it sits waiting for further instructions in its original, raw format.
- Data Lake object: Where data can be mapped to other data sources, and transformations applied (we’ll come back to both mapping and transformations later).
- Data Model object: This object is the closest to how a Salesforce object functions, with standard objects are related to one another, with the ability to create custom ones, too. Unlike Salesforce objects, data is not stored in the object, rather data is referenced from the data lake object when a query is run.
5. Data Mapping (Mapping Canvas)
It’s highly likely that you would have seen the ‘magical’ mapping canvas in Salesforce’s demos at conferences – magical because of the animation of data being passed from one point to another.
This is the mapping canvas, which gives you a visual interface to see which data points are mapped together. When you have datasets from different sources, the same data point could have a different field name in one system to another.
In short, here you’re mapping data points from a data source object (the raw, ingested data) to a data lake object to make it usable for manipulation (i.e. transformations). You will be specifying the primary key, and match/reconciliation rules as part of this process.
6. Identity Resolution
Consider the variation that one individual customer could have when engaging on all the platforms in your organization. If my name is Rebecca, I could refer to myself as ‘Rebecca’, ‘Becky’, or ‘Becca’ depending on the situation or my mood, plus I could have moved house, and so have different mailing addresses – however, I would be the same person. Traditional deduplication rules may not pick this up, however, resolving this is one of Data Cloud’s strengths.
It’s important to note that Data Cloud doesn’t merge records – the records will still exist in the source system as they were when they were ingested. What Data Cloud is doing is compiling records together to render a ‘golden record’, which can be leveraged in the activation stage (coming later).
7. Match Rules
You may be familiar with Salesforce duplicate and matching rules, however, Data Cloud’s rules can take this further. There are two broad categories to how Data Cloud performs matching:
- Deterministic: There is no doubt that the data points belong to the same individual, even if there is a difference in capitalization (i.e. not case-sensitive). This is comparable to exact matches in Salesforce duplicate rules.
- Probabilistic: This is comparable to fuzzy matches in Salesforce duplicate rules. This caters to the nuances in how people represent themselves in their data, catering to abbreviations, nicknames, etc. These can be set to a precision level (high, medium, low) depending on how much freedom you want the match rules to have.
8. Reconciliation Rules
Again, comparing this concept to Salesforce deduplication, if you’ve ever used a third-party tool, you will know that you can apply certain ‘rules of thumb’ to help deduplicate in mass. This is to determine the value that should be used for a field, when there are multiple values that could be used. Reconciliation rules can use:
- Last updated
- Most frequent
- Source priority (next point)
Note: Reconciliation rules can only be used on certain types of fields (text, number, alphanumeric) – in other words, they can’t be used for phone or email fields.
9. Source Priority
As you just saw, source priority is one of the ‘rules of thumb’ that can be applied to a data point (i.e. field) when working with reconciliation rules. Out of all the data sources feeding into the data lake object, you can rank some sources above others, essentially telling Data Cloud that you trust one source more over another.
There are likely data sources in your tech stack that are prone to less accurate (even dummy) data, so you can push these to the bottom of the pile.
10. Unified Profile
The result of identity resolution is a unified profile. This is the ‘golden record’ that people have been striving for ever since the conception of CRM. These are not static, but will adapt as the streaming source data changes.
11. Calculated Insights
These allow you to create a new data point, to give further insight into data patterns. Like a Salesforce roll-up field, you can combine multiple data points to gain a new data point that can be used when activating segments.
Calculated insights can be built using the builder, or SQL. Examples can be found here.
12. Streaming Insights
Remember the difference between real-time and batched data streams? Insights work in the same way. Streaming insights should be reserved for smaller datasets where you need faster (if not, real-time) insights.
13. Activation Target
Activation is what we’ve been leading up to. There would be no point in going through the ingesting, modeling, mapping, reconciliation, and transformation to only have the perfected segment sit in Data Cloud.
Activation is the process of sending these ‘golden records’ to a destination where they can be used for highly personalized interactions. To not verge into jargon-like territory, we’ll give some examples of activation:
- Marketing Cloud: To kick-start a Journey Builder automated campaign.
- Advertising platforms: To have more relevant adverts displayed to the individual, and in turn, saving your advertising budget by not ‘throwing spaghetti on the wall, and seeing what sticks’ in terms of ad creative and messaging.
- Other cloud repositories: Such as Amazon S3, to have segments usable from that repository.
- Salesforce Sales/Service Cloud: (Note: This is a data target, which differs slightly from an activation target). To give reps and agents a granular event stream of what their prospects/customers are doing, in order to inform conversations.
14. Data Action
Similar to activation targets, data actions “send an alert or an event to a target based on streaming insights and engagement data to trigger an automation or data integration” (source). These are Salesforce platform events, webhooks, and Marketing Cloud.
As with any major part of the Salesforce platform you’re attempting to master, there’s inevitably a learning curve. We’ve taken you through Data Cloud’s key terminology so you can understand data modeling concepts, and how data goes from one stage to another in Data Cloud, leading up to activating these perfected segments.
Thanks to Eliot Harper for reviewing this guide.