Data Cloud / Artificial Intelligence / Consultants

Salesforce Data Transforms: What Is This Key Component of Data Cloud?

By Eunice Wong

Data Transforms is a relatively new function released on Data Cloud in the summer of 2023. Yet, with little information on its function within the Data Cloud-focused content on Trailhead, what exactly are Data Transforms? And how can you use them to enhance your Data Cloud solution?

As emphasized by Salesforce and any data user, the data used to feed the model must be clean, consistent, and well-structured to ensure that AI can produce the best possible and most accurate outcome. Data Cloud is no different, as well-prepared and organized data are crucial for its success in enabling Data Cloud’s key functionalities in profile unification, reconciliation, segmentation, and activation functionalities. While this may seem a relatively simple and a general understanding of good data practice, things are often easier said than done… 

Unprepared data is a barrier to many organizations that may cause them to “miss out” on the AI revolution. Fixing this may be a costly endeavor requiring additional data engineers and software platforms, but what if your AI platform allows you to prepare your data, too? Now you can, with the Data Transforms functionality on Data Cloud.

What Are Data Transforms?

Data Transforms promise to enable data transformations in near real-time on Data Cloud and allow more functionality to transform your data on the platform itself, making room for the ability to curate even more advanced use cases and solutions on Data Cloud. There are two types of data transforms: streaming and batch. They differ in both functionality and scheduling of transformation processing. 

Batch transforms offer more functionality and allow for more complex calculations, such as creating data aggregations, data joins between your data lake objects, and filtering them. When deciding between the output of either a data lake object (DLO) or a data model object (DMO) In batch data transforms, consider where this should sit regarding data spaces and the use case of your output. 

If you select a DLO, you may map this data and use it as part of the unification and reconciliation process; this option is not available if you choose DMOs as this DMO is technically not mapped and, therefore, cannot go through the unification process. However, you can use this DMO output to create a custom DMO specifically to fuel more complex segmentation use cases.

On the other hand, streaming transforms only offer basic SQL functionality, where transformations and calculations are limited to a singular data object (no joins allowed, unfortunately). In essence, streaming insights will read a record, reshape it, and (re-)write your data into a target DLO that needs to be a different object than the source. Streaming transforms can produce DLOs unavailable for segmentation and activation and run continuously as a streaming process, picking up new or changed data. 

Batch transforms are a repeatable series of operations you can run manually or according to a schedule when data updates. They also allow you to transform your data for more usage, meaning you can produce new insights and fields for identity resolution, segmentation, and calculated insights. Batch data transforms allow you to aggregate filter join.

Data Transformation Use Cases 

  • Normalize your data by transposing it to enable ingestion into the data cloud data model. Data Cloud requires data to be normalized to fit its model, but not all source systems provide normalized data.(i.e., Salesforce Marketing Cloud Engagement delivers its data in a flat-file format, so you may need to transpose it to enable its usage)
  • Use batch data trransforms to pre-process to curate new/ manipulate DLOs to create the perfect source data where there may not have been before to be used (i.e., join different DLOs to find the proper Individual ID to unify with the rest of your data)
  • Create calculated insights and dimensions for your data modeling. This is an incredibly powerful tool as it enables you to generate new calculated insights and dimensions that you may also use in the identity resolution process, which is not otherwise possible with the calculated insights functionality. 

Limitations

  • Having personally used both data transform options, the main observation is that the overall functionality is relatively limited, which is understandable – Data Cloud was not meant to be a data cleaning engine, but instead expects data to be in a clean and structured state before ingestion. Nevertheless, there are ways to get creative with the limited functionality that is available where necessary. 
  • At the time of writing, orgs on the Data Cloud SKU follow costs based on their processing usage, meaning the more data transforms they process (aka every time a transform is run) will have implications on the cost of their licensing. These costs can add up quite quickly.
  • At the time of writing, the limit of batch data transforms sits at 100 per org, while streaming insights sits at 25. For some, this is a fair number of transforms, but for others, this may serve as a caution for strategic usage. (In the scenario where the limit on batch data transforms is hit, you can technically combine some as each can produce more than one output – but creativity will be necessary).
READ MORE: How Are Calculated and Streaming Insights Different?

Summary

Despite its limitations, Data Transforms pose a step in the right direction for the product in providing users with the option to process and prepare their data in the platform itself, where clean data may only sometimes be readily available within organizations. 

It serves as an option to provide more opportunities for more organizations to enhance their understanding of their customer profiles across their organization, leveraging this information to create business impact and be onboarded on the AI revolution.

The Author

Eunice Wong

Eunice is a Salesforce Consultant specialising in Marketing Cloud Engagement, Marketing Cloud Intelligence, and Data Cloud.

Leave a Reply