A Dataflow is a file that contains instructions to create Datasets, which you can use for Einstein Analytics data visualisations. The real power of Dataflows come when applying transformations to them. Transformation can be defined as the process of converting data from one format or structure into another – a fundamental aspect of most data integration and data management tasks.
Previously, I shared the power of Einstein Analytics with templates and by embedding the data visualisations within your Salesforce Lightning pages. Today we take an in-depth look at the machine that is Einstein Analytics, focusing on Dataflow transformations that you can do there.
What is a Dataflow?
A Dataflow is a file that contains instructions to create Datasets, which you can use for Einstein Analytics data visualisations. A Dataset is a collection of data, think of it as a table or set of values, where every column represents a particular variable and each row corresponds to a given record of the Dataset in question.
Transformation can be defined as the process of converting data from one format or structure into another. You can find the list of transformations on the top pane on a Dataflow:
This is exactly what I intend to explain today. Let’s cover the different transformations there are available for you and get to know why we would use them.
Transformations for Analytics Dataflows
This is the first one that you come across on the top pane on any Dataflow. Think of it as a wizard. To start with, its purpose is to build objects and relationships easily. It allows you to grab the objects, fields and relationships that you intend to use for your Dataset.
For example, here we are fetching Opportunity products with a couple of fields and their related Opportunities:
sdfcDigest and Digest
These two are the transformations that fetch data. If you have connected to Salesforce, (local) you would have used the first one to fetch data from a Salesforce object.
In a dataflow, a digest transformation extract is synced with the connected data. Use this to extract data synced from an external Salesforce org, or the data synced through an external connection. Use the sfdcDigest transformation to extract from your local Salesforce org.
For example, I could use this one in our previous example to find the account information, related to the opportunities.
This transformation with a cool name loads existing and registered Datasets that have been created outside of the dataflow, maybe dataflow you already have in place.
For example, we could use an Edgemart to bring data from an existing Dataset with an Account balance from an external system that is already connected and registered as a Dataset to analytics.
It is here that you add one Dataset to one another. It combines rows from multiple Datasets into a single Dataset.
A common use case is to create a Dataset about activities, then use computer expressions to add fields and when it appends them, you get all the fields in the same Dataset.
This one is about adding from one object child to another – basically, it is about adding columns.
In our previous example, we could be using ‘append’ to bring the information of the Account into our initial flow of Opportunity Products & Opportunities.
As you can see in the above screenshot, there are two sfdcDigests where we bring over Salesforce Objects (skip the Flatten node for a moment), then to combine data from one into columns of the other, we use that Augment node named 105.
computeExpression and computeRelative
Both these two have similar names but different functions. ComputeExpression is a formula; a powerful way to create fields (only on one record). It allows you to look at the rest of the fields and make calculations. For example, we could look at the probability of our Opportunity and multiply it by the amount field.
On the other hand, we have ComputeRelative, also as an expression. However, this time it acts across rows (over records). The ‘Partition By’, is used for slice/group; for example, if you want to order the latest Opportunity for each Account by Account name.
In either of the two ‘Computes’ you can add multiple fields in one node.
Image ref: a fantastic series from the product team on YouTube to learn Einstein Analytics.
Midway Check In
Let’s take a break and recap. So far, we have looked at the first eight transformations and how data is brought in via a wizard, from a local connection to Salesforce or an existing DataSet. We also learnt how to enrich data using fields, as well as columns.
Now, we’ll look at the remaining transformations and how to convert them once your data is ready as a DataSet!
As the cryptic name tells us… you will be using this transformation if you need to change the dimension to measure. It creates a new column in the Dataset with a new measure value from a dimension field, preserving the original dimension to ensure that the existing lenses and dashboards do not break if used elsewhere.
Think of a dimension. Add something that you can group by, like the Stage field and measure something you can do calculations with, a number.
This is used to flatten hierarchies. The common use case here is to get the user role so that you can add it to Security Predicates (row-level security works a bit differently in Einstein Analytics, maybe we should keep it in the backlog for a future article!)
For example, if we may want to have one field in the Opportunity with a concatenation of the product families from the Opportunity Products, we could just use flatten for that.
In this example we are bringing the UserRole with a sfdcDigest to use the flatten node to concatenate the role path into the User object table, combining it with the augment node (seen earlier on in this post).
True to its name, this is used to filter records!
It has two types:
- the simple one, Stage:EQ:Closed Lost or
- the more complex, SOQL based one.
You may have seen that you can filter within a digest ‘filter conditions’. However, instead of filtering, it is better to use a separate node. The earlier option is not recommended because it uses the replication/sync object level and everything else that calls from it.
This one is about dropping some of the fields that you are carrying through the transformation of your Dataflow.
For example, you may have added some fields from other objects to perform some calculations, to add extra columns. However, you do not necessarily have to keep the Dataset until the end product/table.
This transformation allows you to update the specified field values in an existing Dataset, based on the data from another Dataset, a little like a lookup on steroids.
For example, if the Product Code changes with our Opportunity Products, we could look at the Product object Dataset and update the value into the Opportunity Products.
Last but not least, where all that magic happens and gets recorded! You will use a sfdcRegister to save your results into a DataSet as the final result of all the transformations, in the form of the final data table that you can use in your Einstein Analytics dashboards, AI Discovery stories or further Dataflows!
You have multiple ways to bring in and transform data for Einstein Analytics. Dataflow enables you to create Datasets, as refined data tables which you can then use in visualizations.
It is not as complicated as you may think! Here we have covered are some new names, words and concepts that will bring familiarity and hopefully now you have clarity about them.
We looked at how data is brought in via a wizard, from a local connection to Salesforce or an existing DataSet. We also learnt how to enrich data using fields, as well as columns.
Then, we looked at the remaining transformations and learnt how to convert them, once your data is ready to change into a DataSet!
Just like in other similar cases, this too requires practice. The best thing that you can do right now to reinforce this is to grab an Analytics enabled devorg and play with it. Here’s a trial for you: Build and Administer Einstein Analytics.