Following the recent releases and enhancements with the CRM Analytics and Salesforce Data Cloud native Snowflake integration, it’s easier than ever to harness your Salesforce skills to perform more advanced data analysis in Snowflake.
As someone who is constantly blending and analyzing Salesforce data with other data sources in Snowflake, I wanted to share some of the most common functions and features that help me on a daily basis, including how to restructure your data, SQL functions, and matching with Soundex and Distance functions.
Why Salesforce + Snowflake Matters
There are many amazing solutions in the marketplace for building analytics, and Salesforce has continued to expand its portfolio of native data machine learning integrations so you can leverage your existing investments. Salesforce’s native integration with Snowflake extends the might of the CRM into the powerful data warehouse and analytics capabilities.
- Fast and Simple Setup: The setup and ongoing management is straightforward, minimizing the learning curve and technical overhead. Setting up Snowflake is like setting up a new Salesforce trial account.
- Low Initial Cost: From a cost perspective, this integration is efficient because you only pay for the compute resources you actually use. With a 30-day free trial, you can sharpen your skills cost-free. My personal Snowflake instance that I use for training and experimentation runs me less than $30/month.
- CRM Analytics and Data Cloud: Salesforce’s CRM Analytics and Data Cloud with Snowflake is natively integrated. The Data Cloud integration provides direct access from Salesforce to Snowflake.
Use Cases for Salesforce in Snowflake
Advanced Analytics and Data Exploration in Healthcare
Leveraging the Salesforce and Snowflake data together will allow complex analysis of anonymized data, that may exist across multiple systems and parties. Making transactional data from Salesforce in Snowflake can provide data and analytics teams another dimension to build benchmarks for patient quality of care.
Co-Mingling of Anonymized Data in Financial Services with Data Cleanroom
In financial services, integrating Salesforce with Snowflake and employing a data cleanroom allows for the secure blending of anonymized customer data from Salesforce with external financial datasets in Snowflake. This approach not only provides deeper insights into customer behavior and market trends, but also ensures compliance with stringent financial privacy regulations. The resulting data intelligence is pivotal for tailoring financial products and managing risk while upholding data privacy and security standards.
Staging and Lists Management in Manufacturing
Manufacturers can use the combined power of Salesforce and Snowflake for advanced staging and management of data lists, such as supply chain components, customer orders, and inventory levels. This allows for more sophisticated analysis and operational planning than what’s possible with standard Salesforce reporting, leading to optimized production and distribution processes.
Re-Structuring Your Data With Snowflake
Data Warehouse Structures
In Snowflake, traditional data warehouse structures (Star schema), characterized by the use of fact and dimensional tables, offer a stark contrast to the Salesforce data model. These structures, designed for analytics, allow for complex queries and in-depth historical data analysis. The Salesforce model is designed for operational processes and real-time transactions.
Snowflake offers flexibility to employ a “one big table” approach where facts and dimensions are commingled and transformed into a single schema that is ready for reporting, exploration, and analytics. For analytics tools like Tableau, this is the common and preferred approach to slice and dice data.
Snapshot Data
Data storage in Snowflake is quite inexpensive. Numerous use cases call for daily snapshot fact tables for point-in-time metrics like financial balances. This allows the effortless capture and storage of data states at regular intervals, such as month-end, daily, or hourly. This feature is particularly valuable in sectors like finance and retail, where it enables deeper analysis of trends and changes over time with simplicity and efficiency. Salesforce offers a similar concept for snapshot reports.
Semi-Structured Data
One of the amazing features employed by Snowflake is native querying of semi-structured data like JSON. Additionally, Snowflake offers stored procedures, allowing data engineers to utilize Python or Javascript to account for more complex, deeply nested JSON structures. Pulling semi-structured data into Snowflake allows analysts and data scientists to immediately explore and extract new insights from data that would otherwise need to be parsed and structured.
SQL Basics
Let’s dive into some SQL code! To start with a simple example, I had pre-prepared a table that joins Opportunities with Accounts to give us a simple starting point to explore Snowflake functions. In this case, I have a left outer join from the Account table, meaning we have all Accounts with and without Opportunities.
SELECT
a."CreatedDate" AS "AccountCreatedDate",
a."Id" AS "AccountId",
a."Name" AS "AccountName",
o."Amount",
o."CreatedDate" AS "OpportunityCreatedDate",
o."Id" AS "OpportunityId",
o."StageName",
o."CloseDate"
FROM
"SF_LAB"."SFDC"."Account" a
LEFT OUTER JOIN "SF_LAB"."SFDC"."Opportunity" o ON a."Id" = o."AccountId"
Resulting Data:
SQL Functions Built for Simple Data Transformation
Let’s start with some simple examples that you should be familiar with writing formula fields in Salesforce…
Case Statements
Case statements can be used in many different scenarios. One common analytics use case is binning data based on numeric values. Salesforce reports provide a great numeric binning interface that is much easier than writing SQL, but for illustrative purposes let’s explore this use case together:
CASE
WHEN "Amount"<0 THEN '<0'
WHEN "Amount" >=0 AND "Amount"<100000 THEN '0-100K'
WHEN "Amount" >=100000 THEN '>100K'
ELSE NULL
END "Amount Bin"
Snowflake features many functions designed to make preparation of data for analysis faster. For example, there is a standard function called “WIDTH_BUCKET” that can accomplish this same binning task when the objective is to quickly analyze the distribution of opportunities by amount.
WIDTH_BUCKET( <expr> , <min_value> , <max_value> , <num_buckets> )
WIDTH_BUCKET( "Amount" , 0 , 1000000 , 10 ) "Amount_WIDTHBUCKET"
Working with Dates
Converting from UTC
Because Salesforce stores data in UTC, DateTime fields need to be handled to avoid a discrepancy between your analytics and Salesforce reports.
Let’s assume operations occur in California. To transform the data out of UTC you can use the CONVERT_TIMEZONE function shown below
CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', "CreateDate") "Local TZ Create Date"
To use other global timezones, you can reference values from the IANA Time Zone Database. There are other options in Snowflake to set session and user-level time zones to globally convert all date times covered on the Snowflake DateTime documentation.
Calculating Date Diffs and Adding Dates
Calculating the difference in times and adding dates is quite simple in Snowflake.
In the example below, I calculated the days from Account creation to Opportunity Creation:
DATEDIFF( <date_or_time_part>, <date_or_time_expr1>, <date_or_time_expr2> )
DATEDIFF( ‘days’, "AccountCreatedDate", "CloseDate") “Days from Account to Opportunity Close”
To add or subtract time to a specific date you can use the DATEADD function.
DATEADD(<date_or_time_part>, <value>, <date_or_time_expr> )
DATEADD(‘years’,12,"CloseDate") "1 Year Renewal Date"
To bring together 2 concepts we have covered, I am blending the CASE function with the DATEADD to derive a next renewal date. When the Stage is “Closed Won” we calculate the 12 months from the CloseDate. If the Stage is not “Closed Won” we leave the value empty (NULL).
CASE
WHEN "StageName" = 'Closed Won' THEN DATEADD('years',12,"CloseDate")
ELSE NULL
END "Next Renewal Date"
Dealing With Deduplication
If you are in the business of data wrangling, then you may have some tricks to deal with dirty, missing, and duplicate data. For this article, I don’t cover window functions for deduping data. I did, however, want to highlight some nifty functions in Snowflake that I use for managing duplicate data, merge/purge, and identity reconciliation.
What if you want to analyze your accounts or leads table with other data sources to determine accounts with similar names? That is where matching features help. In the next example, I will analyze the Salesforce Accounts table for duplicates.
Matching Features with Soundex and Distance functions
The first matching function is called Soundex, which returns a string that contains a phonetic representation of the input string.
SOUNDEX( <varchar_expr> )
The phonetic representation is not human-intelligible but provides a broad-stroke matching method. Here is what the SOUNDEX output looks like:
The next matching function is called JAROWINKLER_SIMILARITY. You can think of it as a fuzzy match score.
JAROWINKLER_SIMILARITY( <string_expr1> , <string_expr2> )
Computes the Jaro-Winkler similarity between two input strings. The function returns an integer between 0 and 100, where 0 indicates no similarity and 100 indicates an exact match.
EDITDISTANCE( <string_expr1> , <string_expr2> ,1 )
Computes the Levenshtein distance between two input strings. It is the number of single-character insertions, deletions, or substitutions needed to convert one string to another. Edit distance works great, except for short names where there aren’t enough characters. I typically add a condition that requires an exact match when there are less than 6 characters in a company name.
Here is an example that shows all 3 working together. As you can see the Soundex helps narrow the field of potential match as the most broad matching function. This is very important for processing large data sets so you are not trying to run matches between values that are clearly not duplicates.
With some new fancy matching functions in your arsenal, here is a query that is similar to the first. In this case, we SELECT from the Account object and then cross join to itself. This effectively forces Snowflake to compare each account name to itself.
SELECT
a1."Id" AS "AccountId1",
a1."Name" AS "AccountName1",
a2."Id" AS "AccountId2",
a2."Name" AS "AccountName2",
SOUNDEX ("AccountName1") AS "Soundex AccountName1",
SOUNDEX ("AccountName2") AS "Soundex AccountName2",
JAROWINKLER_SIMILARITY(a1."Name", a2."Name") AS "JaroWinklerSimilarity",
EDITDISTANCE(a1."Name", a2."Name") AS "EditDistance"
FROM
"GG_DEVELOPMENT_OPPS"."SFDC"."Account" a1
CROSS JOIN "GG_DEVELOPMENT_OPPS"."SFDC"."Account" a2
WHERE
a1."Id" <> a2."Id" -- To avoid comparing a row with itself
Tip of the Iceberg With Snowflake
For data professionals, Snowflake has democratized the data warehouse and data lake. Today, you also have AI chatbots at your disposal to help you learn and find the right functions to help you answer real-world questions.
If you are interested in learning more Salesforce-focused SQL scenarios with Snowflake, keep your eyes peeled for the upcoming Salesforce Ben course on SQL.