The Salesforce Winter ’24 release, shipped with a new Data Share feature, provides live data sharing from Salesforce to Snowflake. This big step in Salesforce’s Bring Your Own Lake (BYOL) strategy aims to provide bi-directional access between Data Cloud and numerous modern data lake solutions.
The end result is a Salesforce-managed, frictionless process to create a holistic view of customer data. This article focuses on the newly introduced data share feature, specifically for Snowflake data share.
Why Data Cloud Data Share and BYOL Are a Big Deal
Before we dive into the technical step-by-step configuration, I’d like to outline the facts within Salesforce’s recent announcements and integration with Snowflake in relation to the broader Data Cloud solution…
- Fundamentally, Data Cloud represents an evolution of multiple components, namely CDP (Customer Data Platform). Salesforce has evolved Data Cloud to standardize connecting a complex network of data sources that capture customer events (interactions and transactions).
- Snowflake is now a first-class data citizen within Data Cloud. It is integrated with Salesforce through direct collaboration between Salesforce and Snowflake product teams. More details about Snowflake and Salesforce are covered in my previous article, Data Synchronization to Snowflake: The Complete Guide.
- Data sharing highlights the importance of speed of access to data from Salesforce to support transactional, analytical (including ML models), and generative AI-augmented data analysis.
- At the time of writing, Data Cloud data share delivers data from Salesforce to Snowflake but not Snowflake to Salesforce.
- The data share architecture and design lowers latency, allowing data, analytics and Salesforce teams to participate in new ways. Salesforce has effectively removed friction points associated with moving data between cloud platforms.
- Data sharing to Snowflake reduces the overhead and effort to maintain multiple copies of the same data.
- Salesforce Data Cloud has introduced consumption-based pricing that aligns with platform and infrastructure service solution providers such as AWS, Google, Azure, and Snowflake.
What Can I Do in Snowflake That I Can’t Do in Salesforce?
Salesforce is a powerful transactional business application platform. However, in many scenarios, organizations find themselves replicating data, building formulas, rollup fields, and funneling lots of data into the Salesforce data model solely for reporting. This, combined with reaching the limits of custom report types, snapshot reports, and business requirements for advanced analytics, are warning signals that you may need more firepower.
Snowflake, as a data warehouse, is designed specifically for the use cases where your data scientists, business intelligence, and data engineering teams can fully utilize the Salesforce relational data model but with complete control over every facet of data transformation and delivery for analytics use cases.
In the same way, Salesforce isn’t just a CRM, and Snowflake isn’t just a database. Snowflake is a full-fledged data platform that you can unpack with the ability to build sophisticated views of your salesforce data, machine learning models, end user-friendly analytics apps, and now AI-enhanced data analysis.
What is Zero Copy Data Sharing?
Salesforce’s modern Data Cloud has embraced high standards that enable the ability to share live data with Snowflake. It allows Snowflake to access and query Salesforce data without copying all the data from Salesforce to Snowflake on a batch or event basis.
It is fundamentally different from how all customers push Salesforce data into Snowflake today, which is mostly on a scheduled basis.
I recently wrote about the Analytics Cloud Data Manager, which features bi-directional data synchronization between Salesforce and Snowflake.
While the notion of a zero-copy data share is new to Salesforce, Snowflake customers and partners are very familiar with the concept.
Data sharing is available in the partner data marketplace and widely used for intra-company and third-party data sharing, which enables effective collaborative data and analytics work. It is exciting to see these same principles now employed for business application data sharing from Salesforce.
Getting Started with Data Cloud Setup
Before you can utilize data share features, you need to configure Salesforce Data Cloud. Salesforce’s Data Cloud is available for free to set up. Below, you can find a handy guide on how to do it…
In addition to installing Salesforce Data Cloud, you will need to properly configure your Data Cloud and arrange numerous Data Cloud records and objects.
Salesforce Data Cloud Permission Settings
Salesforce has introduced two distinct Data Cloud permission sets that are added upon completing the Data Cloud Setup. To set up a data share, you will need Data Cloud Admin permission sets.
- Data Cloud Admin: Grants access to Data Cloud Setup
- Data Cloud User: For general users
Snowflake Data Cloud Permission Settings
To complete your data share and create the necessary Oauth connection Snowflake, your Snowflake user will require ACCOUNTADMIN and/or SECURITYADMIN roles to complete the steps in this guide.
Data Cloud Configuration Required for Snowflake Data Share
This section will walk you through the end-to-end process for configuring your data space, data stream, and Data Cloud Data Model objects. There are multiple objects, records, and a series of configuration steps required to create the newly introduced data share for Snowflake.
These steps include but are not limited to:
- Connecting your instance.
- Creating and refreshing a Data Stream that pulls data into your Data Lake Object.
- Creating a Data Model Object that maps data from your Data Lake into your Data Model.
Create a Data Cloud Data Share in Salesforce and Snowflake
By integrating Salesforce’s Data Cloud with Snowflake, customers will have ‘Salesforce-managed’ data access inside Snowflake. The following steps will demonstrate the technical process in Salesforce and Snowflake to create the connection successfully.
1. Setting Up Access from Snowflake
Because Data Cloud routes on AWS East, I set up a Snowflake Account on AWS East for the sake of this article.
Once inside Snowflake, you will need to create a system user in Snowflake to provide proper permissions and handle secure connection between Salesforce and Snowflake. To do so, you will need an ACCOUNTADMIN or SECURITYADMIN role. If you are using a brand new Snowflake account, you will want to ensure you have a warehouse created and a role.
I will be using a LOADING role and a LOADING warehouse for demonstrative purposes.
2. Snowflake User Creation
You can create a new user from the Snowflake UI or simply use the following SQL command.
Tip: Keep this username and password handy. You will need them when you connect from Salesforce Data Cloud data share.
CREATE OR REPLACE USER SYSTEM_SFDC_DATACLOUD_ADMIN
PASSWORD = 'insert_strong_password_here'
LOGIN_NAME = SYSTEM_SFDC_DATACLOUD_ADMIN
DISPLAY_NAME = SYSTEM_SFDC_DATACLOUD_ADMIN
DEFAULT_ROLE = LOADING;
3. Generate the Security Integration in Snowflake
This step requires you to enter SQL code. The following tips will help you as you create your SQL:
- The Data Cloud-tenant-url is captured when you set up your Data Cloud. If you don’t have it spare, you can locate it by going to Setup, then Data Cloud Setup, and scrolling to the bottom of the setup page.
- LOADING is not a standard Snowflake role. You will need to pre-authorize the roles that will gain access to Data Cloud.
- Finally, SALESFORCE_DATA_CLOUD_OAUTH_INTEGRATION is a name that can be changed to anything you would like. Make sure you document and share this information as you set up your integration.
CREATE OR REPLACE SECURITY INTEGRATION
SALESFORCE_DATA_CLOUD_OAUTH_INTEGRATION
TYPE = OAUTH
OAUTH_CLIENT = CUSTOM
OAUTH_CLIENT_TYPE = 'CONFIDENTIAL'
OAUTH_REDIRECT_URI = ‘https://login.salesforce.com/services/cdpSnowflakeOAuthCallback'
PRE_AUTHORIZED_ROLES_LIST = ('LOADING')
ENABLED = TRUE
OAUTH_ISSUE_REFRESH_TOKENS = TRUE;
The result should confirm the creation of your integration.
4. Generate Client Credentials
In Snowflake, we need to generate client credentials that the Salesforce data share target object will require.
Tip: The parameter in this SQL labeled SALESFORCE_DATA_CLOUD_OAUTH_INTEGRATION refers to the previous step.
Once you execute the SQL, you will see an output containing the clientID and secret. Copy and store them in a secured location, preferably in an encrypted password/token management solution.
SELECT SYSTEM$SHOW_OAUTH_CLIENT_SECRETS( 'SALESFORCE_DATA_CLOUD_OAUTH_INTEGRATION') AS OAUTHDETAILS;
5. Snowflake Network Policy
Snowflake has its own configuration steps required to enable programmatic access securely. If you have an existing Snowflake instance managed by your data, IT or analytics team, you may have some security and whitelist policies. In this case, you will need to provide the following link to whitelist the appropriate IP.
If you are using a brand new Snowflake instance or trial to prototype, you can skip this step initially, but you will want to implement the proper policies.
6. Setting Up the Data Share Target in Salesforce Data Cloud
Tip: If you do not see Data Share or Data Share Target in your Salesforce instance, you may have these tabs disabled by default. Simply go to your user Profile → Object Settings → then set the Tab Settings to “Default On”.
- Navigate to the Data Shares Target tab in Data Cloud.
- Click “New”, select Snowflake as the connection type, and proceed.
- Fill out the necessary details, including:
- Label and API name
- Account URL: Your Snowflake account URL. If you don’t know your account URL, you can obtain it by going to the lower left corner, clicking on your account name and then the link icon to obtain your account URL (image below).
- Client ID and client secret: Obtain them from the previous step.
- Authenticate using the provided Snowflake credentials for the recently created SYSTEM_SFDC_DATACLOUD_ADMIN user that we created in Step 1.
7. Creating a Data Share in Salesforce Data Cloud
- Navigate to the Data Cloud app.
- Open the Data Shares tab.
- Click “Create New Data Share”. Fill out the necessary details, including:
- Label: The display name of the data share.
- Name: Auto-populated to match the label name (can be customized).
- Data Space: Choose “Default” if no other data space is provisioned.
- Description: Add any relevant details.
- Select the data objects you wish to include in the data share.
- Click “Save”. Once successfully created, the data objects become accessible for executing your data share to Snowflake.
The last step is to link this data share to Snowflake. Since we already created and tested our data share link, we can perform the last step to accomplish our share.
8. Link the Data Share
- Navigate to your data share
- Ensure the data share is active.
- Click the “Link/Unlink Data Share Target”.
9. Viewing and Testing Your Data Share in Snowflake
- Navigate back to Snowflake.
- Select the data share from private sharing.
- Create a database on the data share by clicking “Get Data” and name the database. If you are new to Snowflake and do not have a naming standard for your databases and schemas, you should use a database naming convention that is easy to understand and reflects what you are syncing to your data. In this example, I use a simple convention to highlight the source, that this is a staging database and it is connecting to prod data.
- You will see all the selected data objects within this database named after your Data Cloud instance. Your data will be shared as secured views, allowing you to query your Salesforce objects as Snowflake views. This will allow you to utilize your Salesforce data similarly to any other Snowflake object.
Summary
Whether you scanned the steps or followed the tutorial in sequence, you can see that Data Share with Data Cloud is not quite a plug-and-play solution. However, it does offer a glimpse into the near future, where the benefits of pushing and pulling data from Snowflake will allow you to expand your modern Data Cloud capabilities.
The most important outcome is the ability to put more resources and time into putting data assets to work in the form of analytic and model development. With your Data Cloud Data Models and Data Lake flowing into Snowflake, you get to extend and learn about the massive Snowflake modern data cloud ecosystem.
This is just the beginning of BYOL and data-sharing capabilities. I can’t wait to put Data Cloud to work in anticipation of live Data Access from Snowflake back to Salesforce!
Are you as excited as I am about the new Salesforce Data Cloud data sharing? Don’t forget to share your thoughts in the comments below!
Comments: