I loved Salesforce’s recent blog that stated: “Bad data is junk food for AI.” However, after years of research stating more than half (55%) of business leaders do not trust their data, the challenge remains. I believe one key reason is the gap between IT vs business’ perception of what data reliability is. In fact, if you look closely at the recent Salesforce survey, 57% of Data and Analytics leaders are confident in data accuracy, whereas only 42% of Sales and Service leaders feel the same. How can you be confident data is reliable if you don’t understand how the business uses it?
It’s incumbent on Salesforce Architects to bridge this gap and guide their organization on the necessary capabilities to address data reliability challenges effectively.
In this blog, I’ll explain why data profiling solutions are an essential part of the Salesforce solution architecture. I’ll describe the pros and cons of different architectural approaches to data profiling, guide you through selection criteria, and share best practices and anti-patterns.
Capabilities Required for Sustainable Data Reliability
Salesforce provides a framework for administering data quality in CRM orgs:
My observation, having spent years at Salesforce advising customers on data strategy, is this: most organizations have implemented solutions for duplicate management, standardization, and to some extent data validation. If organizations had implemented data profiling, most were for point-in-time needs (what are my unused fields) vs for ongoing data reliability. Very few organizations have put in place monitoring.
Data reliability starts with assessing your CRM org’s data and associated technical metadata health. Without a quantitative understanding of your data health, you cannot determine if your data reliability is sufficient to meet different business objectives. Even if your data is sufficient to meet your business needs today, you cannot detect unexpected deviations that put your business outcomes at risk without ongoing monitoring.
CRM data and metadata ailments will impact every downstream initiative: enterprise AI, data unification in Data Cloud or Tableau, automation with Flow, etc. Data quality is an ongoing need with moving targets as the business evolves. You must have an effective data profiling and monitoring solution for your stakeholders.
What is Data Profiling and Why Do You Need It?
The Data Management Association defines data profiling as “statistical analysis of data set contents to understand format, completeness, consistency, validity, and structure of the data.” For projects involving a significant amount of data, like Salesforce, they recommend a data profiling tool as the most efficient means of conducting this analysis.
Organizations require scalable profiling solutions for assessing data in Salesforce CRM. These tools must cater to various stakeholder needs across different record types (Sales, Partner Sales, Vendor Management, HR, etc). Similarly, companies need profiling solutions to effectively unify data from different sources in Salesforce Data Cloud. It is crucial to be able to perform rapid assessments and continuous data trend monitoring.
Select a Profiling Solution That Meets Your Stakeholders’ Needs
Data profiling is not a new concept and there are many tools available in the industry. In general, there are 3 deployment architectures for data profiling solutions.
- Native solutions that profile data within the boundaries of the business application.
- External tools that profile data from applications based on data exports.
- Hybrid solutions that have Salesforce user interfaces but process assessment and analytics outside of the org, often through APIs.
Understanding the pros and cons of each architecture is essential to drive approval and adoption within your organization.
Native Data Profiling
Built and hosted on the Salesforce platform, native solutions offer four key advantages.
- Data security: Native profiling offers the greatest data security of the three architectures because:
- Data never leaves the org.
- Native apps can leverage Salesforce’s field-level security and sharing rules.
- These apps also come with the added assurance of having passed Salesforce’s security review process.
- Current data: Because native profiling solutions run in the org they will always profile real-time data and metadata.
- Easy installation and upgrades: The AppExchange makes installation and upgrades seamless.
- Familiar user interface: You and your users will have the advantage of the familiar, Salesforce user interface.
Examples of native data profiling solutions include Cuneiform for CRM, Field Pro, FieldSpy, and Field Trip.
External Profiling Tools
External tools have historically been the purview of IT departments. They can assess data from any source and may have more specialized features. However, external tools have five key disadvantages:
- Data security: These tools typically require data to be exported outside of the business application to be analyzed, meaning analysis occurs outside of the security controls built into your CRM environment. This comes with further disadvantages:
- Data context is lost. Because exports standardize data types (details such as string vs. picklist values) and associated configuration metadata is not available to external profiling tools.
- Out-of-date data. The analysis will be limited to the export, making trend analysis also more expensive at best.
- Increased data governance complexity. Data copies and when they are exported must be tracked, security and access controls need to be maintained across different technologies, and data access control and deletion processes must be expanded.
- Not accessible to business users or CRM admins: As IT tools, these applications are seldom accessible to CRM admins, never mind business users, who are primarily responsible for effective metadata configuration and data maintenance.
- Higher costs: Licensing costs aside, the need to handle data security concerns, additional integrations and processes, and the learning curve of external tools come with a higher cost than native solutions.
- Difficult to understand data trends: Your organization’s data retention policies may require purging data from the external solution before you have time to effectively monitor and respond to trends. Essential to build an effective history of the org’s data reliability.
Examples of external data profiling solutions include Ataccama, IBM InfoSphere, Informatica Data Explorer, and Talend Open Studio.
Hybrid Profiling Tools
Also found on the AppExchange, hybrid solutions have Salesforce user interfaces and have passed a Salesforce Security Review. However, they do process and persist data outside of your org.
Hybrid solutions have the advantage of ease of use but require much more security review vigor and effort. They share the same disadvantages as external tools when it comes to data governance, data trending, and acquisition costs.
Examples of hybrid solutions include Hubbl Process Analytics and Metazoa Snapshot.
What About Reports and Queries?
When I heard at a recent Circle of Success that “you can create custom reports to identify empty fields or field value frequency,” I was initially confused. It is possible to create custom solutions that mimic data profiling results but these would be time-consuming with the additional burden of ongoing maintenance. I for one would not want to build a separate query for every single field I may need to assess, when my objects may have 200-500+ custom fields.
External query applications, e.g. DBeaver and SoqlXplorer, while favorite tools on my laptop have the same challenges as above, so I would not recommend these as a scalable “business solution”.
Evaluation Guide
When evaluating data profiling solutions keep these patterns and anti-patterns in mind.
Do Start With Native Data Profiling Solutions
Begin by evaluating native data profiling apps. In addition to several advantages, many native apps are also free, making them an ideal starting point for evaluation.
Do Assess Profiling Solutions for Data Security and Access
Most effective data profiling solutions assess both CRM data and associated metadata. It’s also important to understand how the solution manages access and purging.
Does the app require the Administrator profile with read-all permissions or can it run under the user’s own permission levels? I prefer the latter. It ensures users only see the objects, fields, and records they are allowed to see.
Does the app support a read-only view of only the profiling results? This empowers data specialists to find patterns even under the most restrictive permission models.
Do Evaluate Solution Performance With Representative Data
Performance levels and feature breadth of data profiling solutions available on the AppExchange may vary. Performance assessments are key, especially in larger orgs.
Start by asking about the maximum field and record count that the solution provider has certified for their solution.
If you have a full copy Sandbox:
- Identify your largest objects by size as well as by the number of fields.
- Create and run the same profiling definition for the object(s) in each tool. Document execution time and if the tool can process the entire object without timing out.
If you cannot use real production data to evaluate tools, use synthetic data with a comparable number of fields and rows. A tool like Mockaroo will enable you to create synthetic data with custom parameters. Use the CSV to create a temporary custom object and assess.
Of course, always remember how governor limits may skew initial results.
As a general benchmark, the latest native data profiling tools can assess 10 million records with 500+ fields in 20 minutes or less.
Do Use a Data Profiling Features Evaluation Matrix
You may not initially utilize all of the below features. However, identifying feature gaps can assist in making your initial selection and facilitate future growth and expansion over time.
Capability | Description | Example | Why? |
---|---|---|---|
Data Profiling Fundamentals | Does the tool capture: – Field fill rate – Field distinct rate | A field is 97% null with 3 distinct values | Foundational capability every profiling tool should have. |
Data with Configuration | How observed data compares to the config – Distinct values vs. active picklist values – Distinct values vs. data type – Fill rates vs. field usage | Picklist field has 10 distinct values but 14 active value configurations Field is string data type but only has 7 distinct values Field is not used but is in 4 UIs and 8 reports | Assessing data and configuration metadata together can quickly provide insights on potential usability and data reliability challenges. |
Data Profiling Granularity | Ability to create multiple profiling scenarios to assess an object’s contents | Customer Accounts vs. Partner Accounts Open Opportunities vs. Closed Won Opportunities | Salesforce’s flexible data model means it is common for multiple functions or business units to be supported by the same object. Granular scenarios are essential to assess business-specific data quality considerations. |
Advanced Profiling Features | Ability to infer additional insights based on data and metadata. e.g. net population rate | The field’s default value represents 90% of populated values | Demonstrates tool benefits to assess probable reliability |
Data Governance and Dictionary Support | Ability to capture Definition, Help Text, Data Owner, Data Classification, Data Management Rules, Encryption, and Usage details in a common view | Field is classified as Confidential, PII but not encrypted | Data Governance features in Salesforce CRM are spread across multiple parts of the setup tree. Consolidating all insights in a common UI and data model simplifies visualization and monitoring. |
Trend Monitoring | Ability to take snapshots and compare profiling insights over time | Profiling definition shows 12% record volume growth but field completeness has dropped | Snapshotting and trend analysis are critical for understanding data changes over time and can aid in proactive data management. |
Data Quality KPIs | Ability to incorporate data quality formulas into the assessment | Billing Address completeness Incorrect Account detection Junk account detection | The inclusion of KPIs for data quality provides a structured approach to measuring and improving data integrity. |
Reports and Dashboards | Out-of-the-box reports Ability to create custom reports with Salesforce tools | Fill rate visualizations | Accelerates value realization with common tools for stakeholder engagement. |
Data Health Scoring | Application providing a snapshot of data health based on out-of-the-box or configurable scoring models | Data Dictionary health is 47/100 Account object data health is 72/100 | Offering a holistic view of data health at a glance is crucial for quick assessments and prioritization. |
Data Quality Improvement Recommendations | Ability to look across all relevant/profiled objects to identify tactical next steps to improve data and org configuration | Convert string field to picklist Encrypt sensitive field | Every org has data quality and configuration health challenges. The ability to correlate findings to actions quickly brings value faster. |
User Experience and Usability | How user-friendly is the tool? Does it use the latest UX patterns (e.g. LDS) | N/A | A good UI/UX can significantly affect adoption rates and the effectiveness of data profiling activities. |
Customization and Flexibility | Ability to expose insights, e.g. data quality scores, within other CRM applications | Can the UI or data be exposed in other parts of the app? | The ability to customize the tool to fit specific organizational needs. |
Scalability | How well does the tool scale with the growing amount of data? | What are certified data volume and field counts per object? Does the vendor publish performance statistics? | Ensure the tool can scale with the growing amount of data and evolving business requirements. |
Shield Support | Can the solution profile encrypted fields? Are there limitations? | N/A | Some customer orgs may have Shield turned on. Understand limits. |
Compliance and Certifications | Can the vendor demonstrate current certifications or compliance? | SOC2 ISO27001 HIPAA FedRamp | Certain industries may require these certifications. |
Support and Community | The availability of support options, documentation, user community | Is the product well documented? Does the vendor offer free and premier support? | Effective documentation and an active user community can be valuable resources for troubleshooting and best practices. |
Don’t Let IT Tooling Get in the Way of Data Reliability
Your IT department may have already procured a data profiling tool to support their integration development initiatives. More mature IT departments may even possess internal data quality monitoring capabilities built on other systems (data warehouse, enterprise message bus, etc).
Stay vigilant and make the business case for why your Salesforce Admins, Data Specialists, and Business Data Stewards need to assess and monitor their data and metadata health. Asking “How many of these users are users of the IT tool?” is often an effective way to get the point across.
Do Work With Your Admins to Unlock the Full Potential of Data
As an architect, you guide the overall Salesforce architecture and roadmap. This includes how to identify and address scalability, security, integration, and data quality concerns to meet the organization’s strategic goals. Admins have always been key allies. As a primary user of data profiling solutions, they can ensure your data assessments are impactful and help to maintain data quality and reliability over time.
If you do not have a data profiling solution in your CRM org, partner with and educate your admins on the benefits:
- Assessing data and metadata quality to guide tactical actions to improve data reliability, application usability, and maintainability. E.g:
- Identifying unused/underutilized fields and field values.
- Using profiling insights to have better data models (e.g. pick list conversions, deactivating unused picklist values, or splitting a field into many to capture more granular data).
- Understanding data dictionary health.
- Identifying fields that can predict successful business outcomes to focus user adoption on key business data.
- Implementing monitoring solutions to catch deviations and ensure data remains reliable for the life of your Salesforce solutions.
Collaborate closely with admins to communicate the architectural vision, ensuring practical application and maintenance. This partnership ensures that the system not only meets current needs but is also poised for future growth and change. It utilizes tools like data profiling to uphold high data quality and system efficiency.
Do Profile Every Production Instance
Your evaluation data profiling solutions will likely happen in your full or partial copy Sandbox. To take advantage of data quality monitoring, you will need to deploy data profiling solutions in production. This way, you can take snapshots of your data growth and correlate that to metadata changes over time.
If your organization has multiple instances, deploy the tooling across each org. This assessment can significantly demonstrate the importance of enterprise-wide data reliability and governance strategies. It will also illustrate the importance of data unification for AI, analytics, automation, and activation initiatives, including but not limited to Data Cloud.
When Do You Augment Native Data Profiling?
My simple answer is when there is a specific set of insights that are impactful to the business outcomes you want to achieve and you are not able to get these insights from your native data profiling solution.
I prefer to have the smallest set of solutions working together, starting with native and hybrid solutions and then moving to external tooling due to the above-mentioned reasons. While I am comfortable with hybrid tools that analyze metadata (e.g. dependencies), I would have to have a very good reason to do data assessments outside of my org, especially given the importance of ongoing data quality monitoring, contextual root cause analysis, and in-app response.
Summary
Architects must help their organizations assess and improve data reliability to unlock the full potential of their Salesforce data ecosystem and drive long-term success. Putting in place the right data profiling and ongoing monitoring solution is a key component in achieving this outcome.
The first step to ensuring data reliability is assessing the data and associated metadata against business outcomes.
Start with a native data profiling solution from the AppExchange that can support multiple business scenarios. Assess the technical data health of your key objects and business data reliability for one scenario to build the foundation.
Set up data quality formulas based on data that matters to your business. Set up monitoring and alerts to detect and respond to unexpected changes. Scale to additional business use cases over time and, if you have them, to other orgs as well.
Also implement data profiling in Data Cloud, so you are not only ensuring your CRM data is reliable, but you are also monitoring data reliability across any data source that is powering your applications, AI, automation, analytics, and activation initiatives.