Artificial Intelligence / Data / Data Cloud

From Data Chaos to AI-Readiness: A Salesforce Data Governance Playbook

By Mehmet Orun

Following Salesforce’s Agentforce announcement at Dreamforce ’24 and the global enablement sessions teaching thousands of Salesforce professionals how to configure their own agents, excitement and interest in Enterprise AI have skyrocketed. At the same time, concerns about AI hallucinations and whether the underlying data (and metadata) are ready to support AI have also increased significantly.

As a seasoned Salesforce Data Management practitioner, I am often approached by Salesforce Admins, Architects, Consultants, and CRM Managers about how to know if their data is ready for AI and how to kick off and right-size their data governance initiatives, as the effort can feel overwhelming.

In this article, we will explore the most common causes of AI hallucination and how to set up targeted data governance initiatives to ensure your data and metadata is ready for AI. First, let’s start with the fundamentals of the solutions we want to put in place…

Is It Automation, Analytics, or AI?

Having spoken to many consultants and customers seeking to define their “AI roadmap”, I noticed a pattern:

  • What “Artificial intelligence” is to a business user or leader may not be “AI” to a technologist.
  • We need to understand and leverage automation, analytics, and true AI capabilities to deliver the most effective results.
  • Data and metadata reliability (definitions, sensitivity classification, ownership) for fields that matter is essential for any of the initiatives.

Machine-Assisted Insights – A Comparison

TypeDescriptionSalesforce Product Examples and Business Use Cases
AnalyticsVisualize data through reports and dashboards to present business activity summary or trends.Tableau or CRM-A: Visualize customer activity trends for account-based marketing.

Einstein Discovery: Identify correlations between sales activity and revenue outcomes.
AutomationStreamline workflows and repetitive task execution using predefined rules or logic.Flow: Recommend the best account to associate with a new lead by automating account enrichment by matching account name and address details against reference databases (e.g. D&B, Bureau van Dijk).

Next Best Action: Provide guided recommendations for service agents to resolve customer issues effectively.
Predictive AILeverage historical data to identify patterns and make data-driven predictions.

There is value in revisiting how Salesforce presented its AI solution in 2022.
Einstein Discovery: Forecast sales performance based on historical trends.

Einstein Prediction Builder: Anticipate customer churn to enable proactive retention strategies.
Generative AICreate, summarize, or extract insights from unstructured data to deliver personalized and dynamic outputs.Einstein GPT: Craft tailored email campaigns based on customer interaction history.

Agentforce: Generate personalized chat responses, e.g. analyze knowledge articles to respond to “What is your return policy?”
Agentic AIDeliver recommendations and automate workflows based on contextual intelligence derived from structured and unstructured data.Agentforce: Create a sales email based on the understanding of open and won opportunity history across account records from multiple orgs, associated email and activity history, including cases that can indicate customer sentiment.

Before diving into AI-Readiness Data Governance playbook details, I admit I am excited about Agentforce for three main reasons:

  1. Enterprise AI requires combining all of the machine-assisted capabilities above. I do not want to choose and wire different technologies, especially when it comes to establishing the data security layer.
  2. Enterprise AI requires complete, consistent, current, and correct data about the customer, leveraged in a contextual and compliant way. The Salesforce platform, with the combined power of Agentforce and Data Cloud, delivers on this key need.
  3. By tapping into Salesforce’s broader Data Management partnerships, data profiling, data cleansing, and data enrichment, we can assess data reliability and build the most robust solutions with reliable data that reduces the risk of AI hallucinations more predictably and faster than other alternatives.

Understanding the Main Causes of AI Hallucinations

If you ask your favorite GPT tool, you are sure to get a set of answers on what causes AI hallucinations. Here is my perspective:

1. Incomplete Understanding of the Customer

Duplicate or disconnected records, within and across data sources, lead to an incomplete understanding of the customer. This means the information is not guaranteed to be complete, current, or even correct.

If you are early in your Agentforce journey, your first use cases are likely driven by data in your CRM org. Assess and quantify if you have intentional or unintentional duplicates. Once you have proven the technology works, enhance it by leveraging Data Cloud’s identity resolution to build a unified profile. This will be key to having a complete, consistent, current, and even contextual understanding of your customers.

2. Missing Data, Often Due to Customer Adoption Challenges

Incomplete or unreliable source objects undermine the quality of AI outputs. For example, if a field is not populated and you are using that field in your prompt, AI engines may try to guess, leading to hallucinations.

Profile your source objects, focusing on recent periods, such as current plus one to three years.

  • Identify which fields are reliably populated. For example, if you have a picklist or string field that only has one value and that is the default in your source, the data is probably not reliable.
  • Review fields with less than 100, up to 250, distinct values, especially if they are well populated, and see if the top and bottom values reflect what you’d expect.

A Sales Cloud customer had Opportunity Stages that were 100% populated. However, reviewing the Stage value distribution uncovered two interesting data points:

  1. Won vs. Lost ratio was almost even, despite this industry typically experiencing a ratio of 20/80. Either they were really good at selling, or many opportunities were not being entered into Salesforce.
  2. There were more deals in negotiation than prospecting, indicating an inverted funnel. This further demonstrated sales reps were likely not to enter deals into Salesforce until much later in the cycle.

Such gaps would impact forecasting, sales rep productivity analysis, and AI delivering on its full potential by having access to complete data.

Use profiling insights to identify fields that are consistently populated with reliable data as part of your prompt design (or even identity resolution rules) so your agents can produce consistent results.

Also, leverage the same profiling insights to optimize your page layouts, picklist configurations, and help text to prioritize so your users have an easier time capturing data and understanding the purpose of your key fields.

3. Fields with Incomplete Metadata

Without proper metadata definitions or sensitivity classifications, AI can misinterpret how to use the data.

Identify fields that matter to business outcomes by profiling your source objects when business transactions were completed successfully vs. unsuccessfully. Then, analyze which fields are consistently populated during successful outcomes and which fields were more populated when outcomes were successful. This will help you quickly identify which fields are consistently populated when the desired outcomes are met so you narrow the focus of your work.

In Salesforce CRM, make sure any field you plan to use in your prompts has the Data Owner identified, who is then tasked with reviewing descriptions, metadata classifications, and field content for reliability.

4. Outdated Data Skewing Results

Older, no longer relevant data can misguide AI models and results. For example, you may have ten years of case history with case resolutions based on older knowledge articles and interaction channels.

Rule of thumb: Always start with data that you know to be recent, relevant, and reliable.

For in-scope data sources, review your data retention policy and see if you are archiving and purging data per that data retention policy. There are plenty of backup and archival solutions in the ecosystem – including Own, Capstorm, and many others – that can ensure your data remains available but not stored in your CRM. This improves usability by hiding irrelevant data in search results, optimizes costs by both saving on CRM storage and reducing data processed in a consumption-based model, and also reduces your compliance risk.

5. Insufficient Post-Deployment Monitoring

Data in applications, business processes, and the people who interact with the data are always changing. To ensure AI is delivering its intended business value, you need to monitor data reliability and respond to any unexpected changes.

What do you monitor for?

  • If you have data sources that help you create a unified profile in Data Cloud, identify disproportionately occurring values that may indicate data quality challenges (my favorite fake email is na@na.com), leading to bad data unification.
  • For fields that are powering your prompts, assess if your fill rate or distinct density changes significantly, e.g. a 5-10% difference across time periods which could be monthly or quarterly.

Summary: Focus on One Business Use Case at a Time

If you do not already have a data governance program, starting it may feel complex and scary, with unknown costs and skill sets. If you already have a Data Governance program, they are likely understaffed.

The key to success is to focus on a single business outcome at a time, identifying the fields that matter to your business outcomes and ensuring the data and metadata are fit for purpose.

Do monitor your data and metadata after you baseline it, so you can focus your attention on the next business need vs. re-fixing the same content from before.

Do not forget to engage your end users and business stakeholders and share with them your findings about both the importance of data reliability at the source and how you are ensuring the data that powers their AI-enabled solutions can be trusted.

The Author

Mehmet Orun

Mehmet is a Salesforce veteran and data management SME, having worked with Salesforce since 2005 as a customer, employee, practice lead, and partner. Now GM and Data Strategist for PeerNova, an ISV partner focused on data reliability, as well as Data Matters Global Community Leader.

Leave a Reply