Artificial Intelligence / Admins / Data

Using Generative AI to Clean Your Data In Salesforce

By Robert Gelo

The significance of accuracy in the field of data management cannot be overstated. The development of artificial intelligence (AI) technology, particularly generative AI (GenAI), has made it possible for nearly anyone to use sophisticated tools to easily clean and validate their data within Salesforce.

By using GenAI, Salesforce data cleaning may be streamlined, and high accuracy and efficiency can be achieved. Organizations can greatly improve the quality and integrity of their data by using this cutting-edge technology. With access to GenAI, you can eliminate a significant portion of labor-intensive manual data cleaning procedures, which are especially time-consuming when working with large datasets.

GenAI for Data Cleaning

GenAI platforms, such as Microsoft Copilot, Google Gemini, or OpenAI’s ChatGPT, can speed up your data cleaning process. You don’t need to be skillful in Python, but if you know how to ‘talk’ to the GenAI tool, you’ll achieve results in a very short time. Data cleaning involves dealing with missing values, inconsistent formats, and errors that can skew analysis and lead to incorrect conclusions.

Manual data cleaning is time-consuming and prone to human error, making it an inefficient solution for large datasets. As your primary tool, you can still use Google Sheets, Microsoft Excel, or Apple Numbers, but GenAI will deliver the data you need to update or enter into your datasheet rapidly.

Creating Prompts for GenAI

What’s more, using GenAI doesn’t have to cost anything. The test database’s data, which is stored in XLSX and CSV files, has been cleaned using the most widely used GenAI platforms. While the conventional “find and replace” spreadsheet feature could have been useful in this situation, the AI approach is far more direct, efficient, and easy to use.

Traditional data cleansing strategies involve manual inspection, rule-based cleaning, and statistical methods. While these methods can be effective, they often require significant time and effort. They may also fail to capture complex patterns and dependencies in the data. As previously mentioned, you don’t need highly specialized technical skills like Python programming. Instead, you just need to learn how to feed the right prompts into the right GenAI tool.

READ MORE: Why the “Prompt” Is the Key to Unlocking AI Success

Step 1

Suppose you export an Excel file from Salesforce and you keep your client data in that file – this can be worked on row by row. 

Firstly, find the row with conflicting data after you have opened your exported file. In our case, we had two rows of data that described the location and status of our leads that needed to be cleaned. 

The problematic row can then be located and fed into the GenAI. In our case, we used OpenAI’s ChatGPT 3.5, Google Gemini (formerly Bard), and Microsoft Copilot. It’s advisable to use two or more GenAI platforms simultaneously so that you can compare the outputs and select the best outcomes. 

Step 2

Secondly, create the AI prompt. It doesn’t have to be long or complicated – just make sure that you describe exactly what you want, and that the GenAI will be able to interpret, understand, and execute with precision. 

In our case, we used the following prompt in Microsoft Copilot:

Note: Typos are automatically detected and fixed.

You’ll notice that we copied our problematic row from Excel right below the prompt. This way, the AI will understand that we only want to keep three categories and that it needs to replace “On Fire!” with “Hot”.

In your prompt, it’s essential to specify the format in which you require your data to be returned. Here, all we needed was one simple text row that we could quickly copy back into the Excel file. 

Copilot provided a quick explanation along with the accurate data on the first attempt.

Step 3

Now, you need to carefully assess the data returned by the AI. Verify that there are no AI hallucinations and that only the necessary categories are present. 

Remember, for the AI system to understand what you want, you must train it. This implies that errors might still occur and that the AI, or the algorithms built into it, would need to learn and use sufficient reasoning along the way.

Step 4

Finally, copy the text row the AI returned into the problematic row in your Excel XLSX file. Following that, you’ll be able to import the CSV file into Salesforce. 

Copilot vs. Gemini vs. ChatGPT

As previously mentioned, we used OpenAI’s ChatGPT 3.5, Google Gemini (formerly Bard), and Microsoft Copilot. When you use multiple platforms for the same tasks, you may assess each platform’s performance and compare capabilities. Remember, these outcomes are only as good as your prompts and the degree to which you can train algorithms to provide useful results. This dynamic environment is always changing.

While Gemini hallucinated slightly, we discovered that Copilot and ChatGPT produced useful results for this basic task. Furthermore, ChatGPT generated results without providing an explanation, whereas Copilot provided a comprehensive explanation of how it arrived at its conclusions. It’s worth noting that ChatGPT operated quickly, returning results in a matter of seconds in comparison to Copilot. 

As they were the most precise and comprehensive, we decided to use the results from Copilot in this particular case. Of course, free versions can have restrictions, however, it appears that these basic tools can still accomplish a lot. For example, while Copilot won’t produce 50,000 lines of random data, it will provide you with the Python code to do it.

Importing Clean Data in Salesforce

Once you have your clean lead data prepared and available in CSV format (it may be data about your accounts, leads, opportunities, or something else), you can launch the Data Import Wizard in your Salesforce app.

Launch the Data Import Wizard

  1. In the upper right-hand corner, click the setup gear Setup icon and choose Setup.
  2. Initiate the wizard by typing “Data Import Wizard” into the Quick Find box in Setup and choosing Data Import Wizard.
  3. Read the welcome page and click Launch Wizard.
  4. The object-specific homepage’s Tools list also offers the option to open the Data Import Wizard. Data Import Wizard can also be accessed by non-administrators via their personal settings.

Choose the Data You Want to Import

  1. Click Standard Objects to import articles, contacts, leads, solutions, person accounts, and accounts. Click Custom Objects to import custom objects.
  2. Indicate if you wish to change records that already exist in Salesforce, add new records, or add and update records at the same time. Selecting Add new and Update existing records activates workflows that add objects during import; selecting Update existing records does not.
  3. As needed, specify matching and additional criteria. To find out more details about each choice, move your mouse over the question marks. Match by Name for updates, and upserts of custom objects is case-sensitive.
  4. Indicate if workflow rules and procedures should be triggered when the imported records satisfy the requirements.
  5. Drag the CSV file to the designated location to identify the file containing your data.
  6. Click Next.

Map Your Data Fields to Salesforce Data Fields

  1. To view a list of standard Salesforce data fields, navigate to the fields section of the object’s management settings.
  2. Find the fields that are not mapped by going through the list of mapped data fields.
  3. Under each field that isn’t mapped, click Map.
  4. Select up to ten Salesforce fields to map to from the Map Your Field dialogue box by searching for them, and then clicking the Map button. 
  5. Click Change to the left of the relevant field to alter mappings that Salesforce carried out automatically. After selecting the Salesforce fields you wish to map and deleting the ones you don’t, click Map.
  6. Click Next.
  7. Evaluate the details of your import on the Review page. Click Previous to specify your mappings and go back to the previous page if you still wish to import unmapped fields.
  8. Click Start Import.

Typically, comma-separated CSV files are imported.

Fields that need to be mapped are indicated clearly in the interface.

In this example, you want to transfer the original CSV header “Address Line 1” to the Salesforce CRM-recognized “Street”.

Keeping GenAI at Bay

Implementing GenAI for data cleaning offers several benefits, such as:

  • Increasing efficiency by automating the cleaning process. 
  • Improving accuracy by learning complex patterns in the data.
  • Providings scalability for large datasets. 
  • Reducing bias by avoiding manual intervention in the cleaning process.

With GenAI, the data cleaning process can be achieved with just a few clicks; you simply train the model on your data, use it to clean your data, and then import the cleaned data back into your system. This process can be automated, making it easy to maintain clean data over time.

However, it’s important to consider factors such as data quality, algorithm selection, and data privacy before implementing GenAI for data cleaning.

Summary

While GenAI can be a powerful tool for data cleaning, it’s not a silver bullet. It should be used as part of a broader data management strategy that includes manual data cleaning and quality control processes. Most importantly, always remember to review and verify the AI’s output before making changes to your data.

The Author

Robert Gelo

A journalist and editor with a technical background, Robert has gained experience in marketing, advertising, public relations, and social media. He was introduced to Salesforce as the CMO of a consultancy.

Leave a Reply