Have you ever created a new Developer org, or spun up a new Sandbox, only to realize that you’re going to need a lot of data to test out some functionality? I’ve had this problem more times than I can count, going through the painful process of creating records 1 by 1. This is not only frustrating, but a waste of time in the grand scheme of things. (Also, why is it so hard to think of random names when you are forced to?!)
Luckily, I decided to use my old friend Google to find something that could generate data on mass that I could later import. I landed on the amazing tool, Mockaroo. Before long, I realized that it didn’t stop there; there are at least two more great options to consider! Let’s take a closer look.
Mockaroo isn’t a tool developed for the Salesforce ecosystem, but a tool designed for anyone to generate fake, but realistic data, in mass. You will see when you go to the site, the interface is super simple, you can choose your field names, choose the type of data you want to include, as well as apply filters or custom formulas.
In fact, Mockaroo has over 143 types of data. This ranges from standard fields such as first name, company, email, to some pretty fun data types such as car make, movie title, or even Bitcoin address. So if you have a custom object you are trying to generate some random data for, this is perfect.
Generating Account and Contact Data
Generating account, contact, and lead data is pretty straight forward, some default fields will already be filled in, but you can simply search for as many fields as you like to populate. I always think that demos look a lot more impressive if records are fully populated with data, so fill your boots!
Once you’ve selected all the fields you need, simply click Download Data and the CSV will be downloaded to your computer.
Generating Opportunity Data
Generating opportunity data is also very straight forward, but you get to have a bit of fun with the custom field types. As this isn’t a Salesforce tool, you will have to enter the field names and choose the corresponding field types for all of the required fields on an opportunity (name, close date, stage, and amount), as well as some other fields to give the record a bit of color. See the below as an example:
- If you’re generating data for multiple objects, e.g. accounts, contacts, and opportunities, you’ll need to import the parent first (accounts), and then provide the parent IDs of the accounts to the other records.
- Remember that this isn’t a tool created by Salesforce, so you’ll sometimes just have to choose field types that have a close resemblance. (Like using a grocery product for the opportunity name above).
- If you want your Data Loading tool to automatically map the columns in the Excel file to the correct field in Salesforce, ensure that the field name you provide in Mockaroo is the same as in Salesforce.
- You have a limit of 1,000 rows generated by the free version of Mockaroo, which should be sufficient for most demos or Sandboxes.
- You can access Mockaroo via a Developer API—check this out here.
2. Sandbox Seeding Automation
If you want the best test data for your dev environments, Salesforce sandbox seeding automation is probably the best option. Plus, it’s super fast.
What is Sandbox Seeding Automation?
Unlike the previous way of generating dummy data, Sandbox seeding automation actually replicates data from production and moves it into your Sandbox. Prodly Sandbox Seeding, for example, gives you the ability to instantly reproduce real production data into up to five Sandboxes or Scratch Orgs in one go.
The Benefits of Sandbox Seeding Information
Sandbox seeding automation offers a number of distinct benefits:
- It gives you high quality test data that’s specific to your org. The first and most important advantage to Sandbox seeding automation is that it doesn’t create random dummy data that doesn’t resemble the data in your production environment. Instead, it reproduces data from production into your org. This high-caliber test data contributes directly to the quality of your code. Of course, the best Sandbox seeding tools offer features like data masking and data redaction. These let you ensure data security and compliance at all stages of development and testing.
- You can reuse your high quality test data. Sandbox seeding solutions that provide templates give you the ability to define a standard set of test data you can use over and over again. As a result, you’ll have every test scenario covered. On top of that, because you get used to working with the same test data, it becomes easier to work with.
- It lets you create production-grade environments. The high-quality test data you get from a data seeding solution like Prodly elevates your lower-level org from a basic playground to a valuable development environment that’s essentially a mini-copy of production. When you can work in a production-grade environment, it exponentially improves the quality of your changes.
- It requires no coding. The best Sandbox seeding automation for Salesforce is developed with ease of use in mind. This means that both admins and developers can effortlessly navigate the UI – or use a quick DX command – and populate a sandbox with just a few clicks. Now, everyone on the team can now create their own high-quality environments.
- It’s fast. Data seeding automation is lightning fast when compared to manually generating test data and even most other methods of creating dummy data. All you have to do is select the data you want to replicate, redact it if necessary, and designate your destination environment(s). This frees up literally hours you’d otherwise be spending on non-development tasks – and gives you much more time to focus on higher-value work.
As with any robust form of Salesforce automation, using a Sandbox seeding solution in a DevOps approach can greatly accelerate your release cycles. In addition, because it eliminates busy work, it unburdens everyone who’s involved with setting up environments in Salesforce.
3. Generative AI
As you probably know by now, large language models (LLMs) like Bard and ChatGPT can be super helpful for Salesforce admins, release managers, and developers. One task they can really streamline is creating dummy data for Salesforce.
There are two ways to use generative AI to generate sample data, depending on your preferred work method. The first is to ask an LLM like ChatGPT or Bard to create test data you can import into your org.
The second method involves asking an LLM to write an Apex script to generate dummy data.
Use Generative AI to Create Test Data to Import Into Your Org
If you’ve never created dummy data for Salesforce before, you can simply ask ChatGPT or Bard to do it for you.
Here’s how it works with ChatGPT:
All you have to do is copy the data in the table, paste it into an Excel sheet, and then you can use Data Loader to import it into your test org.
You can do the same with Bard:
Drawbacks of Using Generative AI to Create Dummy Data Declaratively
There are several drawbacks to using generative AI to create dummy data declaratively. First, if you use this method, you’re still stuck using Data Loader to get the dummy data into your test environment. That’s fine for relatively straightforward batches, but if you need more complex data, it can get super time consuming.
Secondly, and more importantly, generative AI can’t yet ensure that the data you create is representative of the data in your production environment because it doesn’t know what kind of data that is. With low-quality test data, your entire development process is less streamlined. You have to deal with more bugs and delays, which results in frustration on your team – as well as your end users.
Use Generative AI to Write a Script for Generating Sample Data
The most efficient way to use ChatGPT or Bard to create dummy data is to ask the LLM to write an Apex script. Because both are proficient at most computer languages as well as natural language, they’re entirely capable of doing this.
Here’s an example of using ChatGPT 4.0 to write an Apex script for generating sample data:
Prompt: Please write the Apex script to generate a batch of sample data in Salesforce.
Here’s a screenshot of the response:
Now all you have to do is review the code for accuracy. If it’s good to go, you can simply copy it and paste it in the Code Editor in Salesforce. And ChatGPT 4.0 is pretty helpful here too, because it tells you exactly how to do that:
Drawbacks of Using LLMs to Generate Dummy Data in Salesforce
Unfortunately, there are several drawbacks to using LLMs to generate sample data.
First of all, you can’t use them directly in the Salesforce platform. ChatGPT can write a script for you in seconds – but you still have to review it for accuracy first. If it doesn’t need debugging, you have to copy, paste, and run the script in Salesforce. In addition, it’s challenging to get ChatGPT to write dummy data that’s actually useful.
ChatGPT 3.5, the free model, is frequently inaccurate. ChatGPT 4, the paid model, is generally both fast and accurate. However, if you’re looking to generate complex relational test data, you have to create a very detailed prompt that describes exactly what kind of test data you need.
In fact, it’s best if you follow the prompt with an example so ChatGPT understands what you’re looking for. Even then, you’ll likely have to fine tune your prompt a number of times before you get it right. All this can be extremely time consuming. Plus, if you have enough knowledge of Apex to provide an example, then you could probably write a script yourself in less time than it would take you to write and fine tune a prompt for an LLM.
Of course, there are many pre-written prompts for ChatGPT floating around the internet. Some are very helpful; others aren’t. Whether a prompt is free or you pay for a subscription to gain access to it, it’s usually a process of trial and error to tailor it to your exact needs.
Finally, even after you’ve gotten your prompt right, it’s not a given that you can reuse it every time you need test data. Depending on what’s going on in your production environment and what you’re working on, you might need data that looks completely different; that means you have to start the process over again.
Ultimately, which method of sample data generation you use is your choice. Nonetheless, before making your selection, it’s important to consider what the benefits are of each in terms of data quality, data security, ease of use, and velocity. Because by carefully weighing the pros and cons of each method, you can make an informed decision that fits your needs and helps move your projects forward.