Testing Agentforce: Strategies for Impactful QA

Quality assurance (QA) is a critical part of software development, and Agentforce QA is no exception. Unlike traditional scripted testing, testing AI-powered agents introduces interesting challenges to the testing process. Natural language variations impact the ability to test consistently across different scenarios, which creates the need for testing models to be more flexible. AI testing must be able to account for a wide range of inputs and behaviors while maintaining structured validation.

This makes a well-designed QA process crucial. Ensuring that AI-powered agents perform effectively and reliably can be the difference between a seamless user experience and a difficult user experience riddled with inconsistencies. Inadequate testing can lead to a frustrating user experience, inefficiencies, and potentially, missed opportunities for automation. On the other hand, a well-designed QA process can lay the foundation for success, helping architects refine their agents for optimal performance.

Foundational Components: Get the Ball Rolling Before QA

Before jumping into QA, keep in mind that Salesforce offers several excellent resources on their help page to guide your AI journey. Among many other fantastic articles, Considerations for Agents and Troubleshooting Agents offer foundational information and insights into preventing common issues before they become major roadblocks, including best practices for writing input instructions and action instructions.

In addition, the Agentforce Testing Center provides a structured, automated method to validate agent behavior before involving human QA testers. Taking the time to familiarize yourself with these and other helpful resources and being mindful of best practices when building your agent’s initial configurations will streamline the testing process and improve overall effectiveness.

Assemble and Empower an Amazing QA Team

A strong QA team is more than just a group of testers; it’s a team effort that requires planning, communication, collaboration, and creativity. To maximize the impact of your QA team, consider these seven key elements, with a few comments specific to some of my team’s recent projects.

1. Plan Ahead

Establish a QA schedule that allows enough time for thorough testing without unnecessary bottlenecks. A well-defined schedule with testing cycles ensures that testing is not rushed and allows time for iterative improvements.

Agents must be taken offline to make updates to instructions or actions; this will halt testing. Be aware that adjustments to an agent require tests to be restarted or rerun.

2. Define Success

Clearly define what a successful test looks like and ensure your QA team is aligned. If there is no agreed-upon definition of done, testers may approach testing aimlessly, leading to inconsistent results and incomplete coverage.

Our team initially used a ‘Pass / Fail’ status to categorize success but ended up adding ‘Partial Pass’ to account for longer testing scenarios where the majority of the acceptance criteria were met.

3. Encourage Communication

Ongoing interaction via a preferred channel helps keep your QA team in tune with one another and can help get them unstuck faster. A lack of communication can result in redundant efforts, missed insights, and can slow down progress.

Slack was a great resource used by our testers for communication and collaboration. Keeping an eye on the channel also helped the build team preview a list of potential refinements they could expect to see at the end of a testing cycle.

4. Build a Diverse Team

A mix of backgrounds, Salesforce experience, and project roles can help ensure a variety of user perspectives. A diverse team provides broader insights into user behaviors and potential issues, leading to more comprehensive testing.

Four individuals similar in background and experience may perform a test nearly identically, but four individuals with less in common may provide more variety in the way the test is run and the types of questions they ask, which can translate to a more thorough test of the agent’s capabilities.

5. Draft Tests Strategically

Provide clear guidelines for the testing scenario with sample utterances, but allow room for testers to explore and be creative. Rigid instructions can limit the ability to uncover unexpected behaviors, whereas flexibility encourages real-world scenario testing.

Once an agent is running ‘in the wild’, you will have very little control over what your end-users enter as utterances, making fully scripted testing a less-than-ideal fit.

Encourage testers to be creative and to try different variations of utterances, including using misspellings and missing punctuation, to see if they remain successful.

6. Track Test Interactions

Encourage QA testers to document full prompt paths and responses, using screenshots to highlight agent behaviors. This will provide a clearer understanding of results, which can lead to faster troubleshooting and refinement.

In Agentforce testing, a test marked as ‘Fail’ without reproducible steps or a screen capture will not provide enough actionable information for the build team.

In the first version of the QA template, we included seven fields:

Step #
Directions
Sample Inputs
Exact Input
Expected Output
Exact Output
Status (Pass, Partial Pass, Fail)

Testers were given examples of how to properly fill out the template, and we dedicated a meeting specifically to align on this particular process.

7. Analyze Results Effectively

Compile completed test results and review the data to see if any patterns and trends exist that should be addressed. Without sufficient analysis, valuable insights can be overlooked, and recurring issues may persist.

After completing each testing cycle, a summary was compiled from the completed tests of all QA users. This allowed the build team to more easily identify inconsistent behavior for each test and determine refinements and next steps. It can also help identify opportunities to improve tests, guidance, and communication.

Key Takeaways from the QA Process

Throughout the testing process, teams can uncover valuable insights that can refine the approach.

1. Communication Is Key

End-of-day check-in meetings and a dedicated Slack channel were essential to keeping testers aligned and actively collaborating. Without consistent communication, important findings might be missed, and testers could end up working in isolation.

Brief, daily standups provide a quick way to touch base and address any immediate issues or questions.

Quick, ad hoc syncs were used to align on changes, such as updates to the agent or modifications to the testing template, and ensure that everyone was on the same page and that changes were understood and implemented without issue.

Utilizing Slack to keep an open line of communication was helpful to share new or different agent behaviors, clear up any questions about tests or the agent, and compare notes.

2. Encourage Exploratory Testing

Unlike traditional tests that follow a strict path, agent testing requires a more fluid approach. Instead of dictating rigid steps, outline the scenario and define the objective, provide sample utterances, and then allow testers to ultimately determine their own path.

This method not only evaluates how well your agent interprets varied utterances but also ensures that testing reflects more of the diverse perspectives of your user base.

A sample of a single, simple test is below. QA testers would run through many variations, which could include:

Introducing different ways of requesting information or changes: This includes using sentences instead of lists or shuffling/rearranging the request to test the agent’s flexibility and understanding.
Using synonyms, misspellings, abbreviations, bad grammar, and sloppy or nonexistent punctuation: This tests the agent’s ability to handle real-world user inputs, which are often imperfect and varied.
Skipping steps to see if the agent was able to course-correct: This tests the agent’s ability to handle unexpected or incomplete input sequences and recover gracefully.

Let’s look at how we ran this test using our methodology.

Firstly, ask to update a product and specify fields and values to update. You can update multiple product lines at a time and multiple fields on each line or on separate lines.

It’s important to note that when updating the Quantity, you must also specify the Sales Price (If you do not specify a Sales Price, the Sales Price will be adjusted to equate to the Total Price / Quantity). The agent should warn you if you miss this step.

If the agent did not successfully display a list of current Opportunity Products after your last add, you may have to request to show the products again to ensure proper context from a refreshed list. When updating a range of lines, state ‘1 through 8’ rather than ‘1-8’; the agent tends to treat a dash (-) as a minus sign.

Here is a sample input:

Update lines 8 through 10
Quantity = 6
Sales Price = 500
Add to the Scope of Work = “Product application at perimeter of apartments”
Work Order Frequency = Bi-Monthly
Start Date = 03/01/2025

Here is the exact input:

Update lines 1 through 3
Qty = 12
Site = new value
Sales Price = 250
Frequency = Monthly
Start Date = 02/01/2025

The expected output is the correct, corresponding records that should be updated with the accurate field values. The agent should display the updated records in the conversation window in a plain-text, numbered list. Occasionally, the agent will display the entire list of products after an update; this is acceptable behavior.

This is the exact output:

The test results show a partial pass – data was saved correctly, but the confirmation display was inaccurate.

3. Be Strategic About Modifications

If several elements of an Agent’s instructions or actions are changed at one time, diagnosing any resulting issues can be more challenging. Take an incremental approach when introducing changes to keep chaos to a minimum.

My team’s change control process looked something like this:

Document the current state: Before making any changes, record the existing instructions for the topic and agent actions. Keep these records in a “topic diary” for reference.
Modify and test in preview: Make the necessary changes to the instructions or agent action and thoroughly test them in the Conversation Preview.
Determine the complexity of the change:
- If the change was simple and worked as expected, you can go back to step 1 and introduce further changes.
- If the change is complex, use the Testing Center to validate it extensively before making any additional changes. This helps to ensure that complex changes are thoroughly vetted and minimizes the risk of introducing further issues.
- If the change didn’t work, you have a copy of the original instructions that you could copy back to the agent. Be sure to retest!
Utilize the Testing Center for incremental validation: Before handing changes over to testers, use the Testing Center to conduct quick smoke tests. This can help identify major issues early, reducing wasted time for human testers.
- In an ideal situation, this would work perfectly every time. However, not only did we have a piece of functionality that was more difficult to test using the Testing Center due to its complexity, but it would also tend to pass Conversation Preview tests even when we tested with context, for example, as a particular user or contact. Do your best to keep the noise away from your QA testers, but manage expectations with exceptional communication.
Guard against testing fatigue: Seeing less testing output? Only seeing “Pass” or “Fail” with no notes? Keeping testers engaged and motivated is essential for a higher quality testing process. Something as simple as having them shift to pair testing or reviewing the results of other testers can change their perspective and help them regain velocity.
- Pair testing can transform a solitary testing experience into a collaborative, more engaging one, which can help keep testers motivated. Gamifying the process can add a dose of fun and creativity to testing tasks. Consider challenging testers to discover the most efficient conversational flow or encouraging them to share the most unexpectedly effective utterances.
Document everything: Because Agentforce currently lacks version control (although improvements are expected soon), maintaining a structured build diary is essential. Aligning instruction versions – for topics and agent actions – with successful tests helps maintain clarity on which features were working with which versions.
Adapt and evolve: A QA template is not set in stone – adapting the process can help to unlock stronger results and more effective testing strategies. Being open to iterative improvements to the QA process ensures you are making the most out of a tester’s time and are more likely to receive the feedback you need to make efficient progress.
- We enhanced the QA template with additional fields and a way to track path variations for different utterances, which was a change proposed by one of our testers. This improvement was especially useful as we moved into the later phases of testing, where chaining utterances and improving efficiency were critical.

Final Thoughts

Successful Agentforce QA requires a balance of preparation, structured processes, and adaptability. AI systems are continuously evolving, and with them, the challenges of testing will grow more complex. We found that the key to effective QA testing is not simply identifying issues but encouraging a mindset of collaboration and continuous learning.

A rigid approach may provide short-term validation, but long-term success depends on the ability to approach testing with a combination of flexibility, testing governance, and communication. By staying agile and committed to improvement, teams can ensure that their AI-powered agents perform reliably and effectively in real-world scenarios.

Articles by role:

Featured

Articles by role:

Featured

UPCOMING EVENTS

Can You Trust AI With Your Salesforce Data? New Research and Best Practices

Connections 2026

London’s Calling 2026

Resolve Social Media Cases Faster Using Agentforce and Service Cloud

Why You Need a Living, Breathing Salesforce Data Dictionary for AI Success

Testing Agentforce: Strategies for Impactful QA

Foundational Components: Get the Ball Rolling Before QA