DevOps / Artificial Intelligence / Releases

Agentforce Testing Best Practices: How to Ensure Reliable Deployments in Salesforce

By Timo Kovala

Developing an Agentforce agent is a feat of architecture: you must master programmatic (Apex) and declarative (Flow) development, data modelling, Salesforce Platform and Data Cloud configuration, and prompt engineering, not to mention navigating the intricacies of LLMs. A lot can go wrong, making systematic testing all the more important. 

Too often, teams end up building their own workarounds just to get a reliable deployment out the door. Last-minute changes to instructions labeled as “hot fixes,” or agents with “simulated” end-to-end testing being approved for deployment. We should be asking fundamental questions: what are we testing, why does it matter, and how do we know it works in production? In this article, we explore what ”good” looks like in Agentforce testing and deployment.

What Should You Test?

The easy answer is: everything. However, in reality, we must cope with uncertainty, limited data, and tight timelines. We try to optimize lead time to production without cutting corners. Agentforce developers need to adopt a practical mindset to testing: when you cannot test everything all the time, you need to run focused tests on the most impactful areas.

Agentforce developers face unique challenges that aren’t solved simply by bolting on another CI/CD tool or following past best practices. The built-in Agentforce Testing Center is useful for initial batch testing, but it doesn’t solve deeper issues, such as unrepresentative test data, or the fact that many agent behaviors are hard to simulate outside of production.

Agentforce requires us to rethink our DevOps process. The general rule of thumb still applies: if a test can be automated, do it. However, several areas of Agentforce testing require manual user input. Some tests can be run accurately in a partial copy sandbox or even a scratch org, but others require production data and usage context. Additionally, full agentic end-to-end testing requires various methods and tools, some more familiar than others. In the table below, you will find some tests that are crucial for successful Agentforce deployments.

Test ItemMethodEnvironmentToolsNotes
Agent settings and configurationManual and batch testingScratch Org, Dev, UAT/StagingAgentforce Builder, Testing Center, manual UATEnsure the agent has access to the required records, fields, settings, and other metadata
Topic scope and classificationManual and batch testingScratch Org, Dev, UAT/StagingAgentforce Builder, Testing Center, manual UATVerify topic classification even if the agent outcome is correct
Apex class logicUnit tests with assertionsScratch Org, DevApex Testing Framework, VS CodeFocus on edge cases and governor limits
Flow logicManual testing with multiple users and scenariosDev, UAT/StagingFlow debugging tool, Salesforce UI, Developer ConsoleSimulate Flow usage as part of an agent action with realistic data
Prompt templatesManual and programmatic testingDev, UAT/StagingPrompt Builder, Debug logs, VS CodeUse a testing sheet for qualitative test feedback from testers
Variables and filtersManual testing with multiple users and scenariosDev, UAT/StagingAgentforce Builder, Testing Center, manual UATEnsure topic and action filters are applied correctly in various scenarios
Escalation to a humanManual and batch testingDev, UAT/StagingAgentforce Builder, Testing Center, manual UATEscalation should be consistent in both typical and edge cases
Custom guardrailsManual and batch testingDev, UAT/StagingAgentforce Builder, Testing Center, manual UATValidate custom guardrails written into topics, actions, and instructions
Agent UI behavior (employee-facing agents)End-to-end testingDev, UAT/StagingProvar, Selenium, LWC test frameworks (Jest)Validate chatbots, Screen Flows, and LWCs used with agents
Web or mobile deployment (customer-facing agents)End-to-end testingUAT/Staging + website/mobile app test environment, ProdBotium, TestMyBot, Jmeter, testRigorFocus on the UX and interaction with Salesforce data
Deployment validationPre-deployment checks and smoke testsUAT/Staging, ProdAutoRABIT, Copado, Flosum, Gearset, DevOps CenterAlways validate post-deployment behavior
Data integrityTest with anonymized or synthetic dataDevData Loader, Workbench, SOQLEnsure no data loss or corruption during deployment
Data securityEnd-to-end testing, continuous monitoringUAT/Staging, ProdEvent Monitoring (Shield), ArovyEnsure agents and integrated apps can only access data they are allowed to
IntegrationsMock and live endpoint testing, continuous monitoringSIT, UAT/StagingIntegration platform (e.g. MuleSoft), Postman, ArovyValidate third-party handoffs and error handling

Why Is Testing More Important Than Before?

Too often, we treat testing and deployment like a checkbox exercise – something to rush through once the “real work” is done. But if you’ve ever watched a seemingly solid release unravel in production, you know that this mindset doesn’t hold up. Testing has never been something to overlook, and with AI agents now in the picture, it’s more important than ever.

Before Agentforce, we could at least count on user feedback for early bug detection. With autonomous agents, we may not have that luxury. Seemingly foolproof instructions may cause hallucinations in production that, if left unchecked, can lead to customer dissatisfaction and loss of revenue. The truth is, Agentforce deployments are complex, and the stakes are high, especially when customer-facing agents are involved. The complexity arises from several factors, which we explore below.

Architectural Complexity

Agents are complex architectural components that house topics, instructions, action definitions and reference actions, and the agent’s own configurations. On the system level, agents consist of third-party AI models integrated with Salesforce Platform and Data Cloud through the Einstein Trust Layer. The trust layer is a tightly coupled set of features that enable secure and trusted AI grounding with first-party data.

Considering the above, Agentforce testing deserves special attention; copy-pasting previous Salesforce testing procedures simply doesn’t cut it. 

Non-Determinism

One of the main challenges for testing is that agents are non-deterministic by nature. The Atlas Reasoning Engine that confers agents the ability to “think” can follow complex instructions, but there will always be slight variations in outputs, which you would not find in a solution driven by rule-based logic, such as a Flow or Apex class. As a result, there is always a risk of unexpected agent behavior, emphasising the role of continuous post-deployment monitoring.

Autonomous Operation

”Human in the loop” works differently with Agentforce than traditional process automation. In principle, AI agents handle most of the work on their own, but they should know when to check in with a person. If something’s unclear, risky, or outside the usual, the agent asks a human for input before moving forward. Humans help steer the big-picture decisions, while agents take care of the details. All this behavior must be covered extensively during testing, of course. This is why edge cases are so important when testing agent guardrails.

Credit Consumption

One factor that impacts both the ”why” and ”how” of Agentforce testing is credit consumption. Whether your contract follows the older conversation-based or Flex Credits model, the bad news is that testing will cost you credits, whether done within a sandbox or production org. Testing with Agentforce Testing Center or Testing API also consumes credits.

This means that any AI and data usage should be carefully monitored during testing to avoid unintended usage spikes. However, credit usage shouldn’t stop you from proper end-to-end testing. Untested topics, instructions, or actions will likely far exceed the costs of credits consumed in a controlled test sample. You simply have to be mindful of data and action usage when creating test cases.

How Do You Know It Works in Production?

To be perfectly honest, deploying agents to production scares me every time. Maybe it’s the fact that the technology is so new, or that there are so many unknowns. It could be the non-determinism of agentic behavior – that it’s impossible to say for certain how the agent will behave. Whatever the reason may be, being on your toes is probably for the best. After all, being alert is probably your best asset for successful agent deployment.

A successful Agentforce deployment isn’t just copying metadata from one org to the next. Old benchmarks like passing test cases or functioning API integrations aren’t enough. With AI agents, success means they handle real tasks, show good judgment, and know when to ask for help. Just because an agent produces expected outcomes in a staging environment doesn’t mean it will do so in production 100% of the time. In this sense, an agent deployment can be considered a success only when it stands the test of time.

In the restaurant industry, they say that a messy kitchen leads to messy dishes. I find that this applies to Agentforce just as well; if your environments are a mess, you cannot expect reliable deployments. Agents are particularly susceptible to inconsistencies between environments. Mismatching sharing and visibility settings, outdated data, or unsynchronized Flow and Apex class versions will seriously undermine agent deployments. It is crucial that staging and production environments are as identical in both data and metadata as possible.

Once your environments are synchronized, there are several measures you can take to ensure successful deployment. As with deployments in general, these steps are spread across three stages: pre-deployment, go-live, and post-deployment. The exact tasks and order vary between orgs, but the essentials are always the same: set measurable KPIs, validate every component individually and together, and keep a keen eye on every user story once deployed. Perhaps the hardest part is being ready to postpone or cancel a release that isn’t ready. A lot of love goes into building an agent, and letting go isn’t easy.

Below, you will find checklist templates targeted at Agentforce deployments. Feel free to modify and apply it to your org’s needs.

Final Thoughts

Agentforce testing and deployment isn’t about ticking boxes. It’s about staying focused in a space where things break quietly and unpredictably. Common pitfalls like relying on unrepresentative test data, skipping edge cases, or assuming that environments are in sync can lead to agents that behave well in theory but fail in the wild. And when agents fail, they don’t just crash – they mislead, misclassify, or quietly erode trust.

But here’s the good news: with the right mindset, focused testing, and a bit of healthy skepticism, you can build agents that not only work, but also continuously develop to meet user needs.

The Author

Timo Kovala

Timo is a Marketing Architect at Capgemini, working with enterprises and NGOs to ensure a sound marketing architecture and user adoption. He is certified in Salesforce, Marketing Cloud Engagement, and Account Engagement.

Leave a Reply