Agentforce Testing Best Practices: How to Ensure Reliable Deployments in Salesforce

Developing an Agentforce agent is a feat of architecture: you must master programmatic (Apex) and declarative (Flow) development, data modelling, Salesforce Platform and Data Cloud configuration, and prompt engineering, not to mention navigating the intricacies of LLMs. A lot can go wrong, making systematic testing all the more important.

Too often, teams end up building their own workarounds just to get a reliable deployment out the door. Last-minute changes to instructions labeled as “hot fixes,” or agents with “simulated” end-to-end testing being approved for deployment. We should be asking fundamental questions: what are we testing, why does it matter, and how do we know it works in production? In this article, we explore what ”good” looks like in Agentforce testing and deployment.

What Should You Test?

The easy answer is: everything. However, in reality, we must cope with uncertainty, limited data, and tight timelines. We try to optimize lead time to production without cutting corners. Agentforce developers need to adopt a practical mindset to testing: when you cannot test everything all the time, you need to run focused tests on the most impactful areas.

Agentforce developers face unique challenges that aren’t solved simply by bolting on another CI/CD tool or following past best practices. The built-in Agentforce Testing Center is useful for initial batch testing, but it doesn’t solve deeper issues, such as unrepresentative test data, or the fact that many agent behaviors are hard to simulate outside of production.

Agentforce requires us to rethink our DevOps process. The general rule of thumb still applies: if a test can be automated, do it. However, several areas of Agentforce testing require manual user input. Some tests can be run accurately in a partial copy sandbox or even a scratch org, but others require production data and usage context. Additionally, full agentic end-to-end testing requires various methods and tools, some more familiar than others. In the table below, you will find some tests that are crucial for successful Agentforce deployments.

Test Item	Method	Environment	Tools	Notes
Agent settings and configuration	Manual and batch testing	Scratch Org, Dev, UAT/Staging	Agentforce Builder, Testing Center, manual UAT	Ensure the agent has access to the required records, fields, settings, and other metadata
Topic scope and classification	Manual and batch testing	Scratch Org, Dev, UAT/Staging	Agentforce Builder, Testing Center, manual UAT	Verify topic classification even if the agent outcome is correct
Apex class logic	Unit tests with assertions	Scratch Org, Dev	Apex Testing Framework, VS Code	Focus on edge cases and governor limits
Flow logic	Manual testing with multiple users and scenarios	Dev, UAT/Staging	Flow debugging tool, Salesforce UI, Developer Console	Simulate Flow usage as part of an agent action with realistic data
Prompt templates	Manual and programmatic testing	Dev, UAT/Staging	Prompt Builder, Debug logs, VS Code	Use a testing sheet for qualitative test feedback from testers
Variables and filters	Manual testing with multiple users and scenarios	Dev, UAT/Staging	Agentforce Builder, Testing Center, manual UAT	Ensure topic and action filters are applied correctly in various scenarios
Escalation to a human	Manual and batch testing	Dev, UAT/Staging	Agentforce Builder, Testing Center, manual UAT	Escalation should be consistent in both typical and edge cases
Custom guardrails	Manual and batch testing	Dev, UAT/Staging	Agentforce Builder, Testing Center, manual UAT	Validate custom guardrails written into topics, actions, and instructions
Agent UI behavior (employee-facing agents)	End-to-end testing	Dev, UAT/Staging	Provar, Selenium, LWC test frameworks (Jest)	Validate chatbots, Screen Flows, and LWCs used with agents
Web or mobile deployment (customer-facing agents)	End-to-end testing	UAT/Staging + website/mobile app test environment, Prod	Botium, TestMyBot, Jmeter, testRigor	Focus on the UX and interaction with Salesforce data
Deployment validation	Pre-deployment checks and smoke tests	UAT/Staging, Prod	AutoRABIT, Copado, Flosum, Gearset, DevOps Center	Always validate post-deployment behavior
Data integrity	Test with anonymized or synthetic data	Dev	Data Loader, Workbench, SOQL	Ensure no data loss or corruption during deployment
Data security	End-to-end testing, continuous monitoring	UAT/Staging, Prod	Event Monitoring (Shield), Arovy	Ensure agents and integrated apps can only access data they are allowed to
Integrations	Mock and live endpoint testing, continuous monitoring	SIT, UAT/Staging	Integration platform (e.g. MuleSoft), Postman, Arovy	Validate third-party handoffs and error handling

Why Is Testing More Important Than Before?

Too often, we treat testing and deployment like a checkbox exercise – something to rush through once the “real work” is done. But if you’ve ever watched a seemingly solid release unravel in production, you know that this mindset doesn’t hold up. Testing has never been something to overlook, and with AI agents now in the picture, it’s more important than ever.

Before Agentforce, we could at least count on user feedback for early bug detection. With autonomous agents, we may not have that luxury. Seemingly foolproof instructions may cause hallucinations in production that, if left unchecked, can lead to customer dissatisfaction and loss of revenue. The truth is, Agentforce deployments are complex, and the stakes are high, especially when customer-facing agents are involved. The complexity arises from several factors, which we explore below.

Architectural Complexity

Agents are complex architectural components that house topics, instructions, action definitions and reference actions, and the agent’s own configurations. On the system level, agents consist of third-party AI models integrated with Salesforce Platform and Data Cloud through the Einstein Trust Layer. The trust layer is a tightly coupled set of features that enable secure and trusted AI grounding with first-party data.

Considering the above, Agentforce testing deserves special attention; copy-pasting previous Salesforce testing procedures simply doesn’t cut it.

Non-Determinism

One of the main challenges for testing is that agents are non-deterministic by nature. The Atlas Reasoning Engine that confers agents the ability to “think” can follow complex instructions, but there will always be slight variations in outputs, which you would not find in a solution driven by rule-based logic, such as a Flow or Apex class. As a result, there is always a risk of unexpected agent behavior, emphasising the role of continuous post-deployment monitoring.

Autonomous Operation

”Human in the loop” works differently with Agentforce than traditional process automation. In principle, AI agents handle most of the work on their own, but they should know when to check in with a person. If something’s unclear, risky, or outside the usual, the agent asks a human for input before moving forward. Humans help steer the big-picture decisions, while agents take care of the details. All this behavior must be covered extensively during testing, of course. This is why edge cases are so important when testing agent guardrails.

Credit Consumption

One factor that impacts both the ”why” and ”how” of Agentforce testing is credit consumption. Whether your contract follows the older conversation-based or Flex Credits model, the bad news is that testing will cost you credits, whether done within a sandbox or production org. Testing with Agentforce Testing Center or Testing API also consumes credits.

This means that any AI and data usage should be carefully monitored during testing to avoid unintended usage spikes. However, credit usage shouldn’t stop you from proper end-to-end testing. Untested topics, instructions, or actions will likely far exceed the costs of credits consumed in a controlled test sample. You simply have to be mindful of data and action usage when creating test cases.

How Do You Know It Works in Production?

To be perfectly honest, deploying agents to production scares me every time. Maybe it’s the fact that the technology is so new, or that there are so many unknowns. It could be the non-determinism of agentic behavior – that it’s impossible to say for certain how the agent will behave. Whatever the reason may be, being on your toes is probably for the best. After all, being alert is probably your best asset for successful agent deployment.

A successful Agentforce deployment isn’t just copying metadata from one org to the next. Old benchmarks like passing test cases or functioning API integrations aren’t enough. With AI agents, success means they handle real tasks, show good judgment, and know when to ask for help. Just because an agent produces expected outcomes in a staging environment doesn’t mean it will do so in production 100% of the time. In this sense, an agent deployment can be considered a success only when it stands the test of time.

In the restaurant industry, they say that a messy kitchen leads to messy dishes. I find that this applies to Agentforce just as well; if your environments are a mess, you cannot expect reliable deployments. Agents are particularly susceptible to inconsistencies between environments. Mismatching sharing and visibility settings, outdated data, or unsynchronized Flow and Apex class versions will seriously undermine agent deployments. It is crucial that staging and production environments are as identical in both data and metadata as possible.

Once your environments are synchronized, there are several measures you can take to ensure successful deployment. As with deployments in general, these steps are spread across three stages: pre-deployment, go-live, and post-deployment. The exact tasks and order vary between orgs, but the essentials are always the same: set measurable KPIs, validate every component individually and together, and keep a keen eye on every user story once deployed. Perhaps the hardest part is being ready to postpone or cancel a release that isn’t ready. A lot of love goes into building an agent, and letting go isn’t easy.

Below, you will find checklist templates targeted at Agentforce deployments. Feel free to modify and apply it to your org’s needs.

Final Thoughts

Agentforce testing and deployment isn’t about ticking boxes. It’s about staying focused in a space where things break quietly and unpredictably. Common pitfalls like relying on unrepresentative test data, skipping edge cases, or assuming that environments are in sync can lead to agents that behave well in theory but fail in the wild. And when agents fail, they don’t just crash – they mislead, misclassify, or quietly erode trust.

But here’s the good news: with the right mindset, focused testing, and a bit of healthy skepticism, you can build agents that not only work, but also continuously develop to meet user needs.

Articles by role:

Featured

Articles by role:

Featured

UPCOMING EVENTS

Texas Dreamin’ 2026

Bharat Dreamin’ 2026

DevOps for Doc Gen: Engineering Reliable Workflows in Salesforce

Midwest Dreamin’ 2026

The Enterprise Agentforce Playbook: Real Lessons from a Production Deployment

Agentforce Testing Best Practices: How to Ensure Reliable Deployments in Salesforce

What Should You Test?

Why Is Testing More Important Than Before?

Architectural Complexity

Non-Determinism

Autonomous Operation

Credit Consumption

How Do You Know It Works in Production?

Final Thoughts

The Author

Timo Kovala

More like this:

Enforcing Salesforce Code Quality With Automated PR Comments

How to Safely Deploy AI-Written Code in Salesforce

An Overview of Salesforce’s Next-Gen DevOps Centre

Leave a Reply Cancel reply

Articles by role:

Featured

Articles by role:

Featured

What's trending

UPCOMING EVENTS

Texas Dreamin’ 2026

Bharat Dreamin’ 2026

DevOps for Doc Gen: Engineering Reliable Workflows in Salesforce

Midwest Dreamin’ 2026

The Enterprise Agentforce Playbook: Real Lessons from a Production Deployment

What Should You Test?

Why Is Testing More Important Than Before?

Architectural Complexity

Non-Determinism

Autonomous Operation

Credit Consumption

How Do You Know It Works in Production?

Final Thoughts

The Author

Timo Kovala

More like this:

Enforcing Salesforce Code Quality With Automated PR Comments

How to Safely Deploy AI-Written Code in Salesforce

An Overview of Salesforce’s Next-Gen DevOps Centre

Leave a Reply Cancel reply