DevOps / Artificial Intelligence

How to Safely Deploy AI-Written Code in Salesforce

By Holly Bracewell

AI coding tools lead to dramatically increased developer throughput. Agentforce, Cursor, Claude, Codex, and Copilot are helping Salesforce teams ship more code faster than ever. But as a result, code review times are also increasing substantially, as recent data shows. While AI boosts developer output, it’s creating a bottleneck at the quality gate.

There’s also the volume problem. AI generates verbose code – hundreds of lines at once. Senior developers who’d find ten issues in ten lines of code often just tick “looks good” when handed 500 lines to wade through, especially when it’s code they haven’t written. Teams can’t afford rubber-stamping just because code came from AI.

This article explores how to combat these bottlenecks, with best practices and actionable advice on how to scale AI safely without sacrificing quality.

The Salesforce-Specific Challenges of AI-Generated Code

AI-generated code should go through rigorous review regardless of what platform you’re developing on – but the nuances of Salesforce development mean that AI-generated code can carry extra risks.

  • Understanding metadata complexity: AI tools without deep org awareness lack the architectural context of the Salesforce platform and your org. Does your AI know whether your org is trigger-based or has migrated to Flows? When it suggests creating new objects, does it understand the knock-on effects on org limits you’re already hitting?
  • Shallow testing: AI-generated unit tests often create “happy paths” that don’t fully stress test your code. They check what happens when everything goes right, but can tend to skip the scenarios where things are more likely to go wrong. For example, they don’t test what happens at boundary conditions (like off-by-one errors), permission restrictions (what happens when a user without proper access tries something), or error handling. In our own experimentation with AI-generated unit tests, an Agentforce test achieved 94% code coverage, which AI treated as good enough despite being given a clear organizational standard of 95% minimum coverage.
  • Governor limits ignorance: Because AI tools are unaware of the Salesforce multi-tenant architecture, AI tools trained on traditional languages don’t always understand concepts like bulkification or SOQL query limits, which are fundamental constraints when working with the Salesforce platform.
  • Hard-coded traps: AI often generates code that includes hard-coded values, like record IDs or URLs tied to your dev sandbox. AI doesn’t automatically understand this Salesforce deployment pattern, so it can’t identify that a value that works in dev won’t exist in UAT or production, and won’t flag that it should be replaced with a Custom Metadata Type, Custom Setting, or environment variable.

None of this means AI tools aren’t useful. It means their output needs to be reviewed with Salesforce-specific knowledge and with these patterns in mind.

AI-on-AI Review Compounds Risk

The instinctive response to dealing with a bottleneck of pull requests to review is to use AI to solve the problem. However, done incorrectly, this can amplify the risk of vulnerabilities getting through your safety nets.

READ MORE: Who Owns the Risk When AI Writes Your Salesforce Code?

As we’ve all learned by now, AI generation is probabilistic by nature – the same prompt won’t always produce the same output. Stack a probabilistic review tool on top of a probabilistic generation tool and risk compounds. When it comes to the review stage, you need a deterministic layer that applies the same rules consistently and returns the same result every time.

Essential Guardrails When Reviewing AI-Generated Code

Given all of this, what does a review process actually need to do when AI is writing the code? It’s about implementing tools and processes that enable you to move fast without accumulating unseen risk.

  • Deterministic validation: Rules should be applied consistently, with the same result every time. That’s what prevents your code quality from being slowly eroded by a tool that’s deciding what counts as close enough.
  • Full platform context: Reviewing shouldn’t be done in a vacuum. There needs to be an understanding of how components interact across an org’s entire configuration and code setup – Apex, Flows, permissions, sharing rules, Agentforce integrations. A tool that only reads one file at a time, or only covers Apex, is checking just a small portion of what can go wrong.
  • Focus on what’s new: Surfacing 200 legacy violations every time a developer changes five lines is just adding noise that’s unhelpful and will get ignored. Review tooling should isolate the impact and issues introduced by the current change to keep reviews focused and actionable.
  • Risk-appropriate gates: Not all code carries the same level of risk. Payment processing logic and a cosmetic UI change warrant different levels of scrutiny. A good review framework lets teams configure what type of changes trigger a hard block versus a warning, based on their own priorities rather than generic best practices.
  • Human judgment layered in: Complex issues should still be flagged for human review. The goal is to find tools that reduce noise and save developer time, but not to hand off judgment entirely. Human reviews are still essential.

As Geoffrey Vauzefornier, SFXD founder, noted in the Abridged Spring ’26 release notes

“The architectural principle here is treating AI as a suggestion layer with deterministic business logic validation—not as a production-critical autonomous system. Resist FOMO. Evaluate your actual needs. Build governance frameworks before building agents. And maybe wait for things to, you know, actually leave Beta before betting your business processes on them.”

Geoffrey Vauzefornier, Founder, SFXD

Best Practices for Safely Building AI Into Your Development Processes

Choosing the right tool stack is only part of the equation for successful adoption of AI. How you introduce AI in the first place is also a determining factor for success.

READ MORE: Why Most Salesforce Teams Get Artificial Intelligence Wrong

The following steps might seem like a big upfront commitment, but investing the time to get it right sets you up for long-term success, releasing securely at pace – it lets you scale AI use with confidence rather than accumulating risk that slows you down in the long run.

  • Phase it in: Start with lower-stakes uses before expanding AI’s role into more complex areas of development. AI trust takes time to earn, and one hallucinated production bug can undo months of it, so build trust over time.
  • Treat AI like a junior developer: It’s important to outline coding standards before AI begins generating anything, then make those standards available to the AI upfront, the same way you’d onboard a new team member. The more context it has, the less correction is needed.
  • Track your quality violations over time: The review stage should also be a source of insight into how effectively your process is working. Violations that keep recurring are a signal worth acting on. It might mean refining your prompts, updating your conventions, or identifying a training gap. 
  • Know where your data is going: If you’re using a public LLM, your codebase is leaving your environment. That may be fine, but there should be a deliberate and clear decision on it, not an assumption. 

Final Thoughts: The Secure Way to Leverage AI-Assisted Development

Trust in AI takes months to build, and one hallucinated production bug can tear up that trust in an instant. As you scale AI in Salesforce development, remember: velocity without guardrails just means you’re accumulating risk faster. 

Shift your quality checks left, layer your defenses, and be wary of letting AI mark its own homework.

The Author

Holly Bracewell

Holly is a Technical Author for Gearset, the leading DevOps solution for Salesforce.

Leave a Reply