Beyond the Hype: Our Experience of Building 3 Salesforce AI Apps

A year ago, OpenAI released ChatGPT4, igniting a global frenzy. Almost everyone was experimenting with it, discussing its capabilities, and fueling the AI hype of 2023. Much has been written about the potential of AI innovations in the business world. This article takes a different approach, offering an honest account of the experiences we, at Aquiva Labs, have had over the past year.

In April 2023, our “AI Task Force” met to deliberate on how to respond to (and capitalize on) the latest developments for our clients, as well as ourselves. We later outlined company-wide guidelines for utilizing AI and committed significant investment to experimenting with generative AI immediately.

I will now introduce three apps we’ve developed, as they illustrate vividly the progression of our understanding of both the potentials and the limitations of generative AI. All three are Salesforce apps, and you’ll notice that we aimed to leverage as much native functionality as possible. Each app has been crafted for genuine use cases within Aquiva Labs, and is actively used by our employees. Furthermore, they’re free on the AppExchange or as open-source on GitHub.

*Each experiment builds on knowledge from the previous one*

Step 1: Text Reasoning With LLMs from Within Salesforce

We explored numerous business use cases but ultimately discarded many due to concerns about potential hallucinations and our need for more confidence in preventing them. At one juncture, I likened a proficient AI tool to an assistant – capable of handling arduous tasks without assuming the responsibility of validating its preliminary work or making final decisions.

Drawing a parallel, I referenced the Static Code Analyzer PMD, which I ported to Salesforce in 2016. While PMD efficiently scours large codebases for problematic code, it still necessitates a skilled developer’s final assessment to confirm the issue and determine the appropriate resolution.

Following this, we swiftly conceptualized DMD (the Document Mess Detector) – a tool that harnesses LLM’s prowess in comprehending and analyzing natural language to identify issues within business documents. Our Salesforce org’s repository of PDFs proved ideal for scrutiny, encompassing contracts, Statements of Work, slide decks, and NDAs.

Designed to mimic PMD closely, DMD operates on a similar principle. Users codify rules and anti-patterns specific to document types, organizing them into document-specific rulesets. Importantly, these rules are written in natural language, which is how a business person will describe requirements and expectations for a business document. To start an analysis, users must choose a ruleset and hit “Run”. Minutes later, they receive results indicating pass or fail for each rule, accompanied by a natural language justification elucidating the AI’s decision-making process.

The implementation, as outlined below, is quite straightforward. We utilize an external API to extract plain text from PDF or Word files. This text, with the rules and some prompt engineering, is sent to OpenAI. OpenAI’s LLM is prompted to evaluate each rule and furnish a detailed JSON-formatted response, which we parse into custom objects.

DMD is heavily utilized within Aquiva Labs and is accessible in two forms: as a free app on the AppExchange, where users must provide their API keys, and as an MIT-licensed open-source repository on GitHub.

Key Learnings

Model quality influences text understanding: Higher-quality models like GPT4 result in slower response times and increased costs. We found GPT4 to deliver the best results, whereas GPT3 proved inadequate for handling complex rules. GPT 3.5 struck a favorable balance between quality, speed, and cost.
Document size poses a significant constraint: DMD struggles to process documents exceeding a few dozen pages due to various technical limitations. Initially, challenges arose with heap size and callout size, but context size limits remained a hurdle even with these addressed.
Being LLM agnostic is desirable yet challenging: Leveraging Custom Metadata and Apex interfaces, we replaced ChatGPT with alternative models such as Claude, Einstein GPT, and Meta’s Llama2 on AWS. Developing a flexible plugin model compatible with diverse models without sacrificing specificity proved a noteworthy challenge.

Step 2: Text Reasoning With RAG Using AWS Backend

As we prepared to start our second experiment, the landscape was abuzz with a new wave of tools and concepts to enable AI to learn from vast private knowledge bases and overcome previous limitations regarding context size.

One notable example was AskYourPDF.com, a platform that allows users to upload large documents and ask a chatbot to summarize, translate, or answer specific questions about the content. We developed a Salesforce adaptation dubbed Ask Your Document. The app is available on AppExchange as a Freemium app.

However, delving into this endeavor proved to be quite an adventure. The learning curve was steep, and we found ourselves grappling with a plethora of new concepts, from vector databases to storing text based on semantic meaning rather than keywords to LLMs capable of converting text into such vectors through a process called Embedding.

We learned about the Retrieval Augment Generation (RAG) concept, where an LLM reformulates search results from vector databases to mimic actual answers. Additionally, we had to familiarize ourselves with Python, deemed the perfect language for orchestrating AI-based software, and utilize LangChain, a Python framework facilitating the orchestration of various components and processes.

Moreover, to host and integrate these components with Salesforce, we turned to AWS – an incredibly powerful platform that adds complexity to the solution.

Key Learnings

Off-platform can quickly become daunting: While leveraging AWS offers immense computational power, it introduces significant complexity and costs. Proficiency in configuring, scaling, and optimizing requires a nuanced understanding.
Integration overhead often leads to extensive off-platform migration: Initially, we only intended to host the Vector DB on AWS. However, we gradually offloaded tasks such as text extraction, chunking, and searching to maintain app simplicity.
RAG is just an enhanced search with no genuine text understanding: Our initial assumption of Vector DBs to provide limitless context for LLMs was flawed. Without training a dedicated model yourself, current AI can only effectively reason within specific contextual boundaries. Consequently, a Vector DB alone couldn’t sufficiently support DMD for analyzing documents of arbitrary sizes.

Step 3: Beyond Text – Autonomous Agents Perform User Tasks

Upon the release of Ask Your Document, we were initially overwhelmed by the multitude of technologies we needed to master, coupled with the realization of certain limitations that persisted despite being at the forefront of technology. It became evident that effective AI should transcend mere text comprehension and generation.

Around this time, AutoGPT emerged, sparking widespread discussions on autonomous AI agents. I vividly recall using our Ask Your Document tool to decipher a groundbreaking paper on the ReAct pattern. The concept was straightforward: combining LLMs with classic deterministic code to automate tasks on behalf of users. The AI receives a list of tool interfaces and the user’s objective, utilizing prompting techniques to devise a plan for executing each task. Each tool’s output feeds into subsequent tool calls until the task is completed, with the user promptly informed of the results.

We had already commenced building a Salesforce Proof of Concept utilizing the existing Chat Completion API when OpenAI provided invaluable assistance. At their inaugural Developer Conference in November, they released the Assistant API, enabling us to transition 90% of the ReAct agent’s complexities from Salesforce to an OpenAI Assistant configuration.

Thus, “My Org Butler” was born – a chat UI integrated into the ever-present taskbar of Salesforce orgs, designed to assist users with typical Salesforce tasks: querying data, manipulating metadata and org settings, or invoking other Salesforce features.

Recognizing Salesforce’s robust and well-documented public APIs, we realized their potential for automating many manual tasks within an org. By leveraging the Assistant’s knowledge, we could consolidate automation efforts into a single tool.

To check the feasibility of our idea, we did not have to write a single line of code; we merely configured an Assistant in OpenAI and saw if it would work. We configured a single tool that would allow the Open AI Assistant to define and delegate arbitrary API calls to the same org with the credentials of the actual running user. We were amazed to see how well that worked.

Open AI could break down tasks into multiple syntactically correct API calls. As Open AI assistants can also store knowledge files that the assistant could use during its runs, we uploaded the Salesforce Platform API Postman Collection, a machine-readable description of all APIs we planned on using.
All that was left to do was to write a small wrapper app that displays a chatbot user interface and moderates a chat between the user and the Open AI Assistant API. Whenever the assistant wants to call a tool, it sends back machine-readable instructions to Salesforce, where the actual API call is executed, respecting the permissions of the running user.

The assistant was not only able to detect and correct errors that happened during a run but also could explain what it planned to do or ask the user for approval before a more risky action was performed. Of the three tools, I found this one the most surprising because it required minimal work to get work performed quite reliably by AI.

MyOrgButler is available as an unlocked package that is ready for use and customization as an MIT-licenced open source at the Aquiva Labs Github repository.

Key Learnings

LLMs are amazing as glue code: LLMs can orchestrate old-fashioned deterministic code. For us, tasked properly and given well-defined tools, nearly led to perfect task completion.
Tools Design will be a new art: Do you remember when architects struggled to scope APIs or Micro Services? The same story might come back with AI tools.
Not yet efficient: Although generally feasible, our own experience and various studies show that LLM-based Agents are too slow, still too unreliable, and way to expensive to replace humans in realistic scenarios.

Final Thoughts

Let’s face it: custom AI is hard. Therefore, it’s advisable to leverage native Salesforce AI features whenever possible. Our experiences have shown that integrating custom AI introduces a Pandora’s box of new and complex technologies and concepts. Without a deep understanding and effective management, these can lead to unexpected setbacks.

Custom development is an option, but for most standardized use cases, we recommend native Salesforce AI features as these features become available. You may wonder about their availability and whether and when they will transition from conference announcements to general availability. This concern is valid, especially considering Salesforce’s track record of delayed product launches following flashy announcements.

Despite the 2016 announcement, few customers and partners are utilizing Einstein AI features such as Prediction and Recommendation Builder or Next Best Action. But we are optimistic. AI speeds up everything. If you don’t deliver, the market will move elsewhere. We see how Salesforce is riding the AI wave and enabling businesses to succeed at reasonable prices.

As we advise our clients to always start with what’s possible with native Salesforce functionality, many want to create unique intellectual property that extends out-of-the-box capabilities with custom solutions, allowing them to create unique differentiators for their business. This is when working with a partner that understands the power and limitations of standard features and invests the time and resources into learning and practicing AI and adjacent technologies, would be a smart choice.

Looking at our three apps, we anticipate many of them becoming obsolete in the coming months because Salesforce will introduce native replacements for the apps or their major components:

Ask Your Document will likely be replaced by Einstein Semantic Search on Files. Additionally, Data Cloud will see enhancements with Vector Store functionality, offering a completely native and secure RAG environment.
My Org Butler will be replaceable by Custom Copilots and Copilot Actions, Salesforce’s Agent technology, which operates within your org without needing external APIs but instead leverages Apex and Flow as tools.
Only DMD remains relevant. The Prompt Builder and the LLM Gateway could largely replace its technical underpinnings. This approach would allow for easy replacement of LLMs and ensure data security through the Salesforce AI Trust Layer.

At the time of writing, we can be quite sure that none of those native tools will be free; they will require additional costs per user and per month. However, this is true for open AI, AWS, and other off-platform infrastructures. Plus additional complexity and less trust.

READ MORE: Einstein Trust Layer – What’s the “Trust Gap” Salesforce Keep Talking About?

Articles by role:

Featured

Articles by role:

Featured

UPCOMING EVENTS

How to Know Your Agentforce AI Agents Are Ready for Production

Texas Dreamin’ 2026

Bharat Dreamin’ 2026

DevOps for Doc Gen: Engineering Reliable Workflows in Salesforce

Midwest Dreamin’ 2026

Beyond the Hype: Our Experience of Building 3 Salesforce AI Apps

Step 1: Text Reasoning With LLMs from Within Salesforce

Key Learnings

Step 2: Text Reasoning With RAG Using AWS Backend

Key Learnings

Step 3: Beyond Text – Autonomous Agents Perform User Tasks

Key Learnings

Final Thoughts

The Author

Robert Sösemann

More like this:

Claude Tag Raises a Bigger Question: What Is Agentforce For?

The Real Salesforce AI Risk Isn’t Slow Adoption – It’s Exhaustion

Salesforce Admin Burnout: Is AI Making Your Workload Worse?

Leave a Reply Cancel reply

Articles by role:

Featured

Articles by role:

Featured

What's trending

UPCOMING EVENTS

How to Know Your Agentforce AI Agents Are Ready for Production

Texas Dreamin’ 2026

Bharat Dreamin’ 2026

DevOps for Doc Gen: Engineering Reliable Workflows in Salesforce

Midwest Dreamin’ 2026

Step 1: Text Reasoning With LLMs from Within Salesforce

Key Learnings

Step 2: Text Reasoning With RAG Using AWS Backend

Key Learnings

Step 3: Beyond Text – Autonomous Agents Perform User Tasks

Key Learnings

Final Thoughts

The Author

Robert Sösemann

More like this:

Claude Tag Raises a Bigger Question: What Is Agentforce For?

The Real Salesforce AI Risk Isn’t Slow Adoption – It’s Exhaustion

Salesforce Admin Burnout: Is AI Making Your Workload Worse?

Leave a Reply Cancel reply