Developers / Data / DevOps / Platform

How to Identify Unused Salesforce Fields With Python: A Step-by-Step Guide

By Nancy Al Kalach

Overgrown data models sometimes tend to slow down development, potentially leading to missed insights, confusing reports, or bad user experiences. The challenge is that sometimes there is no easy way to tell which fields are actually being used.

In this guide, I will walk you through how to use Python to identify potentially unused fields in your Salesforce org. We will identify every field for how often it is populated across your org and flag out empty or low-usage fields, making it easy to share findings with your team.

Note: This article builds on a previous guide, where I showed you how to use the Metadata API to extract field definitions and visualize object structures.

Step 1: Set Up Your Python Environment

Let’s start by setting up your tools:

  1. Install Python and Set up your Virtual Environment

If you don’t have Python 3.x installed yet, download it from here.

Create a working directory for this project called field-report

To create and activate a virtual environment, run the following commands in your terminal (make certain you’re executing these from the field-report directory:

Mac or Linux:

> python -m venv venv

> source venv/bin/activate  

Windows: 

> python -m venv venv

> venv\Scripts\activate

Note: In some environments, you’ll execute Python as python3.

The above commands create an isolated Python environment called venv and activates it. This allows for the neat organization of your project’s dependencies without impacting other Python projects.

  1. Install Required Packages

We will need the following Python libraries:

  • simple-salesforce: Handles API connections and requests for interacting with the Salesforce REST API.
  • pandas: Helps with easier data handling, analysis and manipulation.
  • python-dotenv: Allows the secure storage of credentials in environment variables.

To install them, run the following command in your terminal:

> pip install simple-salesforce pandas python-dotenv

Step 2: Securely Store Your Credentials

  1. Create a .env file

In your project folder, create a file called .env with the following contents:

SF_USERNAME=your_email@example.com
SF_PASSWORD=your_password
SF_TOKEN=your_security_token_from_email

The above code stores your Salesforce login information in a file that you can exclude from your version control and load it into your script using environment variables.You can log in using your Salesforce username, password, and a security token. If you don’t have your security token, here’s how to reset and get your Salesforce security token.

  1. Create the fieldreport.py file
  2. Add the following code to load environment variables in Python:
from dotenv import load_dotenv
import os
load_dotenv()
username = os.getenv("SF_USERNAME")
password = os.getenv("SF_PASSWORD")
token = os.getenv("SF_TOKEN")

This loads the values from your .env file and assigns them to Python variables. This helps keep sensitive info out of your source code.

Step 3: Connect to Salesforce

After completing step two and loading your credentials, you can establish a connection to Salesforce with the below code snippet:

from simple_salesforce import Salesforce
sf = Salesforce(
    username=username,
    password=password,
    security_token=token
)

This authenticates the Python script with your Salesforce org, and stores the session in the sf object, which you can then use to perform CRUD operations.

To test your connection and verify it was successful, you can check the current user’s Id using the built-in user_id property by adding the below:

print("Connected as user Id:", sf.user_id)

You can then run the test by executing the following command in your terminal.

> python fieldreport.py

Once the script has successfully run, if it prints a user Id, then the connection to Salesforce was successful.

Step 4: Audit Field Usage on Contact Object

In order to identify the most unused Salesforce fields, you need a Python Script that:

  • Fetches all API names from field on the Contact SObject.
  • Queries Contact records from within your Salesforce org.
  • Checks against populated and unpopulated fields in each record.
  • Saves and exports a usage report as an Excel sheet for visibility.

Python Script: Field Usage Report

#Import the necessary libraries for Salesforce access, Excel export, and secure credential loading.
import os
from dotenv import load_dotenv
from simple_salesforce import Salesforce
import pandas as pd


#Load credentials from .env
load_dotenv()
sf = Salesforce(
    username=os.getenv("SF_USERNAME"),
    password=os.getenv("SF_PASSWORD"),
    security_token=os.getenv("SF_TOKEN")
)


#Query field API names from Contact SObject in Salesforce
fields = sf.Contact.describe()['fields']
field_names = [f['name'] for f in fields]

#Query Contact records within Salesforce
query = f"SELECT {', '.join(field_names)} FROM Contact"
records = sf.query_all(query)['records']

#This helps clean up metadata attributes that are irrelevant for our analysis
for r in records:
    r.pop('attributes', None)


#Load into DataFrame
df = pd.DataFrame(records)
total_records = len(df)

#Identify null values
null_counts = df.isnull().sum()

#Build usage report
field_usage = pd.DataFrame({
    "Field API Name": null_counts.index,
    "Null Count": null_counts.values,
    "Populated Count": total_records - null_counts.values,
    "Percent Populated": ((total_records - null_counts.values) / total_records * 100).round(2)
})

This creates a summary table showing how many records are missing values for each field and what percentage of records have data for each specific field.

#Sort and export from least-used to most-used fields 
field_usage = field_usage.sort_values(by="Percent Populated")
field_usage.to_excel("contact_field_population_report.xlsx", index=False)

This sorts the report from least-used to most-used fields and exports the result to an Excel file named contact_field_population_report.xlsx.

The final generated Excel file will look like the below:

Note: As you can see in the above code snippet, after fetching the data from Contact SObject, we are removing the “attributes” field that Salesforce includes in each record. This field is not actual data, it is metadata about the object type and record URL that is not useful for the purpose of this exercise. This attributes node will look something like this:

{
  "attributes": {
    "type": "Contact",
    "url": "/services/data/v58.0/sobjects/Contact/003XXX"
  },
  "Id": "003XXX",
  "FirstName": "Nancy",
  "LastName": "Al Kalach"
}

Final Thoughts

This Python script provides you with an easy and effective way to clean up your Salesforce org, by identifying which fields are relevant. You can run it before a major cleanup initiative, after data migration, or as part of a quarterly hygiene check or technical enhancement projects.

If you’re looking to scale this further, I have also built a free, open-source CLI tool that automates this process. This will allow you to dynamically select any SObject (standard or custom), choose between Excel or CSV output, and generate usage reports without having to edit the script manually.

The Author

Nancy Al Kalach

Nancy Al Kalach is a Senior Salesforce Developer based in San Francisco, currently working at Illumina. Nancy holds multiple certifications, including Salesforce Agentforce Specialist and OmniStudio Developer.

Leave a Reply