Consultants / Admins / Developers

How to Write a Python Script to Simplify Salesforce Knowledge Migration

By Anastasia Posivets

When a company wants to centralize knowledge management, enhance customer support, improve accessibility, and seamlessly integrate knowledge with existing processes, they may decide to migrate their knowledge articles to Salesforce. 

The benefits of this migration are definitely worth it, but businesses may face some challenges and limitations. Overcoming them may require additional work, such as writing a Python script.

The Challenge

I recently needed to migrate knowledge articles to Salesforce on a one-time basis. The client provided me with two CSV files. One was an Excel file that contained information about the articles, such as external ID, language, title, and HTML body, and the other was a CSV file containing the title, related data category, and sub-category.

    From these, there were two challenges:

    1. I had to migrate more than 300 articles in English along with their seven language versions while maintaining the folder structure that will later be zipped for Article import, as follows:
      • Zip folder
      • Articles_to_migrate.csv, which contains references in the body to the knowledge_articles folder where separate article HTML files are stored
      • Content.properties
      • Knowledge_articles folder
      • Separate article folder N times (where N is the number of articles considering their translations)
      • Article .html file to import
      • Assets folder containing the image files referenced in the .html file
    1. Considering all that structure, I had to extract the HTML body from the initial file to a separate folder as an .html file and modify the body column in the CSV file to replace the HTML with a reference to the separate .html file. I then had to download images from the HTML as separate files into a different folder and replace the images in the .html file with references to the folder where the images were downloaded.

    Additionally, it is important to note that SVG files cannot be displayed in Salesforce articles; therefore, only images in formats such as PNG and JPEG are considered.

    Regarding other limitations, you can find information here in the official Salesforce documentation.

    To achieve these transformations (image downloading, and structure creation), I wrote a Python script that I will provide later that simplifies this process. I will go through the process of file transformations, formulas, etc., supported by screenshots.

    Let’s get started!

    Data Preparation and Transformation Steps

    1. The initial XLS file appeared as follows, with tabs for each language:
    1. Create a CSV file containing copied en_US articles with the following information: id, url, html_url, title, locale, and body.
    2. Add a column titled “channels” with the value “application+sites+csp” in every row, as these articles are intended for internal use and for their customers on the experience site.
    3. Add a column “IsMasterLanguage” with the value “1” for en_US articles, as the English version will serve as a master version, while later, the value “0” will be populated for other non-English versions.
    4. The “Locale” column was renamed to “Language”.
    5. Replace “en-us” values with “en_US”, and “nl” with “nl_NL”.
    6. Rename “title” to “Title”.
    7. Rename “body” to “Answer__c” since we needed to display HTML in that field on the layout.
    8. Add a column “RecordTypeId” populated with the 15-character ID of the required record type (which requires the 15-character, case-sensitive ID format).
    9. Add two columns for data categories; let’s assume we had two categories, “Business” and “Customer”, resulting in two new columns “datacategorygroup.Business” and “datacategorygroup.Customer”.
    10. Use the VLOOKUP function in the CSV to find relevant article data categories and sub-categories by ID mapping. I won’t elaborate on the VLOOKUP topic, but here are some details:
    1. I then need to map column A from the articles table [Lookup_value], which contains the article ID, to another table where the categories are located [Table_array] (it’s important that the ID is in the first column of the other table). I want to return values from the second column [Col_index_num] (in this example, it’s the Data Category). Later, the same formula can be used to return a value from the Macro-grouping column by using 3 as the Col_index_num in the formula.

      The initial file with data categories was as follows:

    I previously created data categories and subcategories to follow the provided hierarchy. I then use another formula to concatenate macro-grouping and sub-grouping to achieve the following format [macro-grouping+sub-grouping]:

    1. Then I add other article versions in different languages to the common CSV file where IsMasterLanguage = 0, channels = application + sites + csp, RecordTypeId = id in 15 characters, datacategorygroup.Business and datacategorygroup. Customer should not have any value since their master English version will have this value.
    2. The final step of the CSV modification was sorting the table by ExternalId__c and then by IsMasterLanguage (from largest to smallest, from 1 to 0).
      We need to load articles in a way that the English master article version is first, followed by its other translations.:

    Here is the final version of the file:

    Python Script Steps

    Now it’s time to take a look at the Python script. Here is the source code on GitHub.

    1. Make sure you have the necessary Python packages installed.
    1. The Python script should be placed in the folder where the final version of the CSV file is located; in my example, it is named “articles_to_migrate.csv”.
    1. It creates a folder where other folders containing articles will be placed.
    1. Since the CSV contains special characters from different languages, I wanted to sanitize them.
    1. There is a function to download assets, such as images, from HTML files to a separate “assets” folder for each article.
    1. The script Iterates through the rows and columns of the CSV file to create new subfolders for each article version, named [Language+ID], parses the HTML to find images, downloads them, and places them in the “assets” subfolder. It also replaces specific tags for Iframe videos in order to display them properly in Salesforce and creates separate HTML files for each article version, referencing them in the CSV file that will be used for uploading.
    1. You may notice that I generated a ZIP file here.

    I actually was not planning to use this ZIP for importing; I later created another ZIP file. However, I wanted to know how many MB the ZIP would be because there is a size limitation for ZIP files in Salesforce.

    1. Before running the script I had a folder with those files:
    1. articles_to_migrate.csv – the version of the CSV after all modifications described above. Here is content.properties with this content inside:
    1. I then need to run migrate_articles_with_assets.py.
    2. Run the following command in the console: “python3 migrate_articles_with_assets.py” (there are other options instead of python3; consider whatever is applicable for your laptop) and wait until you see “Knowledge articles have been zipped into knowledge_articles.zip” in the console.

    Some issues may occur in your use case; if so, it’s necessary to adjust the script accordingly.

    1. The script will generate a folder named “knowledge_articles”, inside which you will find another CSV file “articles_to_migrate.csv”.

    You will need to include that in your ZIP for article import. The “articles_to_migrate.csv” will have a reference to the folder where the correct HTML is stored in the “Anwer__c” column.

    1. The final version of the CSV file that must be included in the ZIP is as follows:
    1. The structure of the folder should be like this:
    1. Before importing articles to Salesforce, ensure that all necessary languages are enabled.
    2. You can now import them by selecting your ZIP file and checking the box for “Contains translations?” Then click “Import Now”.

    Wait and see the status/errors that you will receive by email.

    Summary

    Migrating knowledge articles to Salesforce can be a complex task, especially when dealing with multiple languages and ensuring the correct file structure. However, by utilizing a Python script, you can significantly streamline the process. This script automates the creation of necessary folders, extraction of HTML content, and management of assets, ultimately saving time and reducing the risk of errors. 

    While challenges like handling special characters and image formats may arise, with the right tools and preparation, you can achieve a smooth and efficient migration.

    The Author

    Anastasia Posivets

    Anastasia is a 9x certified Technical Lead and Solution Architect at Vimera. She's passionate about learning new features.

    Leave a Reply