Building a Layered Approach to Data Management, Part II: How to Obfuscate and Delete the Personal Information of Data Subjects
Part 2 of a 3-Part Blog Series
In a previous post, we laid out the steep challenges to ensuring data privacy in a world of complex digital architectures and stiff compliance demands of the California Consumer Privacy Act (CCPA), the General Data Protection Regulation (GDPR), and other mandates. We showed how the successful strategy requires layers of control over personal information with the foundational layer being clear visibility into your many data subjects, regardless of their nature or where they reside in your digital ecosystem. Now let’s turn our attention to the next layer: How to obfuscate and delete the personal information of data subjects.
Obfuscating and Deleting Data Elements and Data Subjects
The GDPR and the CCPA require companies to limit the use and sharing of personal information and to delete the personal information of data subjects upon request. The mandate is reflected in the direct call in Article 32 of the GDPR to use mechanisms such as masking when using personal information, and its value is reflected in the fact that effective limitations on the use of personal information can save companies from the fallout of breaches and the breach notification requirements in place in these two jurisdictions.
Unfortunately, obfuscation techniques and the deletion of personal information are easier said than done, creating real data management challenges for companies. When it comes to the use of obfuscation to limit the personal information authorized users within the organization can access, many systems are just not built with the capacity to accommodate such functionality. Applications often allow some limitations on data views by creating different roles with access rights, but those are not as flexible or aligned with the privacy policies of the company. In addition, obfuscating data in large repositories—whether that data is structured, unstructured, or semi-structured—involves challenges such as shared drives full of documents and big data lakes.
Meanwhile, the deletion of data, especially data that pertains to a particular data subject, presents a different technical problem, namely replication. Data moves regularly between systems and is validated by comparing it to previous data. That interconnectedness means that when a line item—such as the record of a data subject—is unexpectedly missing, a replication error happens. When such errors happen, financial systems cannot balance their entries and transaction records do not add up.
The Right Approach to Obfuscation and Deletion
Success involves several different components for addressing these data management challenges. Beyond the obvious need to process personal information in systems that can handle different obfuscation techniques, there is the important question of correctly identifying the data that needs to be protected in the first place.
We can all recite a list of sensitive data elements that should only be made available in limited circumstances, but there are complexities. The reality of data sensitivity is that it’s often the combination of several data elements that define the data’s risk. The “formula” for effective obfuscation must account for the user’s legitimate need for specific data elements, the risk associated with the combination of these data elements, and the minimum exposure of the data elements for the use at hand. Few solutions out there today address such a formula.
As for data deletion, we must embrace techniques that address the challenge of unreconciled data entries following the deletion of a data subject. The most practical way to address this challenge is the use of obfuscation for identifying data elements, while simultaneously preserving the data that is necessary for replication purposes—leaving that data intact. One obfuscation technique that is especially useful for this purpose is Format Preserving Masking (FPM). FPM replaces real personal information with data elements that look like personal information, but don’t belong to any data subject (a name is replaced with a different new name, an address with another address, etc.).
There are several advantages of FPM over simpler masking techniques to hide the data, such as when characters are replaced with asterisks. First, users of the repositories using FPM cannot tell which record was modified and cannot then try to reverse engineer the deleted identity. Second, FPM can be used to support analytics processes that include modified data elements. The algorithms used to convert personal information in FPM can be designed to keep data in ranges that are meaningful to the company. When this happens, modified data can still be used for analytics purposes—keeping dates of birth within certain ranges, for instance, or maintaining an accurate count of the number of data subjects from certain areas, etc. Finally, FPM can be applied consistently across systems so that the same manufactured data elements are consistently applied when the data subject in question is encountered.
As with any technological solution for subjective problems such as identifiability, there are no silver bullets. We need data management standards that guide the use of obfuscation technologies so that companies can identify common and leading practices. We also need to educate the various stakeholders in the privacy management space, from privacy professionals to regulators. And, as we’ll see in the third in this series of posts, success requires we employ obfuscation and data deletion processes that hold up in the real world of the extended enterprise and third-party vendor ecosystems.
Catch up on the whole series:
- PKWARE January 13, 2025