March 12, 2021

Data Privacy and Protection at Global Biotech



This PKWARE customer is a biotechnology company based in California and is dedicated to scientific research and development for medicines to treat people with serious and life-threatening diseases. The company employs almost 15,000 employees and achieves nearly $2 billion in annual revenue.


  1. Different customer projects were using a variety of data types. They needed a solution that offered flexible policy creation to address finding different kinds of sensitive data ranging from PHI to HIPAA data. They had 17 different classifiers. Every data set would need a new policy.
  2. The customer was setting up a data lake into which they were going to ingest data from a variety of third-party data sets coming primarily in JSON and CSV files.
  3. PKWARE needed to work with AWS as well as with a designated solution provider. The PS Consultant was stationed on site with the customer. The proof of concept (POC) was partially completed with AWS. The next step was to have the additional solution provider independently test PKWARE APIs and automation. The final steps would include fine-tuning the rules and creating a de-identified data lake with fully protected data to be consumed by the customer’s analytics team.

Key requirements included:

  • Performance was key. They also had strict time restrictions for data ingestion.
  • A hands-off, automated solution was extremely important.
  • Accuracy in minimizing false positives in an unstructured format was also crucial.
Company Profile

Global Biotech


Large Enterprise




California, USA

Our Approach

Using PKWARE, every element of sensitive data was identified and masked as required for compliance.

Use Cases

Step 1:

  • Leveraged certain out-of-the-box elements from the PKWARE platform. Then used the PKWARE policy builder to build custom rules (elements) for their unique, specific elements: patient IDs and serial numbers.
  • Fine-tuned rules to capture the nuances of their environment, resulting in no false positives and no false negatives.

Step 2:

  • Created an automated flow. Leveraged PKWARE APIs and AWS Lambda functions.

Step 3:

  • Leveraged EMR at scale to achieve detection and masking for a specific window of time.


Fully Automated Solution
Processes that had been manual, slow, and inaccurate became rapid and accurate.

Full De-Identification
Every element of sensitive data was identified and masked as required for compliance.

Safe Zone for Analytics
Fully protected data now flows into the customer’s safe zone data lake. Analytics can deliver full value safely.