With total consolidated assets of $35 billion USD at the end of 2020, this commercial banking organization feeds large volumes of data in compressed parquet format via a third-party tool into its AWS S3 buckets.
The bank needed the capability to identify sensitive information in the specific parquet file formats, remediate any false-positives, and mask actual sensitive data in a way that preserved the format of the original data. Also required was a fully automated flow for the workflow that could be applied to any new data feeds coming in from the AWS S3 bucket.
The bank deployed PK Protect in their AWS environments with a custom policy set up to define the required sensitive data. PKWARE leveraged EMR Cluster for running MR jobs so that discovery could be performed on large scales, and bulk remediation was provided via an automated utility. The PKWARE team also provided S3 Orchestrator, which gave the capability to segregate the scanning and masking task based on task size for any given S3 bucket folder.