The Data Scan feature uses machine learning and pattern matching to identify and classify your sensitive data in AWS S3 Buckets.
Activating Data Scanning will enable you to discover sensitive data such as:
- Sensitive personal information
- Developer secrets and credentials
- Financial and health data.
Data Scanning uses the following concepts:
- Data Classes, e.g., US Social Security number.
- Data Collections (a group of data classes), e.g., Privacy Data.
- Data Scan Jobs.
Open Raven comes with a number of Default Data Classes and Default Data Collections. You can also Create a Data Class and Create a Data Collection. Data Scan Jobs run as specified by you, scanning for specific Data Classes and Data Collections in a target Asset Group on a scheduled cadence.
Open Raven analyzes S3 bucket data in many different formats. When running a Data Scan Job, the full file and its contents are inspected. The type of file is determined by its MIME type. The following table describes the file formats that Open Raven supports today.
File Extensions or MIME Type
.txt, .log, .json, .yml, .html, .htm, .csv, and others with MIME types
.pdf, .doc, .docx, .ppt, .xls, .xlsx, .odt, .ods, .odp and others with MIME types
Common document file formats like Adobe PDF files, Microsoft Word, Powerpoint, and Excel files, and more.
Files with MIME types
Apache Parquet files and Avro object containers.
Compression or archive
.ar, .arj, .br, .bz, .bz2, .cpio, deflate, .gtar, .gz, .gzip, .jar, .iz4, .izma, .pack200, .rar, .tar, .xz, .z, .zip
If there's a file format you wish to see supported by Open Raven, please reach out to us at [email protected].
Updated 2 months ago