Create a Data Class

To create a Data Class, you will need to:

  1. Go to "Data Classes" and Create Data Class
  2. Fill out the Form and Validator Function Editor
    a) Describe Your Data Class
    b) Exclude Data
    c) Define Match Patterns
    d) Enter Keywords and Distance
    e) Define a Validation Function
  3. Save Your Data Class

Let's get started.

Step 1. Go to "Data Classes" and Create Data Class

In "Configuration," select "Data Classes" and then click on Create Data Class.

Step 2. Fill out the Form and Validator Function Editor

A form with all the required fields for the Data Class will appear on the left and a Validator Function Editor on the right.

a) Describe Your Data Class

Begin by defining a Reference ID and Data Class Name. It is important to give the Data Class a unique ID and a meaningful name that is easy to reference.

A Data Class Description should be short and may provide more detail on what the Data Class specifies.

The Status of a Data Class determines if it is in use or not. Active Data Collections will be used when scanning data, whereas ones set as Inactive will not be matched.

b) Exclude Data

Many data stores (including S3 buckets) can have data that needs to be excluded from a Data Scan. Otherwise, there is a high likelihood of false-positives.

The Excludes field in the data class takes as input a regular expression that Open Raven will use to determine if it should scan the S3 object.

For example, to avoid scanning any files that are written by the AWS-config service, use an excludes regex such as:

^[0-9]{12}_Config.*?.json.gz

c) Define Match Patterns

You must define one or several Match Patterns to ensure the Data Class is scanning for the correct data.

These are regular expressions that specify the Data Class which you are defining. You should name patterns for ease of reference, for example, “SSN-with-dashes,” and specify the regular expression to match for, for example:

(\b(?!000|666)[0-8]\d{2}-(?!00)\d{2}-(?!0000)\d{4}\b) 

📘

Verifying regular expressions

There are several places where regular expressions are used in Data Classes, and they are difficult to write and verify. We recommend using a tool like Regex101 to create and test regular expressions that match expected data before saving them.

d) Enter Keywords and Distance

Keywords and Keyword Distance will specify any additional measures Open Raven can use to identify sensitive Data Classes and increase accuracy.

We scan Keywords at a specified distance in front of and behind a Data Class. For example, if the Data Class "Coordinates" has the Keywords "latitude" and "longitude" with a distance of 50, Open Raven will check for the Keywords 50 characters before and after any sensitive data that matches the specified Match Pattern.

e) Define a Validation Function

You can define a Validation Function to further limit false positives (after you've already applied Excluded and Keywords).

For example, Validation Functions may perform simple check-digit calculations (e.g., Luhn for credit card numbers) or call external APIs to verify that the matched data is “live” (e.g., active codes, tracking numbers, etc.). Validation Functions are written in Javascript.

Validation Functions start with the base Validation Function:

function validate (input){
  return true;
}

The function takes a single input (the matched data) and returns a boolean (true/false) depending on whether the data is “valid” or not. For a simple Luhn check Validation Function, you may use:

function validate(input) {
    var len = input.length
    var parity = len % 2
    var sum = 0
    for (var i = len-1; i >= 0; i--) {
        var d = parseInt(input.charAt(i))
        if (i % 2 == parity) { d *= 2 }
        if (d > 9) { d -= 9 }
        sum += d
    }
    return (sum % 10) === 0
}

For a more complex function that calls an external API to test if a zip code is valid, you may use:

function validate(input) {
    var auth = "YOUR-AUTH-CODE";
    var token = "YOUR-AUTH-TOKEN";
    const request = new XMLHttpRequest();
    const url="https://us-zipcode.api.smartystreets.com/lookup?"
        + "auth-id=" + auth 
        + "&auth-token=" + token 
        + "&zipcode=" + input;
    request.open("GET", url, false);
    request.send();

    return !request.responseText.includes("Invalid ZIP Code.");
}

The example below specifies a Validation via an external API call and returns a response to help identify Data Classes. If any failures occur during the Validation, an error will be logged.

Step 3. Save Your Data Class

Click Save.

For more help with Data Classes or Validation Functions, reach out to our support team at [email protected].


What’s Next