Skip to content

Data Classification Components

The Data Classification process assigns categories to business resources according to rules. Rules are composed of one or more rule criteria.

Rule criteria consists of finding a match within files to one or more string or pattern. The strings can be defined as free text, regular expressions, or one stored as a policy object. A regular expression in a policy object may be accompanied by a verification algorithm to further narrow down the search.

Note

There are policy objects and verification algorithms out of the box for standard searches or you can create your own to fit your needs.

The classification rule is the main data classification component. Rules also contain subcomponents that complete the rule structure, simplify the rule management task, and provide extended functions.

File properties can be used for classification of files that is performed by the customer manually or using a third party application. Data Access Security will read the metadata on the files and can use them for data classification rules. This will include reading metadata from encrypted files.

Data Categories

The data category (the basic component of data classification) is the tag used when a classification rule is satisfied.

To define a data category, open the Manage Categories panel from any of the Data Classification screen.

  1. Navigate to Compliance > Data Classification > Policies > Actions > Manage Categories or Compliance > Data Classification > Rules > Actions > Manage Categories.
  2. In the Manage Categories window, type the category name in the Add New Category section.
  3. Select Add.

The system adds a new data category to the Current Categories list. Users can edit and delete existing user-defined categories from the Current Categories list. Users can also search categories either by name or by checking the Show user defined categories only checkbox.

Categories have a default sensitivity level of Medium. The sensitivity level can be set to Low or High according to the organizations policy for the business resources tagged within the configured category.

Data Classification Policy

The Data Classification Policy is a logical container for data classification rules. For example, all the rules that help identify content which may be subject to HIPAA regulation should be grouped under a HIPAA policy.

Data Access Security provides several predefined packaged policies and classification rules. Users can create additional user-defined policies, as well as adjust, extend, and customize existing ones.

Rules

Policies set the rules for detecting critical, sensitive, and regulated data to be protected by organizational procedures, governance processes,and access controls.

File Properties

Data Access Security analyzes standard attributes, including extension, size, and file name and also other metadata attributes and file properties. All file properties are discovered and created during the classification analysis process.

  1. In the web client, navigate to Compliance > Data Classification > Rules > Actions > Manage File Properties or Compliance > Data Classification > Policies > Actions > Manage File Properties to open the Manage File Properties window.
  2. Type the file property details.
  3. If relevant, check the Custom Properties checkbox.
  4. Select Add.

Encrypted Files

In order to classify encrypted files without Data Access Security reading the file contents, you can tag the files locally according to your classification rules and use these tags for classification rules (See Local Classification).

Local Classification

You can use a local classification for files by tagging files with relevant tags. The metadata of the files are uploaded to the Data Access Security database as file properties in the scanning process. These properties can be used to create classification rules manually.

The file properties found will be added automatically to the list of available properties for filtering after the first iteration. In order to have these properties available in the initial run of the Data Classification, add the properties to the property list, as described in File Properties above.

Policy Objects

Policy objects are searches which are saved for use in rules. For example, predefined policy objects can search for credit cards.

  1. Navigate to Compliance > Data Classification > Policy Objects.
  2. Select New Policy Object to open the New Policy Object page.

Data classification policy object fields include:

  • Policy Object Name - Name of the policy object
  • Description - Provide additional information about the policy
  • Type - The type of search that the policy object performs. It can be one of the following search types:
    • Keyword - A keyword may be one or more words. If multiple words are involved, the entire phrase will be searched. Stop words such as "a" or "and" are stripped from the search keywords. If you want to include stop keywords in the phrase, you can use a regex phrase instead. (For more information on ignoring stop words, see https://www.elastic.co/guide/en/elasticsearch/guide/current/stopwords.html)
    • Wildcard - Supports the following special characters. It supports any amount of asterisks (*) and only one question mark (?).
    • Regular Expression - Using standard regex for defining policies.
  • Values - You can search for a single value or a list of matching values.
  • Mask Values (Regular Expression policy objects only) - Masking portions of matched values collected as classification evidence on demand. There a maximum limit of 3 unmasked characters on all evidence snippets. The original value of masked characters is discarded, is nor persistent and cannot be inferred from the evidence masked value.

    • Display the first characters — number of characters from the left displayed in the matched value.
    • Display the last characters — number of characters from the right displayed in the matched value.
    • Verification Algorithm - A code based algorithm to enable more complex filtering. See Data Classification Verification Algorithms for further details.

Policy objects are a good way to reuse searches containing complex definitions.

Classification Types

Regular Expressions Within Policy Objects

Regular expressions form the basis for many content pattern searches. Data Access Security uses the .net regular expression engine as its underlying engine for regular expressions searches. All regular-expression definitions and searches must conform to the engine’s restrictions, limitations, and standards.

When selecting a policy object of type Regular Expression, Admins and Compliance Managers have the ability to provide additional information and settings to the policy object and search criteria.

Verification Algorithm - A standard, out of the box example, is the Luhn verification algorithm. This algorithm ensures that all phrases classified as credit cards are, indeed, valid credit card numbers (as far as an algorithm can validate without contacting the bank, of course). When selected, this verification will only be run on strings that conform with the credit card regular expression entered, for example: “^3[47][0-9]{13}$”

See Data Classification Verification Algorithms for a full description on creating verification algorithms.

Mask Values - By default, the regular-expression matches are saved as part of the results. It is recommended to mask the values of the matches to avoid exposing critical data.

Regex Matching and Case

Please note that regex matching is case sensitive by default. To make a regex ignore case, use the prefix “(?!)”

For example: “home” will find “home”, but ignore “Home”

The regex “(?!)home” will find “Home”, “HOME” and “HoMe”

Identifying Line Breaks using Regex

For parsed files, line breaks are represented by a single CR (\r), instead of (\r\n) or (\n), and therefore not identified by the regex line boundaries ^ and $.