Skip to content

General Information

Classification and Flow Architecture

The Data Classification content analysis and data processing is performed by the Data Access Security Data Classification Deploy-Anywhere collectors. These collectors are based on specialized virtual appliances that are deployed in customer environments', data centers, or private virtual cloud, to ensure data is processes on customer site.

The Data Access Security central Data Classification engine identifies data locations eligible for scanning and sends directions to the specialized data collectors in the customer's environment.

The collectors then travers the data location to be scanned; read, process and analyzed the content of the scanned files; classify and categorize their content based on the relevant classification policies and rules, and send metadata-only results back to the central classification engine and their Data Access Security tenant.

Data Classification Content Analysis Process

The Data Classification Content Analysis Process is comprised of several steps that can execute concurrently and independently. These include:

  • Classification Policy Management and Evaluation
  • Running a Data Classification Content Analysis Task
  • Querying and Retrieving Results

Classification Policy Management and Evaluation

Data Access Security includes a variety of preset packaged classification policies and content classification rules. Additional rules and policies can be created and existing ones can be adjusted and customized at any point through the Data Access Security web user interface.

Once a Data Classification Content Analysis tasks is issued for a specific application, the Data Classification engine leverages the most recent policy definition applicable to the scanned application. The relevant policy definition will persist through the duration of the content analysis and classification task. Any changes made to the policy definition after the content classification task has been started will not be reflected in the current classification process.

Note

Changes to classification policies will trigger a complete re-scan and analysis of the application data content in the subsequent data classification task.

Data Classification Content Analysis Flow

  1. The Data Access Security classification Engine identifies data assets and locations to be analyzed and classified. A data asset (Business Resource) will be selected for classification when the following conditions are met:
    1. This is the first time the data asset is being analyzed and classified
    2. The data asset has been updated or modified since the last time it was classified (changes are evaluated based on the Business Resource Last Modified Date attribute)
    3. The Business Resource is included in the Classification Scope of the scanned application
    4. There is at least one classification policy configured to include the scanned application in its classification policy scope
    5. The Business Resource is not excluded from classification due to the de-duplication mechanism (see below)
    6. If any changes were made to the classification policies, all Business Resources in scope will be queued for re-scan, analysis, and classification
  2. The Central Classification Engine sends the information about data assets and locations to be scanned to the Classification Collectors.
  3. The collectors retrieves the list of files to be scanned and analyzed in each business resource.
  4. The Data Classification Data Collectors reads the content and metadata of each file.
  5. The Data Classification Data Collectors evaluates the content of the files based on the classification policies and rules, classifies and categorizes the data content, and sends the metadata results of the classification to the Central Classification Engine to inform the customer tenant.

Data Classification Deduplication Scan

In various storage solutions and file share applications, it is possible for multiple access paths or share paths to point to the same physical location and data content.

To minimize the running time of the Data Classification task, these duplicate access paths are identified and shared data is scanned only once.

To maintain ease of use and user readability, when a user reviews the classification results through reports, insights, or the Data Classification Forensics page, classification results will reflect all access paths including duplicate access and share paths.