Supported File Types

The privacy engine indexes data based on a file’s content and attributes. The system also supports file properties and custom properties for all supported file types. The privacy engine reads file content based on the file extension.

Image files can be analyzed and searched for keywords using an optical character recognition (OCR) capability. This is a resource heavy process and is configured separately. See section Optical Character Recognition (OCR).

The Data Classification engine supports the following file types /extensions:

 

File Extension

Expected file type

docx doc xls xlsx ppt pptx

Microsoft Office files

txt csv

Plain Text (including Comma Separated Values files)

htm html xml

Web files

cs js sql

Code script files

pdf

 

zip gzip tar rar 7zip

Archive files

Jpeg jpg tif tiff gif png wmf emf bmp pdf

Image files analyzed by the OCR module*

The system downloads files from cloud-based content stores and non-CIFS application (for example, Box, DropBox, Google Drive, OneDrive, SharePoint and NFS) to a local directory on the server. Once the indexing process finishes, the system deletes the downloaded files from the indexing server.