Supported File Types
The privacy engine indexes data based on a file’s content and attributes. The system also supports file properties and custom properties for all supported file types. The privacy engine reads file content based on the file extension.
Image files can be analyzed and searched for keywords using an optical character recognition (OCR) capability. This is a resource heavy process and is configured separately. See section Optical Character Recognition (OCR).
The Data Classification engine supports the following file types /extensions:
File Extension |
Expected file type |
docx doc xls xlsx ppt pptx |
Microsoft Office files |
txt csv |
Plain Text (including Comma Separated Values files) |
htm html xml |
Web files |
cs js sql |
Code script files |
|
|
zip gzip tar rar 7zip |
Archive files |
Jpeg jpg tif tiff gif png wmf emf bmp pdf |
Image files analyzed by the OCR module* |
The system downloads files from cloud-based content stores and non-CIFS application (for example, Box, DropBox, Google Drive, OneDrive, SharePoint and NFS) to a local directory on the server. Once the indexing process finishes, the system deletes the downloaded files from the indexing server.