Skip to content

Crawling

Crawling is the process that discovers the business resource (BR) of a specific application type. It is the first task involving an application, since BRs are required for many other activities involving applications, such as Permissions Collection and Access Certification.

For example, a crawler may discover folders (BR) on a connected application.

Before beginning the crawling process, you must install and run the permissions collection service for each application.

The crawling process involves the following:

  • Discovery of business resources and the population of a business resource tree
  • Business resource size calculation
File Name File Type Size
Finance Balance Sheet.xls Excel (*.xls) 2 M
Finance Salaries.docx Word (*.docx) 1 M
Finance Departments.txt Text (*.txt) 3 M
Finance Organization.ppt PowerPoint (*.ppt) 5 M
Finance Other Files (An uncommon file type) 4 M
  • Summary of business resource size by file type
Category Name Size
Office Files (2M + 1M + 5M = 8M)
Text Files 3 M
Finance Other Files 4M

The Business Resource Trees display the results of crawling in various locations in Data Access Security.

Interaction of Crawling with Permission Analysis

The permissions analysis process collects the following:

  • The crawling process collects application business resources.
  • In parallel, the Identities Collector collects users and groups (which may occur before the crawler collects the business resources, since these collections are unrelated).
  • The Permissions Collector collects the business resources, users, and groups, and associates them with permission types to create permissions.

Configuring and Scheduling the Crawler

To set or edit the Crawler configuration and scheduling, complete the following:

  1. Navigate to Admin > Applications.
  2. Scroll through the list or use the filter to find the application.
  3. Click the edit icon on the line of the application.
  4. Select Next until you reach the Crawler & Permissions Collection settings page.

Note

The actual entry fields vary according to the application type.

See Scheduling a Task to set a schedule.

Setting the Crawl Scope

There are several options on how to set the crawl scope:

  • Setting an explicit list of resources to include and / or exclude from the scan.
  • Creating a regex to define resources to exclude.

Including and Excluding Paths by List

To set the paths to include or exclude in the crawl process for an application, complete the following:

  1. Navigate to Admin > Applications.
  2. Scroll through the list or use the filter to find the application.
  3. Click the edit icon on the line of the application.
  4. Select Next until you reach the Crawler & Permissions Collection settings page.

    Note

    The actual entry fields vary according to the application type.

  5. Scroll down to the Crawl configuration settings.

  6. Select Advanced Crawl Scope Configuration to open the scope configuration panel.
  7. Select Include / Exclude Resources to open the input fields.
  8. To add a resource to a list, type in the full path to include / exclude in the top field and select + to add it to the list.
  9. To remove a resource from a list, find the resource from the list and click the x icon on the resource row.

Note

When creating exclusion lists, excludes take precedence over includes.

Excluding Paths by Regex

To set filters of paths to exclude in the crawl process for an application using regex, complete the following:

  1. Navigate to Admin > Applications.
  2. Scroll through the list or use the filter to find the application.
  3. Click the edit icon on the line of the application.
  4. Select Next until you reach the Crawler & Permissions Collection settings page.

    Note

    The actual entry fields vary according to the application type.

  5. Select Exclude Paths by Regex to open the configuration panel.

  6. Type in the paths to exclude by Regex; see regex examples in the section below. Since the system does not collect business resources that match this Regex, it also does not analyze them for permissions.

Note

To write a backslash or a Dollar sign, add a backslash before it as an escape character.

Note

To add a condition in a single command, use a pipe character “|”.

Excluding Top Level Resources

Use the top level exclusion screen to select top level roots to exclude from the crawl. This setting is done per application.

To exclude top level resources from the crawl process, complete the following:

  1. Open the application screen by navigating to Admin > Applications.
  2. Find the application to configure and click the dropdown menu on the application line. Select Exclude Top Level Resources to open the configuration panel.

    The Top Level Resource Exclusion overlay displays.

    If the Run Task button is selected, a task will run a short detection scan to detect the current top level resources.

    This is the first time the task will run, a note at the top of the overlay will read "Run task to detect the top level resources.

    If the top level resource list has changed in the application while on this screen, select the Run Task button to retrieve the updated structure.

    Once triggered, you can see the task status in Settings > Task Management > Tasks.

    Note

    This will only work if the user has access to the task page.

    When the task has completed, select Refresh to update the page with the list of top level resources.

  3. Select the top level resource dropdown list and select top level resources to exclude.

  4. Select Save to save the change.
  5. To refresh the list of top level resources, run the task again. Running the task will not clear the list of top level resources to exclude.

Business Resource Structure

The table below lists additional information on the Business Resource Structure.

Application Type Business Resource Type Business Resource Full Path Structure Example
Active Directory Every LDAP Object Distinguished Name CN=Howard,CN=Users,DC=Example,DC=com
OneDrive for Business Folder Personal/<user email>/<Folder-Path> Perosnal/watson@company. Example.com/Diagnostics/Recent
SharePoint Online Site Collection/List/Folder https://<Company-Name>.sharepoint.com/<Site-Collection>/<Site>/Lists/<List>/<Folder> https://sailpoint.sharepoint.com/Wayback Site/Lists/Songs/New York/New York