Configuring and Scheduling the Crawler
To set or edit the Crawler configuration and scheduling:
- Open the edit screen of the required application.
- Navigate to Admin > Applications.
- Scroll through the list, or use the filter to find the application.
- Click the edit icon on the line of the application.
- Press Next until you reach the Crawler settings page. The actual entry fields vary according to the application type.
- In the Calculate Resource Size field, determine when, or at what frequency, Data Access Security calculates the resources' size:
- Never
- Always
- Second crawl and on (default)
- Schedule a task.
Setting the Crawl Scope
There are several options to set the crawl scope:
- Setting explicit list of resources to include and / or exclude from the scan.
- Creating a regex to define resources to exclude.
Note
External resources are collected when crawling internal resources. When excluding internal resources the associated external resources will also be excluded. Crawling only external resources is not supported at this time. Exclusion of external resources is supported.
Including and Excluding Paths by List
To set the paths to include or exclude in the crawl process for an application:
- Open the edit screen of the required application.
- Navigate to Admin > Applications.
- Scroll through the list, or use the filter to find the application.
- Click the edit icon on the line of the application.
-
Press Next until you reach the Crawler settings page.
The actual entry fields vary according to the application type.
-
Scroll down to the Crawl configuration settings.
- Click Advanced Crawl Scope Configuration to open the scope configuration panel.
- Click Include / Exclude Resources to open the input fields.
- To add a resource to a list, type in the full path to include / exclude in the top field and click + to add it to the list.
- To remove a resource from a list, find the resource from the list, and click the x icon on the resource row.
When creating exclusion lists, excludes take precedence over includes.
Setting the Crawl Scope
There are several options to set the crawl scope:
- Setting explicit list of resources to include and / or exclude from the scan.
- Creating a regex to define resources to exclude.
Crawler Regex Exclusion Examples
The following are examples of crawler Regex exclusions:
Exclude all drives which start with one or more user names:
-
Starting with John.Doe:
^Team Members\/John\.Doe@.*
-
Starting with John.Doe or Jane.Doe:
^Team Members\/(John|Jane)\.Doe@.*
Include ONLY drives which start with one or more user names:
-
Starting with John.Doe:
^(?!Team Members\/John\.Doe@.*).*
-
Starting with John.Doe or Jane.Doe:
^(?!Team Members\/(John|Jane)\.Doe@.*).*
Narrow down the selection:
- Include only the C$ drive shares:
\\server_name\*C$:(?!\\\\server_name\\*C*\$($|\\.*)).*
- Include only one folder under a share:
\\server\share\*folderA*
:^(?!\\\\server_name\\share\$($|\\*folderA*$|\\*folderA*\\.*)).*
- Include all administrative shares:
^(?!\\\\server_name\\[a-zA-Z]\$($|)).*
Notes
-
To use a backslash or
$
sign, add a backslash before it as an escape character. -
To add a condition in a single command, use a pipe character
|
.
Excluding Top Level Resources
Use the top-level exclusion screen to select top-level roots to exclude from the crawl. This setting is done per application.
To exclude top-level resources from the crawl process:
- Go to Admin > Applications.
- Find the application to configure and select the dropdown list menu on the application line. Select Exclude Top Level Resources to open the configuration panel.
- Select the Run Task button to trigger a task that runs a short detection scan to detect the current top-level resources. If the top-level resource list has changed in the application while you are on this screen, select the Run Task button to retrieve the updated structure.
- Once triggered, you can view the task status in Settings > Task Management > Tasks, depending on your access to the task page.
- When the task has completed, select Refresh to update the page with the list of top-level resources.
-
Select the top-level resource list and choose top-level resources to exclude.
Note
If all resources are selected and you wish for them to be deselected, select Deselect All. You can also select individual resources.
-
Select Save to save the change.
- To refresh the list of top-level resources, run the task again. Running the task will not clear the list of top-level resources to exclude.
Documentation Feedback
Feedback is provided as an informational resource only and does not form part of SailPoint’s official product documentation. SailPoint does not warrant or make any guarantees about the feedback (including without limitation as to its accuracy, relevance, or reliability). All feedback is subject to the terms set forth at https://developer.sailpoint.com/discuss/tos.