Aggregation

Aggregation is the process by which data about identities and their access is read from your enterprise systems into IdentityIQ. IdentityIQ aggregates Account data (which includes information about identities and their accounts and entitlements on the outside systems) and Account Group data (which includes account groups and application object types that are the basis for creating entitlements that represent group membership).

IdentityIQ uses configured applications to connect to these enterprise systems, and uses tasks to do the work of reading the data into IdentityIQ and correlating it to identities stored in IdentityIQ.

Aggregation and Applications

A configured Application is the component that lets IdentityIQ communicate with an enterprise system. The enterprise system is the source of information about accounts and account groups, which will be read into IdentityIQ.

Applications use a system-specific connector type (such as JDBC, LDAP, Active Directory, Azure, Workday, etc.) to set up a connection to the system that is the source of the data. The configuration options are flexible; many elements of the configuration depend on the connector type, but they all have several things in common:

Connection parameters – the information IdentityIQ needs in order to communicate with the data source. This typically includes a path to the data source and credentials for logging in/authenticating, but may include more.
Account schema – how IdentityIQ defines and organizes the data that is being read in.
Correlation logic – how IdentityIQ maps data from the source system to what is stored in IdentityIQ.

For more information on configuring applications, see the Connectors & Integrations section on the SailPoint Product Documentation portal.

What Data Is Aggregated?

Account aggregation is the process through which account data from a configured application is read into IdentityIQ and stored in Link (account) objects connected to Identities. Aggregation is an integral part of every IdentityIQ installation. Account aggregation reads in information about identities, which typically includes:

Account information – the accounts the identity has on the system being aggregated.
Entitlements – the access the identity has on the systems that it has accounts on.
From authoritative sources – information about the identity, such as name, department, email address, etc.

Account Group aggregation is used to create entitlements (managedAttributes) representing an application's group objects. See Entitlement Catalog.

Authoritative and Non-Authoritative Data Sources

The enterprise systems that provide information about identities and their access may be numerous, and information about identities may not always be synchronized across all systems. For this reason, some sources of data are designated as authoritative sources. An authoritative source is any repository for employee information for your enterprise that represents the primary and most trusted information about identities, such as a human resources application. This is in contrast to non-authoritative sources that may contain some accurate information about identities but is not considered the system of record for information about the identity itself.

A simple example is when an employee's name changes – Pat Smith becomes Pat Jones. In this example, Human Resources will change the employee's name, and perhaps the email address, in an authoritative source, such as Active Directory. The changes then need to be propagated out to other accounts that the user has, such as JIRA, Sales Force, Outlook, etc.

A system is designated as an authoritative source by checking the Authoritative Application flag in the application configuration for that source. For more information, see the Using the Edit Application Page.

Note that your organization can have multiple authoritative sources.

Partitioned Aggregation

Partitioning can increase processing throughput and speed of data processing, by breaking operations into multiple pieces, or partitions, allowing data processing to split across multiple hosts, and across multiple threads per host. Aggregation is one of the areas in IdentityIQ where partitioning can be used to improve performance.

Partitioned aggregation can occur at either the application level or the task level.

Application Configuration for Partitioning

Some connector types support partitioning at the application level. To use partitioning for account and account group aggregation, you must configure the application for partitioning, and enable partitioning when defining an account aggregation or identity refresh task.

Application-level partitioning requires connector support and some use of partitioning statements to obtain mutually-exclusive data sets for parallel processing. This creates multiple connections to a target system and also spreads aggregation processing across the task servers.

Partitioning is supported by many but not all connectors, and can be enabled as part of the application's configuration. The way partitioning is configured varies by connector. For the most current information about a particular connector's partitioning support, refer to the IdentityIQ Connectors documentation on SailPoint's documentation portal.

Aggregation Task Configuration for Partitioning

Task-level partitioning is an alternative when application-level partitioning is not available or is undesirable. This variation pulls data from the target system into IdentityIQ using a single connection and multi-threads the processing across the task servers. The best practice is to use application-level partitioning whenever possible, as it has superior throughput potential when compared to task-level partitioning or traditional single-threaded aggregation, which uses a single thread for both data source connection and data processing within IdentityIQ.

Activating partitioning on an aggregation task only requires selecting the Enable Partitioning option in the task definition user interface page. This must be enabled for each aggregation task which will use partitioning, as this setting is disabled by default.

In addition, you can configure the number of objects per partition. This option sets the maximum number of records to include in each partition. IdentityIQ divides the accounts from the data source into as many partitions as required to create mutually-exclusive segments, with each containing no more than the specified number of accounts.

Only some tasks support partitioning: account aggregation, account group aggregation, identity refresh, identity request maintenance, propagate role changes, and system maintenance.

Delta Aggregation

Delta aggregation is the process of only aggregating accounts or account groups that have changed since the last aggregation.

Delta aggregation can be run as an alternative to a full aggregation, which brings in all accounts or account groups, regardless of whether they are unchanged since the last aggregation.

Using delta aggregation to bring in only the changes can be much faster than full aggregations, and can allow processes to occur at a much more rapid pace.

The option to enable delta aggregation is set in the aggregation task. You can set this option in the tasks for aggregating accounts and for aggregating account groups. However, delta aggregation requires support by the connector; not all connector types support delta aggregation.

Partitioning in Delta Aggregation

Delta aggregation can in some cases support partitioning. Partitioning in delta aggregation relies on the connector having partitioning implemented; if the connector does not include partitioning functionality, the partitioning option will be ignored and delta aggregation will work in the default, singled-threaded mode.

For the most current information about a particular connector's partitioning support, refer to the IdentityIQ Connectors documentation on SailPoint's documentation portal.

For more information on aggregation tasks, see Tasks for Aggregatio.

Tasks for Aggregation

Tasks drive the actual work of retrieving info from the data source. There is a task type for aggregating accounts, and a task type for aggregating groups. You use the task type as a template to set up your own specific tasks, and you can have many defined tasks for each type – for example, it typical to have a separate account aggregation task for each one of your source systems.

You can also have more than one aggregation task for a given system – for example, one that runs daily to only pick up changes from that day Delta Aggregation, and a more thorough one that runs monthly to refresh all your data from that specific source.

The aggregation tasks can be configured with options that determine which of the task's available actions are performed in the aggregation.

An Account aggregation task is responsible for:

Reading the account data from the designated data source
Creating a Link object to represent the account or updating an existing Link object with any data changes for the account
Associating the accounts (Links) to an existing Identity in the system or creating new Identities to hold the accounts

There are several additional options that an Account aggregation task can be configured to perform, such as:

Deleting any Links for accounts that no longer exist
Recalculating active scopes for the installation when scoping is enabled
Executing some of the Identity Refresh task options

An Account Group aggregation task aggregates information about groups. Group aggregation can only be done for applications which have a group schema defined. IdentityIQ aggregates group data from one application at a time, repeating this process for each application specified in the aggregation task (in the "applications" parameter of the task).

Other tasks make updates based on aggregated data, and therefore should be run after aggregation:

Identity Refresh: This task scans all identities to ensure that all identity information is up-to-date and accurate. Identity Refresh scans are also used to detect and report on policy violations, which may arise due to changes in account or group associations.
Effective Access Indexing: Effective Access is any indirect access that was granted through another object, such as a nested group, an unstructured target, or another role. This task indexes effective access so that it can be shown on a single view of an identity.

For more information, see: