Partitioning

Partitioning is used to break operations into multiple pieces, or partitions, allowing data processing to split across multiple hosts, and across multiple threads per host. The overall goal with partitioning is to increase processing throughput and speed.

Partitioning is supported for account and account group aggregation tasks, for identity refresh tasks, for generation of manager and targeted certifications, and for role propagation.

Some connector types support partitioning at the application level. To use partitioning for account and account group aggregation, you must configure the application for partitioning, and enable partitioning when defining an account aggregation or identity refresh task.

If partitioning is enabled in an aggregation task that is acting on a connector application that does not support partitioning, partitioning will take place at the IdentityIQ database level and not at the connector level; this is referred to as generic partitioning or task-level partitioning.

Applications are configured as part of the Account Settings on the Configuration tab of the Application Configuration page. See Configuring an Application.

For task details, see Account Aggregation, Identity Request Maintenance, and Identity Refresh.

For certification details, see Manager, Application Owner, and Advanced Access Reviews and Scheduling a Targeted Certification.

Note

Partitioning is not available on all tasks or certifications. Partitioning is available for Account Aggregation, Account Group Aggregation, Identity Refresh, Perform Identity Request Maintenance, and Perform Maintenance tasks, and for Manager and Targeted Certification generation. Partitioning is also not available on all application types. Partitioning is controlled by both the configuration of the applications you are using and the configuration of the applications used to communicate with those applications.

How Partitioning Works

Each partition is placed in a global queue, and machines (or hosts) in a cluster compete to execute the partitions in the queue. Machines are added or removed from the cluster dynamically with automatic balancing. If a machine fails or is taken down while processing a partition, the partition is placed back into the queue and reassigned to a different machine.

A single result object is shared by all partitions and is continually updated so you can monitor the overall progress of the partitioned operation. When all partitions have finished executing, the result is marked complete.

Each instance of IdentityIQ includes a Server object containing information about what is happening in that instance. For machines running multiple instances of IdentityIQ, each instance must be assigned a unique iiq.hostname and have a unique Server object.

The Server objects include a heartbeat service that is updated by a new system thread on a regular basis. By monitoring server heartbeats, machines in the cluster can detect when another machine fails. When this happens any partitioned requests that were running on that machine are restarted and picked up by a different machine in the cluster, so that failure of one machine does not terminate an entire long running task.

Server objects include some statistics, such as the number of request threads currently active, and the request types that are executing. You can view the state of the machines in your cluster on the Administrator Console page. See Using the Administrator Console.

Loss Limits

Some of the features which support Partitioning also include an option to set loss limits for the identities or accounts being processed by a task. The loss limit sets the maximum number of identities or accounts that will be reprocessed in case of a sudden termination of a partitioned refresh.

In a partitioned task, each time the task reaches the loss limit – that is, it has processed a number of accounts or identities that match the value of the loss limit – it commits a list of the accounts to a requestState object. If the task should happen to fail, due perhaps to a server or database going down, the task will check the requestState object when it resumes, so that it knows which accounts have already been processed. This means the task doesn't have to re-process the entire partition. A lower loss limit number will result in less duplicated work following a crash, but may slow down the task due to increased database contention.

Loss limit data that is stored in the requestState object is base-64 encoded and so is not human-readable. RequestState objects are not retained in the IdentityIQ database past their usefulness; in other words, once a loss limit has been reached, the object for that particular segment is automatically deleted.

Configuring Partitioning Request Objects

Partitioning is also maintained using RequestDefinition objects that are defined for each request type. These objects control how each request-type is processed. For example, these objects define the number of threads that run for each request on the instances of IdentityIQ running on a specific machine. The RequestDefinition objects must be defined on each machine, host, in a cluster.

Note

By default the maximum number of threads to run on each host is set to 1. This number can be changed to maximize performance in your environment, but should be done with caution and only after testing and tuning for your environment.

The following RequestDefinition objects are available:

Aggregation Partition – define the maximum number of threads to run on each host during account aggregations
Identity Refresh Partition – define the maximum number of threads to run on each host during identity refresh
Manager Certification Generation Partition – define the maximum number of threads, the error action, and orphan action for partitioned manager certification requests
Role Propagation Partition – define the maximum number of threads to run on each host during role propagation

To work with the RequestDefinition objects, go to the IdentityIQ Debug page and select RequestDefinition from the Select an Object dropdown list.