Setting Up Data Extract Task

The Data Extract Task selects objects from the IIQ database, processes each object, and publishes them to the specific publisher. The first time Data Extract runs, it completes a full extract for the defined objects. Subsequent task runs extract a delta based on what has changed since the last time it ran.

You can set up this task to run on your instance:

  1. Configure a Data Extraction YAMLConfig for the task provides what types of objects to extract. See Configuring Data Extraction.

  2. Configure a transformConfigurationName YAMLConfig to describe how to extract the types from the first YAMLConfig extractedObjects. See Configuring Data Transformation.

    1. Make sure the extractedObjects from the first YAMLConfig all have a corresponding imageConfigDescriptor and that each has a valid objectClassName

  3. Ensure an appropriate publisher is currently registered and available. Refer to publisher configuration(hyperlink)

  4. Navigate to Setup > Tasks.

  5. Select the New Task dropdown in the upper right corner.

  6. From the dropdown list, select Data Extract.

    Note: When upgrading to version 8.4 from another version of IdentityIQ, if you do not see the Data Extract option, then make sure you followed the upgrade process by importing upgradeObjects. If it's a clean installation, then you need to reimport init.xml.

  7. On the New Task screen, enter a Name for your task and add any other optional field information you would like.

  8. Under Data Extract Options, select a Data Extract YAMLConfig and Data Extract publisher.

  9. Select Save, Save & Execute, Cancel, or Refresh.

    1. Once the Save button is selected, we can set the optional Task Arguments using debug page lossLimit, partition and other arguments.

    2. Executing the task looks at what objects are configured to be exported, applies the filter criteria and any limits that you have set and translates all of those objects into JSON documents, and writes them to a JMS queue.

    3. If executed, review the Task Results, which display all the differences as well as the attribute statistics. See Viewing Data Extract Task Results.

You can schedule this task to run on a regular cadence. See How to Schedule a Task.

If you configure different YAML configurations for different object types, you can also configure separate tasks to run at different intervals. For example, YAML 1 may be configured for Object X and YAML 2 for Object Y. Task 1 for YAML 1 may be scheduled to run every week, while Task 2 for YAML 2 may be scheduled to run every day.

 

Enabling Partitioning in Data Extract Task

Partitioning is used to break operations into multiple parallel executions, or partitions, allowing data processing to split across multiple hosts, and across multiple threads per host. The overall goal with partitioning is to increase processing throughput and speed.

For Data Extract, partitioning cannot be configured in the UI. It is configured in the RequestDefinition object for Data Extract. RequestDefinition objects govern how IdentityIQ handles items added to the Request queue for processing. There are many different RequestDefinition objects, but only a few of them are relevant to partitioning.

To configure Partitioning for Data Extract:

  1. Select the Wrench icon dropdown at the top of the screen, then select Object.

  2. Select a RequestDefinition Object from the Object Browser dropdown.

  3. Click on the Data Extract Partition object from the list.

  4. IdentityIQ opens a window showing the object’s XML.

Sample XML

Copy
 <RequestDefinition name="Data Extract Partition" executor="sailpoint.request.DataExtractRequestExecutor"retryMax="20">
             <Attributes>
             <Map>
             <entry key='maxThreads'value='5'/>
             <entry key="numDequeuRetries"value="5"/>
             <entry key="dequeueRetryWaitInterval"'value="2000"/>
             <entry key="numDequeuRetries"value="5"/>    
             </Map>
             </Attributes>
             </RequestDefinition>    

These elements define Partitioning for Data Extract:

The maxThreads value governs the number of partitions that IdentityIQ will launch for each task execution

The numDequeueRetries value is the number of times the operation can be retried if it fails during its operation. The value of numDequeRetries should be equal to the value of retryMax.

The dequeueRetryWaitInterval value is the time taken to restart the operation, once it has failed. Value here is in milliseconds.

Note: The values in the attributes of XML code are configurable, the above shown values are given by default if nothing is provided.

For more information about Configuring Partitioning Request Objects, see Configuring Partitioning Request Objects

For more information about Partitioning in general, see Partitioning