Setting Up Data Extract Task

The Data Extract Task selects objects from the IIQ database, processes each object, and publishes them to the specific publisher. The first time Data Extract runs, it completes a full extract for the defined objects. Subsequent task runs extract a delta based on what has changed since the last time it ran.

You can set up this task to run on your instance:

Configure a Data Extraction YAMLConfig for the task provides what types of objects to extract. See Configuring Data Extraction.
Configure a transformConfigurationName YAMLConfig to describe how to extract the types from the first YAMLConfig extractedObjects. See Configuring Data Transformation.
1. Make sure the extractedObjects from the first YAMLConfig all have a corresponding imageConfigDescriptor and that each has a valid objectClassName
Ensure an appropriate publisher is currently registered and available. Refer to publisher configuration(hyperlink)
Navigate to Setup > Tasks.
Select the New Task dropdown in the upper right corner.
From the dropdown list, select Data Extract.

Note: When upgrading to version 8.4 from another version of IdentityIQ, if you do not see the Data Extract option, then make sure you followed the upgrade process by importing upgradeObjects. If it's a clean installation, then you need to reimport init.xml.
On the New Task screen, enter a Name for your task and add any other optional field information you would like.
Under Data Extract Options, select a Data Extract YAMLConfig and Data Extract publisher.

Select Save, Save & Execute, Cancel, or Refresh.

Once the Save button is selected, we can set the optional Task Arguments using debug page lossLimit, partition and other arguments.

The default task arguments added are

Argument Name	Default	Description
lossLimit	2500	The state of Data Extract Partitions will be snapshotted to its RequestState object each time it processes an additional set of lossLimit objects.
maxObjectAttempts	5	If an object fails to be extracted or published during a run of Access History or Data Extract, that is considered a failed attempt. The failed object will be processed again in subsequent runs of the task if it has failed less than maxObjectAttempts times.
maxFailuresAbsolute	500	If more than maxFailuresAbsolute objects failed to be extracted or published during a run of the task: the task will be marked with an error, and the NamedTimestamp date will not be altered, and the failed objects will not be saved Thus, the next run will be a redo.
maxFailuresPercent	5	If more than maxFailuresPercent percent objects failed to be extracted or published during a run of the task: the task will be marked with an error, and the NamedTimestamp date will not be altered, and the failed objects will not be saved Thus, the next run will be a redo.
minExtractPartitions	5	A hint for the minimum number of data extract partitions to launch. This will be ignored (exceeded) if there are more than maxObjectsPerPartition * minExtractPartitions objects to process.
maxExtractPartitions	50	A hint for the maximum number of data extract partitions to launch. This will be ignored (exceeded) if there are more than maxObjectsPerPartition * maxExtractPartitions objects to process.
maxObjectsPerExtractPartition	50000	The maximum number of objects which will be delegated to a single data extract partition.

Executing the task looks at what objects are configured to be exported, applies the filter criteria and any limits that you have set and translates all of those objects into JSON documents, and writes them to a JMS queue.

If executed, review the Task Results, which display all the differences as well as the attribute statistics. See Viewing Data Extract Task Results.

The results declared for the task are

Result Label	Result Variable	Description
Number of Objects qualified for extract	totalObjectMessages	Count of the objects which were qualified for processing. This is the sum of totalModifiedObjectMessages and totalReattemptObjectMessages. Always shown.
Number of Objects Qualified by Change	totalModifiedObjectMessages	Count of the modified objects which were qualified for processing. Only shown if totalReattemptObjectMessages > 0.
Number of Objects Qualified by Re-attempt	totalReattemptObjectMessages	Count of the previously failed objects which were qualified for another re-attempt at processing in this run. Only shown if > 0
Number of Deletion Objects	totalDeletionExtractedObjects	Count of the rows in spt_intecepted_delete which were attempted to be published for this task. Shown if > 0.
Deletion Objects Published	totalDeletionExtractedObjectsDispatched	Count of the rows in spt_intecepted_delete which were successfully published by this task. Shown if > 0.
Number of Objects Processed	totalSeenObjects	Count of the objects which were processed (across all partitions). This does not imply whether or not they were successfully extracted and published – only that a partition attempted to process it. Only shown if > 0
Number of Objects Unprocessed	totalUnseenObjects	If there any objects left unprocessed because one or more partitions were prematurely exited (e.g. due to too many failures), then totalObjectsUnseen is populated with the count of unprocessed objects. Only shown if > 0
Number of Objects Successfully Extracted	totalExtractedObjects	Count of the objects which were successfully extracted (across all partitions) Only shown if > 0.
Number of Objects Not Found	totalExtractedObjectsNotFound	Count of the objects which were not found in the database during extraction (across all partitions). Only shown if > 0.
Number of Objects that Failed to Extract	totalExtractedObjectsFailed	Count of the objects which encountered exceptions during extraction (across all partitions). Only shown if > 0.
Number of Objects Successfully Published	totalExtractedObjectsPublished	Count of the objects which were successfully published (across all partitions). Only shown if > 0.
Number of Objects that Failed to Publish	totalPublishingFails	Count of the objects which encountered exceptions during publishing (across all partitions). Only shown if > 0.
Number of Abandoned Re-attempts	totalDroppedObjects	Count of the failed objects that have exceeded their re-attempt limit, and will not be attempted again.

You can schedule this task to run on a regular cadence. See How to Schedule a Task.

If you configure different YAML configurations for different object types, you can also configure separate tasks to run at different intervals. For example, YAML 1 may be configured for Object X and YAML 2 for Object Y. Task 1 for YAML 1 may be scheduled to run every week, while Task 2 for YAML 2 may be scheduled to run every day.

Enabling Partitioning in Data Extract Task

Partitioning is used to break operations into multiple parallel executions, or partitions, allowing data processing to split across multiple hosts, and across multiple threads per host. The overall goal with partitioning is to increase processing throughput and speed.

For Data Extract, partitioning cannot be configured in the UI. It is configured in the RequestDefinition object for Data Extract. RequestDefinition objects govern how IdentityIQ handles items added to the Request queue for processing. There are many different RequestDefinition objects, but only a few of them are relevant to partitioning.

To configure Partitioning for Data Extract:

Select the Wrench icon dropdown at the top of the screen, then select Object.
Select a RequestDefinition Object from the Object Browser dropdown.
Click on the Data Extract Partition object from the list.
IdentityIQ opens a window showing the object’s XML.

Sample XML

Copy

 <RequestDefinition name="Data Extract Partition" executor="sailpoint.request.DataExtractRequestExecutor"retryMax="20">
             <Attributes>
             <Map>
             <entry key='maxThreads'value='5'/>
             <entry key="numDequeuRetries"value="5"/>
             <entry key="dequeueRetryWaitInterval"'value="2000"/>
             <entry key="numDequeuRetries"value="5"/>    
             </Map>
             </Attributes>
             </RequestDefinition>

These elements define Partitioning for Data Extract:

The maxThreads value governs the number of partitions that IdentityIQ will launch for each task execution

The numDequeueRetries value is the number of times the operation can be retried if it fails during its operation. The value of numDequeRetries should be equal to the value of retryMax.

The dequeueRetryWaitInterval value is the time taken to restart the operation, once it has failed. Value here is in milliseconds.

Note: The values in the attributes of XML code are configurable, the above shown values are given by default if nothing is provided.

For more information about Configuring Partitioning Request Objects, see Configuring Partitioning Request Objects

For more information about Partitioning in general, see Partitioning