PII Search and Relevancy Scoring
The File Access Manager Privacy Engine will search for all submitted PII search criteria.
Search criteria marked as "Required" will be mandatory. If a file does not match these criteria, it will not be returned as a result.
Search criteria that are not marked as "Required" will not exclude a file if not matched, but will contribute to the overall relevancy score.
Relevancy Score
A document relevancy score signifies the accuracy percentage between the document content and the search criteria.
The higher the relevancy score, the higher the probability the information matched belongs to the individual whose details we've entered in the search criteria.
The more elaborate and well-defined the search criteria is, the better accuracy the privacy engine can produce. For example, searching for just a first name or a nickname is likely to return a large number of false positive, since there are likely to be many matches of that name.
However, searching for a specific email, name, and ID is likely to produce much more accurate results.
The relevancy score is calculated based on the number of search criteria and the accuracy of the data the privacy engine was able to match.
Each search criteria has a relative relevancy score allocation that contributes to the overall relevancy score.
For example, when you perform a search using four search criteria, each criterion has a weight of 25% of the overall score, or a relative score accounting for 25% of the overall score.
The overall score will be based on the number of criteria matched. If the search matched only one out of four search criteria, the overall relevancy score would be 25%.
If two search criteria were matched, the relevancy score would be 50%, and so forth. A full match of all search criteria would yield a 100% relevancy score match.
Name fields offer more granular relevancy scoring. If a name search criteria is matched in its entirety, then it will contribute the full amount of its relative relevancy score.
However, Name fields (the Name and Alias fields) are also evaluated for partial matching. In case a name search criteria was partially matched, it will contribute only 50% of it's relative relevancy score to the overall score.
For example, with a four-term search criteria, when one of the search criterion is the name "John Smith," the name field will have a 25% relative relevancy score.
If the name is matched fully, that is, the name "John Smith" in matched fully in the document, the name search criteria will contribute the full 25% to the overall relevancy score.
However, if the name is partially matched, for example, the file contains "John" or "Smith," only half of the relative relevancy score would be accounted for in the document overall relevancy score.
Thus, the more search criteria involved in the DSAR query, the less impact partial matching have on the overall score, since the likelihood that the identity search for was actually matched is much higher.
So, in the previous example, if we're looking for 4 data points (e.g., ID, Address, Email and Name ) – the search matched the first three and fully matched the name – the relevancy score would be 100%.
However, if the first three criteria are matched and the name is matched partially, the relevancy score would be 87%. It is still high, since we hit 4 different data points and there's a high probability the document matched the identity, or individual we're searching for, even if the name was not fully matched. However, if the query is searching just for a name, and the name is partially matched, then the overall score would be 50%, as opposed to a 100% for a fully matched name.
Lower probability documents are documents with a low relevancy score. They can easily be excluded from further DSAR processing. The decision to exclude files from further DSAR processing is with the discretion of Privacy Manager and Reviewers.