22nd EANN 2021, 25 - 27 June 2021, Greece

A Fuzzy Approach to Identity Resolution

Asif Nawaz, Prof H Kazemian

Abstract:

  Identity resolution is crucial for law enforcement agencies globally and a difficult task to match the real-world identity in big data due to data inconsistency e.g. typographical errors, naming variation, and abbreviations. The fuzzy approach to identity resolution has been introduced that uses Soundex and Jaro-Winkler distance algorithms in a cascaded manner to calculate an aggregate score for the full name. While the Edit-distance algorithm is used to score the address and ethnicity description attributes. The Soundex code has been modified to numbers only with increased code length to 6-digits for this fuzzy approach. This allowed the matching algorithm to overcome some of the Soundex code limitations of name matching. The approach accommodates three different variations of name for an iterative search process that retrieves matched records based on inputs. In the experiment, searching for a suspect in two different cases, the initial search retrieved 173 and 52 records for each target suspect. These records were grouped using the Mean-Shift clustering technique based on the similarity score of three attributes. For further analysis, the segmentation process of records matched 16 and 22 records for each case respectively, and graph analysis matched the target suspect identity out of other matched identities with links association to different addresses. The overall matching performance of this fuzzy approach is encouraging, and it can benefit law enforcement agencies to speed up the investigation process and most importantly can help to identify the suspect with even minimal information available.  

*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.