Record linkage is the process of identifying records that refer to the same real-world entities, in situations where entity identifiers are unavailable. Records are linked on the basis of similarity between common attributes, with every pair being classified as a link or non-link depending on their degree of similarity. Record linkage is usually performed in a three-step process: first groups of similar candidate records are identified using indexing, pairs within the same group are then compared in more detail, and finally classified. Even state-of-the-art indexing techniques, such as Locality Sensitive Hashing, have potential drawbacks. They may fail to group together some true matching records with high similarity. Conversely, they may group records with low similarity, leading to high computational overhead. We propose using metric space indexing to perform complete record linkage, which results in a parameter-free record linkage process combining indexing, comparison and classification into a single step delivering complete and efficient record linkage. Our experimental evaluation on real-world datasets from several domains shows that linkage using metric space indexing can yield better quality than current indexing techniques, with similar execution cost, without the need for domain knowledge or trial and error to configure the process.
- When: 18th October 2018 13:00 - 14:00
- Where: Cole 1.33b
- Series: Systems Seminars Series
- Format: Seminar