The address connector: Noninvasive synchronization of hierarchical data sources
MetadataShow full item record
SubjectApproximate matching; Data quality; Entity resolution; Hierarchical data; Record linkage; Residential addresses; Similarity query; Trees
Different databases often store information about the same or related objects in the real world. To enable collaboration between these databases, data items that refer to the same object must be identified. Residential addresses are data of particular interest as they often provide the only link between related pieces of information in different databases. Unfortunately, residential addresses that describe the same location might vary considerably and hence need to be synchronized. Non-matching street names and addresses stored at different levels of granularity make address synchronization a challenging task. Common approaches assume an authoritative reference set and correct residential addresses according to the reference set. Often, however, no reference set is available, and correcting addresses with different granularity is not possible. We present the address connector, which links residential addresses that refer to the same location. Instead of correcting addresses according to an authoritative reference set, the connector defines a lookup function for residential addresses. Given a query address and a target database, the lookup returns all residential addresses in the target database that refer to the same location. The lookup supports addresses that are stored with different granularity. To align the addresses of two matching streets, we use a global greedy address-matching algorithm that guarantees a stable matching. We define the concept of address containment that allows us to correctly link addresses with different granularity. The evaluation of our solution on real-world data from a municipality shows that our solution is both effective and efficient.
Showing items related by title, author, creator and subject.
Augsten N; Böhlen M; Gamper J (ACM, 2006)Several recent papers argue for approximate lookups in hierarchical data and propose index structures that support approximate searches in large sets of hierarchical data. These index structures must be updated if the ...
Jaber, M; Papapetrou, P; Helmer, S; Wood, PT (Springer, 2014)We study the problem of detecting hierarchical ties in a social network by exploiting the interaction patterns between the actors (members) involved in the network. Motivated by earlier work using a rank-based approach, ...
Fuchs S; Di Lascio FML; Durante F (CMStatistics, 2017)A copula-based notion of dissimilarity between continuous random variables is introduced and formalized. Such a concept aims at detecting rank--invariant dependence properties among random variables and, as such, it will ...