Identifying the Extent of Completeness of Query Answers over Partially Complete Databases
MetadataShow full item record
In many applications including loosely coupled cloud databases, collaborative editing and network monitoring, data from multiple sources is regularly used for query answering. For reasons such as system failures, insufficient author knowledge or network issues, data may be temporarily unavailable or generally nonexistent. Hence, not all data needed for query answering may be available. In this paper, we propose a natural class of completeness patterns, expressed by selections on database tables, to specify complete parts of database tables. We then show how to adapt the operators of relational algebra so that they manipulate these completeness patterns to compute completeness patterns pertaining to query answers. Our proposed algebra is computationally sound and complete with respect to the information that the patterns provide. We show that stronger completeness patterns can be obtained by considering not only the schema but also the database instance and we extend the algebra to take into account this additional information. We develop novel techniques to efficiently implement the computation of completeness patterns on query answers and demonstrate their scalability on real data.
Showing items related by title, author, creator and subject.
Darari, F; Prasojo, RE; Nutt, W (Springer, 2014)With the increased availability of data on the Semantic Web, the question whether data sources offer data of appropriate quality for a given purpose becomes an issue. With CORNER, we specifically address the data quality ...
Nutt, W; Paramonov, S; Savkovic, O (Cambridge University Press (CUP): STM Journals, 2013)We address the problem to determine whether a query over a partially complete database can be answered completely, which arises in data integration and decision support. Using so-called table completeness statements, one ...
Razniewski S; Sadiq SW; Zhou X (ADC, 2016)In big data settings, the data can often be externally sourced with little or no knowledge of its quality. In such settings, users need to be empowered with the capacity to understand the quality of data sets and implications ...