Benchmarking and Optimization of OBDA Systems
MetadataShow full item record
SubjectDescription logics; Ontology-based data access; OBDA; Ontologies; Web ontology language; Logic; Semantic web; Big data; INF/01
In this thesis we address the two distinct but interleaved problems of benchmarking and optimization of ontology-based data access (OBDA) systems. OBDA is an approach to cope with the emerging need of providing an understandable view of the data stored in legacy systems. The OBDA solution to this problem is to separate the user from the data sources by means of a formal specification of domain knowledge that exposes a conceptual view of the data, called ontology. By accessing the data through the conceptual view, the user can query it by means of a more convenient vocabulary, does not need to be aware of storage details, and can obtain richer answers by the interleaving of the data and the domain knowledge. Although prototype OBDA systems are available, their questionable performance remains a significant bottleneck to their wider adoption. At the current time, in fact, there is a need of shifting the focus from the theoretical studies, which have been very fruitful as witnessed by the abundant literature on the subject, to the study of practical solutions. This need provides the motivation for the work in this thesis. As a first step, we have focused our attention on the problem of how to assess the performance of an OBDA system through benchmarking. In doing so, we have identified a series of guidelines on how such systems should be evaluated. These guidelines are based on real-world applicative scenarios, such as industrial applications of OBDA. We have devised a novel benchmark along the identified guidelines and based on real data coming from the oil industry. The benchmark comes with a data generator able to produce, from an initial data seed, datasets of increasing size while taking into account the requirements dictated by the OBDA setting. The devised data generator is not specific to our benchmark, but can be re-used in any setting in which an ontology and an initial data instance are available, without manual input from the end user. This has been done so as to ease and incentivize the proliferation of future OBDA benchmarks. We have then shifted our focus to the problem of optimization of OBDA systems, so as to make them usable in practical scenarios. With this respect, we have studied two different solutions to address the problem. The first solution is based on the observation that certain storage details and policies, prior to this work totally transparent to the OBDA paradigm, could be encoded into constraints able to enhance the performance of query answering up to orders of magnitude in complex real-world industrial scenarios and in presence of large enterprise databases. In this thesis we provide a formalization for such constraints, explain how they can (or cannot) be used to improve the performance of the OBDA system, and clearly single out the reasons why performance improvements take place. The second solution was inspired by the field of query processing in traditional relational database management systems (RDBMSs). In particular, we have studied the possibility of enriching an OBDA system with a planner able to choose the best execution plan for the query at hand. The choice is taken according to a cost model that estimates the resources consumption of each alternative. We have devised a cost model specific for the OBDA scenario that uses statistics traditionally used in RDBMSs as well as OBDA-driven measures. Our experiments show that alternative execution plans to the standard choice of current OBDA implementations can lead to major improvements in the performance of query answering. Moreover, they seem to confirm that our cost model is able to estimate which plan is the best to choose.
Showing items related by title, author, creator and subject.
Chesani, F; Mello, P; Montali, M; Torroni, P (CEUR, 2008)The Service Oriented Architecture paradigm, and its implementation based on Web Services, have been the object of an intense research and standardization activity. One of the most challenging open research issues is the ...
Calvanese, D; Lenzerini, M; Riccardo, R; De Giacomo, G (Bozen-Bolzano University Press, 2007)We aim at representing and reasoning about actions and (high level) programs over ontologies expressed in Description Logics. This is a critical issue that has resisted good solutions for a long time. In particular, while ...
Calvanese D; De Giacomo G; Soutchanski M (AAAI Press, 2015)In this paper we investigate situation calculus action theories extended with ontologies, expressed as description logics TBoxes that act as state constraints. We show that this combination, while natural and desirable, ...