Abstract
Query containment under constraints is the problem of checking whether for every database satisfying a given set of constraints, the result of one query is a subset of the result of another query, Recent research points out that this is a central problem in severa database applications, and we address it within A setting where constraints are specified in the form of special inclusion dependencies over complex expressions , built by using intersection and difference of relations, special forms of quantification, regular expressions over binary relations, and cardinality constraints. These types of constraints capture a great variety of data models, including the relational, the entity-relational, and the object-oriented model, We study the problem of checking whether q is contained in q' with respect to the constraints specified in a schema S, where q and q' are nonrecursive Datalog programs whose atoms are complex expressions. We present the following results on query containment. For the case where q does not contain regular expressions, we provide a method for deciding query containment, and analyze its computational complexity. We do the same for the case where neither S nor q, q' contain number restrictions. To the best of our knowledge , this yields the first decidability result on containment of conjunctive queries with regular expressions. Finally, we Provo that the problem is undecidable for the case where we admit inequalities in q', , 1 Introduction Query containment is the problem of checking whether for avcry database, the result of one query is a subset of the result of another query', Many recent papers point out that this problem is important in several contexts, including information integration [33], query optimization [2, 31, (mate-I 'Wo rofor to the eat semantics of query containment. Bag seman-tlca In atudlod, for oxnmplo, in (26]. rialized) view maintenance [Zl], data warehousing [36], and constraint checking [22]. We deal with the problem of query containment under constraints, i.e. the one of checking whether containment between two queries holds for every database satisfying a given set of constraints. This problem is relevant in every situation where the database schema is specified with a rich data definition language. In particuhrr in the case of information integration, queries are to be compared relatively to (inter-schema) constraints, which are used to declaratively specify the uglue " between two source schema, and between one source schema and the global schema [7,24,33,8,29]. The complexity of query containment in the absence of constraints has been studied in various settings. In [lo], NP-completeness has been established for conjunctive queries, and in [13] a multi-parameter analysis has been performed for the same case, showing that the intractability is due to certain types of cycles in the queries. In [26, 341, II;-completeness of containment of conjunctive queries with inequalities was proved, and in [32] the case of queries with the union and difference operators was studied. For various classes of Datalog queries with inequalities, decidability and undecidability results were presented in (121 and [34], respectively. Query containment under constraints has also been the subject of several investigations. For example, decidability of conjunctive query containment was investigated in [4] under functional and multi-valued dependencies, in [14] under functional and inch&on dependencies, in [9, 28, 301 under constraints representing is-a hierarchies and compIex objects , and in [17] in the case of constraints represented as Datalog programs. In this paper we address query containment in a setting where: l The schema is constituted by concepts (unary relations) and relations as basic elements, and by a set of constraints expressed in a logic-based formalism. Every constraint is an inclusion of the form al C 02, where al and a2 are compIex expressions bu% by using intersection and difference of relations, special forms of quantification, regular expressions over binary relation, and number restrictions (i.e. cardinal-ity constraints imposing limitations on the number of tuples in a certain relation in which an object may appear). The constraints express essentially inclusion dependencies between concepts and relations, and their expressive power is due to the possibility of using complex expressions in the specification of the dependen-149