Abstract
We address the cost of adding value joins to tree-pattern queries and monadic second-order queries over trees in terms of the tractability of query evaluation over two data models: XML and probabilistic XML. Our results show that the data complexity rises from linear, for join-free queries, to intractable, for queries with value joins, while combined complexity remains essentially the same. For treepattern queries with joins (TPJ) the complexity jump is only on probabilistic XML, while for monadic second-order logic over trees with joins (TMSOJ) it already appears for deterministic XML documents. Moreover, for TPJ queries that have a single join, we show a dichotomy: every query is either essentially join-free, and in this case it is tractable over probabilistic XML, or it is intractable. In this light we study the problem of deciding whether a query with joins is essentially join-free. For TMSOJ we prove that this problem is undecidable and for TPJ it is P2-complete. Finally, for TPJ we provide a conceptually simple criterion to check whether a given query is essentially join free.