Abstract
Streams of data often originate from many distributed sources. A user wanting to query the streams should not need to know from where each stream originates but should be provided with a global view of the streams. R-GMA is a system that integrates distributed data streams to provide a global view of all the streams for users to query. R-GMA has been developed as a grid information and monitoring system although the techniques developed can be applied wherever there is a need to publish and query distributed streams.
Stream data is important not only for its current values but also for past values produced. In order to support this, the history of the stream must be archived and stream processing systems must support history queries. However, one problem which then arises is that data streams published by distributed sources are prone to missing data values, e.g. due to a network failure. Since the stream has missed some values, the stored history of the stream contains gaps. This paper considers how to generate the most complete answer possible to a positive conjunctive query over the available stream history. A model for representing the incompleteness in the stream history is provided along with an algorithm that distinguishes when and how the missing data affects the answer to a query.