DigitHist: a Histogram-Based Data Summary with Tight Error Bounds
MetadataShow full item record
We propose DigitHist, a histogram summary for selectivity estimation on multi-dimensional data with tight error bounds. By combining multi-dimensional and one-dimensional histograms along regular grids of different resolutions, DigitHist provides an accurate and reliable histogram approach for multi-dimensional data. To achieve a compact summary, we use a sparse representation combined with a novel histogram compression technique that chooses a higher resolution in dense regions and a lower resolution elsewhere. For the construction of DigitHist, we propose a new error measure, termed $u$-error, which minimizes the width between the guaranteed upper and lower bounds of the selectivity estimate. The construction algorithm performs a single data scan and has linear time complexity. An in-depth experimental evaluation shows that DigitHist delivers superior precision and error bounds than state-of-the-art competitors at a comparable query time.