Abstract
We propose a framework for detecting medium-level events referring to intervals of frames of a video stream. The detected events can serve as input for an earlier developed framework detecting high-level surveillance events. More specifically, we first define some specific image processing algorithms to effectively identify and track people and items in frames, and then exploit a previously-defined language based on relational algebra extended by intervals to develop both offline and online algorithms for labeling sequences of frames with descriptors such as "person A has package X" or "person B is in car C". An experimental evaluation carried out on real-world data sets shows promising results in terms of both accuracy and performance.