Optimizing Shapelet Discovery for Efficient and Accurate Time Series Classification

Adam Charane

Back

Dissertation

Optimizing Shapelet Discovery for Efficient and Accurate Time Series Classification

Adam Charane

Free University of Bozen-Bolzano

Doctor of Philosophy (PHD), Free University of Bozen-Bolzano

16/05/2025

Handle:

https://hdl.handle.net/10863/48443

Abstract

Time series data mining is a diverse field that offers algorithms for tasks ranging from anomaly detection to classification, underpinned by methods that assess similarity between subsequences within complex data. A critical yet under-explored area within this field is the assessment of dataset complexity, especially for multi-class time series. This thesis introduces the concept of empirical hardness—a dataset-specific measure of classification difficulty—and investigates the effectiveness of existing complexity measures in time series analysis. Our findings show that while many traditional complexity measures correlate with empirical hardness, they often offer redundant insights and fail to adequately capture the nuances of multi-class datasets, highlighting the need for new, multi-class-specific metrics tailored for time series data. A second focus of this thesis is the efficient extraction and evaluation of shapelets, discriminative subsequences that serve as features in classification tasks. We identify that the primary challenge in shapelet discovery lies in the high computational cost of evaluating distances across a vast number of candidates. To address this, we introduce an algorithm that reduces the candidate pool by clustering similar subsequences and selecting representative patterns, allowing us to explore multiple window lengths. Furthermore, we introduce an evaluation method, which prioritizes intra-class versus inter-class distinction, yields higher accuracy even with a small number of shapelets, enhancing both efficiency and accuracy. This approach enables us to capture a wider range of informative subsequences, achieving strong classification performance with fewer, more discriminative shapelets. Lastly, we address the challenge of exactly computing correlations between all aligned subsequences of two time series and present a visualization tool that compactly represents these relationships. This correlation analysis serves as a valuable tool for understanding when and where two signals align or diverge, enabling a deeper exploration of time-dependent patterns within the data.

Files and links (1)

pdf

Adam_Charane_Thesis3.13 MB

Embargoed Access

Details

Title: Optimizing Shapelet Discovery for Efficient and Accurate Time Series Classification
Creators: Adam Charane - -, Faculty of Engineering
Contributors: Johann Gamper (Supervisor) - -, Faculty of Engineering
Awarding Institution: Free University of Bozen-Bolzano
Doctor of Philosophy (PHD)
Theses and Dissertations: Doctor of Philosophy (PHD), Free University of Bozen-Bolzano
Publisher: Free University of Bozen-Bolzano
Number of pages: 103
Identifiers: 991007091608101241
Academic Unit: Faculty of Engineering
Language: English
Resource Type: Dissertation

Metrics

1 Record Views