Monitoring, Detecting, Identifying, and Healing Anomalous Workload in Clustered Computing Environments
MetadataShow full item record
SubjectContainer Technology; Anomaly detection; Anomaly Analysis; Performance measurement; Cloud computing; Distributed Cluster; Adaptive software systems; Adaptive systems; Area 01
Context and Motivations. Container-based architecture is emerging as a new approach for building distributed applications as a collection of independent services that work together. As a result, applications are able to be scaled and updated based on the load attributed to each single container. Monitoring the workload in a distributed system is a complex task as the degradation of performance within a single container would cascade and reduce the performance of other dependent containers. Such performance degradation may result from anomalous workload, which could be observed as insufficient response time of an application that would be considered as failure. Hence, knowing workload characteristics in advance helps in controlling system resources that improve system performance. Workload prediction can be used to decide the amount of resources to be allocated for each container or node in the future. The accuracy of workload prediction varies depending on the used prediction methods and the characteristics of the workload. Furthermore, the heterogeneity of resources offered in the cloud may also lead to workload variations that affect the performance of the overall system because some workloads may be CPU intensive whereas others are memory intensive. Such variations may lead to the violation of service level agreements (SLAs) made between service providers and users for specifying the quality of the provided services. Because of the high complexity of cloud applications, modeling the behaviors of applications usually requires domain knowledge which is difficult to obtain. In such a case, anomaly detection, prediction, and localization can help in capturing and tracking the anomalous behavior that deviates from the normal behavior. We aim to investigate how to analyse an anomalous resource behavior in a cluster consisting of nodes with application deployed on containers as their load from a sequence of observations emitted by the resource. Objective. The objective of the thesis is to provide a self-adaptive architecture that detects, locates, heals the anomalous behavior in a containerized cluster environment based on the observed response time. Method. We propose a self-adaptive architecture that compromises two models: Fault Management Models and Recovery Model. The Fault Management Models apply an anomaly type identification mechanism based on the detected anomaly. The Recovery Model provides multiple recovery actions to be applied based on the type of the identified anomaly. At the end, the proposed architecture is evaluated to assess its accuracy in detecting, identifying, predicting, and recovering anomalous behaviors within system components. Results. Different experiments are conducted to show the performance of the proposed architecture. The experiments show that the proposed architecture can detect and locate the anomalous behavior with percent more than 97%. Thesis Statement. Analyzing the anomalous behavior in a containerized cluster environment, and providing multiple recovery actions for the analyzed anomaly, not only increases a system scalability but also reduces the operating cost and the system maintenance.
Showing items related by title, author, creator and subject.
A Controller for Anomaly Detection, Analysis and Management for Self-Adaptive for Container Clusters Samir A; El Ioini N; Fronza I; Barzegar HR; Le VT; Pahl C (2019)Service computing in the cloud allows applications to be deployed remotely. These are managed by third-party service providers that make virtualised resources available for these services. Self-adaptive features for ...
Goal-recognition-based Adaptive Brain-Computer Interface for Navigating in Immersive Robotic Systems Abu-Alqumsan M; Ebert F; Peer A (2017)Objective. This work proposes principled strategies for self-adaptations in EEG-based Brain-computer interfaces (BCIs) as a way out of the bandwidth bottleneck resulting from the considerable mismatch between the low-bandwidth ...
Affordable and Energy-Efficient Cloud Computing Clusters: The Bolzano Raspberry Pi Cloud Cluster Experiment Abrahamsson P; Helmer S; Phaphoom N; Nicolodi L; Preda N; Miori L; Angriman M; Rikkilä J; Wang X; Hamily K; Bugoloni S (IEEE, 2013)We present our ongoing work building a Raspberry Pi cluster consisting of 300 nodes. The unique characteristics of this single board computer pose several challenges, but also offer a number of interesting opportunities. ...