Abstract
Cyber-physical systems (CPS) integrate computational and physical components to interact with humans across various modalities. Online Collaborative Artificial Intelligence Systems (OL-CAIS) are a subset of CPS that collaborate with humans to achieve shared goals through online learning. These systems are exposed to environmental changes that may lead to performance degradation (i.e., disruptive events). Decision-makers must develop policies that restore performance (i.e., ensure resilience) while minimizing adverse energy impacts (i.e., ensure greenness). These policies must balance greenness and resilience in collaborative actions. This research investigates the challenge of balancing greenness and resilience in OL-CAIS under disruptive events. Our objectives are to i) model OL-CAIS resilience for automatic state detection, ii) develop agent-based policies to optimize the trade-off between greenness and resilience, and iii) understand catastrophic forgetting to maintain consistent performance. By addressing these objectives, we equip decision-makers with tools and strategies to manage OL-CAIS effectively. The research employs a systematic, iterative methodology combining theoretical modeling and empirical evaluation. We model OL-CAIS resilience through three operational states: steady, an initial state of autonomous collaboration with humans; disruptive, in which performance degrades and recovering policies are required; and final, in which the system experiences memory degradation after disruption resolution. To fill the need for recovering policies in the disruptive state, we introduce the GResilience framework, which recommends recovery actions to balance greenness and resilience. These include: i) one-agent policies using multi-objective optimization, ii) two-agent policies resolving trade-offs via game theory, and iii) reinforcement learning (RL) agent policies maximizing rewards based on greenness-resilience states. Then, to evaluate the effectiveness of these agent-based policies in achieving green recovery, we create our measurement framework. The measurement framework defines measurable concepts to evaluate the effectiveness of resilience (i.e., recovery speed and performance steadiness) and greenness (i.e., green efficiency and service autonomy). The research also explores catastrophic forgetting in on line learning, analyzing its effects across multiple disruptions and proposing strategies to maintain steady performance. Additionally, we introduce a containerization methodology to optimize resource allocation and reduce energy consumption. We assess our models and theories through empirical evaluation, which includes real world and simulated experiments with an industrial collaborative robot learning object classification from human demonstrations. Our experiments demonstrate the resilience model’s ability to trace system performance evolution across states in runtime during disruptive events. The effectiveness of the GResilience framework to balance greenness and resilience was evaluated in four real-world experiments and over 800 simulated experiments for each agent-based decision-making policy. Results showed that our agents’ policies improved green recovery over internal policies, reducing the time needed for recovery, lowering performance fluctuations between acceptable and unacceptable levels, and decreasing human dependency. However, this improvement came at the cost of higher CO2 emissions, resulting from the additional computational power required by the agents. RL-agent policies outperformed two-agent and one-agent policies, respectively. Furthermore, our experiments revealed catastrophic forgetting caused by repeated disruptions, but our policies ensured consistent performance steadiness. Finally, comparing bare-metal and containerized setups, we demonstrate that containerization significantly reduced CO2 emissions, halving those of the bare-metal setup. In conclusion, this research advances the understanding of green resilient OL-CAIS by providing decision-makers with novel metrics, theoretical models, and actionable strategies. The frameworks and policies developed here empower stakeholders to navigate the trade-offs between resilience and greenness, enabling the design of environmentally sustainable and resilient systems.