Abstract
Enhancing workplace safety and conditions in industrial settings through the recognition of workers’ actions is a pivotal area of research. To this end, the MECCANO dataset serves as a unique benchmark, essential for the development of specialized recognition models. This study, representing our notable achievement of second place in the MECCANO 2023 challenge, focuses on developing egocentric multimodal action recognition models tailored to industry applications. Specifically, we employed the Gate-Shift-Fuse (GSF) module, compatible with any 2D Convolutional Neural Network, extending its functionality to RGB and Depth modalities, including a slow-fast inference approach. Our methodology involved training multiple GSF instances with variations in backbone architecture, number of segments, batch size, and number of epochs. An ensemble system integrating these instances through soft and hard voting was developed, achieving a top-1 accuracy of 52.57% and a top-5 accuracy of 81.53% in the challenge. We also engineered and prototyped an action recognition system that employs the trained models.