Abstract
The widespread adoption of embedded vision systems in industrial applications has highlighted the limitations of deep learning models, which are characterized by a high number of parameters. This is representing a significant concern within the scientific community due to the increased computational resources and memory required for training and inference of these models. Addressing this, we propose a flexible and effective methodology for neural network compression that integrates a pluggable dimensionality reduction layer with a Knowledge Distillation (KD) approach. The proposed compression framework allows for the exploration and comparison of various state-of-the-art techniques as reduction mechanism. Specifically, we investigate and implement reduction layers based on: tensor decompositions, such as Averaged Higher-Order Singular Value Decomposition (AHOSVD); non-linear methods like bottleneck projection layers, convolutional autoencoders (CAEs), and MLP-Mixer architectures. In our approach, this reduction layer replaces certain layers of the original network, projecting feature maps into a lower-dimensional space. The subsequent KD process then guides the compressed network to retain high performance. We conducted extensive experiments on image classification tasks, evaluating the efficacy of networks incorporating these reduction strategies across multiple architectures (VGG19, ResNet101) and datasets (CIFAR-10, CIFAR-100, STL-10). Our approach was then compared against both the original, uncompressed models and quantization, a widely used reduction method, in terms of accuracy, model size, parameter reduction, and inference time. The results demonstrate the versatility and effectiveness of our approach in achieving substantial neural network compression and efficiency across various reduction layer instantiations, while consistently maintaining high accuracy.