Plug-and-play neural compression: A knowledge distillation framework with flexible dimensionality reduction

Laura Meneghetti; Edoardo Bianchi; Nicola Demo; Gianluigi Rozza

doi:10.1016/j.sysarc.2026.103778

Back

Plug-and-play neural compression: A knowledge distillation framework with flexible dimensionality reduction

Journal article

Peer reviewed

Plug-and-play neural compression: A knowledge distillation framework with flexible dimensionality reduction

Laura Meneghetti, Edoardo Bianchi, Nicola Demo and Gianluigi Rozza

The Journal of Systems Architecture: Embedded Software Design, Vol.175, pp.1-15

175

2026

DOI: https://doi.org/10.1016/j.sysarc.2026.103778

Handle:

https://hdl.handle.net/10863/51500

Abstract

Tensor Decomposition

Deep learning

Image processing

Neural network compression

The widespread adoption of embedded vision systems in industrial applications has highlighted the limitations of deep learning models, which are characterized by a high number of parameters. This is representing a significant concern within the scientific community due to the increased computational resources and memory required for training and inference of these models. Addressing this, we propose a flexible and effective methodology for neural network compression that integrates a pluggable dimensionality reduction layer with a Knowledge Distillation (KD) approach. The proposed compression framework allows for the exploration and comparison of various state-of-the-art techniques as reduction mechanism. Specifically, we investigate and implement reduction layers based on: tensor decompositions, such as Averaged Higher-Order Singular Value Decomposition (AHOSVD); non-linear methods like bottleneck projection layers, convolutional autoencoders (CAEs), and MLP-Mixer architectures. In our approach, this reduction layer replaces certain layers of the original network, projecting feature maps into a lower-dimensional space. The subsequent KD process then guides the compressed network to retain high performance. We conducted extensive experiments on image classification tasks, evaluating the efficacy of networks incorporating these reduction strategies across multiple architectures (VGG19, ResNet101) and datasets (CIFAR-10, CIFAR-100, STL-10). Our approach was then compared against both the original, uncompressed models and quantization, a widely used reduction method, in terms of accuracy, model size, parameter reduction, and inference time. The results demonstrate the versatility and effectiveness of our approach in achieving substantial neural network compression and efficiency across various reduction layer instantiations, while consistently maintaining high accuracy.

Files and links (1)

url

https://doi.org/10.1016/j.sysarc.2026.103778View

Details

Title: Plug-and-play neural compression: A knowledge distillation framework with flexible dimensionality reduction
Creators: Laura Meneghetti - Scuola Internazionale Superiore di Studi Avanzati
Edoardo Bianchi - Free University of Bozen-Bolzano
Nicola Demo - Scuola Internazionale Superiore di Studi Avanzati
Gianluigi Rozza - Scuola Internazionale Superiore di Studi Avanzati
Publication Details: The Journal of Systems Architecture: Embedded Software Design, Vol.175, pp.1-15
ISSN: 1383-7621
Series / Volume: 175
Publisher: Elsevier B.V.
Number of pages: 15
Identifiers: (UNIBZ)94280715
991007296250601241
Scopus ID: 2-s2.0-105032919184
Academic Unit: Faculty of Engineering
Language: English
Resource Type: Journal article
Author Names String: Meneghetti L, Bianchi E, Demo N, Rozza G

Metrics

1 Record Views