L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers

Sofia Casarin; Sergio Escalera; Oswald Lanz

doi:10.1109/CVPR52734.2025.00419

Back

L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers

Conference proceeding

Peer reviewed

L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers

Sofia Casarin, Sergio Escalera and Oswald Lanz

2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.4441-4451

IEEE Conference on Computer Vision and Pattern Recognition (Nashville, TN, 11/06/2025–15/06/2025)

2025

DOI: https://doi.org/10.1109/CVPR52734.2025.00419

Handle:

https://hdl.handle.net/10863/48864

Abstract

Measurement

Training

Computer vision

Neural networks

Graphics processing units (GPUs)

Computer architecture

Benchmark testing

Transformers

Pattern recognition

Videos

Training-free Neural Architecture Search (NAS) efficiently identifies high-performing neural networks using zero-cost (ZC) proxies. Unlike multi-shot and one-shot NAS approaches, ZC-NAS is both (i) time-efficient, eliminating the need for model training, and (ii) interpretable, with proxy designs often theoretically grounded. Despite rapid developments in the field, current SOTA ZC proxies are typically constrained to well-established convolutional search spaces. With the rise of Large Language Models shaping the future of deep learning, this work extends ZC proxy applicability to Vision Transformers (ViTs). We present a new benchmark using the Autoformer search space evaluated on 6 distinct tasks, and propose Layer-Sample Wise Activation with Gradients information (L-SWAG), a novel, generalizable metric that characterises both convolutional and transformer architectures across 14 tasks. Additionally, previous works highlighted how different proxies contain complementary information, motivating the need for a ML model to identify useful combinations. To further enhance ZC-NAS, we therefore introduce LIBRA-NAS (Low Information gain and Bias Re-Alignment), a method that strategically combines proxies to best represent a specific benchmark. Integrated into the NAS search, LIBRA-NAS outperforms evolution and gradient-based NAS techniques by identifying an architecture with a 17.0% test error on ImageNet1k in just 0.1 GPU days.

Files and links (1)

url

https://doi.org/10.1109/CVPR52734.2025.00419View

Details

Title: L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers
Creators: Sofia Casarin - Free University of Bozen-Bolzano
Sergio Escalera - Computer Vision Center
Oswald Lanz - Free University of Bozen-Bolzano
Publication Details: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.4441-4451
ISBN: 979-8-3315-4364-8
Conference: IEEE Conference on Computer Vision and Pattern Recognition (Nashville, TN, 11/06/2025–15/06/2025)
Publisher: Computer Vision Foundation
Number of pages: 11
Identifiers: 979-8-3315-4364-8
(UNIBZ)90476819
991007112462801241
Scopus ID: n.a.
Academic Unit: Faculty of Engineering
Language: English
Resource Type: Conference proceeding
Author Names String: Casarin S, Escalera S, Lanz O

Metrics

1 Record Views