Logo image
NAS just once: Neural Architecture Search for joint Image-Video Recognition
Conference proceeding   Peer reviewed

NAS just once: Neural Architecture Search for joint Image-Video Recognition

Sofia Casarin, S Escalera and Oswald Lanz
2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp.6431-6441
IEEE International Conference on Computer Vision Workshops
IEEE International Conference on Computer Vision (Honolulu, Hawai'i, 19/10/2025–23/10/2025)
2025
Handle:
https://hdl.handle.net/10863/51551

Abstract

Image classification Neural Architecture Search Video action recognition
Neural Architecture Search (NAS) for Video Understanding has slowly advanced compared to the Image-domain counterpart. Current approaches often focus on 3D networks, search for untied spatial and temporal components, or for pseudo-3D operators. As NAS methods for image-related tasks are often unsuitable for videos due to the lack of benchmarks like NASBench-101, many video-NAS methods use naA+-ve search procedures and fail to leverage advancements in search mechanisms developed for NAS for image tasks. In this work, we propose the first approach to bridge the gap between NAS for Videos and IMages (VIM-NAS), proposing a unique solution to find high-performing and efficient neural networks across ImageNet, Kinetics-400, Kinetics-600, and Something-SomethingV2 datasets. We optimize the 2D space and 3D space-time tubes to tokenize images and videos, along with the architecture of a unique supernet Vision transformer, via a differentiable weight-entanglement mechanism. Leveraging a multi-dataset training strategy, VIM-NAS achieves 84.4% Top-1 accuracy on ImageNet, 90.7% on Kinetics-400, improves state-of-the-art on Kinetics-600 by 0.4%, and improves previous NAS SOTA by 13.4% on Something-SomethingV2 reducing the accuracy gap with hand-designed neural networks in Video Action Recognition.
url
https://openaccess.thecvf.com/content/ICCV2025W/Findings/html/Casarin_NAS_just_once_Neural_Architecture_Search_for_joint_Image-Video_Recognition_ICCVW_2025_paper.htmlView
url
https://doi.org/10.1109/ICCVW69036.2025.00667View

Details

Metrics

1 Record Views
Logo image