SkillFormer: Unified Multi-View Video Understanding for Proficiency Estimation

Edoardo Bianchi; Antonio Liotta

doi:10.1117/12.3093974

Back

SkillFormer: Unified Multi-View Video Understanding for Proficiency Estimation

Conference proceeding

Peer reviewed

SkillFormer: Unified Multi-View Video Understanding for Proficiency Estimation

Edoardo Bianchi and Antonio Liotta

Eighteenth International Conference on Machine Vision (ICMV 2025), Vol.14114

SPIE - The International Society for Optical Engineering, 14114

International Conference on Machine Vision (Paris, 19/10/2025–22/10/2025)

2026

DOI: https://doi.org/10.1117/12.3093974

Handle:

https://hdl.handle.net/10863/51495

Abstract

Action Quality Assessment

Video Understanding

Proficiency Estimation

Assessing human skill levels in complex activities is a challenging problem with applications in sports, rehabilitation, and training. In this work, we present SkillFormer, a parameter-efficient architecture for unified multi-view proficiency estimation from egocentric and exocentric videos. Building on the TimeSformer backbone, SkillFormer introduces a CrossViewFusion module that fuses view-specific features using multi-head cross-attention, learnable gating, and adaptive self-calibration. We leverage Low-Rank Adaptation to fine-tune only a small subset of parameters, significantly reducing training costs. In fact, when evaluated on the EgoExo4D dataset, SkillFormer achieves state-of-the-art accuracy in multi-view settings while demonstrating remarkable computational efficiency, using 4.5x fewer trainable parameters and requiring 3.75x fewer training epochs than prior baselines. It excels in multiple structured tasks, confirming the value of multi-view integration for fine-grained skill assessment.

Files and links (1)

url

https://doi.org/10.1117/12.3093974View

Details

Title: SkillFormer: Unified Multi-View Video Understanding for Proficiency Estimation
Creators: Edoardo Bianchi - Free University of Bozen-Bolzano
Antonio Liotta - Free University of Bozen-Bolzano
Publication Details: Eighteenth International Conference on Machine Vision (ICMV 2025), Vol.14114
Editor(s): Osten W, Mamut E
ISBN: 9798902321873
EISBN: 9798902321880
ISSN: 0277-786X
Conference: International Conference on Machine Vision (Paris, 19/10/2025–22/10/2025)
Series / Volume: SPIE - The International Society for Optical Engineering
14114
Publisher: SPIE
Washington
Identifiers: 9798902321873
(UNIBZ)94280969
991007295439501241
Scopus ID: 2-s2.0-105032956766
Academic Unit: Faculty of Engineering
Language: English
Resource Type: Conference proceeding
Author Names String: Bianchi E, Liotta A
Additional Description: Editors/Supervisors: Osten W, Mamut E

Metrics

1 Record Views