Abstract
The effectiveness of automatic speech recognition (ASR) systems in environments with acoustic challenges directly influences their utility in a range of voice-activated applications. This paper focuses on an experimental analysis of the resilience of various ASR models to acoustic disturbances — specifically white noise, reverberation, time stretch, and pitch shift — within the context of the Italian language, a non-English and comparatively less-studied linguistic domain. The investigation reveals a notable degradation in performance across the board when models are subjected to these audio transformations. By focusing on Italian, this research contributes valuable insights into the challenges and opportunities in optimizing ASR technologies for languages with lower research exposure.