Benchmarking Whisper Under Diverse Audio Transformations and Real-Time Constraints

Sergei Katkov; Antonio Liotta; Alessandro Vietti

doi:10.1007/978-3-031-77961-9_6

Back

Benchmarking Whisper Under Diverse Audio Transformations and Real-Time Constraints

Book chapter

Peer reviewed

Benchmarking Whisper Under Diverse Audio Transformations and Real-Time Constraints

Sergei Katkov, Antonio Liotta and Alessandro Vietti

Speech and Computer: SPECOM 2024, Vol.15299, pp.82-91

Lecture Notes in Computer Science, 15299, Springer Nature Switzerland

2025

DOI: https://doi.org/10.1007/978-3-031-77961-9_6

Handle:

https://hdl.handle.net/10863/45779

Abstract

Automatic speech recognition

The automatic speech recognition (ASR) domain has advanced considerably with the emergence of large transformer-based models, such as OpenAI’s Whisper. This paper presents an experimentalbased evaluation of the Whisper models, focusing on its performance under various acoustic conditions and input configurations. We specifically examine the effects of audio transformations such as white and Gaussian noise, reverberation, time stretch, and pitch shift, as well as the impact of varying chunk lengths. The findings suggest that while Whisper models are capable of dealing with minimal background noise and demonstrate commendable performance in clean audio conditions, their performance degrades rapidly when subjected to more severe audio transformations and noise, particularly when using shorter chunk lengths. This study contributes valuable insights into the Whisper model’s capabilities and limitations, particularly when it comes to real-time speech recognition, offering guidance for future improvements in ASR technology.

Files and links (1)

url

https://link.springer.com/10.1007/978-3-031-77961-9_6View

Details

Title: Benchmarking Whisper Under Diverse Audio Transformations and Real-Time Constraints
Creators: Sergei Katkov
Antonio Liotta
Alessandro Vietti
Publication Details: Speech and Computer: SPECOM 2024, Vol.15299, pp.82-91
Editor(s): Karpov A, Delić V
ISBN: 9783031779602
ISSN: 0302-9743
EISSN: 1611-3349
Series / Volume: Lecture Notes in Computer Science
15299
Publisher: Springer Nature Switzerland
Cham
Number of pages: 10
Identifiers: 978-3-031-77960-2
(UNIBZ)86253367
991006917324801241
Web of Science ID: WOS:001415332400006
Scopus ID: 2-s2.0-85210852475
Academic Unit: Faculty of Education
Faculty of Engineering
Language: English
Resource Type: Book chapter
Author Names String: Katkov S, Liotta A, Vietti A
Additional Description: Editors/Supervisors: Karpov A, Delić V

Metrics

1 Record Views