The Inherence of Telicity: Unveiling Temporal Reasoning in Video Question Answering

O Loginova; Raffaella Bernardi

Back

The Inherence of Telicity: Unveiling Temporal Reasoning in Video Question Answering

Conference proceeding

Open access

Peer reviewed

The Inherence of Telicity: Unveiling Temporal Reasoning in Video Question Answering

O Loginova and Raffaella Bernardi

Proceedings of the 9th Italian Conference on Computational Linguistics [Venice, Italy, November 30 - December 2, 2023], Vol.3596, pp.1-5

CEUR Workshop Proceedings, 3596

9th Italian Conference on Computational Linguistics, CLiC-it 2023 (Venice)

2023

Handle:

https://hdl.handle.net/10863/46612

Abstract

Video question answering (VQA) requires models to understand video-related questions and generate natural language answers. In multiple-choice VQA, models must associate visual content with one of several predetermined answers. As videos often encompass intricate events and actions unfolding over time, these models must possess the ability to reason across multiple frames and discern the relationships between them with respect to the answers. This paper focuses on the Answerer component of a multiple-choice VQA model, which predicts answers using language-infused key frames. We hypothesise that the Answerer's capacity for temporal reasoning is closely intertwined with its understanding of aspectuality. To investigate this, we augment NeXT-QA, a VQA dataset for causal and temporal reasoning, with annotations for telicity. We then delve into the performance evaluation of SeViLA, a state-of-the-art multiple-choice VQA model, on it. Our findings demonstrate that the model generally exhibits correct handling of aspects, albeit with a bias that is inherent in human nature. © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

Files and links (3)

pdf

short161.69 MBDownload View

Open Access

url

https://ceur-ws.org/Vol-3596/View

url

https://ceur-ws.org/Vol-3596/short16.pdfView

Details

Title: The Inherence of Telicity: Unveiling Temporal Reasoning in Video Question Answering
Creators: O Loginova
Raffaella Bernardi
Publication Details: Proceedings of the 9th Italian Conference on Computational Linguistics [Venice, Italy, November 30 - December 2, 2023], Vol.3596, pp.1-5
ISSN: 1613-0073
Conference: 9th Italian Conference on Computational Linguistics, CLiC-it 2023 (Venice)
Series / Volume: CEUR Workshop Proceedings
3596
Publisher: CEUR-WS
Number of pages: 5
Identifiers: (UNIBZ)89000020
991007042754801241
Scopus ID: 2-s2.0-85181169591
Academic Unit: Faculty of Engineering
Language: English
Resource Type: Conference proceeding
Author Names String: Loginova O, Bernardi R

Metrics

1 Record Views