Testing ChatGPT for Stability and Reasoning: A Case Study Using Italian Medical Specialty Tests

S Casola; Tiziano Labruna; A Lavelli; B Magnini

Back

Testing ChatGPT for Stability and Reasoning: A Case Study Using Italian Medical Specialty Tests

Conference proceeding

Open access

Peer reviewed

Testing ChatGPT for Stability and Reasoning: A Case Study Using Italian Medical Specialty Tests

S Casola, Tiziano Labruna, A Lavelli and B Magnini

Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023), Vol.3596, pp.113-119

CEUR Workshop Proceedings, 3596

9th Italian Conference on Computational Linguistics, CLiC-it 2023 (Venice, 30/11/2023–02/12/2023)

2023

Handle:

https://hdl.handle.net/10863/51391

Abstract

Large language models

ChatGPT

Stability

Although large language models (LLMs) are achieving impressive performance under zero- and few-shot learning configurations, their reasoning capacities are still poorly understood. As a step in this direction, we present several experiments on multiple-choice question answering, a setting that allows us to evaluate the stability of the model under different prompting, the capacity to understand when none of the provided answers is correct, and to reason on specific answering strategies (e.g., recursively eliminate the worst answer). We use the Italian medical specialty tests yearly administered to admit medical doctors to specialties. Results show that a gpt-3.5-turbo model achieves excellent performance in the absolute score (an average of 108 out of 140) while still suffering in certain reasoning capacities, particularly in failing to understand when none of the provided answers is correct.

Files and links (2)

pdf

2-s2.0-851811743741,018.93 kBDownload View

Open Access

url

urn:nbn:de:0074-3596-0 View

Details

Title: Testing ChatGPT for Stability and Reasoning: A Case Study Using Italian Medical Specialty Tests
Creators: S Casola
Tiziano Labruna
A Lavelli
B Magnini
Publication Details: Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023), Vol.3596, pp.113-119
Editor(s): Boschetti F, Lebani GE, Magnini B, Novielli N
ISBN: 9791255000846
ISSN: 1613-0073
Conference: 9th Italian Conference on Computational Linguistics, CLiC-it 2023 (Venice, 30/11/2023–02/12/2023)
Series / Volume: CEUR Workshop Proceedings
3596
Publisher: CEUR Workshop Proceedings
Number of pages: 7
Identifiers: 979-12-550-0084-6
(UNIBZ)94006818
991007295850001241
Scopus ID: 2-s2.0-85181174374
Copyright: Open Access: published under the Creative Commons License Attribution 4.0 International (CC BY 4.0).
Academic Unit: Faculty of Engineering
Language: English
Resource Type: Conference proceeding
Author Names String: Casola S, Labruna T, Lavelli A, Magnini B
Additional Description: Editors/Supervisors: Boschetti F, Lebani GE, Magnini B, Novielli N

Metrics

1 Record Views