Logo image
MLLMs Construction Company: Investigating Multimodal LLMs’ Communicative Skills in a Collaborative Building Task
Conference proceeding   Open access   Peer reviewed

MLLMs Construction Company: Investigating Multimodal LLMs’ Communicative Skills in a Collaborative Building Task

M Sarzotti, Giovanni Duca, C Madge, Raffaella Bernardi and Massimo Poesio
CLiC-it 2025: Eleventh Italian Conference on Computational Linguistics, Vol.4112
4112
Eleventh Italian Conference on Computational Linguistics (Cagliari, 24/09/2025–26/09/2025)
2025
Handle:
https://hdl.handle.net/10863/51711

Abstract

Communication dialogue Multimodality 3D understanding
How effective are the communication choices of Multimodal Large Language Models when pursuing a common goal? Can they make use of common human dialogical patterns? We address these questions by engaging two agents based on the Mistral model in a collaborative building task, where one has to instruct the other how to build a specific target structure. The aim of this work is to investigate whether different prompting techniques with varying degrees of multimodality can influence the performance of MLLM-based agents in the proposed task. Code and data available in the project’s GitHub repository.
pdf
2025.clicit-1.971.37 MBDownloadView
Open Access
url
https://api.elsevier.com/content/abstract/scopus_id/105034261961View

Details

Metrics

1 Record Views
Logo image