Abstract
Spectral analysis of photovoltaic (PV) modules has proven to be a valuable tool for as- sessing module health, performance, and degradation over time. Among these techniques, electroluminescence (EL) imaging has gained significant momentum due to its ability to reveal microscopic defects that are not visible under standard visual inspections. In this work, we present a novel dataset designed for Visual Question Answering (VQA) mod- els, specifically developed to enhance the automated analysis of PV modules using EL imaging. The dataset includes a diverse collection of module-level EL images, manually anno- tated with component-level details, defect classifications, and additional insightful meta- data about the modules. Annotations consist of bounding boxes for various module com- ponents and qualitative interpretation of anomalies following Annex D of IEC 60904-13 60904-13 (2018) and beyond. Additionally, we provide a comprehensive set of both quan- titative and qualitative questions, ensuring that the dataset serves as a robust training resource for VQA models. The dataset is planned to include approximately 1,000 module-level images, covering a diverse range of technologies, cell counts per module, and manufacturers. All images are carefully annotated by expert human annotators based on a predefined list of defects, components, and questions to ensure consistency. Some modules are defect-free, so no defects were annotated, while others are heavily damaged, with up to 100 defects anno- tated. On average, the dataset contains approximately six annotations per module. A set of 10 questions were defined for all modules, designed to answer fundamental questions relating to qualitative and quantitative aspects of the health and state of PV modules based on their corresponding EL image. In addition to the dataset, we introduce novel model architectures specifically de- signed for training on this dataset, alongside recommendations for fine-tuning existing multimodal foundation models for EL image analysis. By integrating expert-driven annotations and domain-specific question-answer pairs, this dataset will serve as a strong foundation for researchers to develop multimodal models for EL image analysis. Beyond assisting both researchers and industry in tasks for defect detection and classification, it enhances interpretability by allowing AI models to respond to domain-specific queries.