Abstract
For analysis of PV performance degradation, sufficiently long timeseries of measurements of PV performance and irradiance are required. Since the performance of PV systems shows clear seasonal variation and the seasonal trend needs to be separated from the long-term trend, at least monthly data is required, and large gaps in timeseries hamper the ability to perform degradation analyses. For short gaps in timeseries simple approaches like interpolation can be applied for imputation of missing values, however, for longer gaps more complex methods are required. In this paper, we analyse the imputation of missing plane-of-array (GPOA) irradiance values. We used a dataset with measurements of GPOA, global horizontal (GHI), diffuse horizontal (DH) and direct normal (DNI) irradiance to train several machine learning and irradiance transposition models to predict plane-of-array irradiance from the other irradiance measurements. To optimize the model hyperparameters, we performed an hyperparameter optimisation using a sample of the complete dataset. We then compared the models for accuracy by calculating the root-mean-squared-error of predictions made with a testing dataset. The three most accurate models were then reoptimised with the full dataset. Finally, the most accurate model was determined, and the missing GPOA irradiance values were estimated. Our results show that in this application, the machine learning models yield far more accurate results than the irradiance transposition models (RMSE of ~30-40 W/m2 vs. ~90-100 W/m2, respectively). The most accurate results were obtained with gradient boosting regression models, with and RMSE of 28.1 W/m2.