Abstract
Accurate decomposition of all sky hourly global horizontal irradiance (GHI) into direct normal irradiance (DNI) and diffuse horizontal irradiance (DHI) is essential for photovoltaic (PV) modeling and building simulations. However, traditional models like DIRINDEX (Direct Index Model), DIRINT (Direct Interpolated Model), DISC (Direct Insolation Simulation Code), and ERBS (Erbs Model) struggle to generalize across climates. This study presents RF-SR (Random Forest - Symbolic Regression), a hybrid approach trained on hourly ground-measured irradiances data from Bolzano, Italy. RF-SR achieved high accuracy (coefficient of determination R2 = 0.934 for DNI, R2 = 0.946 for DHI, root mean square error RMSE ≈ 20-21 W/m2 energy deviation ratio EDR = 0.117) and generalized across several Köppen-Geiger climate zones. Unlike traditional Machine Learning (ML) models, RF-SR produces explicit mathematical expressions, enhancing interpretability and direct application in energy modeling, such as for the PV system simulated in this work. In the studied PV configuration, RF-SR significantly reduced cumulative absolute error (CAE), from 2945.0 kWh (DIRINDEX) to 1250.0 kWh (RF-SR) in continental climates. Future work will validate RF-SR on diverse on-site measured datasets to assess the need for site-specific calibration and improve generalization.