Abstract
Infrared spectroscopy techniques represent a convenient and non-disruptive way to rapidly collect vast amounts of data. Nowadays, these data are effectively used in a plethora of different fields such as medicine, astronomy and food science. Nonetheless, from a statistical viewpoint, they introduce some relevant challenges mainly concerning their high-dimensionality and the complex relationships among spectral variables (wavelengths), often due to convoluted chemical processes. In this framework, factor analysis represents a sensible strategy, as it aims to produce parsimonious representations of the data while focusing on the correlation structures. Nonetheless, its standard application does not account for redundancies in the features. Therefore, a modification of factor analysis is proposed, which maps the data into a lower dimensional latent space while simultaneously clustering the variables. A flexible Bayesian estimation procedure is then considered to fit the model. On one hand, this approach results in an even more parsimonious summary of the data, highlighting which wavelengths carry similar information. On the other hand, from an interpretative point of view, the obtained partition produces useful insights from a chemical standpoint. The method is applied on milk mid-infrared spectroscopy data from cows on different feeding regimens, providing a useful tool to guarantee milk authenticity.