Abstract
Predictor-based Neural Architecture Search (NAS) proved to be a fundamental topic in the NAS domain, as they are used to narrow down the number of architectures for which the true validation accuracy must be computed. Prior works on predictor-based algorithms focus on a single proxy dataset, e.g. Cifar-10, which may suffer from accuracy decline and generalization problems. Some works deal with the improvement of the generalization abilities of predictors, however, none of them investigate the possibility of sharing and re-using the predictor knowledge between different datasets. We impute that one reason for such gap is the absence of NAS datasets with distribution shifts for a meaningful analysis. In this paper, we propose a new search space definition and introduce a dataset composed of architectures trained over four different datasets, i.e. Cifar-10, Fashion-MNIST, Cifar-100 and Tiny-ImageNET. We thoroughly analyze the statistics, the structural elements of successful networks, and the ranking correlation during training and between datasets. We highlight clear differences in well-performing networks and propose an early stopping technique to seep up predictor-based NAS algorithms.