Abstract
Background. Optical Flow estimation is a computer vision task referring to the production of flow fields, the per pixel correspondences between two (or more) video frames. It is an important but challenging task enabling various applications, such as, video coding, tracking, pose estimation, structure-from-motion, to cite a few. Recently, the introduction of Deep Learning models for optical flow estimation has resulted in a veritable paradigm shift of the field. Deep Learning models considerably improved the accuracy in challenging conditions. However, such approaches can be biased towards the specific training data and training metric.
Motivation. This thesis is motivated by the observation of underperformance for the state-of-the-art deep learning optical flow models when used on real data. Specifically, a different accuracy of the produced flow fields depending on the direction of motion has been observed. We identified this problem as the model lack of geometric equivariance, i.e. the model capability of performing equally when different geometric transformations are applied to the inputs.
Problem statement. The goal of this thesis is to 1) analyze to which extent the stateof-the-art optical flow estimators lack equivariance when reflections are applied to the input data and 2) to propose strategies to mitigate the identified equivariance lack. We observed a considerable estimation difference depending on the direction of motion when applying reflections to the testing data and called this phenomenon sign imbalance, a specific lack of equivariance.
Approach. This thesis develops a methodology and metric to measure the extent of sign imbalance for any optical flow estimator, focusing on deep learning based estimators. A framework using transformations changing the orientation of the data is proposed as a methodology to assess the sign imbalance. A metric to measure the sign imbalance is provided; the metric does not require data with groundtruth optical flow. To mitigate the sign imbalance, we integrate the proposed metric during training as auxiliary loss. A comprehensive ablation study is provided to evaluate the effects of models, aggregation metrics, hyperparameter tuning, training data, and fine tuning strategies.
Results. The analyzed state-of-the-art deep learning techniques present a substantial degree of bias towards certain directions of motion, which are as severe in magnitude as the endpoint error, the most commonly used optical flow evaluation metric. Moreover, the mirroring data augmentation stage has only a marginal effect on imbalance mitigation. Instead, results show a dramatic sign imbalance reduction and an overall better optical flow quality when the proposed metric is applied to the state of the art optical flow estimators during training.
Conclusions. The proposed framework and training scheme can help researchers and practitioners to develop robust models for optical flow estimation. The dataset generation scripts and the testing and training code will be available online. The code can be easily extended to different datasets, metrics and models or also to test other biases affecting the optical flow estimators.