Abstract
The ability to understand human actions, plans, and goals in real environments will be an important requirement for future robotic systems to interact with humans in intuitive and safe ways. Indeed, we humans make use of this skill constantly in our daily lives when interacting with or observing others, which allows us to have more efficient communications and interactions. Consequently, if we want to achieve natural and intuitive human-robot interactions, it seems necessary to develop systems able to recognize actions and plans at a close-to-human performance. However, current computational systems are still far from the remarkably good human performance in this task, especially when it comes to real open environments, where the number of actions and plans that can be observed seems unbounded. A possible approach to overcome these limitations is to try to understand and mimic some of the main brain mechanisms that allow us humans to perform so well in this task. This thesis explores this direction by bringing ideas from the structure and function of the neocortex to the well-established fields of machine learning and neural networks. The resultant system is a hierarchical architecture that learns representations of the input temporal data at different levels of abstraction in a self-supervised way. Once the representations have been learned, classifiers can be trained on top of them that are simpler, perform better, and require less training samples than those trained directly over the input, making the system better adapted to open environments. The system has been shown to outperform other state-of-the-art self-supervised representation learning systems for temporal data in this task on different datasets, both in terms of the accuracies and F1 scores achieved and in terms of the number of training samples required. Its internal behavior has also been compared against that of the primary visual cortex and shown to be analogous, validating it as a computational model of this (and possibly other) areas of the neocortex. In addition, the architecture was designed to be easily adaptable to other applications besides classification, including fully unsupervised applications. Following the same brain-inspired approach, it has already been extended to action and goal prediction and action selection tasks.