Abstract
In this dissertation, we aim to combine model-based (Model Predictive Control) and model-free (Neural Network) action value functions for the control of robotic systems and their path planning. We begin with algorithms that primarily rely on model-based action value functions to reduce the number of episodes. An RL algorithm called Expected Sarsa is used to adjust the weights of the cost function of a Model Predictive Controller (MPC), while the unknown parameters of the MPC model are estimated using a gradient-based method. In the next step, we increase the role of RL by eliminating the gradient-based method for system identification and allowing the RL algorithm to also determine the parameters of the MPC model. However, since no system identification is employed, the model and system outputs are not necessarily the same, and steady-state errors may occur. To address the issue with the steady-state error, we employ another RL algorithm named Double Q-Learning, in which we switch between two MPCs, leading to increased exploration in the proposed algorithm. In order to further expand the model-free role and reduce real-time computational time, the target MPC is replaced by a Neural Network, which is updated online. Finally, an offline MPC-RL is introduced based on the Q-Learning method where after training the MPC in the simulation, it is implemented on a real robotic manipulator as a path planner in a human-robot interaction task. This dissertation demonstrates the potential of combining MPC as a model-based action-value function, either instead of or together with Neural Networks, in various value-based RL algorithms. The research journey undertaken here has shown that by employing the proposed algorithms, the number of required training episodes is reduced dramatically and the need for having an accurate model for the MPC is eliminated, while the MPC can be tuned optimally.