Abstract
In this work, we present a new method to tune a Model Predictive Controller (MPC) with the help of a Reinforcement Learning (RL) algorithm called Double Q-Learning. In this algorithm, two function approximators with different sets of parameters are trained simultaneously. First, the nonlinear MPC is parametrized in the weights of its cost function and unknown parameters of its equality and inequality constraints. Then, it is defined as the action-value function of the Double Q-Learning algorithm. By randomly switching between two sets of parameters in the MPC, we show that the exploration of the proposed algorithm increases. Since model error terms are added to the baseline stage cost, thanks to more exploration, less model mismatch is obtained. With this, less bias in the MPC controller is achieved compared to an MPC-based Q-Learning algorithm. Simulation results on a coupled tanks system show that not only the training process resulted to be faster than observed for the MPC-based Q-Learning method, but also the final control performance was found to be more desirable.