Abstract
Recent advances in reinforcement learning (RL) have opened promising opportunities for autonomous drone navigation. However, bridging RL-based methods with high-fidelity simulation environments remains an open challenge. In this paper, we introduce a novel multi-task approach for real-time RL-based autonomous drone navigation inside Unreal Engine 5. The idea is simple but effective: by using a physically accurate drone model and systematically increasing the complexity of simulated scenarios — from predefined path following to dynamic visual tracking with advanced sensing modalities — we achieve impressive generalization to multiple dynamically changing tasks. Our method addresses critical gaps in drone autonomy research, such as obstacle avoidance in swarm coordination and robust visual target tracking under varying environmental conditions. Through comprehensive experimental evaluation on pedestrian detection data in an urban scenario, we highlight the essential factors influencing drone learning performance. Our fine-tuned YOLOv8 model improves pedestrian detection recall from 72.41% to 84.71% on synthetic aerial images, demonstrating the effectiveness of drone-collected data for domain adaptation. https://mmlab-cv.github.io/DroneAgents/