Adaptive Policy Switching for Efficient Multi-Robot Coordination Using Reinforcement Learning

Nadour, Mohamed; Cherroun, Lakhmissi; Tibermacine, Imad Eddine; Rabehi, Abdelaziz; Ma'arif, Alfian

Adaptive Policy Switching for Efficient Multi-Robot Coordination Using Reinforcement Learning

Nadour, Mohamed and Cherroun, Lakhmissi and Tibermacine, Imad Eddine and Rabehi, Abdelaziz and Ma'arif, Alfian (2025) Adaptive Policy Switching for Efficient Multi-Robot Coordination Using Reinforcement Learning. International Journal of Robotics and Control Systems, 5 (6). pp. 3350-3375.

Text
2256-8195-2-PB.pdf - Published Version
Download (2MB)

Official URL: https://pubs2.ascee.org/index.php/IJRCS/article/vi...

Abstract

Multi-robot systems operating in diverse environments require coordination strategies that balance efficiency and safety. This paper presents an adaptive framework combining heuristic planning and learning-based control to achieve that balance. The proposed system dynamically switches between a classical heuristic controller and a Q-learning-based policy according to real-time obstacle density, enabling context-aware adaptation to varying environmental complexity. The framework was evaluated in three representative scenarios of increasing difficulty, including a single robot with one task in an obstacle-free environment, a moderate case with three robots and five tasks among eight obstacles, and a complex case with five robots managing eight tasks amid fifteen obstacles. Performance was analyzed using several metrics such as task completion time, near-miss frequency, operational efficiency, and energy consumption. Results show that while the baseline policy performs best in sparse environments, the reinforcement-learning policy achieves faster completion in dense ones, though this comes at the cost of an increased frequency of near-misses due to its efficiency-driven behavior. The adaptive method effectively reconciles this trade-off, reducing near-misses by 25–40 % while maintaining competitive completion times and minimal energy usage. These findings demonstrate that adaptive policy selection provides robust, context-sensitive coordination across heterogeneous environments and can support missions in logistics, exploration, and disaster-response robotics, autonomously optimizing safety and performance according to real-time conditions.

Item Type:	Article
Subjects:	T Technology > TK Electrical engineering. Electronics Nuclear engineering
Depositing User:	IJRCS ASCEE
Date Deposited:	29 Apr 2026 12:26
Last Modified:	29 Apr 2026 12:26
URI:	https://alxiv.org/id/eprint/241

Actions (login required)

: View Item