A Decentralized Multi-Agent Reinforcement Learning Framework for Cooperative UAV Search: Navigating Non-Stationarity via Reward Shaping

Authors

DOI:

https://doi.org/10.63944/e1p7kr68

Keywords:

Multi Agent Reinforcement Learning, Cooperative Search, Decentralized Partially Observable Markov Decision Process, Reward Shaping, Non-stationarity

Abstract

Due to the complex coupling between high-dimensional state spaces and the stringent constraints of local perception, collaborative search by multi UAV swarms in unknown environments poses a formidable challenge within the field of autonomous systems. Although early algorithmic attempts often assumed stationary targets or perfect communication networks, the inherent non-stationarity of real-world dynamic environments renders traditional independent learning paradigms highly inefficient. Sometimes even leading to complete divergence during training. In an effort to overcome these analytical bottlenecks, this paper explores a Decentralized Partially Observable Markov Decision Process framework and introduces specific Multi Agent Reinforcement Learning methods under a Centralized Training with Decentralized Execution architecture. To fully validate and elucidate these emergent collaborative behaviors, extensive verification in real-world physical environments, as well as further research specifically addressing communication-constrained settings.

Downloads

Published

2026-04-02

Issue

Section

Research Articles

Categories

How to Cite

A Decentralized Multi-Agent Reinforcement Learning Framework for Cooperative UAV Search: Navigating Non-Stationarity via Reward Shaping. (2026). International Journal of Computer Science and Engineering, 1(03), 101-109. https://doi.org/10.63944/e1p7kr68