Multi-Agent Reinforcement Learning for Joint User Association and Resource Allocation in Heterogeneous Cellular Networks

Pothula Pavan Kumar Reddy; C. Kamalanathan

doi:10.51983/ijiss-2026.16.2.23

Authors

Pothula Pavan Kumar Reddy
Dr.C. Kamalanathan

DOI:

https://doi.org/10.51983/ijiss-2026.16.2.23

Keywords:

Multi-Agent Reinforcement Learning (MARL), User Association (UA), Resource Allocation (RA), Energy Efficiency, Heterogeneous Cellular Networks (HetNet’s), Reinforcement Learning (RL), Throughput, Fairness

Abstract

The proposed model, MARL-UA-RA (Multi-Agent Reinforcement Learning for User Association and Resource Allocation), is designed to overcome the challenges of heterogeneous cellular networks (HetNets) by leveraging Multi-Agent Reinforcement Learning (MARL) to optimize resource allocation and improve energy efficiency. The proposed framework models each base station as an autonomous cooperative agent that perceives strong local state information involving user signal quality metrics, neighbor cell information, and temporal network features, and jointly optimizes a global network objective via a monotonic QMIX mixing network. The model uses a dual-headed Q-network structure with specific association and resource allocation heads, allowing for joint per-user handover and resource level allocation, subject to an ε-greedy exploration policy with experience replay. User association satisfies the handover margin constraint of 2.5 dB and the maximum cell load of 0.85, while resource allocation dynamically adjusts user throughput and spectral efficiency using a proportional resource fraction approach. A composite reward function incorporating spectral efficiency, signal strength, load balancing, handover cost, joint SINR capacity, and global load fairness facilitates the cooperative learning process. Comparative analysis with D3QN, DQN, Q-Learning, Genetic Algorithm, and MRSP-based approaches as baselines clearly shows that the proposed MARL-UA-RA-QMIX outperforms all in terms of achieving the maximum Average System Capacity of 75.70 and Average Network Utility of 34.39, together with a near-optimal fairness index of 0.99 and minimum load imbalance of 0.5132, thus establishing its superiority in jointly optimizing network capacity, spectral efficiency, energy efficiency, and fairness in dynamic heterogeneous cellular networks. The main benefits of the proposed model are its capability to expand to large networks, better energy efficiency, and the possibility to manage the dynamic conditions of the network, and it is well applicable to real-time use in 5G and other applications.