A study of wireless communications with reinforcement learning
Time: Tue 2022-06-14 14.00
Location: F3, Lindstedtsvägen 26 & 28, Stockholm
Subject area: Electrical Engineering
Doctoral student: Wanlu Lei , Teknisk informationsvetenskap
Opponent: Professor Geoffrey Ye Li, Imperial College, London
Supervisor: Ming Xiao, Teknisk informationsvetenskap; Chenguang Lu, ; Mikael Skoglund, Teknisk informationsvetenskap
The explosive proliferation of mobile users and wireless data traffic in recent years pose imminent challenges upon wireless system design. The trendfor wireless communications becoming more complicated, decentralized andintelligent is inevitable. Lots of key issues in this field are decision-makingrelated problems such as resource allocation, transmission control, intelligentbeam tracking in millimeter Wave (mmWave) systems and so on. Reinforcement learning (RL) was once a languishing field of AI for solving varioussequential decision-making problems. However, it got revived in the late 80sand early 90s when it was connected to dynamic programming (DP). Then,recently RL has progressed in many applications, especially when underliningmodels do not have explicit mathematical solutions and simulations must beused. For instance, the success of RL in AlphaGo and AlphaZero motivatedlots of recent research activities in RL from both academia and industries.Moreover, since computation power has dramatically increased within thelast decade, the methods of simulations and online learning (planning) become feasible for implementations and deployment of RL. Despite of its potentials, the applications of RL to wireless communications are still far frommature. Therefore, it is of great interest to investigate RL-based methodsand algorithms to adapt to different wireless communication scenarios. Morespecifically, this thesis with regards to RL in wireless communications can beroughly divided into the following parts:In the first part of the thesis, we develop a framework based on deepRL (DRL) to solve the spectrum allocation problem in the emerging integrated access and backhaul (IAB) architecture with large scale deploymentand dynamic environment. We propose to use the latest DRL method by integrating an actor-critic spectrum allocation (ACSA) scheme and a deep neuralnetwork (DNN) to achieve real-time spectrum allocation in different scenarios. The proposed methods are evaluated through numerical simulations andshow promising results compared with some baseline allocation policies.In the second part of the thesis, we investigate the decentralized RL algorithms using Alternating direction method of multipliers (ADMM) in applications of Edge IoT. For RL in a decentralized setup, edge nodes (agents)connected through a communication network aim to work collaboratively tofind a policy to optimize the global reward as the sum of local rewards. However, communication costs, scalability and adaptation in complex environments with heterogeneous agents may significantly limit the performance ofdecentralized RL. ADMM has a structure that allows for decentralized implementation and has shown faster convergence than gradient-descent-basedmethods. Therefore, we propose an adaptive stochastic incremental ADMM(asI-ADMM) algorithm and apply the asI-ADMM to decentralized RL withedge computing-empowered IoT networks. We provide convergence properties for proposed algorithms by designing a Lyapunov function and prove thatthe asI-ADMM has O(1=k) + O(1=M) convergence rate where k and M are thenumber of iterations and batch samples, respectively.The third part of the thesis considers the problem of joint beam training and data transmission control of delay-sensitive communications overvimmWave channels. We formulate the problem as a constrained Markov Decision Process (MDP), which aims to minimize the cumulative energy consumption over the whole considered period of time under delay constraints.By introducing a Lagrange multiplier, we reformulate the constrained MDPto an unconstrained one. Then, we solve it using the parallel-rollout-basedRL method in a data-driven manner. Our numerical results demonstrate thatthe optimized policy obtained from parallel rollout significantly outperformsother baseline policies in both energy consumption and delay performance.The final part of the thesis is a further study of the beam tracking problem using supervised learning approach. Due to computation and delay limitation in real deployment, a light-weight algorithm is desired in the beamtracking problem in mmWave networks. We formulate the beam tracking(beam sweeping) problem as a binary-classification problem, and investigatesupervised learning methods for the solution. The methods are tested in bothsimulation scenarios, i.e., ray-tracing model, and real testing data with Ericsson over-the-air (OTA) dataset. It showed that the proposed methods cansignificantly improve cell capacity and reduce overhead consumption whenthe number of UEs increases in the network.