Abstract
Autonomous control of aircraft is a challenging high-dimensional continuous control problem with applications in Unmanned Aerial Vehicles, autopilot systems and flight simulators. The problem domain appears well-suited to reinforcement learning (RL), a machine learning subfield which implements agents capable of learning from interactions with an environment. Recent advances in the application of deep neural networks to RL have allowed agents to perform well in increasingly complex tasks, including continuous control tasks.
The problem of heading and altitude control of fixed-wing aircraft is formulated in the RL framework as a Markov decision process. A new software package implementing flight control environments is developed by integrating the JSBSim flight dynamics model. The resulting software package, Gym-JSBSim, provides configurable and fast flight control environments with 3Dvisualisation. Gym-JSBSim conforms to the OpenAI Gym interface and is published under an open source license.
A series of experiments evaluating the performance of deep RL agents using the proximal policy optimisation algorithm is then conducted. The results demonstrate the agents are able to learn effective control policies for maintaining a target altitude and heading by directly adjusting control surface positions with continuous actions. Agents perform less well in a more complex flight environment which requires the aircraft to be turned, and further work concentrating on improved action exploration is identified.
The problem of heading and altitude control of fixed-wing aircraft is formulated in the RL framework as a Markov decision process. A new software package implementing flight control environments is developed by integrating the JSBSim flight dynamics model. The resulting software package, Gym-JSBSim, provides configurable and fast flight control environments with 3Dvisualisation. Gym-JSBSim conforms to the OpenAI Gym interface and is published under an open source license.
A series of experiments evaluating the performance of deep RL agents using the proximal policy optimisation algorithm is then conducted. The results demonstrate the agents are able to learn effective control policies for maintaining a target altitude and heading by directly adjusting control surface positions with continuous actions. Agents perform less well in a more complex flight environment which requires the aircraft to be turned, and further work concentrating on improved action exploration is identified.
Original language | English |
---|---|
Number of pages | 88 |
Publication status | Published - Sept 2018 |
Publication series
Name | Department of Computer Science Technical Report Series |
---|---|
ISSN (Electronic) | 1740-9497 |