Abstract
Stratospheric balloons offer a unique and cost-effective platform for a wide range of use cases, for example, they serve as a means of deploying networks in areas which lack infrastructure, low-earth observations, and atmospheric analysis. Specifically, high-altitude balloons provide an advantage over conventional unmanned aerial vehicles due to extended flight times. One key navigation strategy that supports these use cases is station-keeping, which involves maintaining the balloon within a desired region.The widespread adoption of balloon-based systems is hindered by several challenges, particularly the high cost of commonly used stratospheric balloons. To democratise access to this technology, it is essential to improve the viability of low-cost alternatives.
Reinforcement Learning controllers, commonly used for the task of station-keeping, are computationally expensive to train, often requiring weeks of simulation time. Additionally, longer training times affect the ability to rapid re-train in response to updated weather forecasts, limiting generalisation. This is addressed through the use of a difficulty measure based on a Lagrangian drift model to approximate the value function and filter seeded environments, which is shown to reduce training time. Furthermore, in comparison to current forecast scores, the proposed difficulty measure exhibits an approximately monotonic relationship with task difficulty, making it a more reliable difficulty indicator.
Station-keeping for low-cost alternatives, such as latex balloons, remains a challenge, partly due to limited resources and complex dynamics with no constrained volume. We build on this research and train a reinforcement learning controller to show these types of balloons can perform equally compared with current state-of-the-art super-pressure balloons while minimising venting and ballasting actions, framing the problem as a continuous control problem, incorporating transparency into the action space and finally incorporating constraints into the action space to prevent exploitation. Furthermore, reward shaping is investigated by proposing a reward function that biases the center to prevent reward hacking as a result of the target geometry, flying around the target nets higher reward than flying through it, in which policies perform worse in unseen wind fields.
Finally, International regulators require balloon operators to submit a pre-flight notification to air traffic services days in advance. Although temporal factors such as seasonality have been well studied, spatial variability in launch position remains unexplored. It is shown that incorporating aleatoric noise through the use of ensemble wind forecasts improves optimal launch positions over forecast lead times from 24 hours up to 72 hours.
| Date of Award | 10 Dec 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Wenbin Li (Supervisor), Alan Hunter (Supervisor) & Özgür Şimşek (Supervisor) |