本文已被:浏览 71次 下载 34次
中文摘要: 针对多无人机编队飞行控制问题,本文提出了一种基于强化学习的解决方案。该方案融合了课程学习思想、Leader-Follower模型和近端策略优化方法。首先,基于课程学习思想,将复杂的编队控制任务分解为两个学习阶段。在第一阶段,利用近端策略优化方法对Leader无人机进行训练,使其能够沿着预设轨迹飞行。随后进入第二阶段,对Follower无人机进行训练。此时Leader无人机的控制策略固定为第一阶段所得的神经网络,并将Leader的延时位置作为Follower的跟踪目标。根据部分可观测信息,设计了奖励函数,以引导无人机在编队飞行中保持稳定的线性队形。为了验证所提方法的有效性,在Unity环境中进行了四机编队空中立体8字飞行仿真。结果表明,相较于传统控制方法,本方法在无需建立精确数学模型的前提下,通过相对简单的训练过程,就能使智能体在与环境的交互中学习到有效策略。这一成果简化了编队控制的复杂性,为无人机编队控制提供了一种新的解决方案。
Abstract:This paper proposes a reinforcement learning-based method to address the flight control problem for multi-UAV formation. The proposed approach integrates curriculum learning, the leader-follower model, and the proximal policy optimization (PPO) method. Firstly, based on curriculum learning, the complex formation control task is decomposed into two learning stages. In the first stage, the leader UAV is trained using the PPO method to fly along a preset trajectory. Subsequently, the follower UAVs are trained in the second stage. During this stage, the control policy of the leader UAV is fixed as the neural network obtained in the first stage, and the delayed position of the leader is used as the tracking target for the followers. Based on partially observable information, the reward function was elaborately designed to guide the UAVs to maintain a stable linear formation during flight. To validate the effectiveness of the proposed method, we conducted simulations of a four-UAV formation performing a complex “8” shaped flight pattern in a three-dimensional space using the Unity software. The results demonstrate that, compared to traditional control methods, our approach enables agents to learn effective strategies through interaction with the environment via a relatively simple training process, without the need to establish precise mathematical models. This method simplifies the complexity of formation control and provides a novel solution for UAV formation control.
keywords: multi-agent reinforcement learning formation control proximal policy optimization curriculum learning leader-follower model UAV control
文章编号:20250202 中图分类号:TP273 文献标志码:
基金项目:国家自然科学基金(62225308, 62003188)
引用文本:
吴婷,叶林奇,杨君,芦维宁.基于强化学习的多无人机避障编队控制[J].飞控与探测,2025,8(2):09-17.
吴婷,叶林奇,杨君,芦维宁.基于强化学习的多无人机避障编队控制[J].飞控与探测,2025,8(2):09-17.