MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation

发布于 2024-06-27  1 次阅读


AI 摘要

这篇论文介绍了一种名为MAGIC(Meta-Ability Guided Interactive Chain-of-Distillation)的方法,旨在通过知识蒸馏来获取轻量级学生模型,从而克服大规模模型在机器人技术中的参数大小和计算需求过大的问题。具体而言,论文提出了Meta-Ability Knowledge Distillation(MAKD)框架,用于解耦和精炼VLN代理的必要元能力。该方法还引入了Meta-Knowledge Randomization Weighting(MKRW)和Meta-Knowledge Transferable Determination(MKTD)模块,以动态调整元能力和样本级别的聚合权重。论文还提出了一种交互式蒸馏学习策略,允许学生向老师提供反馈,从而形成新的多步老师-学生共同演化管道。实验结果表明,MAGIC-S模型仅使用了其老师模型大小的5%(11M),在R2R测试数据集的公共排行榜上表现优异,甚至优于相同训练数据下的所有先前方法。此外,MAGIC-L模型在SPL和SR方面也超过了先前的最新技术。另外,论文还介绍了一个从生活环境中收集和注释的新数据集,并展示了MAGIC-S在性能和实时效率方面的优越表现。论文作者公开了他们的代码,可在https://github.com/CrystalSixone/VLN-MAGIC上获取到。

[PDF] [Site] [Kimi]

Despite the remarkable developments of recent large models in Embodied Artificial Intelligence (E-AI), their integration into robotics is hampered by their excessive parameter sizes and computational demands. Towards the Vision-and-Language Navigation (VLN) task, a core task in E-AI, this paper reveals the great potential of using knowledge distillation for obtaining lightweight student models by proposing a Meta-Ability Guided Interactive Chain-of-distillation (MAGIC) method. Specifically, a Meta-Ability Knowledge Distillation (MAKD) framework is proposed for decoupling and refining the necessary meta-abilities of VLN agents. A Meta-Knowledge Randomization Weighting (MKRW) and a Meta-Knowledge Transferable Determination (MKTD) module are incorporated to dynamically adjust aggregation weights at the meta-ability and sample levels, respectively. Move beyond the traditional one-step unidirectional distillation, an Interactive Chain-of-Distillation (ICoD) learning strategy is proposed to allow students to give feedback to teachers, forming a new multi-step teacher-student co-evolution pipeline. Remarkably, on the R2R test unseen public leaderboard, our smallest model, MAGIC-S, with only 5% (11M) of the teacher's size, outperforms all previous methods under the same training data. Additionally, our largest model, MAGIC-L, surpasses the previous state-of-the-art by 5.84% in SPL and 3.18% in SR. Furthermore, a new dataset was collected and annotated from our living environments, where MAGIC-S demonstrated superior performance and real-time efficiency. Our code is publicly available on https://github.com/CrystalSixone/VLN-MAGIC.

Hello
最后更新于 2024-08-02