Guiding Video Prediction with Explicit Procedural Knowledge

发布于 2024-06-27  5 次阅读


AI 摘要

根据这篇文章的内容,作者提出了一种将领域的程序化知识整合到深度学习模型中的通用方式。作者将其应用于视频预测的情况,基于面向对象的深度模型,并表明这比仅使用数据驱动模型的表现要好。他们开发了一种架构,促进潜在空间的解缠,以利用整合的程序化知识,并建立了一个设置,允许模型在潜在空间中学习程序化接口,用于视频预测的下游任务。作者将模型的性能与最先进的数据驱动方法进行对比,并表明纯数据驱动方法难以处理的问题可以通过使用关于领域的知识来解决,提供了一个简单收集更多数据的替代方案。

[PDF] [Site] [Kimi]

We propose a general way to integrate procedural knowledge of a domain into deep learning models. We apply it to the case of video prediction, building on top of object-centric deep models and show that this leads to a better performance than using data-driven models alone. We develop an architecture that facilitates latent space disentanglement in order to use the integrated procedural knowledge, and establish a setup that allows the model to learn the procedural interface in the latent space using the downstream task of video prediction. We contrast the performance to a state-of-the-art data-driven approach and show that problems where purely data-driven approaches struggle can be handled by using knowledge about the domain, providing an alternative to simply collecting more data.

Hello
最后更新于 2024-08-02