Memory

FindingDory: A Benchmark to Evaluate Memory in Embodied Agents

Large vision-language models have recently demonstrated impressive performance in planning and control tasks, driving interest in their application to real-world robotics. However, their deployment for reasoning in embodied scenarios is constrained …

Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning

To enable embodied agents to assist and operate effectively over extended timeframes, it is crucial to develop models capable of forming and accessing memories to remain contextualized in an environment. In the current paradigm of training …

Recurrent Attention-based Token Selection for Efficient Streaming Video-LLMs

Video Large Language Models (Video-LLMs) excel at understanding videos in-context, assuming full access to the video when answering queries. However, these models face challenges in streaming scenarios where hour-long videos must be processed online, …