To enable embodied agents to assist and operate effectively over extended timeframes, it is crucial to develop models capable of forming and accessing memories to remain contextualized in an environment. In the current paradigm of training …
Fine-tuning vision-language foundation models has emerged as a powerful approach to leveraging internet-scale data for generalization in downstream applications. A particularly promising source of representations already used in supervised learning …
This paper introduces a novel method for enhancing the effectiveness of the Asynchronous Advantage Actor-Critic (A3C) algorithm by incorporating state-aware exploration. We achieve this improvement through three simple yet impactful modifications (1) …