Attention to Action: Leveraging Attention for Object Navigation [conference paper]


The 32nd British Machine Vision Conference (BMVC) - November 22-25, 2021


Shi Chen (Ph.D. student), Qi Zhao (associate professor)


Navigation towards different objects is prevalent in daily lives. State-of-the-art embodied vision methods accomplish the task by implicitly learning the relationship between perception and action or optimizing them with separate objectives. While effective in some cases, they have not yet developed (1) a tight integration of perception and action, and (2) the capability to address visual variance that is significant in the moving and embodied setting. To close these research gaps, we introduce a new attention mechanism, which represents the pursuit of visual information that highlights the potential directions of final targets. Instead of working conventionally as a weighted map for aggregating visual features, the new attention is defined as a compact intermediate state connecting visual observations and action. It is explicitly coupled with action to enable a joint optimization through a consistent action space, and also plays an importance role in learning features more robust against visual variance. Our experiments show significant improvements in navigation across various types of unseen environments with known and unknown semantics. Ablation analyses indicate that the proposed method correlates attention patterns with the directions of action, and overcomes visual variance by distilling useful information from visual observations into attention distribution.

Link to full paper

Attention to Action: Leveraging Attention for Object Navigation


computer vision