Dissertation Title: Bridging Visual Perception and Reasoning: A Visual Attention Perspective

Doctoral Candidate: Shi Chen

Faculty Advisor: Dr. Catherine Qi Zhao

Dissertation Title: Bridging Visual Perception and Reasoning: A Visual Attention Perspective

Defense Date and Time: May 24th, 10 AM - 12 PM, Wed

Abstract:

One of the fundamental goals of Artificial Intelligence (AI) is to develop visual systems that can reason with the complexity of the world. Advances in machine learning have revolutionized many fields in computer vision, achieving human-level performance among several benchmark tasks and industrial applications. While the performance gap between machines and humans seems to be closing, the recent debates on the discrepancies between machine and human intelligence have also received a considerable amount of attention. These contradictory observations strike the very heart of AI research, and bring attention to the question: How can AI systems understand the comprehensive range of visual concepts and reason with them to accomplish various real-life tasks, as we do on a daily basis?

Humans learn much from little. With just a few relevant experiences, we are able to adapt to different situations. We also take advantage of inductive biases that can easily generalize, and avoid distraction from all kinds of statistical biases. The innate generalizability is a result of not only our profound understanding of the world but also the ways we perceive and reason with visual information. For instance, unlike machines that develop holistic understanding by scanning through the whole visual scene, humans prioritize their attention with a sequence of eye fixations. Guided by visual stimuli and the structured reasoning process, we progressively locate the regions of interest, and understand their semantic relationships as well as connections to the overall task. Research on humans' visual behavior can provide abundant insights into the development of vision models, and have the potential of contributing to AI systems that are practical for real-world scenarios.

With an overarching goal of building visual systems with human-like reasoning capability, we focus on understanding and enhancing the integration of visual perception and reasoning. We leverage visual attention as an interface for studying how humans and machines prioritize their focuses when performing visual reasoning tasks, and shed light on two important research questions: What roles does attention play in decision-making? How do we characterize attention in different scenarios?

We provide insights into these questions by making progress from three distinct perspectives: (1) From the visual perception perspective, we study how humans and machines allocate their attention when interacting with a variety of visual environments. We investigate the fine-grained characteristics of attention, which reveals the significance of different visual concepts and how they contribute to perception. (2) From the reasoning perspective, we pay attention to the connections between reasoning and visual perception, and develop vision models that make decisions in ways that agree with humans' reasoning procedures. (3) Humans not only capture and reason on important information with high accuracy, but can also justify their rationales with supporting evidence. We study the impacts of explainability in human-like intelligence and build generalizable and interpretable models. Our efforts provide an extensive collection of observations for demystifying the relationships between perception and reasoning, and offer insights into the development of trustworthy AI systems.

Start date
Wednesday, May 24, 2023, 10 a.m.
End date
Wednesday, May 24, 2023, Noon

Share