Past Events

Dissertation Title: Bridging Visual Perception and Reasoning: A Visual Attention Perspective

Doctoral Candidate: Shi Chen

Faculty Advisor: Dr. Catherine Qi Zhao

Dissertation Title: Bridging Visual Perception and Reasoning: A Visual Attention Perspective

Defense Date and Time: May 24th, 10 AM - 12 PM, Wed


One of the fundamental goals of Artificial Intelligence (AI) is to develop visual systems that can reason with the complexity of the world. Advances in machine learning have revolutionized many fields in computer vision, achieving human-level performance among several benchmark tasks and industrial applications. While the performance gap between machines and humans seems to be closing, the recent debates on the discrepancies between machine and human intelligence have also received a considerable amount of attention. These contradictory observations strike the very heart of AI research, and bring attention to the question: How can AI systems understand the comprehensive range of visual concepts and reason with them to accomplish various real-life tasks, as we do on a daily basis?

Humans learn much from little. With just a few relevant experiences, we are able to adapt to different situations. We also take advantage of inductive biases that can easily generalize, and avoid distraction from all kinds of statistical biases. The innate generalizability is a result of not only our profound understanding of the world but also the ways we perceive and reason with visual information. For instance, unlike machines that develop holistic understanding by scanning through the whole visual scene, humans prioritize their attention with a sequence of eye fixations. Guided by visual stimuli and the structured reasoning process, we progressively locate the regions of interest, and understand their semantic relationships as well as connections to the overall task. Research on humans' visual behavior can provide abundant insights into the development of vision models, and have the potential of contributing to AI systems that are practical for real-world scenarios.

With an overarching goal of building visual systems with human-like reasoning capability, we focus on understanding and enhancing the integration of visual perception and reasoning. We leverage visual attention as an interface for studying how humans and machines prioritize their focuses when performing visual reasoning tasks, and shed light on two important research questions: What roles does attention play in decision-making? How do we characterize attention in different scenarios?

We provide insights into these questions by making progress from three distinct perspectives: (1) From the visual perception perspective, we study how humans and machines allocate their attention when interacting with a variety of visual environments. We investigate the fine-grained characteristics of attention, which reveals the significance of different visual concepts and how they contribute to perception. (2) From the reasoning perspective, we pay attention to the connections between reasoning and visual perception, and develop vision models that make decisions in ways that agree with humans' reasoning procedures. (3) Humans not only capture and reason on important information with high accuracy, but can also justify their rationales with supporting evidence. We study the impacts of explainability in human-like intelligence and build generalizable and interpretable models. Our efforts provide an extensive collection of observations for demystifying the relationships between perception and reasoning, and offer insights into the development of trustworthy AI systems.

MnRI Master In Robotics Town Hall - Fall 2023 Admitted Students

Please join MnRI Director Nikos Papanikolopoulos at this event. Topics include course registration and preparation for starting Fall 2023.


MnRI In conjunction with OVPR: Opportunity to present your research to the Undersecretary of Defense Office


In conjunction with OVPR (as a result of the visit of the VP for Research to MnRI), we will start having these short events that will highlight your work with various funding agencies. The first event is on Monday, May 15, 2023, ⋅ 9 am – 10 am (Central Time - Chicago) with Dr. Kimberly Sablon in the Office of the Undersecretary for Defense. Her interests are in trustworthy AI and autonomy.  Her interests are described in detail in the following:

This needs to be a focused and technically rich discussion. Start with a 10-minute overview and then have 2-3 faculty from your team provide 7-10 minute presentations of their research.

We will send the presentation material in advance to the DOD. This will not be via video but audio as they do not have cameras in many rooms in the Pentagon. Please send us your material if you are interested by April 25 so the VP's office can select the material they want to send to DOD.

I suggest preparing 5-7 slides that capture your current work, some promising results, and plans. Keep it short since we will try to fit 3-4 of you in the presentation. 

Does New York City's Community Preference Policy Violate the Fair Housing Act?

Abstract: Many constraints dictate the allocation of New York City's affordable housing. Each unit is reserved for households of a particular size and income level, and preferences are given to certain groups (people with disabilities, community residents, and municipal employees) for a fraction of the units in each building. We demonstrate that these policies combine to make it difficult for low-income applicants to win lotteries for buildings outside their own neighborhoods. This finding is relevant to an ongoing lawsuit challenging the city's policy of favoring community residents. 

Bio: Nick Arnosti is an Assistant Professor at the Department of Industrial and Systems Engineering at the University of Minnesota. His research focuses on giving away social goods such as affordable housing, public school seats, visas, and scarce medical supplies. He has also studied the allocation of hunting licenses, hiking permits, and discounted event tickets. Previously, he was an Assistant Professor at Columbia Business School. He received a Ph.D. in Operations Research from Stanford University in 2016.


Deep Learning for Robot Perception and Manipulation Lect. 3

Guest Lecture 3:
Who: Sai Vemprala, Senior Researcher, Microsoft, Seattle
When: Apr 25, 2023, 2:30 pm Central Time
Notable works: ChatGPT for RoboticsLATTePACT

MnRI Research Town Hall

Please join MnRI Director Nikos Papanikolopoulos at this event to discuss research opportunities and how teams can be formulated. 

Deep Learning for Robot Perception and Manipulation Lect. 2

Guest Lecture 2:
Who: Mohit Shridhar, Ph.D. Student, University of Washington, Seattle
When: Mar 23, 2023, 2:30 pm Central Time
Notable works: Perceiver-ActorCLIPortALFRED

Deep Learning for Robot Perception and Manipulation Lect. 1

Guest Lecture 1:
Who: Yu Xiang, Assistant Professor, University of Texas, Dallas
When: Mar 16, 2023, 2:30 pm Central Time
Notable works: PoseCNNDeepIMLatentFusion

MnRI Master In Robotics Program Open House

Prospective Students are invited to explore our Master of Science in Robotics (MSR) Program offered by the Minnesota Robotics Institute and find all information about it.

  • Explore the MSR program and how it can help you achieve your goals.
  • Learn how to apply and what you can do now to get the most out of the program. 
  • Enjoy light refreshments and interact with the director, advisor, and students.

Video Super-resolution for Low Bitrate Streams

Abstract: Presenting a novel model for the joint problems of video super-resolution, removing compression artifacts and overall video enhancement of low-bitrate streams and combining Generative Adversarial Networks with Dynamic Upsampling Filters and a novel progressive training strategy that uses perceptual metrics. Derive real-time models for the cloud and evaluate on publicly available high-resolution datasets.

Bio: Roland Miezianko is a Senior Applied Scientist and Technical Lead at Amazon Grand Challenge, working on computer vision and machine learning models. His projects include the Amazon Glow communication device, Amazon Comprehend Medical NLP medical document entity extraction service, Amazon Care healthcare telepresence, medical knowledge graphs and chatbots, low-level computer vision detection using IR and EO sensors, with model deployment on edge devices and in the cloud. Roland received his Ph.D. from Temple University, where he focused on artificial intelligence software systems and video-based analytics. He received a Master’s degree in CIS from La Salle University and BS in Electrical Engineering from Boston University.